Note: You are looking at a static copy of the former PineWiki site, used for class notes by James Aspnes from 2003 to 2012. Many mathematical formulas are broken, and there are likely to be other bugs as well. These will most likely not be fixed. You may be able to find more up-to-date versions of some of these notes at http://www.cs.yale.edu/homes/aspnes/#classes.

1. Structs

A struct is a way to define a type that consists of one or more other types pasted together. Here's a typical struct definition:

   1 struct string {
   2     int length;
   3     char *data;
   4 };

This defines a new type struct string that can be used anywhere you would use a simple type like int or float. When you declare a variable with type struct string, the compiler allocates enough space to hold both an int and a char * (8 bytes on a typical 32-bit machine). You can get at the individual components using the . operator, like this:

   1 struct string {
   2     int length;
   3     char *data;
   4 };
   5 
   6 int
   7 main(int argc, char **argv)
   8 {
   9     struct string s;
  10 
  11     s.length = 4;
  12     s.data = "this string is a lot longer than you think";
  13 
  14     puts(s.data);
  15 
  16     return 0;
  17 }

struct_example.c

Variables of type struct can be assigned to, passed into functions, returned from functions, and tested for equality, just like any other type. Each such operation is applied componentwise; for example, s1 = s2; is equivalent to s1.length = s2.length; s1.data = s2.data; and s1 == s2 is equivalent to s1.length == s2.length && s1.data = s2.data.

These operations are not used as often as you might think: typically, instead of copying around entire structures, C programs pass around pointers, as is done with arrays. Pointers to structs are common enough in C that a special syntax is provided for dereferencing them.¹ Suppose we have:

   1     struct string s;            /* a struct */
   2     struct string *sp;          /* a pointer to a struct */
   3 
   4     s.length = 4;
   5     s.data = "another overly long string";
   6 
   7     sp = &s;

We can then refer to elements of the struct string that sp points to (i.e. s) in either of two ways:

   1     puts((*sp).data);
   2     puts(sp->data);

The second is more common, since it involves typing fewer parentheses. It is an error to write *sp.data in this case; since . binds tighter than *, the compiler will attempt to evaluate sp.data first and generate an error, since sp doesn't have a data field.

Pointers to structs are commonly used in defining AbstractDataTypes, since it is possible to declare that a function returns e.g. a struct string * without specifying the components of a struct string. (All pointers to structs in C have the same size and structure, so the compiler doesn't need to know the components to pass around the address.) Hiding the components discourages code that shouldn't look at them from doing so, and can be used, for example, to enforce consistency between fields.

For example, suppose we wanted to define a struct string * type that held counted strings that could only be accessed through a restricted interface that prevented (for example) the user from changing the string or its length. We might create a file myString.h that contained the declarations:

   1 /* make a struct string * that holds a copy of s */
   2 struct string *makeString(const char *s);
   3 
   4 /* destroy a struct string * */
   5 void destroyString(struct string *);
   6 
   7 /* return the length of a struct string * */
   8 int stringLength(struct string *);
   9 
  10 /* return the character at position index in the struct string * */
  11 /* or returns -1 if index is out of bounds */
  12 int stringCharAt(struct string *s, int index);

myString.h

and then the actual implementation in myString.c would be the only place where the components of a struct string were defined:

   1 #include <stdlib.h>
   2 #include <string.h>
   3 
   4 #include "myString.h"
   5 
   6 struct string {
   7     int length;
   8     char *data;
   9 };
  10 
  11 struct string *
  12 makeString(const char *s)
  13 {
  14     struct string *s2;
  15 
  16     s2 = malloc(sizeof(struct string));
  17     if(s2 == 0) return 0;
  18 
  19     s2->length = strlen(s);
  20 
  21     s2->data = malloc(s2->length);
  22     if(s2->data == 0) {
  23 	free(s2);
  24 	return 0;
  25     }
  26 
  27     strncpy(s2->data, s, s2->length);
  28 
  29     return s2;
  30 }
  31 
  32 void
  33 destroyString(struct string *s)
  34 {
  35     free(s->data);
  36     free(s);
  37 }
  38 
  39 int
  40 stringLength(struct string *s)
  41 {
  42     return s->length;
  43 }
  44 
  45 int
  46 stringCharAt(struct string *s, int index)
  47 {
  48     if(index < 0 || index >= s->length) {
  49 	return -1;
  50     } else {
  51 	return s->data[index];
  52     }
  53 }

myString.c

In practice, we would probably go even further and replace all the struct string * types with a new name declared with typedef.

2. Unions

A union is just like a struct, except that instead of allocating space to store all the components, the compiler only allocates space to store the largest one, and makes all the components refer to the same address. This can be used to save space if you know that only one of several components will be meaningful for a particular object. An example might be a type representing an object in a LISP-like language like Scheme:

   1 struct lispObject {
   2     int type;           /* type code */
   3     union {
   4         int     intVal;
   5         double  floatVal;
   6         char *  stringVal;
   7         struct {
   8             struct lispObject *car;
   9             struct lispObject *cdr;
  10         } consVal;
  11     } u;
  12 };

Now if you wanted to make a struct lispObject that held an integer value, you might write

   1     lispObject o;
   2 
   3     o.type = TYPE_INT;
   4     o.u.intVal = 27;

where TYPE_INT had presumably been defined somewhere. Note that nothing then prevents you from writing

   1     x = 2.7 * o.u.floatVal;

but the effects will be strange, since it's likely that the bit pattern representing 27 as an int represents something very different as a double. Avoiding such mistakes is your responsibility, which is why most uses of union occur inside larger structs that contain enough information to figure out which variant of the union applies.

3. Bit fields

It is possible to specify the exact number of bits taken up by a member of a struct of integer type. This is seldom useful, but may in principle let you pack more information in less space, e.g.:

   1 struct color {
   2     unsigned int red   : 2;
   3     unsigned int green : 2;
   4     unsigned int blue  : 2;
   5     unsigned int alpha : 2;
   6 };

defines a struct that (probably) occupies only one byte, and supplies four 2-bit fields, each of which can hold values in the range 0-3.

CategoryProgrammingNotes

Arguably, this is a bug in the design of the language: if the compiler knows that sp has type struct string *, there is no particular reason why it can't interpret sp.length as sp->length. But it doesn't do this, so you will have to remember to write sp->length instead. (1)