[FrontPage] [TitleIndex] [WordIndex

Note: You are looking at a static copy of the former PineWiki site, used for class notes by James Aspnes from 2003 to 2012. Many mathematical formulas are broken, and there are likely to be other bugs as well. These will most likely not be fixed. You may be able to find more up-to-date versions of some of these notes at http://www.cs.yale.edu/homes/aspnes/#classes.

1. Generic types

The first rule of programming is that you should never write the same code twice. Suppose that you happen to have lying around a dictionary type whose keys are ints and whose values are strings. Tomorrow you realize that what you really want is a dictionary type whose keys are strings and whose values are ints, or one whose keys are ints but whose values are stacks. If you have N different types that may appear as keys or values, can you avoid writing N2 different dictionary implementations to get every possible combination?

Many languages provide special mechanisms to support generic types, ones for which part of the type is not specified. It's as if you could declare an array in C to be an array of some type to be declared later, and then write functions that operate on any such array without knowing what the missing type is going to be (templates in C++ are an example of such a facility). Unfortunately, C does not provide generic types. But by aggressive use of function pointers and void *, it is possible to fake them.

2. Generic dictionary: interface

Below is an example of an interface to a generic dictionary type for storing maps from constant values to constant values. The void * pointers are used to avoid having to declare exactly what kinds of keys and values the dictionary will contain.

   1 /* Set dict[key] = value. */
   2 /* Both key and value are copied internally. */
   3 /* If data is the null pointer, remove dict[key]. */
   4 void dict_set(Dict d, const void *key, const void *value);
   5 
   6 /* Return dict[key], or null if dict[key] has not been set. */
   7 const void *dict_get(Dict d, const void *key);

We'll also need a constructor and destructor, but we'll get to those in a moment. First we need to think about what dict_set and dict_get are supposed to do, and how we might possibly be able to implement them. Suppose we want to build a dictionary with strings as both keys and values. Internally, this might be represented as some sort of hash table or tree. Suppose it's a hash table. Now, given some void *key, we'd like to be able to compute its hash value. But we don't know what type key points to, and if we guess wrong we are likely to end up with segmentation faults or worse. So we need some way to register a hash function for our keys, whatever type they might really be behind that void *.

Similarly, we will want to be able to compare keys for equality (since not all keys that hash together will necessarily be the same), and we may want to be able to copy keys and values so that the data inside the dictionary is not modified if somebody changes a value passed in from the outside. So we need a fair bit of information about keys and values. We'll organize all of this information in a struct made up of function pointers. (This includes a few extra components that came up while writing the implementation.)

   1 /* Provides operations for working with keys or values */
   2 struct dict_contents_operations {
   3     /* hash function */
   4     unsigned long (*hash)(const void *datum, void *arg);
   5 
   6     /* returns nonzero if *datum1 == *datum2 */
   7     int (*equal)(const void *datum1, const void *datum2, void *arg);
   8 
   9     /* make a copy of datum that will survive changes to original */
  10     void *(*copy)(const void *datum, void *arg);
  11 
  12     /* free a copy */
  13     void (*delete)(void *datum, void *arg);
  14 
  15     /* extra argument, to allow further specialization */
  16     void *arg;
  17 };

We could write a similar but smaller struct for values, but to save a little bit of effort in the short run we'll use the same struct for both keys and values. We can now write a constructor for our generic dictionary that consumes two such structs that provide operations for working on keys and values, respectively:

   1 /* create a new dictionary with given key and value operations */
   2 /* Note: value_ops.hash and value_ops.equal are not used. */
   3 Dict dict_create(struct dict_contents_operations key_ops,
   4                  struct dict_contents_operations value_ops);

So now to create a dict, we just need to fill in two dict_contents_operations structures. For convenience, it might be nice if dict.c provided some preloaded structures for common types like ints and strings. We can also use the arg field in struct dict_contents_operations to make the keys and values themselves be parameterized types, for example a type of byte-vectors of given length.

We can declare these various convenience structures in in dict.h as

   1 /* Some predefined dict_contents_operations structures */
   2 
   3 /* 
   4  * DictIntOps supports int's that have been cast to (void *), e.g.:
   5  *     d = dict_create(DictIntOps, DictIntOps);
   6  *     dict_set(d, (void *) 1, (void * 2));
   7  *     x = (int) dict_get(d, (void * 1));
   8  */
   9 struct dict_contents_operations DictIntOps;
  10 
  11 /*
  12  * Supports null-terminated strings, e.g.:
  13  *     d = dict_create(DictStringOps, DictStringOps);
  14  *     dict_set(d, "foo", "bar");
  15  *     s = dict_get(d, "foo");
  16  * Note: no casts are needed since C automatically converts
  17  * between (void *) and other pointer types.
  18  */
  19 struct dict_contents_operations DictStringOps;
  20 
  21 /*
  22  * Supports fixed-size blocks of memory, e.g.:
  23  *     int x = 1;
  24  *     int y = 2;
  25  *     d = dict_create(dict_mem_ops(sizeof(int)), dict_mem_ops(sizeof(int));
  26  *     dict_set(d, &x, &y);
  27  *     printf("%d", *dict_get(d, &x);
  28  */
  29 struct dict_contents_operations dict_mem_ops(int size);

We'll define the operations in DictIntOps to expect ints cast directly to void *, the operations in DictStringOps to expect char * cast to void *, and the operations in dict_mem_ops(size) will expect void * arguments pointing to blocks of the given size. There is a subtle difference between a dictionary using DictIntOps and dict_mem_ops(sizeof(int)); in the former case, keys and values are the ints themselves (after being case), which in the latter, keys and values are pointers to ints.

Implementations of these structures can be found in dict.c in the Appendix.

To make a dictionary that maps strings to ints, we just call:

   1     d = dict_create(DictStringOps, DictIntOps);

and then we can do things like:

   1     dict_set(d, "foo", (void *) 2);
   2     v = (int) dict_get(d, "foo');

If we find ourselves working with an integer-valued dictionary a lot, we might want to define a few macros to avoid having to type casts all the time.

3. Generic dictionary: implementation

To implement our generic dictionary, we just take our favorite non-generic hash table, and replace any calls to fixed hash functions, copier, free, etc. with calls to elements of the appropriate structure. The result is given in the Appendix.

4. Appendix: complete code

4.0.1. dict.h

   1 typedef struct dict *Dict;
   2 
   3 /* Provides operations for working with keys or values */
   4 struct dict_contents_operations {
   5     /* hash function */
   6     unsigned long (*hash)(const void *datum, void *arg);
   7 
   8     /* returns nonzero if *datum1 == *datum2 */
   9     int (*equal)(const void *datum1, const void *datum2, void *arg);
  10 
  11     /* make a copy of datum that will survive changes to original */
  12     void *(*copy)(const void *datum, void *arg);
  13 
  14     /* free a copy */
  15     void (*delete)(void *datum, void *arg);
  16 
  17     /* extra argument, to allow further specialization */
  18     void *arg;
  19 };
  20 
  21 /* create a new dictionary with given key and value operations */
  22 /* Note: value_ops.hash and value_ops.equal are not used. */
  23 Dict dict_create(struct dict_contents_operations key_ops,
  24                  struct dict_contents_operations value_ops);
  25 
  26 /* free a dictionary and all the space it contains */
  27 /* This will call the appropriate delete function for all keys and */
  28 /* values. */
  29 void dict_destroy(Dict d);
  30 
  31 /* Set dict[key] = value. */
  32 /* Both key and value are copied internally. */
  33 /* If data is the null pointer, remove dict[key]. */
  34 void dict_set(Dict d, const void *key, const void *value);
  35 
  36 /* Return dict[key], or null if dict[key] has not been set. */
  37 const void *dict_get(Dict d, const void *key);
  38 
  39 /* Some predefined dict_contents_operations structures */
  40 
  41 /* 
  42  * DictIntOps supports int's that have been cast to (void *), e.g.:
  43  *     d = dict_create(DictIntOps, DictIntOps);
  44  *     dict_set(d, (void *) 1, (void * 2));
  45  *     x = (int) dict_get(d, (void * 1));
  46  */
  47 struct dict_contents_operations DictIntOps;
  48 
  49 /*
  50  * Supports null-terminated strings, e.g.:
  51  *     d = dict_create(DictStringOps, DictStringOps);
  52  *     dict_set(d, "foo", "bar");
  53  *     s = dict_get(d, "foo");
  54  * Note: no casts are needed since C automatically converts
  55  * between (void *) and other pointer types.
  56  */
  57 struct dict_contents_operations DictStringOps;
  58 
  59 /*
  60  * Supports fixed-size blocks of memory, e.g.:
  61  *     int x = 1;
  62  *     int y = 2;
  63  *     d = dict_create(dict_mem_ops(sizeof(int)), dict_mem_ops(sizeof(int));
  64  *     dict_set(d, &x, &y);
  65  *     printf("%d", *dict_get(d, &x);
  66  */
  67 struct dict_contents_operations dict_mem_ops(int size);

4.0.2. dict.c

   1 #include <stdlib.h>
   2 #include <string.h>
   3 #include "dict.h"
   4 
   5 struct dict_elt {
   6     unsigned long hash;           /* full hash of key */
   7     void *key;
   8     void *value;
   9     struct dict_elt *next;
  10 };
  11 
  12 struct dict {
  13     int table_size;          /* number of slots in table */
  14     int num_elements;        /* number of elements */
  15     struct dict_elt **table; /* linked list heads */
  16     /* these save arguments passed at creation */
  17     struct dict_contents_operations key_ops;
  18     struct dict_contents_operations value_ops;
  19 };
  20 
  21 #define INITIAL_TABLE_SIZE (16)
  22 #define TABLE_SIZE_MULTIPLIER (2)
  23 #define TABLE_GROW_DENSITY (1)
  24 
  25 Dict 
  26 dict_create(struct dict_contents_operations key_ops,
  27             struct dict_contents_operations value_ops)
  28 {
  29     Dict d;
  30     int i;
  31 
  32     d = malloc(sizeof(*d));
  33     if(d == 0) return 0;
  34 
  35     d->table_size = INITIAL_TABLE_SIZE;
  36     d->num_elements = 0;
  37     d->key_ops = key_ops;
  38     d->value_ops = value_ops;
  39     d->table = malloc(sizeof(*(d->table)) * d->table_size);
  40     if(d->table == 0) {
  41         free(d);
  42         return 0;
  43     }
  44 
  45     for(i = 0; i < d->table_size; i++) d->table[i] = 0;
  46 
  47     return d;
  48 }
  49 
  50 void
  51 dict_destroy(Dict d)
  52 {
  53     int i;
  54     struct dict_elt *e;
  55     struct dict_elt *next;
  56 
  57     for(i = 0; i < d->table_size; i++) {
  58         for(e = d->table[i]; e != 0; e = next) {
  59             next = e->next;
  60             d->key_ops.delete(e->key, d->key_ops.arg);
  61             d->value_ops.delete(e->value, d->value_ops.arg);
  62             free(e);
  63         }
  64     }
  65     free(d->table);
  66     free(d);
  67 }
  68 
  69 /* return pointer to element with given key, if any */
  70 static struct dict_elt *
  71 dict_fetch(Dict d, const void *key)
  72 {
  73     unsigned long h;
  74     int i;
  75     struct dict_elt *e;
  76 
  77     h = d->key_ops.hash(key, d->key_ops.arg);
  78     i = h % d->table_size;
  79     for(e = d->table[i]; e != 0; e = e->next) {
  80         if(e->hash == h && d->key_ops.equal(key, e->key, d->key_ops.arg)) {
  81             /* found it */
  82             return e;
  83         }
  84     }
  85     /* didn't find it */
  86     return 0;
  87 }
  88 
  89 /* increase the size of the dictionary, rehashing all table elements */
  90 static void
  91 dict_grow(Dict d)
  92 {
  93     struct dict_elt **old_table;
  94     int old_size;
  95     int i;
  96     struct dict_elt *e;
  97     struct dict_elt *next;
  98     int new_pos;
  99 
 100     /* save old table */
 101     old_table = d->table;
 102     old_size = d->table_size;
 103 
 104     /* make new table */
 105     d->table_size *= TABLE_SIZE_MULTIPLIER;
 106     d->table = malloc(sizeof(*(d->table)) * d->table_size);
 107     if(d->table == 0) {
 108         /* put the old one back */
 109         d->table = old_table;
 110         d->table_size = old_size;
 111         return;
 112     }
 113     /* else */
 114     /* clear new table */
 115     for(i = 0; i < d->table_size; i++) d->table[i] = 0;
 116 
 117     /* move all elements of old table to new table */
 118     for(i = 0; i < old_size; i++) {
 119         for(e = old_table[i]; e != 0; e = next) {
 120             next = e->next;
 121             /* find the position in the new table */
 122             new_pos = e->hash % d->table_size;
 123             e->next = d->table[new_pos];
 124             d->table[new_pos] = e;
 125         }
 126     }
 127 
 128     /* don't need this any more */
 129     free(old_table);
 130 }
 131 
 132 void
 133 dict_set(Dict d, const void *key, const void *value)
 134 {
 135     int table_position;
 136     struct dict_elt *e;
 137 
 138     e = dict_fetch(d, key);
 139     if(e != 0) {
 140         /* change existing setting */
 141         d->value_ops.delete(e->value, d->value_ops.arg);
 142         e->value = value ? d->value_ops.copy(value, d->value_ops.arg) : 0;
 143     } else {
 144         /* create new element */
 145         e = malloc(sizeof(*e));
 146         if(e == 0) abort();
 147 
 148         e->hash = d->key_ops.hash(key, d->key_ops.arg);
 149         e->key = d->key_ops.copy(key, d->key_ops.arg);
 150         e->value = value ? d->value_ops.copy(value, d->value_ops.arg) : 0;
 151 
 152         /* link it in */
 153         table_position = e->hash % d->table_size;
 154         e->next = d->table[table_position];
 155         d->table[table_position] = e;
 156 
 157         d->num_elements++;
 158 
 159         if(d->num_elements > d->table_size * TABLE_GROW_DENSITY) {
 160             /* grow and rehash */
 161             dict_grow(d);
 162         }
 163     }
 164 }
 165 
 166 const void *
 167 dict_get(Dict d, const void *key)
 168 {
 169     struct dict_elt *e;
 170 
 171     e = dict_fetch(d, key);
 172     if(e != 0) {
 173         return e->value;
 174     } else {
 175         return 0;
 176     }
 177 }
 178 
 179 /* int functions */
 180 /* We assume that int can be cast to void * and back without damage */
 181 static unsigned long dict_int_hash(const void *x, void *arg) { return (int) x; }
 182 static int dict_int_equal(const void *x, const void *y, void *arg) 
 183 { 
 184     return ((int) x) == ((int) y);
 185 }
 186 static void *dict_int_copy(const void *x, void *arg) { return (void *) x; }
 187 static void dict_int_delete(void *x, void *arg) { ; }
 188 
 189 struct dict_contents_operations DictIntOps = {
 190     dict_int_hash,
 191     dict_int_equal,
 192     dict_int_copy,
 193     dict_int_delete,
 194     0
 195 };
 196 
 197 /* common utilities for string and mem */
 198 static unsigned long hash_mem(const unsigned char *s, int len)
 199 {
 200     unsigned long h;
 201     int i;
 202 
 203     h = 0;
 204     for(i = 0; i < len; i++) {
 205         h = (h << 13) + (h >> 7) + h + s[i];
 206     }
 207     return h;
 208 }
 209 
 210 static void dict_delete_free(void *x, void *arg) { free(x); }
 211 
 212 /* string functions */
 213 static unsigned long dict_string_hash(const void *x, void *arg)
 214 {
 215     return hash_mem(x, strlen(x));
 216 }
 217 
 218 static int dict_string_equal(const void *x, const void *y, void *arg)
 219 {
 220     return !strcmp((const char *) x, (const char *) y);
 221 }    
 222 
 223 static void *dict_string_copy(const void *x, void *arg)
 224 {
 225     const char *s;
 226     char *s2;
 227 
 228     s = x;
 229     s2 = malloc(sizeof(*s2) * (strlen(s)+1));
 230     strcpy(s2, s);
 231     return s2;
 232 }
 233 
 234 struct dict_contents_operations DictStringOps = {
 235     dict_string_hash,
 236     dict_string_equal,
 237     dict_string_copy,
 238     dict_delete_free,
 239     0
 240 };
 241 
 242 /* mem functions */
 243 static unsigned long dict_mem_hash(const void *x, void *arg)
 244 {
 245     return hash_mem(x, (int) arg);
 246 }
 247 
 248 static int dict_mem_equal(const void *x, const void *y, void *arg)
 249 {
 250     return !memcmp(x, y, (size_t) arg);
 251 }    
 252 
 253 static void *dict_mem_copy(const void *x, void *arg)
 254 {
 255     void *x2;
 256 
 257     x2 = malloc((size_t) arg);
 258     memcpy(x2, x, (size_t) arg);
 259     return x2;
 260 }
 261 
 262 struct dict_contents_operations
 263 dict_mem_ops(int len)
 264 {
 265     struct dict_contents_operations mem_ops;
 266 
 267     mem_ops.hash = dict_mem_hash;
 268     mem_ops.equal = dict_mem_equal;
 269     mem_ops.copy = dict_mem_copy;
 270     mem_ops.delete = dict_delete_free;
 271     mem_ops.arg = (void *) len;
 272 
 273     return mem_ops;
 274 }

4.0.3. test-dict.c

   1 #include <stdio.h>
   2 #include <setjmp.h>
   3 #include <signal.h>
   4 #include <unistd.h>
   5 #include <string.h>
   6 
   7 #include "tester.h"
   8 #include "dict.h"
   9 
  10 /* avoid retyping casts */
  11 #define Iget(d, key) ((int) dict_get((d), (const void *) (key)))
  12 #define Iset(d, key, value) \
  13     (dict_set((d), (const void *) (key), (const void *) (value)))
  14 
  15 #define STRESS_TEST_ITERATIONS (100000)
  16 
  17 int
  18 main(int argc, char **argv)
  19 {
  20     Dict s2i;
  21     Dict i2i;
  22     Dict s2s;
  23     Dict m2m;
  24     int i;
  25     short s;
  26     int ivalue;
  27 
  28     tester_init();
  29 
  30     TRY { s2i = dict_create(DictStringOps, DictIntOps); } ENDTRY;
  31     TRY { i2i = dict_create(DictIntOps, DictIntOps); } ENDTRY;
  32     TRY { s2s = dict_create(DictStringOps, DictStringOps); } ENDTRY;
  33 
  34     /* should be empty initially */
  35     TEST(Iget(s2i, "foo"), 0);
  36     TEST(Iget(i2i, 12), 0);
  37     TEST_ASSERT(dict_get(s2s, "foo") == 0);
  38 
  39     /* initial set */
  40     TRY { Iset(s2i, "foo", 1); } ENDTRY;
  41     TRY { Iset(i2i, 12, 2); } ENDTRY;
  42     TRY { dict_set(s2s, "foo", "bar"); } ENDTRY;
  43     TEST(Iget(s2i, "foo"), 1);
  44     TEST(Iget(i2i, 12), 2);
  45     TEST_ASSERT(!strcmp(dict_get(s2s, "foo"), "bar"));
  46 
  47     /* set again */
  48     TRY { Iset(s2i, "foo", 3); } ENDTRY;
  49     TRY { Iset(i2i, 12, 4); } ENDTRY;
  50     TRY { dict_set(s2s, "foo", "baz"); } ENDTRY;
  51     TEST(Iget(s2i, "foo"), 3);
  52     TEST(Iget(i2i, 12), 4);
  53     TEST_ASSERT(!strcmp(dict_get(s2s, "foo"), "baz"));
  54 
  55     /* now delete */
  56     TRY { dict_set(s2i, "foo", 0); } ENDTRY;
  57     TRY { dict_set(i2i, (const void *) 12, 0); } ENDTRY;
  58     TRY { dict_set(s2s, "foo", 0); } ENDTRY;
  59     TEST(Iget(s2i, "foo"), 0);
  60     TEST(Iget(i2i, 12), 0);
  61     TEST_ASSERT(dict_get(s2s, "foo") == 0);
  62 
  63     /* stress test on i2i */
  64     TRY {
  65         for(i = 0; i < STRESS_TEST_ITERATIONS; i++) {
  66             Iset(i2i, i, i*37);
  67         }
  68     } ENDTRY;
  69 
  70     for(i = 0; i < STRESS_TEST_ITERATIONS; i++) {
  71         TRY { ivalue = Iget(i2i, i); } ENDTRY;
  72         TEST(ivalue, i*37);
  73         if(ivalue != i*37) break;
  74     }
  75 
  76     /* stress test on m2m */
  77     /* We'll use blocks to hold int keys and short values */
  78     TRY { 
  79         m2m = dict_create(dict_mem_ops(sizeof(int)),
  80                           dict_mem_ops(sizeof(short)));
  81     } ENDTRY;
  82     TRY {
  83         for(i = 0; i < STRESS_TEST_ITERATIONS; i++) {
  84             s = i % 37;
  85             dict_set(m2m, &i, &s);
  86         }
  87     } ENDTRY;
  88 
  89     for(i = 0; i < STRESS_TEST_ITERATIONS; i++) {
  90         TRY { ivalue = *((short *) dict_get(m2m, &i)); } ENDTRY;
  91         TEST(ivalue, i % 37);
  92         if(ivalue != i % 37) break;
  93     }
  94 
  95     TRY { dict_destroy(s2i); } ENDTRY;
  96     TRY { dict_destroy(i2i); } ENDTRY;
  97     TRY { dict_destroy(s2s); } ENDTRY;
  98     TRY { dict_destroy(m2m); } ENDTRY;
  99 
 100     tester_report(stdout, argv[0]);
 101     return tester_result();
 102 }

{{{$ make test gcc -g3 -ansi -pedantic -Wall -c -o test-dict.o test-dict.c gcc -g3 -ansi -pedantic -Wall -c -o dict.o dict.c gcc -g3 -ansi -pedantic -Wall -c -o tester.o tester.c gcc -g3 -ansi -pedantic -Wall -o test-dict test-dict.o dict.o tester.o ./test-dict OK! }}}


CategoryProgrammingNotes


2014-06-17 11:57