1. Generic types
The first rule of programming is that you should never write the same code twice. Suppose that you happen to have lying around a dictionary type whose keys are ints and whose values are strings. Tomorrow you realize that what you really want is a dictionary type whose keys are strings and whose values are ints, or one whose keys are ints but whose values are stacks. If you have N different types that may appear as keys or values, can you avoid writing N2 different dictionary implementations to get every possible combination?
Many languages provide special mechanisms to support generic types, ones for which part of the type is not specified. It's as if you could declare an array in C to be an array of some type to be declared later, and then write functions that operate on any such array without knowing what the missing type is going to be (templates in C++ are an example of such a facility). Unfortunately, C does not provide generic types. But by aggressive use of function pointers and void *, it is possible to fake them.
2. Generic dictionary: interface
Below is an example of an interface to a generic dictionary type for storing maps from constant values to constant values. The void * pointers are used to avoid having to declare exactly what kinds of keys and values the dictionary will contain.
1 /* Set dict[key] = value. */
2 /* Both key and value are copied internally. */
3 /* If data is the null pointer, remove dict[key]. */
4 void dict_set(Dict d, const void *key, const void *value);
5
6 /* Return dict[key], or null if dict[key] has not been set. */
7 const void *dict_get(Dict d, const void *key);
We'll also need a constructor and destructor, but we'll get to those in a moment. First we need to think about what dict_set and dict_get are supposed to do, and how we might possibly be able to implement them. Suppose we want to build a dictionary with strings as both keys and values. Internally, this might be represented as some sort of hash table or tree. Suppose it's a hash table. Now, given some void *key, we'd like to be able to compute its hash value. But we don't know what type key points to, and if we guess wrong we are likely to end up with segmentation faults or worse. So we need some way to register a hash function for our keys, whatever type they might really be behind that void *.
Similarly, we will want to be able to compare keys for equality (since not all keys that hash together will necessarily be the same), and we may want to be able to copy keys and values so that the data inside the dictionary is not modified if somebody changes a value passed in from the outside. So we need a fair bit of information about keys and values. We'll organize all of this information in a struct made up of function pointers. (This includes a few extra components that came up while writing the implementation.)
1 /* Provides operations for working with keys or values */
2 struct dict_contents_operations {
3 /* hash function */
4 unsigned long (*hash)(const void *datum, void *arg);
5
6 /* returns nonzero if *datum1 == *datum2 */
7 int (*equal)(const void *datum1, const void *datum2, void *arg);
8
9 /* make a copy of datum that will survive changes to original */
10 void *(*copy)(const void *datum, void *arg);
11
12 /* free a copy */
13 void (*delete)(void *datum, void *arg);
14
15 /* extra argument, to allow further specialization */
16 void *arg;
17 };
We could write a similar but smaller struct for values, but to save a little bit of effort in the short run we'll use the same struct for both keys and values. We can now write a constructor for our generic dictionary that consumes two such structs that provide operations for working on keys and values, respectively:
So now to create a dict, we just need to fill in two dict_contents_operations structures. For convenience, it might be nice if dict.c provided some preloaded structures for common types like ints and strings. We can also use the arg field in struct dict_contents_operations to make the keys and values themselves be parameterized types, for example a type of byte-vectors of given length.
We can declare these various convenience structures in in dict.h as
1 /* Some predefined dict_contents_operations structures */
2
3 /*
4 * DictIntOps supports int's that have been cast to (void *), e.g.:
5 * d = dict_create(DictIntOps, DictIntOps);
6 * dict_set(d, (void *) 1, (void * 2));
7 * x = (int) dict_get(d, (void * 1));
8 */
9 struct dict_contents_operations DictIntOps;
10
11 /*
12 * Supports null-terminated strings, e.g.:
13 * d = dict_create(DictStringOps, DictStringOps);
14 * dict_set(d, "foo", "bar");
15 * s = dict_get(d, "foo");
16 * Note: no casts are needed since C automatically converts
17 * between (void *) and other pointer types.
18 */
19 struct dict_contents_operations DictStringOps;
20
21 /*
22 * Supports fixed-size blocks of memory, e.g.:
23 * int x = 1;
24 * int y = 2;
25 * d = dict_create(dict_mem_ops(sizeof(int)), dict_mem_ops(sizeof(int));
26 * dict_set(d, &x, &y);
27 * printf("%d", *dict_get(d, &x);
28 */
29 struct dict_contents_operations dict_mem_ops(int size);
We'll define the operations in DictIntOps to expect ints cast directly to void *, the operations in DictStringOps to expect char * cast to void *, and the operations in dict_mem_ops(size) will expect void * arguments pointing to blocks of the given size. There is a subtle difference between a dictionary using DictIntOps and dict_mem_ops(sizeof(int)); in the former case, keys and values are the ints themselves (after being case), which in the latter, keys and values are pointers to ints.
Implementations of these structures can be found in dict.c in the Appendix.
To make a dictionary that maps strings to ints, we just call:
1 d = dict_create(DictStringOps, DictIntOps);
and then we can do things like:
If we find ourselves working with an integer-valued dictionary a lot, we might want to define a few macros to avoid having to type casts all the time.
3. Generic dictionary: implementation
To implement our generic dictionary, we just take our favorite non-generic hash table, and replace any calls to fixed hash functions, copier, free, etc. with calls to elements of the appropriate structure. The result is given in the Appendix.
4. Appendix: complete code
4.0.1. dict.h
1 typedef struct dict *Dict;
2
3 /* Provides operations for working with keys or values */
4 struct dict_contents_operations {
5 /* hash function */
6 unsigned long (*hash)(const void *datum, void *arg);
7
8 /* returns nonzero if *datum1 == *datum2 */
9 int (*equal)(const void *datum1, const void *datum2, void *arg);
10
11 /* make a copy of datum that will survive changes to original */
12 void *(*copy)(const void *datum, void *arg);
13
14 /* free a copy */
15 void (*delete)(void *datum, void *arg);
16
17 /* extra argument, to allow further specialization */
18 void *arg;
19 };
20
21 /* create a new dictionary with given key and value operations */
22 /* Note: value_ops.hash and value_ops.equal are not used. */
23 Dict dict_create(struct dict_contents_operations key_ops,
24 struct dict_contents_operations value_ops);
25
26 /* free a dictionary and all the space it contains */
27 /* This will call the appropriate delete function for all keys and */
28 /* values. */
29 void dict_destroy(Dict d);
30
31 /* Set dict[key] = value. */
32 /* Both key and value are copied internally. */
33 /* If data is the null pointer, remove dict[key]. */
34 void dict_set(Dict d, const void *key, const void *value);
35
36 /* Return dict[key], or null if dict[key] has not been set. */
37 const void *dict_get(Dict d, const void *key);
38
39 /* Some predefined dict_contents_operations structures */
40
41 /*
42 * DictIntOps supports int's that have been cast to (void *), e.g.:
43 * d = dict_create(DictIntOps, DictIntOps);
44 * dict_set(d, (void *) 1, (void * 2));
45 * x = (int) dict_get(d, (void * 1));
46 */
47 struct dict_contents_operations DictIntOps;
48
49 /*
50 * Supports null-terminated strings, e.g.:
51 * d = dict_create(DictStringOps, DictStringOps);
52 * dict_set(d, "foo", "bar");
53 * s = dict_get(d, "foo");
54 * Note: no casts are needed since C automatically converts
55 * between (void *) and other pointer types.
56 */
57 struct dict_contents_operations DictStringOps;
58
59 /*
60 * Supports fixed-size blocks of memory, e.g.:
61 * int x = 1;
62 * int y = 2;
63 * d = dict_create(dict_mem_ops(sizeof(int)), dict_mem_ops(sizeof(int));
64 * dict_set(d, &x, &y);
65 * printf("%d", *dict_get(d, &x);
66 */
67 struct dict_contents_operations dict_mem_ops(int size);
4.0.2. dict.c
1 #include <stdlib.h>
2 #include <string.h>
3 #include "dict.h"
4
5 struct dict_elt {
6 unsigned long hash; /* full hash of key */
7 void *key;
8 void *value;
9 struct dict_elt *next;
10 };
11
12 struct dict {
13 int table_size; /* number of slots in table */
14 int num_elements; /* number of elements */
15 struct dict_elt **table; /* linked list heads */
16 /* these save arguments passed at creation */
17 struct dict_contents_operations key_ops;
18 struct dict_contents_operations value_ops;
19 };
20
21 #define INITIAL_TABLE_SIZE (16)
22 #define TABLE_SIZE_MULTIPLIER (2)
23 #define TABLE_GROW_DENSITY (1)
24
25 Dict
26 dict_create(struct dict_contents_operations key_ops,
27 struct dict_contents_operations value_ops)
28 {
29 Dict d;
30 int i;
31
32 d = malloc(sizeof(*d));
33 if(d == 0) return 0;
34
35 d->table_size = INITIAL_TABLE_SIZE;
36 d->num_elements = 0;
37 d->key_ops = key_ops;
38 d->value_ops = value_ops;
39 d->table = malloc(sizeof(*(d->table)) * d->table_size);
40 if(d->table == 0) {
41 free(d);
42 return 0;
43 }
44
45 for(i = 0; i < d->table_size; i++) d->table[i] = 0;
46
47 return d;
48 }
49
50 void
51 dict_destroy(Dict d)
52 {
53 int i;
54 struct dict_elt *e;
55 struct dict_elt *next;
56
57 for(i = 0; i < d->table_size; i++) {
58 for(e = d->table[i]; e != 0; e = next) {
59 next = e->next;
60 d->key_ops.delete(e->key, d->key_ops.arg);
61 d->value_ops.delete(e->value, d->value_ops.arg);
62 free(e);
63 }
64 }
65 free(d->table);
66 free(d);
67 }
68
69 /* return pointer to element with given key, if any */
70 static struct dict_elt *
71 dict_fetch(Dict d, const void *key)
72 {
73 unsigned long h;
74 int i;
75 struct dict_elt *e;
76
77 h = d->key_ops.hash(key, d->key_ops.arg);
78 i = h % d->table_size;
79 for(e = d->table[i]; e != 0; e = e->next) {
80 if(e->hash == h && d->key_ops.equal(key, e->key, d->key_ops.arg)) {
81 /* found it */
82 return e;
83 }
84 }
85 /* didn't find it */
86 return 0;
87 }
88
89 /* increase the size of the dictionary, rehashing all table elements */
90 static void
91 dict_grow(Dict d)
92 {
93 struct dict_elt **old_table;
94 int old_size;
95 int i;
96 struct dict_elt *e;
97 struct dict_elt *next;
98 int new_pos;
99
100 /* save old table */
101 old_table = d->table;
102 old_size = d->table_size;
103
104 /* make new table */
105 d->table_size *= TABLE_SIZE_MULTIPLIER;
106 d->table = malloc(sizeof(*(d->table)) * d->table_size);
107 if(d->table == 0) {
108 /* put the old one back */
109 d->table = old_table;
110 d->table_size = old_size;
111 return;
112 }
113 /* else */
114 /* clear new table */
115 for(i = 0; i < d->table_size; i++) d->table[i] = 0;
116
117 /* move all elements of old table to new table */
118 for(i = 0; i < old_size; i++) {
119 for(e = old_table[i]; e != 0; e = next) {
120 next = e->next;
121 /* find the position in the new table */
122 new_pos = e->hash % d->table_size;
123 e->next = d->table[new_pos];
124 d->table[new_pos] = e;
125 }
126 }
127
128 /* don't need this any more */
129 free(old_table);
130 }
131
132 void
133 dict_set(Dict d, const void *key, const void *value)
134 {
135 int table_position;
136 struct dict_elt *e;
137
138 e = dict_fetch(d, key);
139 if(e != 0) {
140 /* change existing setting */
141 d->value_ops.delete(e->value, d->value_ops.arg);
142 e->value = value ? d->value_ops.copy(value, d->value_ops.arg) : 0;
143 } else {
144 /* create new element */
145 e = malloc(sizeof(*e));
146 if(e == 0) abort();
147
148 e->hash = d->key_ops.hash(key, d->key_ops.arg);
149 e->key = d->key_ops.copy(key, d->key_ops.arg);
150 e->value = value ? d->value_ops.copy(value, d->value_ops.arg) : 0;
151
152 /* link it in */
153 table_position = e->hash % d->table_size;
154 e->next = d->table[table_position];
155 d->table[table_position] = e;
156
157 d->num_elements++;
158
159 if(d->num_elements > d->table_size * TABLE_GROW_DENSITY) {
160 /* grow and rehash */
161 dict_grow(d);
162 }
163 }
164 }
165
166 const void *
167 dict_get(Dict d, const void *key)
168 {
169 struct dict_elt *e;
170
171 e = dict_fetch(d, key);
172 if(e != 0) {
173 return e->value;
174 } else {
175 return 0;
176 }
177 }
178
179 /* int functions */
180 /* We assume that int can be cast to void * and back without damage */
181 static unsigned long dict_int_hash(const void *x, void *arg) { return (int) x; }
182 static int dict_int_equal(const void *x, const void *y, void *arg)
183 {
184 return ((int) x) == ((int) y);
185 }
186 static void *dict_int_copy(const void *x, void *arg) { return (void *) x; }
187 static void dict_int_delete(void *x, void *arg) { ; }
188
189 struct dict_contents_operations DictIntOps = {
190 dict_int_hash,
191 dict_int_equal,
192 dict_int_copy,
193 dict_int_delete,
194 0
195 };
196
197 /* common utilities for string and mem */
198 static unsigned long hash_mem(const unsigned char *s, int len)
199 {
200 unsigned long h;
201 int i;
202
203 h = 0;
204 for(i = 0; i < len; i++) {
205 h = (h << 13) + (h >> 7) + h + s[i];
206 }
207 return h;
208 }
209
210 static void dict_delete_free(void *x, void *arg) { free(x); }
211
212 /* string functions */
213 static unsigned long dict_string_hash(const void *x, void *arg)
214 {
215 return hash_mem(x, strlen(x));
216 }
217
218 static int dict_string_equal(const void *x, const void *y, void *arg)
219 {
220 return !strcmp((const char *) x, (const char *) y);
221 }
222
223 static void *dict_string_copy(const void *x, void *arg)
224 {
225 const char *s;
226 char *s2;
227
228 s = x;
229 s2 = malloc(sizeof(*s2) * (strlen(s)+1));
230 strcpy(s2, s);
231 return s2;
232 }
233
234 struct dict_contents_operations DictStringOps = {
235 dict_string_hash,
236 dict_string_equal,
237 dict_string_copy,
238 dict_delete_free,
239 0
240 };
241
242 /* mem functions */
243 static unsigned long dict_mem_hash(const void *x, void *arg)
244 {
245 return hash_mem(x, (int) arg);
246 }
247
248 static int dict_mem_equal(const void *x, const void *y, void *arg)
249 {
250 return !memcmp(x, y, (size_t) arg);
251 }
252
253 static void *dict_mem_copy(const void *x, void *arg)
254 {
255 void *x2;
256
257 x2 = malloc((size_t) arg);
258 memcpy(x2, x, (size_t) arg);
259 return x2;
260 }
261
262 struct dict_contents_operations
263 dict_mem_ops(int len)
264 {
265 struct dict_contents_operations mem_ops;
266
267 mem_ops.hash = dict_mem_hash;
268 mem_ops.equal = dict_mem_equal;
269 mem_ops.copy = dict_mem_copy;
270 mem_ops.delete = dict_delete_free;
271 mem_ops.arg = (void *) len;
272
273 return mem_ops;
274 }
4.0.3. test-dict.c
1 #include <stdio.h>
2 #include <setjmp.h>
3 #include <signal.h>
4 #include <unistd.h>
5 #include <string.h>
6
7 #include "tester.h"
8 #include "dict.h"
9
10 /* avoid retyping casts */
11 #define Iget(d, key) ((int) dict_get((d), (const void *) (key)))
12 #define Iset(d, key, value) \
13 (dict_set((d), (const void *) (key), (const void *) (value)))
14
15 #define STRESS_TEST_ITERATIONS (100000)
16
17 int
18 main(int argc, char **argv)
19 {
20 Dict s2i;
21 Dict i2i;
22 Dict s2s;
23 Dict m2m;
24 int i;
25 short s;
26 int ivalue;
27
28 tester_init();
29
30 TRY { s2i = dict_create(DictStringOps, DictIntOps); } ENDTRY;
31 TRY { i2i = dict_create(DictIntOps, DictIntOps); } ENDTRY;
32 TRY { s2s = dict_create(DictStringOps, DictStringOps); } ENDTRY;
33
34 /* should be empty initially */
35 TEST(Iget(s2i, "foo"), 0);
36 TEST(Iget(i2i, 12), 0);
37 TEST_ASSERT(dict_get(s2s, "foo") == 0);
38
39 /* initial set */
40 TRY { Iset(s2i, "foo", 1); } ENDTRY;
41 TRY { Iset(i2i, 12, 2); } ENDTRY;
42 TRY { dict_set(s2s, "foo", "bar"); } ENDTRY;
43 TEST(Iget(s2i, "foo"), 1);
44 TEST(Iget(i2i, 12), 2);
45 TEST_ASSERT(!strcmp(dict_get(s2s, "foo"), "bar"));
46
47 /* set again */
48 TRY { Iset(s2i, "foo", 3); } ENDTRY;
49 TRY { Iset(i2i, 12, 4); } ENDTRY;
50 TRY { dict_set(s2s, "foo", "baz"); } ENDTRY;
51 TEST(Iget(s2i, "foo"), 3);
52 TEST(Iget(i2i, 12), 4);
53 TEST_ASSERT(!strcmp(dict_get(s2s, "foo"), "baz"));
54
55 /* now delete */
56 TRY { dict_set(s2i, "foo", 0); } ENDTRY;
57 TRY { dict_set(i2i, (const void *) 12, 0); } ENDTRY;
58 TRY { dict_set(s2s, "foo", 0); } ENDTRY;
59 TEST(Iget(s2i, "foo"), 0);
60 TEST(Iget(i2i, 12), 0);
61 TEST_ASSERT(dict_get(s2s, "foo") == 0);
62
63 /* stress test on i2i */
64 TRY {
65 for(i = 0; i < STRESS_TEST_ITERATIONS; i++) {
66 Iset(i2i, i, i*37);
67 }
68 } ENDTRY;
69
70 for(i = 0; i < STRESS_TEST_ITERATIONS; i++) {
71 TRY { ivalue = Iget(i2i, i); } ENDTRY;
72 TEST(ivalue, i*37);
73 if(ivalue != i*37) break;
74 }
75
76 /* stress test on m2m */
77 /* We'll use blocks to hold int keys and short values */
78 TRY {
79 m2m = dict_create(dict_mem_ops(sizeof(int)),
80 dict_mem_ops(sizeof(short)));
81 } ENDTRY;
82 TRY {
83 for(i = 0; i < STRESS_TEST_ITERATIONS; i++) {
84 s = i % 37;
85 dict_set(m2m, &i, &s);
86 }
87 } ENDTRY;
88
89 for(i = 0; i < STRESS_TEST_ITERATIONS; i++) {
90 TRY { ivalue = *((short *) dict_get(m2m, &i)); } ENDTRY;
91 TEST(ivalue, i % 37);
92 if(ivalue != i % 37) break;
93 }
94
95 TRY { dict_destroy(s2i); } ENDTRY;
96 TRY { dict_destroy(i2i); } ENDTRY;
97 TRY { dict_destroy(s2s); } ENDTRY;
98 TRY { dict_destroy(m2m); } ENDTRY;
99
100 tester_report(stdout, argv[0]);
101 return tester_result();
102 }
{{{$ make test gcc -g3 -ansi -pedantic -Wall -c -o test-dict.o test-dict.c gcc -g3 -ansi -pedantic -Wall -c -o dict.o dict.c gcc -g3 -ansi -pedantic -Wall -c -o tester.o tester.c gcc -g3 -ansi -pedantic -Wall -o test-dict test-dict.o dict.o tester.o ./test-dict OK! }}}