Comparison
We have seen several ways to construct dictionaries: hash tables, unbalanced binary search trees, red-black trees, and skip lists. There are several factors that influence the choice of an algorithm:
- Sorted output. If sorted output is required, then hash tables are not a viable
alternative. Entries are stored in the table based on their hashed value, with no other ordering.
For binary trees, the story is different. An in-order tree walk will produce a sorted list.
For example:
void WalkTree(Node *P) { if (P == NIL) return; WalkTree(P->Left); /* examine P->Data here */ WalkTree(P->Right); } WalkTree(Root);
To examine skip list nodes in order, simply chain through the level-0 pointers. For example:
Node *P = List.Hdr->Forward[0]; while (P != NIL) { /* examine P->Data here */ P = P->Forward[0]; }
- Space. The amount of memory required to store a value should be minimized. This
is especially true if many small nodes are to be allocated.
- For hash tables, only one forward pointer per node is required. In addition, the hash
table itself must be allocated.
- For red-black trees, each node has a left, right, and parent pointer. In addition, the
color of each node must be recorded. Although this requires only one bit, more space may
be allocated to ensure that the size of the structure is properly aligned. Therefore each
node in a red-black tree requires enough space for 3-4 pointers.
- For skip lists, each node has a level-0 forward pointer. The probability of having a
level-1 pointer is ½. The probability of having a level-2 pointer is ¼. In general,
the number of forward pointers per node is
n= 1 + ½ + ¼ + ... = 2.
- For hash tables, only one forward pointer per node is required. In addition, the hash
table itself must be allocated.
- Time. The algorithm should be efficient. This is especially true if a large dataset
is expected. Table 3-2 compares the search time for each algorithm. Note that worst-case behavior
for hash tables and skip lists is extremely unlikely. Actual timing tests are described below.
- Simplicity. If the algorithm is short and easy to understand, fewer mistakes may be made. This not only makes your life easy, but the maintenance programmer entrusted with the task of making repairs will appreciate any efforts you make in this area. The number of statements required for each algorithm is listed in Table 3-2.
method | statements | average time | worst-case time |
---|---|---|---|
hash table | 26 | O(1) | O(n) |
unbalanced tree | 41 | O(lgn) | O(n) |
red-black tree | 120 | O(lgn) | O(lgn) |
skip list | 55 | O(lgn) | O(n) |
Average time for insert, search, and delete operations on a database of 65,536 (216) randomly input items may be found in Table 3-3. For this test the hash table size was 10,009 and 16 index levels were allowed for the skip list. Although there is some variation in the timings for the four methods, they are close enough so that other considerations should come into play when selecting an algorithm.
method | insert | search | delete |
---|---|---|---|
hash table | 18 | 8 | 10 |
unbalanced tree | 37 | 17 | 26 |
red-black tree | 40 | 16 | 37 |
skip list | 48 | 31 | 35 |
Table 3-4 shows the average search time for two sets of data: a random set, where all values are unique, and an ordered set, where values are in ascending order. Ordered input creates a worst-case scenario for unbalanced tree algorithms, as the tree ends up being a simple linked list. The times shown are for a single search operation. If we were to search for all items in a database of 65,536 values, a red-black tree algorithm would take .6 seconds, while an unbalanced tree algorithm would take 1 hour.
random input |
count | hash table | unbalanced tree | red-black tree | skip list |
---|---|---|---|---|---|
16 | 4 | 3 | 2 | 5 | |
256 | 3 | 4 | 4 | 9 | |
4,096 | 3 | 7 | 6 | 12 | |
65,536 | 8 | 17 | 16 | 31 | |
ordered input |
16 | 3 | 4 | 2 | 4 |
256 | 3 | 47 | 4 | 7 | |
4,096 | 3 | 1,033 | 6 | 11 | |
65,536 | 7 | 55,019 | 9 | 15 |