Comparison

Table 2 illustrates resource requirements for a hash table implemented using three strategies. The array method represents nodes as elements of an array, the object method represents nodes as objects, while the collection method utilizes the built-in hashing feature of Visual Basic collections.

Table 2: VB Resource Requirements
time(ms)
n method insert find delete kBytes faults
1,000 array 10 10 10 72 17
object 50 20 40 104 26
collection 60 40 40 100 25
5,000 array 40 50 40 228 132
object 301 90 261 744 186
collection 490 220 220 516 129
10,000 array 90 101 90 412 297
object 711 200 822 1,604 401
collection 1,111 471 491 1,044 261
50,000 array 450 541 540 2,252 1,524
object 7,481 1,062 13,279 8,480 2,118
collection 9,394 2,623 2,794 5,420 1,355
100,000 array 912 1,141 1,122 4,504 3,047
object 22,182 2,103 48,570 17,072 4,266
collection 27,830 5,658 5,918 10,896 2,724

Memory requirements and page faults are shown for insertion only. Hash table size for arrays and objects was 1/10th the total count. Tests were run using a 200Mhz Pentium with 64Meg of memory on a Windows NT 4.0 operating system. Statistics for memory use and page faults were obtained from the NT Task Manager. Code may be downloaded that implements the above tests, so you may repeat the experiment.

It is immediately apparent that the array method is fastest, and consumes the least memory. Objects consume four times as much memory as arrays. In fact, overhead for a single object is about 140 bytes. Collections take about twice as much room as arrays.

An interesting anomaly is the high deletion time associated with objects. When we increase the number of nodes from 50,000 to 100,000 (a factor of 2), the time for deletion increases from 13 to 48 seconds (a factor of 4). During deletion, no page faults were noted. Consequently, the extra overhead was compute time, not I/O time. One implementation used at run-time for freeing memory involves maintaining a list, ordered by memory location, of free nodes. When memory is freed, the list is traversed so that memory to be released can be returned to the appropriate place in the list. This is done so that memory chunks may be recombined when adjacent chunks are freed. Unfortunately, this algorithm runs in O(n2) time, where execution time is roughly proportional to the square of the number of items being released.

I encountered a similar problem while working on compilers for Apollo. In this case, however, the problem was exacerbated by page faults that occurred while traversing links. The solution involved an in-core index that reduced the number of links traversed.