Comparison
Table 2 illustrates resource requirements for a hash table implemented using three strategies. The array method represents nodes as elements of an array, the object method represents nodes as objects, while the collection method utilizes the built-in hashing feature of Visual Basic collections.
time(ms) | ||||||
n | method | insert | find | delete | kBytes | faults |
---|---|---|---|---|---|---|
1,000 | array | 10 | 10 | 10 | 72 | 17 |
object | 50 | 20 | 40 | 104 | 26 | |
collection | 60 | 40 | 40 | 100 | 25 | |
5,000 | array | 40 | 50 | 40 | 228 | 132 |
object | 301 | 90 | 261 | 744 | 186 | |
collection | 490 | 220 | 220 | 516 | 129 | |
10,000 | array | 90 | 101 | 90 | 412 | 297 |
object | 711 | 200 | 822 | 1,604 | 401 | |
collection | 1,111 | 471 | 491 | 1,044 | 261 | |
50,000 | array | 450 | 541 | 540 | 2,252 | 1,524 |
object | 7,481 | 1,062 | 13,279 | 8,480 | 2,118 | |
collection | 9,394 | 2,623 | 2,794 | 5,420 | 1,355 | |
100,000 | array | 912 | 1,141 | 1,122 | 4,504 | 3,047 |
object | 22,182 | 2,103 | 48,570 | 17,072 | 4,266 | |
collection | 27,830 | 5,658 | 5,918 | 10,896 | 2,724 |
Memory requirements and page faults are shown for insertion only. Hash table size for arrays and objects was 1/10th the total count. Tests were run using a 200Mhz Pentium with 64Meg of memory on a Windows NT 4.0 operating system. Statistics for memory use and page faults were obtained from the NT Task Manager. Code may be downloaded that implements the above tests, so you may repeat the experiment.
It is immediately apparent that the array method is fastest, and consumes the least memory. Objects consume four times as much memory as arrays. In fact, overhead for a single object is about 140 bytes. Collections take about twice as much room as arrays.
An interesting anomaly is the high deletion time associated with objects. When we increase the number of nodes from 50,000 to 100,000 (a factor of 2), the time for deletion increases from 13 to 48 seconds (a factor of 4). During deletion, no page faults were noted. Consequently, the extra overhead was compute time, not I/O time. One implementation used at run-time for freeing memory involves maintaining a list, ordered by memory location, of free nodes. When memory is freed, the list is traversed so that memory to be released can be returned to the appropriate place in the list. This is done so that memory chunks may be recombined when adjacent chunks are freed. Unfortunately, this algorithm runs in O(n2) time, where execution time is roughly proportional to the square of the number of items being released.
I encountered a similar problem while working on compilers for Apollo. In this case, however, the problem was exacerbated by page faults that occurred while traversing links. The solution involved an in-core index that reduced the number of links traversed.