Comparison

Table 2 illustrates resource requirements for a hash table implemented using three strategies. The array method represents nodes as elements of an array, the object method represents nodes as objects, while the collection method utilizes the built-in hashing feature of Visual Basic collections.

**Table 2: VB Resource Requirements**
n	method	insert	find	delete	kBytes	faults
		time(ms)
1,000	array	10	10	10	72	17
	object	50	20	40	104	26
	collection	60	40	40	100	25
5,000	array	40	50	40	228	132
	object	301	90	261	744	186
	collection	490	220	220	516	129
10,000	array	90	101	90	412	297
	object	711	200	822	1,604	401
	collection	1,111	471	491	1,044	261
50,000	array	450	541	540	2,252	1,524
	object	7,481	1,062	13,279	8,480	2,118
	collection	9,394	2,623	2,794	5,420	1,355
100,000	array	912	1,141	1,122	4,504	3,047
	object	22,182	2,103	48,570	17,072	4,266
	collection	27,830	5,658	5,918	10,896	2,724

Memory requirements and page faults are shown for insertion only. Hash table size for arrays and objects was 1/10th the total count. Tests were run using a 200Mhz Pentium with 64Meg of memory on a Windows NT 4.0 operating system. Statistics for memory use and page faults were obtained from the NT Task Manager. Code may be downloaded that implements the above tests, so you may repeat the experiment.

It is immediately apparent that the array method is fastest, and consumes the least memory. Objects consume four times as much memory as arrays. In fact, overhead for a single object is about 140 bytes. Collections take about twice as much room as arrays.

An interesting anomaly is the high deletion time associated with objects. When we increase the number of nodes from 50,000 to 100,000 (a factor of 2), the time for deletion increases from 13 to 48 seconds (a factor of 4). During deletion, no page faults were noted. Consequently, the extra overhead was compute time, not I/O time. One implementation used at run-time for freeing memory involves maintaining a list, ordered by memory location, of free nodes. When memory is freed, the list is traversed so that memory to be released can be returned to the appropriate place in the list. This is done so that memory chunks may be recombined when adjacent chunks are freed. Unfortunately, this algorithm runs in O(n²) time, where execution time is roughly proportional to the square of the number of items being released.

I encountered a similar problem while working on compilers for Apollo. In this case, however, the problem was exacerbated by page faults that occurred while traversing links. The solution involved an in-core index that reduced the number of links traversed.