Hash Functions
In the previous example, we determined a hash value by examining the remainder after division. In this section we'll examine several algorithms that compute a hash value.
Division Method (TableSize = Prime)
A hash value, from 0 to (HashTableSize
- 1), is computed by dividing the key
value by the size of the hash table and taking the remainder:
Public Function Hash(ByVal Key As Long) As Long Hash = Key Mod HashTableSize End Function
Selecting an appropriate HashTableSize
is important to the success of this method.
For example, a HashTableSize
divisible by two would yield even hash values for
even keys, and odd hash values for odd keys. This is an undesirable property, as all keys would
hash to even values if they happened to be even. If HashTableSize
is a power of
two, then the hash function simply selects a subset of the key bits as the table index. To
obtain a more random scattering, HashTableSize
should be a prime number not too
close to a power of two.
Multiplication Method (TableSize = 2n)
The multiplication method may be used for a HashTableSize
that is a power of
2. The Key is multiplied by a constant, and then the necessary bits are extracted to index
into the table. One method uses the fractional part of the product of the key and the golden
ratio, or (sqrt(5) - 1)/2. For example, assuming a word size of 8 bits, the golden ratio
is multiplied by 28 to obtain 158. The product of the 8-bit key and 158 results in a 16-bit
integer. For a table size of 25, the 5 most significant bits of the least significant word
are extracted for the hash value. The following definitions may be used for the multiplication
method:
' 8-bit index Private Const K As Long = 158 ' 16-bit index Private Const K As Long = 40503 ' 32-bit index Private Const K As Long = 2654435769 ' bitwidth(index)=w, size of table=2^m Private Const S As Long = 2^(w - m) Private Const N As Long = 2^m - 1 Hash = ((K * Key) And N) \ S
For example, if HashTableSize
is 1024 (210), then a 16-bit index
is sufficient and S
would be assigned a value of 2(16 - 10) = 64. Constant N
would be 210 - 1, or 1023. Thus, we have:
Private Const K As Long = 40503 Private Const S As Long = 64 Private Const N As Long = 1023 Public Function Hash(ByVal Key As Long) As Long Hash = ((K * Key) And N) \ S End Function
Variable String Addition Method (TableSize = 256)
To hash a variable-length string, each character is added, modulo 256, to a total. A hash value, range 0–255, is computed.
Public Function Hash(ByVal S As String) As Long Dim h As Byte Dim i As Long h = 0 For i = 1 to Len(S) h = h + Asc(Mid(S, i, 1)) Next i Hash = h End Function
Variable String Exclusive-Or Method (TableSize = 256)
This method is similar to the addition method, but successfully distinguishes similar words and anagrams. To obtain a hash value in the range 0–255, all bytes in the string are exclusive-or'd together. However, in the process of doing each exclusive-or, a random component is introduced.
Private Rand8(0 To 255) As Byte Public Function Hash(ByVal S As String) As Long Dim h As Byte Dim i As Long h = 0 For i = 1 To Len(S) h = Rand8(h Xor Asc(Mid(S, i, 1))) Next i Hash = h End Function
Rand8
is a table of 256 8-bit unique random numbers. The exact ordering is not
critical. The exclusive-or method has its basis in cryptography, and is quite effective.
Variable String Exclusive-Or Method (TableSize <= 65536)
If we hash the string twice, we may derive a hash value for an arbitrary table size up to 65536. The second time the string is hashed, one is added to the first character. Then the two 8-bit hash values are concatenated together to form a 16-bit hash value:
Private Rand8(0 To 255) As Byte Public Function Hash(ByVal S As String) As Long Dim h1 As Byte Dim h2 As Byte Dim c As Byte Dim i As Long if Len(S) = 0 Then Hash = 0 Exit Function End If h1 = Asc(Mid(S, 1, 1)) h2 = h1 + 1 For i = 2 To Len(S) c = Asc(Mid(S, i, 1)) h1 = Rand8(h1 Xor c) h2 = Rand8(h2 Xor c) Next i ' Hash is in range 0 .. 65535 Hash = (h1 * 256) + h2 ' scale Hash to table size Hash = Hash Mod HashTableSize End Function
Hashing strings is computationally expensive, as we manipulate each byte in the string. A more efficient technique utilizes a DLL written in C to perform the hash function. Included in the download is a test program that hashes strings using C and Visual Basic. The C version is typically 20 times faster.