# Data Structures - Hash Table

Hash table is a data structure which stores data in an array format of fixed in size. In hash table, the data is stored in the form of key / value pair. Using of hash function, it computes an index of an array where the data value to be stored. If the table size is referred to as H_TableSize, then the table have a range from 0 to H_TableSize-1 indexes in which each index is unique.

### Hashing

Hashing is an implementation of hash table. It is a technique used to maps each key value into some number in the range from 0 to H_TableSize.It is mainly used to perform insertion, deletion and finding an element in a constant average time **O(1)** but it does not support any ordering such as FindMin and FindMax. There several hash methods are available such as Direct, Subtraction, Modulo division, Folding, Midsquare, Random generation etc. Here we are going to use modulo division as hash function.

Suppose, if the inputs are integer, then the key mod H_TableSize gives the index in which the keys are to be placed.

#### Example

If the H_TableSize is 10 and key is 27, then

27 mod 10 = 7.

So the key 27 is placed in 7^{th} index.

#### Code for Simple Hash Function

int Hash(int key,int TableSize) { return key % TableSize; }

Let us consider the case, if the table size is 10 and all keys end with zero, then this simple hash function is not appropriate. The reason is that, more than two keys are mapped to same index. In order to avoid this situation, we must select the table size as a prime number.

#### Simple Hash Function for String Input

Usually the keys are strings, so that we need to take care for choosing hash function. When the input is a string, then we add the ASCII value of each character in the string and use hash function (value mod TableSize) to get the index in which this string will be inserted.

#### Code For Simple Hash Function for String Input

int Hash(char *key,int TableSize) { int value=0; while(*key!="\0") value = value + *key++; return value % TableSize; }

### Collision

When two keys are hashed to same location, then this situation is called collision. This will be avoided by using two simple methods.

- Open Hashing(Separate Chaining)
- Closed Hashing(Open Addressing)

### Open Hashing alias Separate Chaining

This method maintains a list of all elements that are hashed to same location. It needs a small modification to the hash data structure. Instead of storing the element into the array, hash table uses linked list, they will be stored in the linked lists. For easy use, the lists have header. Now each index of an array points to one of the linked list. Suppose, if we insert an element into the hash table, it is added to the linked list at the corresponding index in the array. Hence the collisions are no longer occurred, because the linked list has no limit on its length. Therefore, if more than one keys hash to the same index, then both keys are stored in that linked list.

#### Open Hashing Example

Given input {437, 325, 175, 199, 171, 189,127,509} and a hash function H(X) = X mod 10, Show the resulting for separate chaining hash table.

Solution:

For first key: 437 H (437) = 437 % 10 = 7 The index for 437 is 7. For key: 325 H (325) = 325 % 10 = 5 The index for 325 is 5. For key: 175 H (175) = 175 % 10 = 5 The index for 175 is 5. For key: 199 H (199) = 199 % 10 = 9 The index for 199 is 9. For key: 171 H (171) = 171 % 10 = 1 The index for 171 is 1. For key: 189 H (189) = 189 % 10 = 9 The index for 189 is 9. For key: 127 H (127) = 127 % 10 = 7 The index for 127 is =7. For key: 509 H (509) = 509 % 10 = 9 The index for 509 is 9.

#### Searching using Hashing

If you want to search a data, first we calculate an index by using hashing and then go to that location in the array and look down the list. There you see if that data is present in the list. The constant time required for this sear is O(1). For worst case, When every elements hashed to same location, in that we would be really doing a straight linear search on the linked list, which means that the time required to search an element is O(n).

#### Disadvantages

- It requires pointer and this may slow down the process of inserting of an element into the table.
- It also needs the extra memory for linked list data structure.

To overcome the disadvantages, an alternative method can be used to avoid the collision.

### Closed Hashing alias Open Addressing

This is another method in which the elements are hashed to the table itself. In open addressing, if two elements hash to the same location, this is known as collision, then alternative location are chosen until an empty location is found. The hash function for open addressing is given that

h_{i}(X) = ( Hash(X) + F(i)) % TableSize

with F(0) = 0, F(1)=1, F(2)=2,..etc

Where Hash(X) = X % TableSize and the function F(i) is called the collision resolution strategy.

Actually the open addressing needs a large table than separate hashing because all data will be stored inside the hash table. This is also called closed hashing. There are three collision resolution techniques:

- Linear Probing
- Quadratic Probing
- Double Hashing

#### Linear Probing

This is the simple method , sequentially tries the new location until an empty location is found in the table.

For example: inserting the keys {79, 28, 39, 68, 89} into closed hash table by using same function and collision resolution technique as mensioned before and the table size is 10 ( for easy undestanding we are not using prime number for table size). The hash function is h_{i}(X) = ( Hash(X) + F(i)) % TableSize **for i = 0, 1, 2, 3,...etc**.

**Solution:**

Key | Hash Function h(X) | Index | Collision | Alt Index |
---|---|---|---|---|

79 | _{0}(79) | 9 | ||

28 | _{0}(28) | 8 | ||

39 | h_{0}(39) = (Hash(39)+F(0))%10 = ((39%10)+0)%10 | 9 | The first collision occurs | |

h_{1}(39) = (Hash(39)+F(1))%10 = ((39%10)+1)%10 | 0 | 0 | ||

68 | h_{0}(68) = (Hash(68)+F(0))%10 = ((68%10)+0)%10 | 8 | The collision occurs | |

h_{1}(68) = (Hash(68)+F(1))%10 = ((68%10)+1)%10 | 9 | Again collision occurs | ||

h_{2}(68) = (Hash(68)+F(2))%10 = ((68%10)+2)%10 | 0 | Again collision occurs | ||

h_{3}(68) = (Hash(68)+F(3))%10 = ((68%10)+3)%10 | 1 | 1 | ||

89 | h_{0}(89) = (Hash(89)+F(0))%10 = ((89%10)+0)%10 | 9 | The collision occurs | |

h_{1}(89) = (Hash(89)+F(1))%10 = ((89%10)+1)%10 | 0 | Again collision occurs | ||

h_{2}(89) = (Hash(89)+F(2))%10 = ((89%10)+2)%10 | 1 | Again collision occurs | ||

h_{3}(89) = (Hash(89)+F(3))%10 = ((89%10)+3)%10 | 2 | 2 |

The problem with linear probing is primary clustering. This is referred that even if the table is empty, any key that hashes to table requires several attempt to resolve the collision because it has to cross over the blocks of occupied cell. These blocks of occupied cell form the primary clustering. If any key falls into clustering, then we cannot predict that how many attempts are needed to resolves the collision. These long paths affect the performance of hash table.

#### Quadratic Probing

Quadratic probing is another open addressing method for resolving collision in the hash table. This method is used to eliminate the primary clustering problem of linear probing. This technique works by considering of original hash index and adding successive value of an arbitrary quadratic polynomial until the empty location is found. In linear probing, we would use **H+0, H+1, H+2, H+3,.....H+K** hash function sequence. Instead of using this sequence, the quadratic probing would use the another sequence is that **H+1 ^{2}, H+2^{2}, H+3^{2},....H+K^{2}**. Therefore, the hash function for quadratic probing is

h

_{i}(X) = ( Hash(X) + F(i)

^{2}) % TableSize

**for i = 0, 1, 2, 3,...etc**.

Let us examine the same example that is given in linear probing:

**Solution:**

Key | Hash Function h(X) | Index | Collision | Alt Index |
---|---|---|---|---|

79 | h_{0}(79) = ( Hash(79) + F(0) ^{2}) % 10= ((79 % 10) + 0) % 10 | 9 | ||

28 | h_{0}(28) = ( Hash(28) + F(0) ^{2}) % 10= ((28 % 10) + 0) % 10 | 8 | ||

39 | h_{0}(39)= ( Hash(39) + F(0) ^{2}) % 10= ((39 % 10) + 0) % 10 | 9 | The first collision occurs | |

h_{1}(39)= ( Hash(39) + F(1) ^{2}) % 10= ((39 % 10) + 1) % 10 | 0 | 0 | ||

68 | h_{0}(68)= ( Hash(68) + F(0) ^{2}) % 10= ((68 % 10) + 0) % 10 | 8 | The collision occurs | |

h_{1}(68)= ( Hash(68) + F(1) ^{2}) % 10= ((68 % 10) + 1) % 10 | 9 | Again collision occurs | ||

h_{2}(68)= ( Hash(68) + F(2) ^{2}) % 10= ((68 % 10) + 4) % 10 | 2 | 2 | ||

89 | h_{0}(89)= ( Hash(89) + F(0) ^{2}) % 10= ((89 % 10) + 0) % 10 | 9 | The collision occurs | |

h_{1}(89)= ( Hash(89) + F(1) ^{2}) % 10= ((89 % 10) + 1) % 10 | 0 | Again collision occurs | ||

h_{2}(89)= ( Hash(89) + F(2) ^{2}) % 10= ((89 % 10) + 4) % 10 | 3 | 3 |

Although, the quadratic probing eliminates the primary clustering, it still has the problem. When two keys hash to the same location, they will probe to the same alternative location. This may cause secondary clustering. In order to avoid this secondary clustering, double hashing method is created where we use extra multiplications and divisions

#### Double Hashing

This is the last collision resolution techniques where we use two hash functions. For double hashing, we have a hash function

_{i}= ( Hash(X) + F(i) ) % TableSize

where F(i)=i.hash_{2}(X).

This F(i) will generate the sequence such as hash_{2}(X), 2hash_{2}(X) and so on. We use second hash function as

_{2}(X) = R - (X mod R)

where R is the prime which is smaller than TableSize.

Let us consider the same example in which we choose R=7.

**Solution:**

Key | Hash Function h(X) | Index | Collision | Alt Index |
---|---|---|---|---|

79 | h_{0}(79)= ( Hash(79) + F(0)) % 10 = ((79 % 10) + 0) % 10 | 9 | ||

28 | h_{0}(28)= ( Hash(28) + F(0)) % 10 = ((28 % 10) + 0) % 10 | 8 | ||

39 | h_{0}(39)= ( Hash(39) + F(0)) % 10 = ((39 % 10) + 0) % 10 | 9 | The first collision occurs | |

h_{1}(39)= ( Hash(39) + F(1)) % 10 = ((39 % 10) + 1(7-(39 % 7))) % 10 = (9 + 3) % 10 =12 % 10 | 2 | 2 | ||

68 | h_{0}(68)= ( Hash(68) + F(0)) % 10 = ((68 % 10) + 0) % 10 | 8 | The collision occurs | |

h_{1}(68)= ( Hash(68) + F(1)) % 10 = ((68 % 10) + 1(7-(68 % 7))) % 10 = (8 + 2) % 10 =10 % 10 | 0 | 0 | ||

89 | h_{0}(89)= ( Hash(89) + F(0)) % 10 = ((89 % 10) + 0) % 10 | 9 | The collision occurs | |

h_{1}(89)= ( Hash(89) + F(1)) % 10 = ((89 % 10) + 1(7-(89 % 7))) % 10 = (9 + 2) % 10 =10 % 10 | 0 | Again collision occurs | ||

h_{2}(89)= ( Hash(89) + F(2)) % 10 = ((89 % 10) + 2(7-(89 % 7))) % 10 = (9 + 4) % 10=13 % 10 | 3 | 3 |