Hash Table
Store values under keys, which are calculated by hashing the value itself.
When it comes to internal mechanics, a Hash Table often powers a Map or Set. You still choose the logical key, but the table runs that key through a hash function to decide where the entry should live internally.
It is possible that two different keys produce the same bucket position. Such phenomenon is called a collision. In a Map, logical keys are still unique: setting the same key again replaces or updates its value. Collisions are an implementation problem inside the table, not a permission to have duplicate keys.
The place where entries land is called a bucket. Depending on the collision strategy, a bucket may hold one entry, several entries, or a probe sequence may search for another open bucket.
Hash Function
To create an internal bucket position we use a hash function on the key we want to add. Special feature of such function is to provide fixed-length hash for variable-length data. The bigger length of the hash, the lower chance for collision to happen. For the sake of example let’s make it 4 numbers long.
Calculating Hash
The idea for our hash function is dead-simple. It takes a value and calculates its length, a number of characters to be precise. Since hash must have a fixed length we will give it four digits. If number of characters is less than 10, it will be padded with 3 zeroes, for chars number between 10 and 99 with 2 zeroes, and so on.
| Key | Hash | Collides with |
|---|---|---|
| hash(“coffee break”) | 1200 | - |
| hash(“tea pot”) | 7000 | - |
| hash(“vegan milk”) | 1100 | - |
| hash(“bitter chocolate”) | 1600 | - |
| hash(“small cup”) | 9000 | - |
Handling Collisions
So far, so good. No collision took place, no need to worry. But what happens if hash is not unique? Let’s add another string “orange cream”. Oh, dear. Hash which was returned is collided with one generated for “coffee break”.
| Key | Hash | Collides with |
|---|---|---|
| hash(“coffee break”) | 1200 | - |
| … | … | … |
| hash(“orange cream”) | 1200 | hash(“coffee break”) |
What can we do in this situation? There are generally two ways of handling this.
Separate Chaining
In this first resolution, buckets get form of Linked List which is element corresponding to its hash-based key. If another key lands in the same bucket, the entry will be appended to the underlying list.
Open Addressing
Another solution for collision resolution is to keep this value under different hash, which will be resolved by so-called probing, or search for alternative location. To find this within structure it must be started with colliding key and then properly searched.
Usages
The best thing about Hash Table is that it can usually find an entry by hashing its key. In the average case, lookup, insert, and delete are close to constant time. In the worst case, especially with many collisions or a bad hash function, performance can degrade. Collision handling keeps the structure useful, but it does not make collisions free.