szpak

Hash Table

Store values under keys, which are calculated by hashing the value itself.


When it comes to internal mechanics, a Hash Table often powers a Map or Set. You still choose the logical key, but the table runs that key through a hash function to decide where the entry should live internally.

It is possible that two different keys produce the same bucket position. Such phenomenon is called a collision. In a Map, logical keys are still unique: setting the same key again replaces or updates its value. Collisions are an implementation problem inside the table, not a permission to have duplicate keys.

The place where entries land is called a bucket. Depending on the collision strategy, a bucket may hold one entry, several entries, or a probe sequence may search for another open bucket.

Hash Function

To create an internal bucket position we use a hash function on the key we want to add. Special feature of such function is to provide fixed-length hash for variable-length data. The bigger length of the hash, the lower chance for collision to happen. For the sake of example let’s make it 4 numbers long.

Calculating Hash

The idea for our hash function is dead-simple. It takes a value and calculates its length, a number of characters to be precise. Since hash must have a fixed length we will give it four digits. If number of characters is less than 10, it will be padded with 3 zeroes, for chars number between 10 and 99 with 2 zeroes, and so on.

KeyHashCollides with
hash(“coffee break”)1200-
hash(“tea pot”)7000-
hash(“vegan milk”)1100-
hash(“bitter chocolate”)1600-
hash(“small cup”)9000-

Handling Collisions

So far, so good. No collision took place, no need to worry. But what happens if hash is not unique? Let’s add another string “orange cream”. Oh, dear. Hash which was returned is collided with one generated for “coffee break”.

KeyHashCollides with
hash(“coffee break”)1200-
hash(“orange cream”)1200hash(“coffee break”)

What can we do in this situation? There are generally two ways of handling this.

Separate Chaining

In this first resolution, buckets get form of Linked List which is element corresponding to its hash-based key. If another key lands in the same bucket, the entry will be appended to the underlying list.

Open Addressing

Another solution for collision resolution is to keep this value under different hash, which will be resolved by so-called probing, or search for alternative location. To find this within structure it must be started with colliding key and then properly searched.

Usages

The best thing about Hash Table is that it can usually find an entry by hashing its key. In the average case, lookup, insert, and delete are close to constant time. In the worst case, especially with many collisions or a bad hash function, performance can degrade. Collision handling keeps the structure useful, but it does not make collisions free.