Hash Tables

Bin Zhang

March 28, 2021

1. Direct-address tables

To represent a dynamic set in which the keys are drawn from a small universe, we use a direct-address table, in which each slot corresponds to a key in the universe.

2. Hash tables

Hash tables is similar to direct-address tables. With direct addressing, an element with key \(k\) is stored in slot \(k\). With hashing, an element with key \(k\) is stored in slot \(h(k)\), where \(h\) is a hash function.

However, two keys may hash to the same slot. We call this situation a collision. The simplest collision resolution technique is chaining. We place all elements that hash to the same slot into a linked list. Given a hash table with \(m\) slots storing \(n\) elements, a search takes average-case time \(\Theta(1+n/m)\).

3. Hash functions

To map a key \(k\) into one of \(m\) slots, the hash function of division method is

[h(k) = k \mod m,]

and that of multiplication method is

[h(k) = \lfloor m(kA \mod 1) \rfloor,]

where \(0 < A < 1\).

Universal hashing is to choose the hash function randomly in a way that is independent of the keys that are actually going to be stored.

An integer hash table with chaining and division hash function is implemented below.

#include <stdlib.h>
#define HASH_TABLE_SIZE 1024

struct object
{
    int key;
    struct object *prev, *next;
};

struct hash_table
{
    struct object *T[HASH_TABLE_SIZE];
};

struct object *hash_search(struct hash_table *h, int k)
{
    int index = k / HASH_TABLE_SIZE;
    struct object *re = h->T[index];
    while (re != NULL && re->key != k)
        re = re->next;
    return re;
}

void hash_insert(struct hash_table *t, struct object *x)
{
    int index = x->key / HASH_TABLE_SIZE;
    x->next = t->T[index];
    if (t->T[index] != NULL)
        t->T[index]->prev = x;
    x->prev = NULL;
    t->T[index] = x;
    return;
}

void hash_delete(struct hash_table *h, struct object *x)
{
    int index = x->key / HASH_TABLE_SIZE;
    if (x->prev != NULL)
        x->prev->next = x->next;
    else
        h->T[index] = x->next;
    if (x->next != NULL)
        x->next->prev = x->prev;
    return;
}

4. Open addressing

Another collision resolution technique is open addressing. In open addressing, each table entry contains an element, NULL or DELETED. To insert an element, we probe the table until finding an empty slot. The probe sequence is generated by a hash function,

[\{h(k, 0), h(k, 1), \dots, h(k, m-1)\},]

and is required to be a permutation. To search an element, we probe the table until finding the element or a NULL. To delete an element, we mark the corresponding slot with DELETED.

There are three commonly used techniques to compute the probe sequence.

where \(h'(k), h''(k)\) are ordinary hash functions.

Let \(\alpha = n/m\), the expected number of probes in an unsuccessful search is \(1/(1-\alpha)\), and that in a successful search is \(\frac{1}{\alpha} \ln \frac{1}{1-\alpha}\).

5. Perfect hashing

If the set of keys is static, \(O(1)\) memory accesses can be satisfied with perfect hashing technique. The structure is similar to hash tables with chaining. Instead of making a linked list of the keys hashing to the same slot, we use a small secondary hash table. By choosing the hash function carefully, we can guarantee that there are no collisions.

The size of the secondary hash table is the square of the number of keys hashing to this slot. However the total size of hash table is \(O(n)\).