hash table (idea) by WolfKeeper

Hash Tables are used in computer software as a data structure for storing and retrieving data very quickly.

Data to be stored in a hash table require a key. For example a key for a person's data record might be their social security number.

Given the key, the hash table datastructure can find a piece of data in one or at most a few key comparisons (O(1) time independent of the size of the table or the number of entries in the table.)

At its heart a hash table consists of an array of entries. The number of entries in the hash table is usually bigger than the expected number of entries.

Each entry can be a linked list of the data (this is the normal case) or just the data you wish to access (rarer, this is known as an open address hash table).

To insert a data entry in the hash table, the hash function for the key of the data is run to generate an integer called a hash. The hash is then used modulo the length of the table and is used to index into the array. The key and data is then placed in the linked list (or in the case of an open addressed table, inserted in the hash table at that point- or as near as possible).

Similarly, to retrieve the entry the key is used, and hashed to determine the position to search for the entry. Once the entry is found, it is searched until the correct one is found. There's usually only 1 or a very few data records around or at that point in the table, so this is very fast.

One complication with hash functions is known as a collision. A collision occurs when two hashes resolve down to the same array location (which usually happens somewhat unless a perfect hash can be employed). With the linked list concept the key/data pair just gets added to the end of the same list. If open addressing is used, then the key/data pair is just placed in the next free hole in the array.

Because the hash function typically runs quickly and an array lookup is also a fast operation, inserting, deleting and retrieving from a hash table is very fast; the hash function is often the slowest operation performed. However, the comparison can be slow too. Note that issues such as paging and cache limitations can occur on large tables.

In practice, open addressing is rarely employed- the open addressed hash table has a maximum size, whereas linked list hash tables have no limit to their capacity except available memory; also the performance degrades much more slowly with linked lists as more entries are added (open addressed tables basically crawl to a halt as the table fills). Still, open addressed tables have certain advantages, and sometimes the extra space and extra indirection used by linked lists is bothersome.

Even though a hash table is a fairly wonderful data structure, it has been somewhat infrequently used- in most cases modern computers are fast enough to use linear searching for most problems. It's only when the combination of more than a few hundred data entries and extreme speed are called for that it is necessary to employ it. Still, languages like Perl have the data structure built in, and hence its use is in fact increasing.

For more details about hash tables, I recommend you consult The Art of Computer Programming by Donald Knuth.

hash function	The Art of Computer Programming	data structure	hash
linked list	Perfect Hash Functions	array	element
table	CPU	algorithm	Donald Knuth
Advanced C Programming	Perl: Hashes	collision	Loving someone else's wife
bucket	15-bit graphics	Hash bash	hash oil