Packed Hash
by John Kormylo
E.L.F. Software
Hashing takes a set of somewhat random inputs (usually text
strings) and computes index values. A perfect hash produces
unique, consecutive index values for every input in the set. To
this I add a second criterion, that every input NOT in the set
produce the same index (e.g., 0) so as to identify it as an
invalid input.
Most hashing methods involve chaining, which breaks the input
into keys (characters) and hashes each key separately. The
indexes produced by each step refer to a structure array
containing information for the next step in the chain. In this
case, a perfect hash would mean that there are no unused entries
in this array, and that inputs from the valid set always reach
their target in one try (no probing).
A packed hash uses an array of structures of the form
struct {
unsigned short parent; /* index of previous step */
unsigned short base; /* used to calculate next index */
unsigned short code; /* return value if no more keys */
} table[N];
(Obviously, when N < 32K one can use signed shorts, and when
N >= 64K one has to use longs.) The next index is given by
next = table[index].base - map[key];
where 'map[]' is optional. Structure 'table[0]' is used for the
first step of the chain, and 'table[0].parent' is used to store N
(for index range checking). If there are any unused structures
in the array, the 'parent' entry should be initialized to 0xFFFF
(or any value >=N, or <0 for when using signed indexes).
One can reconstruct the key value for a given table index using
key = unmap[table[table[index].parent].base + index];
where 'unmap[]' is the inverse of 'map[]'. Consequently, there
is no need to store key values in the structure.
What makes a perfect packed hash possible is that most sets of
inputs produce lots of "only children" (nodes with only one valid
key) which fill in the gaps formed in the remaining "families"
(groups of nodes with a common parent). Even so, a certain
amount of "packing" is usually required.
Mapping reduces the size of the gaps to be filled. For example,
mapping text characters in the order "jfgbdckmhleaoiuyrtpsnvwzxq"
works well for English words. The algorithm which produced this
mapping was designed to minimize the space needed to hold all the
"families" end-to-end (no packing), operating over a spell
checker dictionary. Note that the most commonly used letters are
in the center.
However, optimizing a mapping function is time consuming and
rarely needed. It is primarily of benefit when a perfect packed
hash does not exist.