The Quest for a Perfect Dictionary Format
                         Sorted Indexes
                         by John Kormylo
                         E.L.F. Software

Each word in the dictionary is assigned an index in  alphabetical 
order from 1 to N.   This is primarily used to display the  words 
in  browse mode.   It could also be used for later expansion  to 
match words to other information.

The original hash table included a count of the children for each 
node in order to compute these index values.   Specifically,  one 
could  search  every possible child for a node  (in  alphabetical 
order) until the desired letter is reached, and add the totals to 
the current index.

This  is  not a particularly fast method for hash  tables  (works 
great  for  binary search trees,  though).   A better  method  is 
possible  for 2 additional bytes/word.   Specifically,  for  each 
short  hash table (max 64K nodes,  or 32K words) one includes  an 
array  of node index values (in alphabetical order).   Given  the 
node one can easily construct the corresponding word.

The  count field in the node's data structure is replaced by  its 
alphabetical  index.   The sign bit is used now to indicate  that 
the  "base" field contains a suffix hash index.   To determine  a 
valid word termination,  look up the node index corresponding  to 
the  alphabetical index.   If it matches the current node  index, 
then the current node is a valid word termination.

It  should  be noted that even words not in  the  dictionary  are 
assigned  an index corresponding to the number of words ahead  of 
it  in  alphabetical order.   If the word is found  in  the  hash 
table,  but is not a valid termination, one simply uses the index 
in the current node.   If the child searched for does not  exist, 
one  then has to search for the next child in alphabetical  order 
and  use  it's index,  shifting to the parent node  if  none  are 
found.

One  will still have to search the long hash table to find  which 
short hash table to use.  This can be minimized using find_next() 
and find_prev() functions.

                     Combining Dictionaries

The  spell  checker  allows one to combine a  varying  number  of 
dictionaries at a time,  depending on context (e.g.  including  a 
dictionary  of TeX keywords).   While it is possible to map  each 
word to its dictionary and index,  an additional 4 bytes/word for 
each combination of dictionaries is not acceptable.

Given a word,  one can easily calculate the alphabetical index by 
adding  the index values for each dictionary.   But to  find  the 
word corresponding to a particular index, one will generally need 
a  binary  search.   Assuming that the main  dictionary  is  much 
larger than any of the others,  one would perform the search over 
index values for the main dictionary, until and unless some other 
dictionary winds up with more words in the range being searched.