The Quest for a Perfect Dictionary Format Sorted Indexes by John Kormylo E.L.F. Software Each word in the dictionary is assigned an index in alphabetical order from 1 to N. This is primarily used to display the words in browse mode. It could also be used for later expansion to match words to other information. The original hash table included a count of the children for each node in order to compute these index values. Specifically, one could search every possible child for a node (in alphabetical order) until the desired letter is reached, and add the totals to the current index. This is not a particularly fast method for hash tables (works great for binary search trees, though). A better method is possible for 2 additional bytes/word. Specifically, for each short hash table (max 64K nodes, or 32K words) one includes an array of node index values (in alphabetical order). Given the node one can easily construct the corresponding word. The count field in the node's data structure is replaced by its alphabetical index. The sign bit is used now to indicate that the "base" field contains a suffix hash index. To determine a valid word termination, look up the node index corresponding to the alphabetical index. If it matches the current node index, then the current node is a valid word termination. It should be noted that even words not in the dictionary are assigned an index corresponding to the number of words ahead of it in alphabetical order. If the word is found in the hash table, but is not a valid termination, one simply uses the index in the current node. If the child searched for does not exist, one then has to search for the next child in alphabetical order and use it's index, shifting to the parent node if none are found. One will still have to search the long hash table to find which short hash table to use. This can be minimized using find_next() and find_prev() functions. Combining Dictionaries The spell checker allows one to combine a varying number of dictionaries at a time, depending on context (e.g. including a dictionary of TeX keywords). While it is possible to map each word to its dictionary and index, an additional 4 bytes/word for each combination of dictionaries is not acceptable. Given a word, one can easily calculate the alphabetical index by adding the index values for each dictionary. But to find the word corresponding to a particular index, one will generally need a binary search. Assuming that the main dictionary is much larger than any of the others, one would perform the search over index values for the main dictionary, until and unless some other dictionary winds up with more words in the range being searched.