Microsoft Interview Question
Software Engineer / DevelopersMiss N I think you mean "Generalised suffix tree". I would rather not use this since it would have huge memory space requirement. Remember Suffix tree requires O(n) where n is the total size of all all the characters in each words, ex: Generalized suffix tree of String set {ABAB,BABA} requires 15 node approximately O(2N). Imagine creating generalized suffix tree for the whole dictionary???
If space is not an issue I would create hashtable out of the dictionary and query the word at O(1)
If space is an issue then I would create BST out of the dictionary and query the word at O(logN)
exactly Mat...
BST for low memory usage
Hashtable for fast access...also words can he hashed to a unique value using base 26 hash function
Erm.. I don't understand how BST will save you space. You still have to store all data which would be O( n). The lookup time is logn not the space
Considering a word dictionary - the database is huge, and not efficient enough to load everything in the memory. Hence, an approach would help where-in we store the key indices to find the words in the dictionary - similar to B trees which are hash based dictionary tree that would give search performance of O(1) for best hash property or worst performance of O(log n) with different block sizes.
Hence, I would suggest using B Trees with hash based index that gives the following
1) Search performance O(1) - worst O(log N) depending upon the hash property and block size selected.
2) Insert / delete for best hash property with zero collisions or at the worst O(log n). In this case insertion and deletions in a word dictionary is rare
3) Memory requirements are O(n) - as only hash indices are stored
@Anonymous
I said BST because a Hashtable needs more than O(n) space in practice. If the load factor is too high then you will have collisions more often so we need more free buckets. I guess in this case since dictionary is constant, we probably won't populate after initial step to build dictionary it's better to use hashtable.
Keeping in mind that on average long English word is about 10 letters long, trie won't take much space. Also bear in mind that words with common suffix shall share the branch. So I don't think memory overhead is huge. Add to it the capability of trie to provide suggestions for misspelled words, it makes sense.
we will use TRIE data structure. So, a dictionary of word can be represented using a TRIE data structure. In order to reTRIEve the word, we can apply DFS algorithm which will start with the first character of the word and will start at the root of the trie and work toward the leaf of the trie to find if the word is present or not.
- Miss N April 15, 2011I am not very sure if this is what they were expecting