What data structure would you

Microsoft Interview Question for Software Engineer / Developers

0

of 0 votes

11
Answers
What data structure would you use to implement spell correction in a document. The goal is to find if a given word typed by the user is in the dictionary or not (no need to correct it).
What is the complexity? What if you have to support multiple languages/dictionaries?
- April 15, 2011 | Report Duplicate | Flag | PURGE
Microsoft Software Engineer / Developer

Email me when people comment.

An error occurred in subscribing you.

More Questions from This Interview

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 0 vote

we will use TRIE data structure. So, a dictionary of word can be represented using a TRIE data structure. In order to reTRIEve the word, we can apply DFS algorithm which will start with the first character of the word and will start at the root of the trie and work toward the leaf of the trie to find if the word is present or not.
I am not very sure if this is what they were expecting

- Miss N April 15, 2011 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Miss N I think you mean "Generalised suffix tree". I would rather not use this since it would have huge memory space requirement. Remember Suffix tree requires O(n) where n is the total size of all all the characters in each words, ex: Generalized suffix tree of String set {ABAB,BABA} requires 15 node approximately O(2N). Imagine creating generalized suffix tree for the whole dictionary???

If space is not an issue I would create hashtable out of the dictionary and query the word at O(1)
If space is an issue then I would create BST out of the dictionary and query the word at O(logN)

- Mat April 16, 2011 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

exactly Mat...

BST for low memory usage
Hashtable for fast access...also words can he hashed to a unique value using base 26 hash function

- amm April 16, 2011 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Erm.. I don't understand how BST will save you space. You still have to store all data which would be O( n). The lookup time is logn not the space

- Anonymous April 17, 2011 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Considering a word dictionary - the database is huge, and not efficient enough to load everything in the memory. Hence, an approach would help where-in we store the key indices to find the words in the dictionary - similar to B trees which are hash based dictionary tree that would give search performance of O(1) for best hash property or worst performance of O(log n) with different block sizes.

Hence, I would suggest using B Trees with hash based index that gives the following

1) Search performance O(1) - worst O(log N) depending upon the hash property and block size selected.

2) Insert / delete for best hash property with zero collisions or at the worst O(log n). In this case insertion and deletions in a word dictionary is rare

3) Memory requirements are O(n) - as only hash indices are stored

- srinivas April 17, 2011 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

@Anonymous
I said BST because a Hashtable needs more than O(n) space in practice. If the load factor is too high then you will have collisions more often so we need more free buckets. I guess in this case since dictionary is constant, we probably won't populate after initial step to build dictionary it's better to use hashtable.

- Mat April 17, 2011 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Keeping in mind that on average long English word is about 10 letters long, trie won't take much space. Also bear in mind that words with common suffix shall share the branch. So I don't think memory overhead is huge. Add to it the capability of trie to provide suggestions for misspelled words, it makes sense.

- Ashish Kaila June 16, 2011 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

you can also use a compressed trie.

- foobar July 01, 2011 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

Optimal Binary search tree's famous application is Dictionary & spelling correctiob

- Algorithmist April 17, 2011 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Specifically - AVL trees

- srinivas April 19, 2011 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

would not a DAWG be an option? As far as space is concerned it is much more optimal than TRIE.

- Anonymous October 17, 2011 | Flag Reply

CareerCup

Microsoft Interview Question for Software Engineer / Developers

Books

Videos

Resume Review

Mock Interviews