How would you find out a word

Morgan Stanley Interview Question for Technical Support Engineers

0

of 0 votes

13
Answers
How would you find out a word with maximum number of occurences in a big book ..like big big book..
- jainpratik0911 September 11, 2011 in India | Report Duplicate | Flag | PURGE
Morgan Stanley Technical Support Engineer Algorithm

Email me when people comment.

An error occurred in subscribing you.

Country: India
Interview Type: In-Person

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 2 vote

How about a B-Tree which stores the number of occurrences of a word.

h t t p: //en.wikipedia.org/wiki/B-tree

- Ady September 19, 2011 | Flag Reply

Comment hidden because of low score. Click to expand.

of 1 vote

i answered il use a trie data structure to store words with the counts..
the node in structure will have a character name,a pointer and a variable count..
ill traverse through the book character by character...in the trie structure from the root node i keep moving towards the corresponding character....when in the book character is a \t (i.e a space) in the structure the last character nodes counter will be incremented...so in the end what the strucutre will have is for any node's count will give the count of the word which is denoted by characters from root node to that node....
so once structure created traverse the tree and find max count and traverse again from root to get that word...
please post for alternative nd better solutions

- jainpratik0911 September 11, 2011 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

i too had the same solution. Hash table could also b a sol but since the words would b huge and there is infinite range of words so hashtable is not in picture

- anshulzunke September 11, 2011 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

What would be the character on the root? Is your proposing multiple trees or single tree?

- amar September 12, 2011 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

You can do external merge sort on the complete book(all merge logic depends on the RAM available since book can not be loaded completely into memory).

After the sorting is done, You need to read through the sorted structure again and check for the highest occurence word.

- anonymous September 12, 2011 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

what is the complexity? is it better than O(N*M) ?
N number of words in file.
M length of word to be searched.

please explain

- getjar.com/todotasklist my android app September 13, 2011 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

The root wil not have any character..
A single tree.. a node will have have 26 pointers max named by character wich it points ..nd my solution was just an idea nd not on implementation level..if someone can giv a better solution with the coding pls do so..

- Anonymous September 12, 2011 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

merge sort will be n^2..and comparision on each character as well... with trie each character in book will be accessed just once and compared log of n to base 26... isnt that better? correct me if i am wrong..

- Anonymous September 12, 2011 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

This is a tricky question and is expensive to to solve using traditional methods.
One alternative method is to use a large Array. Each element of this Array is a pointer to string.
Now read each string from the book. Take a random no in range 0 to Max_Array_Size. Replace the existing Array[randomNo] with the recently read string.
At the end of reading all words from book, check the array and find the string which is occuring maximum times.

I was asked similar question and argued that the solution will not work. Interviewer explained the following
1. Algorithm most of the times work. I forgot the algorithm name.
2. Array size has to be of certain size according to a formula. I don't remember that
3. The algorithm is very fast and efficient.
4. Can we used in search type applications like google/bing etc where you don't need 100% accurate results.
5. If a word occurs many times in the book, there is higher probability that the word is present at multiple places in the Array.

- gavinashg September 13, 2011 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

bloom filter

- w00ster November 27, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

Since it is a big big book, internal sorting or internal data structure may not work.

My solution is split the whole book into K buckets.
handle bucket one by one.
For firest bucket, take 20% of top frequency words to build a maxheap.
When next bucket comes in, maintain the maxheap.
After read all bucket, the words on the top of the maxheap is most frequent words.

Do not know if it works. I will be very happy if it is corrected.

- Sean.B.Wan March 03, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Another point to this answer, we can use MapReduce way to optimize the execution.
Distribute book to some characters;
Assign different machines/threads to walk through their character, log all words and count by HashMap;
Summarize the result, set a flag into each HashMap entry, keep it the MAX, so we no need to iterate after it.

- Hawk.T.C April 12, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

You can use median of medians algorithm to find the same...

for details visit :
www . ardendertat.com/2011/10/27/programming-interview-questions-10-kth-largest-element-in-array/

- Alok July 24, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

For language words it is better to use Hash map (unordered_map in C++ 11) than a binary tree, as it is possible to create a good hash function which follows language rules (i.e. which vowels follow which consonants and vice versa in English, so the probability of collisions will be small.

- Leo July 31, 2015 | Flag Reply

CareerCup

Morgan Stanley Interview Question for Technical Support Engineers

Books

Videos

Resume Review

Mock Interviews