Yahoo Interview Question Software Engineer / Developers
0of 0 votesGiven a file, return the Top 5 frequently occuring list of words
Our map should be of the form <word, pair<count, pointer>>. This pointer should be an index to a heap structure of size k (5 here). We should maintain a min - heap, every time the count of a word is greater than root of the heap we remove the minimum count seen and replace it with the current value and max-heapify. Cost of max-heapify is lg k. Even if we have to max-heapify after reading every word, the complexity will come out to be O(n lg k) whereas in case of sorting it'd be O (n lg n).
1)read the document and build a tree of words with its count
2)the bst should contain words in lexicographical ordering
3)increment each count of the word if it is fount in the tree otherwise add it in the tree
4)sort this tree.
novel approach but building a tree and searching for the contents of the tree takes O(log n) time.. hashing is preferable? Insertion and searching takes O(1)

~ read the document - O(n)
- son_of_a_thread on August 03, 2011 Edit | Flag Reply~ maintain a hashmap, and hash the strings as keys with initial count of 1 if the current term isnt present else increment the bucket count by 1 - O(1)
~ sort the map by values - O(n log n)