Microsoft Interview Question Software Engineer / Developers

  • microsoft-interview-questions
    0
    of 0 votes
    8
    Answers

    You have stream of words coming, which data structure should be used to store the words efficiently and to answer word queries and maintain top k high frequency words.

    - Khoa on December 15, 2007 Report Duplicate | Flag
    Microsoft Software Engineer / Developer Algorithm



Comment hidden because of low score. Click to expand.
0
of 0 vote

I guess we will have to maintain 2 data structures, one can be a self balancing tree and the other a hash table. The first tree will store the word as they come in sorted order so that the words can be inserted and extracted in O(lgn) time. Along with that whenever a word is inserted its count is incremented. So then in order to select the topmost k high frequency words we have to just select the top k largest counts from the hashtable which can be done using Randomized Quicksort in O(n) time. Instead of a hashtable we can also use a self balancing tree whose nodes will have a count variable and size variable where size will denote the total no of nodes beneath that node including the node itself. And then we find the kth Largest element in using the tree and thus obtains pointers to top k high frequency words.

Anyone with a better Solution????

- gauravk.18 on April 09, 2008 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

I feel a hashtable and a min heap will do a better job.

As the words are coming maintain the hash, and a min heap of size k.
Suppose a new word comes, then add it to hash and if already present increase its count value to say x.
Now suppose the top element in heap is having a count y.
If x > y
then replace y with x and rebuild the heap.
else do nothing

here time complexity after reading n words = O(nlogk)

- algooz on April 16, 2008 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

A PAT-tree or patricia tree would do the job i think

- Anks.. on June 07, 2008 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 votes

yes, I also think PAT-tree is the most appropriate here: due to storage efficiency (one doesn't need to store all common prefixes) in contrast to hash tables as well as performance issues due possible hash conflicts. Additionaly, we need to store word frequencies in leaf nodes to maintain the heap of k most frequent ones.

- 111 on June 14, 2008 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

While I agree that PAT-tree would be the best in storage and probably in the searching of the new string that is coming.

however I do not agree that storing the count in the leaf node is going to help. How would you search for the K most common words? In this way you would have to traverse the whole tree.

- AD on September 29, 2008 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

hw abt a trie or a DWAG which combines common suffixes also

- Anonymous on October 28, 2008 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 vote

I think we have to maintain two data-structures.."Height balanced BST(AVL)" and "Min Heap"
Struct of BST contains extra count variable..which incremented on every occurence of duplicated string...
After insertion in BST insert the same string in min heap.
Create Min heap wrt updated count value associated with each string...
Heap size is "a[K]"
If heap is full delete the first element from heap(minimum count value)...
At the end u have "k" frequently occurring nodes.
Time complexity..
O(nlogn) + O(nlogk) = O(nlogn)

- Shwetank Gupta on November 20, 2008 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

I think by high frequency, it means that the number of times the word is queried the most and not the number of times it is coming. I think to store the words efficiently, it should be stored first in an AVL tree as the words are coming. The node should contain an extra info about the number of time the word is queried. Now upon every query, the number of corresponding node is incremented and then based upon this number the subtree of its parent should be rebalanced such that if the number is greater than the corresponding query number of its parent, this node should be pushed up as far as it satisfies. Hence this way its both kind of working as a BST wrt wrod as well as heap wrt number. This way usually a high frequency number should relatively be high on the level of tree as well since it will be a BST, the worst case search would be O(n).

- Anonymous on November 23, 2008 | Flag Reply


Add a Comment
Name:

Writing Code? Surround your code with {{{ and }}} to preserve whitespace.

Books

is a comprehensive book walking you through every aspect of getting a job at a top tech company, while focuses on software engineering interviews.

Learn More

Videos

CareerCup's interview videos give you a real-life look at technical interviews. In these unscripted videos, watch how other candidates handle tough questions and how the interviewer thinks about their performance.

Learn More

Resume Review

Most engineers make critical mistakes on their resumes -- we can fix your resume with our custom resume review service. And, we use fellow engineers as our resume reviewers, so you can be sure that we "get" what you're saying.

Learn More

Mock Interviews

Our Mock Interviews will be conducted "in character" just like a real interview, and can focus on whatever topics you want. All our interviewers have worked for Microsoft, Google or Amazon, you know you'll get a true-to-life experience.

Learn More