CareerCup

Microsoft Interview Question for Software Engineer / Developers

0

of 0 votes

7
Answers
R3 | Q4. You have a file with million words in it. Find most frequent 10 word in that file. Node that you can store all word in memory.
(Note : Min-Heap + List )
- dutta.dipankar08 May 19, 2014 in India for MS Office Platform | Report Duplicate | Flag | PURGE
Microsoft Software Engineer / Developer Algorithm

Email me when people comment.

Email me when people comment.

Loading...

An error occurred in subscribing you.

Team: MS Office Platform
Country: India
Interview Type: In-Person

More Questions from This Interview

Email me when people comment.

Email me when people comment.

Loading...

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

1

of 1 vote

Lets say we have N words and K = 100.
First, I would use a hash map to count the number of occurrences of each word. A Trie is another option but it consumes a ton of memory.

Then, we have 2 options:
1) Process all <word, frequency> pairs. Use a min-heap to keep the current 10 most frequent words. If the current word has a higher frequency than the minimum of the heap, remove the minimum and add this word. Finally, extract the 10 words from the heap. This will take O(N log K) time and O(N + K) space.

2) Use the median of medians selection algorithm (search wikipedia "Median_of_medians") to find the 10th highest frequency. As a result, we find the 10th highest and the top 9 are the words to the right of the 10th word.
This will take O(N) time and O(N) space.

- Miguel Oliveira May 20, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

0

of 0 votes

Ups, K = 10

- Miguel Oliveira May 20, 2014 | Flag

Comment hidden because of low score. Click to expand.

0

of 0 votes

Wouldn't the min heap option take only O(k) space - rather than O(n + k) - since we're only storing the top 10 items?

- Michael.J.Keating May 20, 2014 | Flag

Comment hidden because of low score. Click to expand.

0

of 0 votes

Sure but we also need O(n) for the hash map

- Miguel Oliveira May 20, 2014 | Flag

Comment hidden because of low score. Click to expand.

1

of 1 vote

Here ya go dude

import java.util.*;
import java.io.*;

public class FrequentWords {
	
	public static void main(String args[]) throws FileNotFoundException {
		FrequentWords mine = new FrequentWords();
		mine.getFrequent("Devil.txt");
	}
	
	
	public void getFrequent(String file) throws FileNotFoundException {
		Scanner input = new Scanner(new FileInputStream(file));
		HashMap<String, Integer> map = new HashMap<String, Integer>();
		while(input.hasNext()) {
			String line = input.next();
			if (map.containsKey(line))
				map.put(line, map.get(line) + 1);
			else 
				map.put(line, 1);
		}
		input.close();
		Comparator<WFile> minComp = new minComparator();
		PriorityQueue<WFile> minHeap = new PriorityQueue<WFile>(minComp);
		for (String key: map.keySet()) {
			WFile word = new WFile(key, map.get(key));
			minHeap.offer(word);
		}
		for(int i = 0; i < 10; i++) {
			WFile word = minHeap.poll();
			System.out.println(word.word + " " + word.amount);
		}
	}
	
	private class WFile {
		String word;
		int amount;
		public WFile(String word, int amount) {
			this.word = word;
			this.amount = amount;
		}
	}
	
	private class minComparator implements Comparator<WFile> {
		@Override
		public int compare(WFile a, WFile b) {
			if (a.amount < b.amount) return 1;
			if (a.amount > b.amount) return -1;
			return 0;
		}
	}
	
}

- Sean Paul September 18, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

0

of 0 vote

Tries - How to get word frequency any fancy implementation ? Besides tries will take a lot of memory thats why they came with a variation called Burst tries.

Did the question poster actually meant a MAX-HEAP with word frequency count being the key value? Can you please explain the min-heap plus list approach ?

- P.D May 19, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

-2

of 2 vote

Why don't we use trie tree. I think easier to implement than the min-heap. And it saves lots of memory.

- ravio May 19, 2014 | Flag Reply

Books

is a comprehensive book on getting a job at a top tech company, while focuses on dev interviews and does this for PMs.

Videos

CareerCup's interview videos give you a real-life look at technical interviews. In these unscripted videos, watch how other candidates handle tough questions and how the interviewer thinks about their performance.

Resume Review

Most engineers make critical mistakes on their resumes -- we can fix your resume with our custom resume review service. And, we use fellow engineers as our resume reviewers, so you can be sure that we "get" what you're saying.

Mock Interviews

Our Mock Interviews will be conducted "in character" just like a real interview, and can focus on whatever topics you want. All our interviewers have worked for Microsoft, Google or Amazon, you know you'll get a true-to-life experience.