Amazon Interview Question for Software Engineer / Developers


Team: Kindle
Country: United States
Interview Type: In-Person




Comment hidden because of low score. Click to expand.
10
of 10 vote

Actuslly the interviewer was right that it is not optimal.
The reason is that you have to hash a word (first pass) then after lookup compare it with the already existing one (second pass).

The optimal solution is to use a tire and add the words to it. When you reach the end leaf, increase the counter and compare it to the current top 3 (to the last, least popular item). If it is already in top 3, the just increase the arrays value and check with the next item. If its larger, swap them.

The 3 topmost array can be substituted with a linked list which always drops the last item but that requires a lot of garbage collection so a fixed array is favorable.

Generally, hash tables are really good but keep in mind that their average collisioon rate with strings are high and they require resizing all the time.

On the other hand, tries are static, no need to resize them.

- Adam January 31, 2012 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 votes

Two pass not required, if we maintain current top 3 as we go through the book hashing words. Also inside the hash map every word is also equipped with its word count.

- Dr.Sai February 01, 2012 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

.. where by 'tire' you mean 'trie', right? Also, you don't have to have any particular structure to keep the top 3, they could be just separate variables. But, probably an array of 3 is more convenient.

- JeffD February 03, 2012 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

Min heap of size 3 should help to track top 3.

- PK February 05, 2012 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

Min heap of size 3 should help to track top 3.

- PK February 05, 2012 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 vote

Not optimal in which sense?

Theorem:
Every correct algorithm that solves this problem is Omega(N).

Proof:
Assume not. That is, assume there exists an algorithm O* such that, for every input I, O* finds the top 3 frequently used words in a text file in a time asymptotically faster than O(N).

Then, we can conclude that for every input I, O* does not check every single word in the file; that is, it is capable of deciding the top 3 property without looking at every single word in the input. At the very least, there exists one word which hasn't been considered; thus, it is easy to build an instance K where each word is identically distributed, and that single word that hasn't been checked is the tie-breaker, essentially causing the algorithm to produce incorrect results without considering such a word, contradicting our assumption that O* is correct.

What was the running time of your algorithm?
And what about map-reduce? Wouldn't this be a good fit?

- I.E January 31, 2012 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 votes

can u explain it clearly??

- anon February 18, 2012 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

You are correct I.E in the sense that you are only considering time complexity. The hashing technique would give exponential, O(26^N), space complexity which is clearly unacceptable. In fact a two-trie is better than even a trie for this case as it has the time complexity of a trie but space complexity better than a trie's.

- Saksham March 21, 2012 | Flag
Comment hidden because of low score. Click to expand.
0
of 2 vote

- Use unordered hash map + min heap

- Use ordered hash map (like STL map which uses balanced trees)

- Rayden January 31, 2012 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

- Use unordered hash map + min heap

- Use ordered hash map (like STL map which uses balanced trees)

- Rayden January 31, 2012 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

Use a max heap

- Anonymous January 31, 2012 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

Use a max heap

- Anonymous January 31, 2012 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

One possible solution:
Sort the file and which is O(N*log(N)) and now with a single pass we can find out the top 3 words

- Anonymous January 31, 2012 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

-use list of pairs<string,int>, int argument contains the number of particular word.Now sort using the second argument.

- tarun January 31, 2012 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

-use list of pairs<string,int>, int argument contains the number of particular word.Now sort using the second argument.

- tarun January 31, 2012 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 votes

It doesn't matter if you do one or two passes over the data; they're both constants, so the interviewer is NOT correct.

Both O(2N) and O(N) are O(N).

The solution the OP provided is asymptotically as good as the so called 'optimal' solution by the interviewer.

- I.E January 31, 2012 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 vote

Actuslly the interviewer was right that it is not optimal.
The reason is that you have to hash a word (first pass) then after lookup compare it with the already existing one (second pass).

The optimal solution is to use a tire and add the words to it. When you reach the end leaf, increase the counter and compare it to the current top 3 (to the last, least popular item). If it is already in top 3, the just increase the arrays value and check with the next item. If its larger, swap them.

The 3 topmost array can be substituted with a linked list which always drops the last item but that requires a lot of garbage collection so a fixed array is favorable.

Generally, hash tables are really good but keep in mind that their average collisioon rate with strings are high and they require resizing all the time.

On the other hand, tries are static, no need to resize them.

- Adam January 31, 2012 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 votes

Sorry, iPad autocorrection did not help :)

- Adam January 31, 2012 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

Actually the interviewer is not right.

It doesn't matter if you do one or two passes over the data; they're both constants, so the interviewer is NOT correct.

Both O(2N) and O(N) are O(N).

The solution the OP provided is asymptotically as good as the so called 'optimal' solution by the interviewer.

- I.E January 31, 2012 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

Errrr... There is time complexity and space/memory complexity. So you are right that there are quite a few O(N) algorithms (time!) but we should consider memory usage as well.

Without knowing the overall number/distinct number/distribution/etc of the words it is hard to tell more.

The funny side of it is that most probably the answer is always "a", "the", "in" for books written in English (or something like that) :-) and if I am not mistaken very similar list-of-3-most-common-words can be compiled for all languages.

- Selmeczy, Péter January 31, 2012 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

You are right absolutely right.
That is why I asked the OP optimal in which sense.

I personally admit I would have answered in a similar matter. I am certainly not an expert, but I think as the file size increases, a map-reduce approach could be a sweet solution; this seems tailored for some kind of parallel processing.

- I.E January 31, 2012 | Flag
Comment hidden because of low score. Click to expand.
0
of 2 vote

For god's sake stop saying "max" heap. How are you planning to find out which node to remove when something bigger than the root comes? You are going to traverse the whole heap to find the minimum?

- Rayden January 31, 2012 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 votes

What?
I think they mean
=> Get word counts, given a dictionary.
=> Max Heapify
=> Delete 3.

- kill February 01, 2012 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 vote

Hello,
In such question is code not required coz no one mentioned a perfect code till now?? DO they just ask for the idea and the algo.

- monty March 27, 2012 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

Hello,
In such question is code not required coz no one mentioned a perfect code till now?? DO they just ask for the idea and the algo.

- monty March 27, 2012 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

Hello,
In such question is code not required coz no one mentioned a perfect code till now?? DO they just ask for the idea and the algo.

- monty March 27, 2012 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

hash up the text and the apply selection sort on it for kth largest value

- priyank November 07, 2012 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

Here is the code i written using prefix tree. This program will tell you first n frequent used words in a file .. I have not read from file for simplicity i just read from an array . If you want to read from file ..read file line by line.. convert each line into words of array and trying adding each word using this program.

package example.concepts;

import java.util.Arrays;
import java.util.HashMap;
import java.util.Map;
import java.util.Set;

public class TrieTreeExample {

	public static void main(String[] args) {
		
		int n= 3 // No. of most occured word 
		TriTree tree = new TriTree(n);
		String[] words = {"prashant","pras","pras","pras","prashant","pras","abc","test","test","prashant","abc","hhh","bbb","bbb","pras","bbb","prashant","prashant","ab","ab","ab","pras"};
        for (int i = 0; i < words.length; i++)
        	tree.insertWord(words[i]);
       
     //   tree.printTree(0, new Character[50], tree.root);
      tree.printTree(0, new StringBuffer(), tree.root);
       
        System.out.println("Is exist :" + tree.findWord("pras")); 
        System.out.println("Max occured word is : " + tree.maxWord1 + " Count is :" + tree.maxWordCount1);
        System.out.println("Max occured word is : " + tree.maxWord2 + " Count is :" + tree.maxWordCount2);
        System.out.println("Max occured word is : " + tree.maxWord3 + " Count is :" + tree.maxWordCount3);
        System.out.println("Max Word :" + Arrays.asList(tree.maxWord));
        System.out.print("Max Count : [ ");
        for(int i =0; i < tree.maxCount.length;i++)
        {
        	System.out.print(tree.maxCount[i] + " ");
        }	
         System.out.println(" ]"); 
        System.out.println(tree.maxMap);
	}

}

class TriTree
{
	TriNode root;
	int maxCount[];
	String maxWord[];

	TriTree(int n)
	{
		root= new TriNode();
		maxCount = new int[n];
		maxWord = new String[n];
	}
	
	void insertWord(String word)
	{
		if (root==null)
		{
			System.out.println("Tree does not exist...");
		}	
		boolean isExist=true;
		TriNode current= root;
		Character character;
		for(int i=0; i < word.length();i++)
		{
			character = word.charAt(i);
			if (current.links.get(character) == null)
			{
				current.links.put(character, new TriNode(character, i==(word.length()-1)?true:false,1));
				isExist=false;
			}		
			current = current.links.get(character);
		}	
		if(!current.fullWord)
		{
			isExist=false;
			current.fullWord=true;
		}	
		else if (isExist && current.fullWord)
		{
			current.countOfChar++;
		}	
		
		findMaxWord(current, word);
		
	}
	
	void findMaxWord(TriNode current, String word)
	{
		
		for (int i=0; i < maxCount.length;i++)
		{
			
			if (maxCount[i] < current.countOfChar )
			{
				if(!word.equalsIgnoreCase(maxWord[i]) && i < maxCount.length-1)
				{
					maxCount[i+1]= maxCount[i];
					maxWord[i+1] = maxWord[i];
				}
				maxCount[i]= current.countOfChar;
				maxWord[i]=word;
				break;
			}	
				
		}
		
		
				
	}
	boolean findWord(String word)
	{
		if (root==null)
		{
			System.out.println("Tree does not exist...");
		}	
		TriNode current= root;
		Character character;
		int i;
		for(i=0; i < word.length();i++)
		{
			character = word.charAt(i);
			if (current.links.get(character) == null)
			{
				return 	false;
			}		
			current = current.links.get(character);
		}
		
		if (i == word.length() && current==null)
		{
			return false;
		}	
		else if (current!=null && !current.fullWord)
		{
			return false;
		}	
		else
		{
			System.out.println("count of word " + current.value + " is :" + current.countOfChar); 
		}
			
		return true;
	}
	void traverseTree()
	{
		
	}
	void printTree(int level, Character branch[], TriNode root)
	{
		 
		if (root==null)
		{
			return;
		}
		Set<Character> setKeys = root.links.keySet();
		for (Character key: setKeys)
		{
			branch[level]= root.value;
			printTree(level+1, branch, root.links.get(key));
		}
		if (root.fullWord)
		{
			branch[level]=root.value;
			for (int i=1; i <=level;i++)
			{
				System.out.print(branch[i]);
			}	
			System.out.println();
		}	
		
	}
	void printTree(int level, StringBuffer buf, TriNode root)
	{
		 
		if (root==null)
		{
			return;
		}
		Set<Character> setKeys = root.links.keySet();
		for (Character key: setKeys)
		{
			buf.insert(level, root.value);
			printTree(level+1, buf, root.links.get(key));
		}
		if (root.fullWord)
		{
			buf.insert(level, root.value);
			System.out.print("Word :");
			for (int i=1; i <=level;i++)
			{
				System.out.print(buf.charAt(i)); 
			}	
			System.out.print("  Count : " + root.countOfChar); 
			System.out.println();
		}	
		
	}

}
class TriNode
{
	Character value;
	boolean fullWord;
	Map<Character,TriNode> links;
	int countOfChar;
	TriNode()
	{
		value ='\0';
		fullWord=false;
		countOfChar=0;
		links = new HashMap<Character,TriNode>();
	}
	TriNode(Character value, boolean isFullWord, int countOfChar)
	{
		this.value = value;
		fullWord=isFullWord;
		this.countOfChar=countOfChar;
		links = new HashMap<Character,TriNode>();
	}
	
	
}

- Prashant April 12, 2013 | Flag Reply
Comment hidden because of low score. Click to expand.
-1
of 1 vote

Use a max heap

- Anonymous January 31, 2012 | Flag Reply
Comment hidden because of low score. Click to expand.
-1
of 1 vote

One possible solution:
Sort the file and which is O(N*log(N)) and now with a single pass we can find out the top 3 words

- Anonymous January 31, 2012 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 votes

correct dude. The best and optimal solution.

- Amit September 27, 2012 | Flag


Add a Comment
Name:

Writing Code? Surround your code with {{{ and }}} to preserve whitespace.

Books

is a comprehensive book on getting a job at a top tech company, while focuses on dev interviews and does this for PMs.

Learn More

Videos

CareerCup's interview videos give you a real-life look at technical interviews. In these unscripted videos, watch how other candidates handle tough questions and how the interviewer thinks about their performance.

Learn More

Resume Review

Most engineers make critical mistakes on their resumes -- we can fix your resume with our custom resume review service. And, we use fellow engineers as our resume reviewers, so you can be sure that we "get" what you're saying.

Learn More

Mock Interviews

Our Mock Interviews will be conducted "in character" just like a real interview, and can focus on whatever topics you want. All our interviewers have worked for Microsoft, Google or Amazon, you know you'll get a true-to-life experience.

Learn More