Amazon Interview Question
SDE1sCountry: India
Interview Type: Phone Interview
Trie is good. Store the count in leaf. In a global variable have the max count & word. If new word's count is more than existing update it.
If N maximum words has to be found, use max-heap to store the word with frequency as determining max-heap property.
simple with complexity=n
package amazon.test.practice;
import java.util.*;
import java.io.*;
public class Wordcount {
public static void main(String args[]) throws FileNotFoundException
{
HashMap<String, Integer> hm = new HashMap<String, Integer>();
Scanner input=new Scanner(new File("c:\\1.txt"));
String max="none";
hm.put(max, 0);
while(input.hasNext())
{
String word = input.next();
//System.out.println(word);
if (hm.containsKey(word))
{
hm.put(word,hm.get(word)+1);
}
else
{
hm.put(word, 1);
}
if (hm.get(word)>hm.get(max))
{
max=word;
}
}
/* Set set = hm.entrySet();
Iterator i=set.iterator();
while(i.hasNext())
{
Map.Entry me = (Map.Entry)i.next();
System.out.print(me.getKey() + ": ");
System.out.println(me.getValue());
}*/
System.out.println(max+"-------"+hm.get(max));
}
}
In my opinion, it depends on the environment you have, how frequent you have to do this, and what the expected time is.
If the requirement is to do this often and if you have a client / server environment with multiple workers, some sort of MapReduce might be an alternative. You have a server that create chunks (splits) of the given file and passes the information of chunks (from, to) to the workers. Also you have workers to do the "map" and others to do the "reduce".
In the Map-Phase the workers create a Mapping of words to occurances for the given chunk. Then they pass it to an intermediate store (e.g. disk)
In the Reduce-Phase workers take the created Map-Results from different workers and combine them to a final result.
Iterate through the file reading one word at a time and write the word in a dictionary.
Case 1: if the word is already present in dictionary - then value++;
case 2 ; if the word is not present in dictionary - add key as word and value as 1
c# code
Dictionary< string , int> dicWordCount = new Dictionary<string , int>();
while(file.Read())
{
string w = file.ReadLine();
if(dicWordCount.containsKey(w))
{
dicWordCont[w] ++; //increment the value by one
}else
{
dicWordCount.Add(w,1);
}
}
//now, iterate through the values and find max of value
int maxOccurance = 0;
int keyWord = "":
foreach(string s in dicWordCont.Values)
{
int count = dicWordCount[s];
if( maxOccurance < count)
{
maxOccurance = count;
keyWord = s;
}
}
If we say huge file, than it can go any whare ... even number of unique elements can also go anywhere (to fit in memory, either using a trie or HashTable). So i think:
- Ajeet May 17, 2014The obvious answer is of course to keep a hash map\trie and store a counter of the occurrence of elements as you move through the file. This is (in terms of time complexity) the optimal solution.
If however, your space requirements are tight you can perform an external sort on the file and then find the longest consecutive run of equal elements. The following should have a constant memory footprint and can be done in Θ(nlogn).