Amazon Interview Question
SDE1sCountry: India
Interview Type: In-Person
Create a BST with words from the Files. Each Node in the tree will have below structure
Node{
String word;
String[] fileName;
int count
int[] line
}
For each word increment the count and insert the File name and line# in node structure
Once you are done creating the tree. Remove all Nodes that have count as 1.
Take first one million integer and merge it with next file which will be total 2 million records and now sort it which will be nLog(N) less than 3 million .
Now pick the 1 million records from here and merge with the next file and repeat the process. At last you will have the 1 million sorted records.
Since accessing files for each number is expensive, and since we have enough memory to hold 3 million, maintain arrays for each file in memory.
Fill the arrays with numbers from corresponding files until memory limit is reached (almost).
Now maintain a min-heap of 10 numbers (10 is number of files) and output the smallest number. Move the index in corresponding array from where the smallest number was sourced. Add the next element in this array to the heap. Continue this process until 1 million smallest numbers were found in sorted order.
When any one file array in memory goes empty, refill the array with numbers from corresponding file.
Please correct me if I am wrong.
Steps:
1) You make a file list array with all file names.
2) Process one by one.
3) for each file:
Find a word and store the word and its corresponding line number and file name in a Key Value store.
A Hashmap of type <String, String[]>
String=word in the file.
String[0]= Count of word occurence
String[1]= Line Number
String[2]=Latest File name where the word appears (this is to handle the case where a single word appears in different files)
Each time a word appears increment count and update the line number and file name(if different)
We can use tries here instead of BST
- Brandy May 24, 2014