Amazon Interview Question
Software Engineer / Developers@Jack,
There were no range provided. I initially thought of answering the standard unix command and she said you cant use it. Then i asked her if i could use external memory and she said no. Then, not even required, the first solution I told her was to have a hashmap constructed of the smaller of the files and then use hash function to check the entries in both the files by scanning the second file.
Then I mentioned since we dont have external memory this solution wont work.
Then I proposed that first we need to sort both the files independently using the merge sort, because it breaks the files in smaller segments and sorts them and finally merges them. After each merge sort we have two sorted file.
Then I told her, assuming that there are 10000 elements that can be loaded in memory at a single time,I would take 5000 from each file say file 1 and file 2.
The top most element of file 1(say a) is compared to last element(say b i.e. 5000 element) of file 2. If a >b, I dont need to compare any more elements of list b. I take another next 5000 elements from file 2.
Similarly, if first element of file 2 say c is greater than 5000th element of file 1 say d i.e. c>d you dont need to compare more elements. In that case you take next 5000 elements of file 1.
If both the conditions are false, you need to compare each element from file to file 2 in that 5000 numbers and then get next 5000 from each list.
This approach, was just to decrease number of comparisons in some way. There might be more efficient ways, but this is the best I could think of that spur of time.
Can you build the hash on disk, using lseek64? It should allow the file offset to be set beyond the end of the existing data in the file. If data is later written at this point, subsequent reads of data in the gap shall return bytes with the value 0 until data is actually written into the gap.
This on-disk hash would store offsets in the first file. Now all I need is a good scheme for dealing with hashing conflicts...
4.Was there any range of values of visitor id given in question?Can You please explain how did you approach this question.
- jack July 01, 2008