Amazon has two log files that

Amazon Interview Question for Software Engineer / Developers

1

of 1 vote

7
Answers
Amazon has two log files that have visitor ids not sorted in any order. One is for today and one for yesterday. Find userid common to them. The files have millllllions of entries in both of them and wont fit in memory..either of them.
- min2 June 30, 2008 | Report Duplicate | Flag | PURGE
Amazon Software Engineer / Developer Algorithm

Email me when people comment.

An error occurred in subscribing you.

More Questions from This Interview

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 0 vote

4.Was there any range of values of visitor id given in question?Can You please explain how did you approach this question.

- jack July 01, 2008 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

sry i mean to say question 5

- jack July 01, 2008 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

@Jack,

There were no range provided. I initially thought of answering the standard unix command and she said you cant use it. Then i asked her if i could use external memory and she said no. Then, not even required, the first solution I told her was to have a hashmap constructed of the smaller of the files and then use hash function to check the entries in both the files by scanning the second file.

Then I mentioned since we dont have external memory this solution wont work.

Then I proposed that first we need to sort both the files independently using the merge sort, because it breaks the files in smaller segments and sorts them and finally merges them. After each merge sort we have two sorted file.

Then I told her, assuming that there are 10000 elements that can be loaded in memory at a single time,I would take 5000 from each file say file 1 and file 2.

The top most element of file 1(say a) is compared to last element(say b i.e. 5000 element) of file 2. If a >b, I dont need to compare any more elements of list b. I take another next 5000 elements from file 2.

Similarly, if first element of file 2 say c is greater than 5000th element of file 1 say d i.e. c>d you dont need to compare more elements. In that case you take next 5000 elements of file 1.

If both the conditions are false, you need to compare each element from file to file 2 in that 5000 numbers and then get next 5000 from each list.

This approach, was just to decrease number of comparisons in some way. There might be more efficient ways, but this is the best I could think of that spur of time.

- min2 July 01, 2008 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

For 3), I guess we need to first sort one file using external merge sort. then create an index in the available memory (such as the range etc) which will essentially be our hash. Then we seek to that particular value in the sorted file and look around for the userid.

- nanmaga October 09, 2008 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Do an iteration of number of ascii characters.
In i-th iteration, select id's having the first character as i from each file. Store id's from first file in binary search tree. Compare with binary search tree the id's of second file.
Output those id's which match.
Is this correct?

- Anym October 17, 2008 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Can you build the hash on disk, using lseek64? It should allow the file offset to be set beyond the end of the existing data in the file. If data is later written at this point, subsequent reads of data in the gap shall return bytes with the value 0 until data is actually written into the gap.

This on-disk hash would store offsets in the first file. Now all I need is a good scheme for dealing with hashing conflicts...

- cristi.vlasceanu June 04, 2009 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Can we just use

egrep -f logfile_today logfile_yesterday

- sumansmita June 20, 2009 | Flag Reply

CareerCup

Amazon Interview Question for Software Engineer / Developers

Books

Videos

Resume Review

Mock Interviews