You have yesterday's and today

Amazon Interview Question for Software Engineer / Developers

0

of 0 votes

17
Answers
You have yesterday's and today's log files that contains ids of users who logged in on amazon's website (there could be million entries). Design and implement way to find out user ids who logged in on both days.
- bobbysanders007 July 01, 2012 in United States | Report Duplicate | Flag | PURGE
Amazon Software Engineer / Developer Algorithm Data Structures

Email me when people comment.

An error occurred in subscribing you.

Country: United States
Interview Type: In-Person

More Questions from This Interview

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 4 vote

Maintain a BST of customers. The node will contain customer id & a hash of size 2[2 days are there]. when a new customer logs in, search its existence in BST. If its there update the corresponding hash. If no,insert a new node in BST & update the corresponding hash entry.
Finally traverse the BST & output all customers with both days visited.
Note that i think this approach is dynamic because there is no limit on the number of customers. So, it would be better than hash where initially size is fixed[may be resized later but takes time to copy the enteries].

- Aashish July 01, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 1 vote

I think this question should be solved by using " find "unix command ....Amazon mostly focus (on one or two) these type's of problem for checking one's unix knowledge ....

- saurabh July 04, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

This is wat is expected.

- Nagbhushan July 04, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

simple in unix
comm -1-2 <yesterday's log file> <today's log file>

- varun July 16, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 2 vote

Another solution could be use a hash of n elements where n is the min(entries in yesterday's files, today's files).

Then just iterate through the second log file and check whether present in hash.
Time - O(m+n)
Space - O(n)

- Luv July 01, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

@Luv using tries would be a better solution in terms of space if the file size grows large.

- @hubulu July 02, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

@Luv - and moreover complexity of ur solution is not O(n+m) as u hav'nt considered the overhead of creating hash.

- @hubulu July 02, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

First question that comes to my mind is to know that how is user identified in this case? Is it a) UserId b)IP-address c)IPaddress+SessionId d) anything else?

Based on this information, we need to develop a solution.

Btw, it seems to be a classic set intersection problem on which even most search engines work.

- Learner July 01, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

We might create a list of buckets like trie data structure (The number and type of buckets depends on the 'content of User IDs'). We read one file and fill those buckets. We keep BST maintained under each bucket as said by shondik in above comment,

Now we keep reading the second file and check for an ID in particular bucket.

In this way we can reduce the searching as compared to pure BST!!!

- crazyfundu July 01, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

we shoulld have list of objects as structure...like
Class A
{
int userId;
int webSiteId
Datetime datetime;
}

1. Use List<A>
2. Read two files data and fill this structure.
3. Read this list of objects data in insert into HashTable with UserID as key.
Key: UserID, value: number of times of he visited
-increment value by one if user is visited
4. Read Hashtable by list of users and then check if value is more than equal to "2" and check for those are visited on yesterday n today date in list....
Print those users who r fall into step4.

- Sri July 01, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

reading the day one log files. creating a trie tree with the log files for first day(as it will work better than hashmaps in case of string keys.) and for each user on second day if the search returns true he logged in on both days.

- Ashish July 11, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

This seems to be the optimal solution..
As searching in BST would be at O(logn) during the second file comparison..

- Zander July 11, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

Since there are only two days:
Maintain a Hashset of users from day1. For day 2, add users to the hashset, if there is a collision, the respective user used the site on both days.

- Anonymous July 17, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

1.Let's say there are M, N users logged on the first and second day respectively.
2. Sort the M entries i.e. O(MlogM)
3. Traverse through the second file and for each name that you read do a binary search in the sorted list so for each element that you read the search is going to be LogM and there are N entries so it is NlogM
4. Final complexity is O(MlogM) + O(NlogM)

- Jay August 25, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

One possibility: make a Hash table keyed on the User ID and in each slot save 2 log entries for both days

- Gaurav October 13, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

if sort has to be done. I think it should be external sorting which has very high time and space complexity.
Any idea how external sorting could be done inplace?

- Arulmozhi July 02, 2012 | Flag

CareerCup

Amazon Interview Question for Software Engineer / Developers

Books

Videos

Resume Review

Mock Interviews