Amazon Interview Question
Software Engineer / DevelopersCountry: United States
Interview Type: In-Person
I think this question should be solved by using " find "unix command ....Amazon mostly focus (on one or two) these type's of problem for checking one's unix knowledge ....
Another solution could be use a hash of n elements where n is the min(entries in yesterday's files, today's files).
Then just iterate through the second log file and check whether present in hash.
Time - O(m+n)
Space - O(n)
@Luv using tries would be a better solution in terms of space if the file size grows large.
First question that comes to my mind is to know that how is user identified in this case? Is it a) UserId b)IP-address c)IPaddress+SessionId d) anything else?
Based on this information, we need to develop a solution.
Btw, it seems to be a classic set intersection problem on which even most search engines work.
We might create a list of buckets like trie data structure (The number and type of buckets depends on the 'content of User IDs'). We read one file and fill those buckets. We keep BST maintained under each bucket as said by shondik in above comment,
Now we keep reading the second file and check for an ID in particular bucket.
In this way we can reduce the searching as compared to pure BST!!!
we shoulld have list of objects as structure...like
Class A
{
int userId;
int webSiteId
Datetime datetime;
}
1. Use List<A>
2. Read two files data and fill this structure.
3. Read this list of objects data in insert into HashTable with UserID as key.
Key: UserID, value: number of times of he visited
-increment value by one if user is visited
4. Read Hashtable by list of users and then check if value is more than equal to "2" and check for those are visited on yesterday n today date in list....
Print those users who r fall into step4.
reading the day one log files. creating a trie tree with the log files for first day(as it will work better than hashmaps in case of string keys.) and for each user on second day if the search returns true he logged in on both days.
1.Let's say there are M, N users logged on the first and second day respectively.
2. Sort the M entries i.e. O(MlogM)
3. Traverse through the second file and for each name that you read do a binary search in the sorted list so for each element that you read the search is going to be LogM and there are N entries so it is NlogM
4. Final complexity is O(MlogM) + O(NlogM)
Maintain a BST of customers. The node will contain customer id & a hash of size 2[2 days are there]. when a new customer logs in, search its existence in BST. If its there update the corresponding hash. If no,insert a new node in BST & update the corresponding hash entry.
- Aashish July 01, 2012Finally traverse the BST & output all customers with both days visited.
Note that i think this approach is dynamic because there is no limit on the number of customers. So, it would be better than hash where initially size is fixed[may be resized later but takes time to copy the enteries].