Amazon Interview Question
InternsCountry: United States
Interview Type: Phone Interview
public static Set<String> findRepeatCustomers(String fileName){
Set<String> set = new HashSet<String>();
Map<String, Integer> map = new HashMap<String, Integer>();
FileReader fr = null;
BufferedReader br = null;
String line=null, custId=null;
try {
fr = new FileReader(fileName);
br = new BufferedReader(fr);
while((line = br.readLine()) != null && line.length()>0) {
StringTokenizer st = new StringTokenizer(line, ":");
st.nextToken();
custId = st.nextToken();
if (map.containsKey(custId)){
Integer count = map.get(custId) + 1;
map.put(custId, count);
} else
map.put(custId, new Integer(1));
}
for(String cID: map.keySet()) {
if (!map.get(cID).equals(new Integer(1)))
set.add(cID);
}
return set;
} catch (IOException ioex) {
System.out.println(ioex.getMessage());
} finally {
try {
br.close();
fr.close();
} catch(IOException ioex){
System.out.println(ioex.getMessage());
}
}
return set;
}
Your code does not consider the condition "after at least a day. "
without this condition the problem is a pretty trivial hashmap lookup..
To fix your code: Since the file is a *log* file, the entries are going to be sorted by default. We take advantage of this. Maintain one map and a set
For each entry in the log file
if(map.contains(custid))
Timestamp ts = map.get(custId);
if(difference(current rowtimestamp, ts) > 24 hours)
Set.add (custid) // you found your repeat customer here
map.remove(custid) // we don't want to process timestamp difference for this customer again
else{
if(!set.contains(custid))
map.add(custid, timestamp)
}
public void PrintRepeatCustomers()
{
Console.WriteLine("Enter path to log file.");
var path = Console.ReadLine();
if (path == null) return;
var lines = File.ReadAllLines(path);
var listLines = lines.ToList();
//logs will only contain the last visit timestamp of customer
var logs = new Dictionary<long,DateTime>();
foreach (var line in listLines)
{
var terms = line.Split(':');
var customerId = long.Parse(terms[1]);
var timestamp = DateTime.Parse(terms[0]);
if(logs.ContainsKey(customerId) && timestamp - logs[customerId] < TimeSpan.FromDays(1))
{
Console.WriteLine(customerId);
}
logs[customerId] = timestamp;
}
}
its better if we do the above question using BST where each node contain timestamp and customer id . tree is designed using customer id . each time for new entry in log comes,seach will be log(n) only on the basis of customer id. And as node is already present check the time if its more than one day print else
update the time. And if customer id is not present then create a new node for that customer...
Approach:
As mentioned in qus we have to find customer who returns to Amazon site(any page)....
so we should focus on Amazon site customer only.....
all those log entry which describe for non -Amazon site we should ignore.
now there are two case on page ID:
1. page id is kind of numeric, alphanumeric or something we can't just say its Amazon's site page without checking Amazon site page IDs list
2. page id is always containing something common so that we can figure it out its Amazon's site page for eg: pkt.amazon.com/index , pkt.amazon.com/review
-For any of case we will sort out only Amazon pages visited by customer
-Pick a customer Id and calculate date from given timestamp
-Find out next occurrence for that customer id in logs and check for visited date from timestamp
-In case there is no diff in date find for next entry and remove one entry from log list
-In case till the end we dont find the customer visit in >= 1 day remove that customer visit entry
-In case there is diff of at least one day in both log entry data then make a entry of the customer in Result list and remove all the occurrence of that customer from sorted log list so that it wil be easy to do search in log list for other customers.
-Now pick another customer and apply logic as above....
I would take the number of days to be accounted from the first and last lines of the log file. and then sort the lines according to the cust id first and then timestamp if the custid is same. for instance, I keep an ArrayList<Line> and the comparator used will have the compare method as follows
public int compare(Line line1, Line line2){
if(line1.custid.equals(line2.custid){
return line1. timestamp - line2.timestamp;
}
return line1.custid - line2.custid;
}
This can be done in o(nlogn) time, where n is the number of lines.
Now we just have to go through the entire sorted list once and make sure for each customer, the timestamps visited confirms with every day in the range we calculated first.
this will just take o(n) time..
The total time complexity is o(nlogn)
This approach takes the cust_id's and then checks the timestapms
1. Take a cust_id
2. Check for that cust_id in the log file, for every record found compare TimeStamps, if difference is greater than a day, print the cust_id and mark that cust_id as visited(so that we do not count it again,we can use a hash table here)
3. Repeat the process for the unprocessed cust_id's
If cust_id's are integers, we can sort the file based on cust_id's and then compare the timestamps,i.e when they visit and display the result.
I know that this is not the optimal approach but I think it will work....!!
Since you will have to parse the whole logger anyway the average complexity of the program is theta(N) where N is the number of records
here is a simple program in python
- Sandy March 28, 2013