Amazon Interview Question
Software Engineer / DevelopersCountry: United States
Interview Type: In-Person
My solution is to keep a heap and a list, the head will keep the elements and should have a "update" operation. the list keeps all the elements arrived in past 1 hour. when a new element arrived, we push it to the back of the list, and check fro the front of the list to delete out-of-time elements; while deleting these elements we need to update the "request number count" of that element in the heap, and delete the element if the count goes to 0 (this prevents the heap becoming too large). when we do the top 10 query just extract heap for 10 times.
Past 1 hour = Current time to cur_min-60 mins.
We need 2 data structures.
1. For each minute maintain map of - {product_id, current_min_frequency}
2. Heap(priority queue) of all product_ids in the last hour, priority by current_hour_frequency
How to maintain:-
As each new request/product_id is read in,
1. Add to current min - prod_id, freq in current min [if already present, increment current_min_freq]
2. Increment current_hour_frequency in priority queue, and re-heapify
After each min,
1. remove the pair for (curre min - 60 mins)
2. decrement current_hour_freq by cur_min_frequency of that oldest mint in Priority Queue, and re-heapify. If cur_hour_freq reaches 0, delete that product_id's node from min_heap.
When queried, get the top 10 elements from heap.
Itcecsa, the question is not clear. It can have many meanings. Did they ask to keep a running list of top 10 items purchased in last one hour with new purchases constantly being received through stream? Or is the list already stored in a file or database and you are asked to write a query to list top 10 items. Please add more details.
This was blazing fast on my i7 laptop with 6 GB RAM and 64 bt OS.. so just have an int matrix of size 60 x 1_000_000.. use some global variable to denote what index is currently used e.g. 35 if its 35th minute.
all function e.g. sum, top10 take less then 1 second..
total memory used was less than 1 GB.. i checked using
Runtime.getRuntime().totalMemory()
in reality process method will also be passed index of min for which largest data is provided. right now i am just using it for all 60 min just to check speed which turned out as around sec.. so actually it will be 60 times faster.
public class Top10 {
public static void main(String[] args) {
int[][] store = new int[60][1_000_000];
int[] lastMinuteData = new int[1_000_000];
// Main algo
process(store, lastMinuteData);
int[] sum = sum(store);
int[] top10 = top10(sum);
}
public static void process(int[][] store, int lastMinuteData[]) {
for (int i = 0; i < 60; i++) { // for each minute
for (int j = 0; j < 1_000_000; j++) { // for each record
store[i][j] = lastMinuteData[j];
}
}
}
public static int[] sum(int[][] store) {
int k[] = new int[1_000_000];
for (int j = 0; j < 1_000_000; j++) {
for (int i = 0; i < 60; i++) {
k[j] += store[i][j];
}
}
return k;
}
public static int[] top10(int[] productCounter) {
int index[] = new int[10];
int count[] = new int[10];
for (int j = 0; j < 1_000_000; j++) {
compare(j, productCounter[j], index, count);
}
return index;
}
private static void compare(int i, int c, int[] index, int[] count) {
for (int j = 0; j < count.length; j++) {
if (c >= count[j]) {
index[j] = i;
count[j] = c;
return;
}
}
}
}
My understanding of top 10 product ids are about statistics (purchase counts, sale income, and etc) associated with 10 product ids.
- jzhu May 18, 2012The minimum heap uses the statistics as node value, instead of the product id itself. Therefore, there should be an auxilary data structure to be used together with the heap so that product id can be mapped to heap node. For this purpose, hash table is perfect.
To get product history in the past hour, I assume that a queue is needed. Each queue element contains product id, properties and timestamp. Periodically, out-dated product records are removed from the queue and recent records are added to the queue.
All the "deltas" will be updated to the heap:
1) Lookup product id in hash table
2) If there is existing heap node, update heap node value using the delta, and heapify to maintain heap property.
3) If there is new product id, add it to the heap.
We can't really keep only 10 nodes in the heap - we need to keep all product ids as long as its property value >0. As a safe approximation, we may keep less nodes, for instance, the smaller value of 256 or number of all relevant product ids. Keeping only 10 could cause nontrivial errors.