Amazon Interview Question
Senior Software Development EngineersCountry: India
Interview Type: In-Person
This approach is best so far on the page.. We can modify it a bit for efficient space complexity by checking and removing entries whose timestamp is older than 10 minutes..
Yes, Hashtable is a better alternative to BST for this problem. I too realized this later. The algorithm can remain the same, but by replacing BST with a hash table.and a counter that maintains the number of users whose current time - last-logged-in time <= 10 mins it would be even better.
The counter is necessary to return the count of users in constant time. Otherwise iterating the hash table every time would require O(n) time. The counter needs to be updated every time we insert/delete elements to the hash table, though the updates to the counter are not needed while updating the hash table or adding to and removing entries from the min-heap.
Since the timeStamp is already kind of sorted elements, I think min-heap is unnecessary here? normal linkedList will do, and the earliest will always be the first one.
I agree with mike. Hash table + doubly link link will do all. Plus bit modification that we dont have to traverse whole DLL to get unique users as retrieval should be as fast as possible.
Here is my approach
1) At each node, there will be total unique visitors till that time since start of server
2) At time of user addition, check if user exists in hash map, and if exist then update its time to current time. and based on difference between lasttimeaccess of user and current time, node's counter will be updated.
3) At time of retrieval, count will be : (count at head - count in last 11th min node).
Update of list will be at time of addition and at time of retrieval.
Update can be done on 1 min timer also.
time_list
{
time_t *nodeTimel
int totalCount;
time_list *next;
time_list *prev;
}
map<userid, time_list*> users;
addUser (userid)
{
node = users[userid];
cur = time(NULL)
oldest = node->nodeTime;
if ( (cur - last) > (10*60) )
{
node = incrementUserCount(cur);
}
users[userid] = node;
}
time_t *head = NULL, * rear=NULL;
time_list * incrementUserCount (updatetime)
{
if (!head)
{
head = new time_list;
head->count = 1;
head->next = NULL;
head->prev = NULL;
rear = head;
}
else
{
if (head->node_time == updatetime)
{
head->count += 1;
}
else
{
//Assuming always newest time entry will come
node = time_list;
node->count = 0;
node->next = NULL;
node->prev = NULL;
//Keeping cumulative sum since start
node->count += head->count;
head->next = node;
node->prev = head;
head = node;
//check if to remove old entries
//keeping window of 11 minute entries
while ( (updatetime - rear->node_time) > ( 11 *60 ) )
{
//removing rear node;
temp = rear;
rear=rear->next;
remove (temp);
}
}
}
return head;
}
//it may possible there was no logging many windows
int getTotalCount ()
{
cur = time(NULL);
//getting last 11th min node
lasttime = cur - 11*60;
while (rear->node_time < lasttime)
{
temp = rear;
rear=rear->next;
remove (temp)
}
if (rear->time < lasttime)
{
return (head->count - rear->count)
}
else
{
return head->count;
}
}
Why can't we use an hashmap in this scenario.
store userid as key and time stamp as value.
whenever a user logins into a server, add that info into the hashmap,
if it is a new user, a new element is inserted into the hashmap.
if the user already logged in, just the value (in this case timestamp) will be updated in the hashmap.
retrieve data using current time - last login time <= 10 mins, in this case we will get unique users
There is one way that I can think of. When ever user logs in put the data into queue. and deque data that is older than 10 minutes say every one minute interval. Parallelly put insert the data (when user logs in) into a map (update timestamp if it already contains this users data). when ever a user info is being dequed from queue, compare data with map and if timestamp is same remove this data.
This map contains information of last login of users upto last 10 minutes. so when it requires detail of number of users logged in at particular time the size of this map will give you the info.
There is one way that I can think of. When ever user logs in put the data into queue. and deque data that is older than 10 minutes say every one minute interval. Parallelly put insert the data (when user logs in) into a map (update timestamp if it already contains this users data). when ever a user info is being dequed from queue, compare data with map and if timestamp is same remove this data.
This map contains information of last login of users upto last 10 minutes. so when it requires detail of number of users logged in at particular time the size of this map will give you the info.
design a data structure with user id and other user information and log in time
create a hash table with this data structure. to count users, discard those user whose last log in is more than 10 mins ago. we can also start a clean up in every 10 mins, delete those entries whose last login is more than 10 mins ago.
Create a Map with userid as key and time stamp that he logged in.
-> when ever user log-in's into the system, store his userid and time stamp into the map and store this into a Queue.
-> When ever time stamp crosses 1 min insert an empty data.
-> So, when the queue reaches 2o min, flush out the first 10 minutes data. As the Queue will hav latest 10 min data.
-> hope this should work.
I though of this solution:
1. Create a min-heap and add users as they login
2. if time > 10 min, delete the root and maintain the min-heap
3. if count needed, traverse the heap (heap sort algo) and tag the corresponding entry of the user in hashtable (this avoids outputting the same user multiple times).
4. Keep the count of new users as you add to the hash
5. return the count (or the entries of the hash, if userIDs are needed) and clear hash.
Generic solution to last N minute
======================================================================
1. Add Login Listener (typically a filter where a user login can be tracked)
[Alternate solution a scheduler @ one minute, but it's the better use rather than scheduler. KInd of ovserver design]
2. Whenever a user logs into the system then Login Event handler manages a TreeMap
where key is minute {1, 2 , ... N .....} and value is corresponding unique user count.
To determine user counter another temporary Set should be used where login user id(s) are stored when next minute comes then Set gets cleared and points to current minute users counter
startTime : Time
// initially 1
currentMinute: Integer
timeMap: TreeMap
function LoginEventHandler(userId : String) {
endTime: Time
loginUsers : Set
if( (endTime - startTime) / 60*currentMinute > 0 ) {
// new minute starts
currentMinute++;
// clear temp set
loginUsers.clear();
loginUsers.add(userId);
// update user counter against minute
timeMap.put(currentMinute, loginUsers.size());
} else {
// continue with current minute
boolean add = loginUsers.add(userId);
if(add) {
//new user
// update user counter against minute
timeMap.put(currentMinute, loginUsers.size());
}
}
}
3. Now actual trick comes into play ;)
timeMap could be used with HashMap but why TreeMap is used, let's understand
From above code it's oviuos for each minute entry may not be there and it should not there like <minute> => 0
Now N minute is given the trick comes if N does not exist so next existing entry ( less than N) should be picked
That's why TreeMap is used.
Here user should not be mapped because billion users login can happen so it won't be good design. If less user logs in then also it won't be good design to map user id because then whole user id set iterates
Using Splay Trees would provide the best solution here...
-Reasons:
1.)As here we want to keep track of the latest data for the past 10 minutes. Splay trees will have it mostly in the 10 closest node to the root.
2.)Also it solves our problem of one user accessing multiple time by providing faster updates.B'coz Splay trees are designed to give faster 2nd time access which BST does not provide.
You can Google for Splay trees and check for there AWESOME properties...
1) hashmap -> whose every element is userid + pointer to corresponding node in doubly linked list
2) doubly linked list (DLL) -> userid + timestamp
Along with these, we store count of users when the last insert/select operation operation is performed and the threshold timestamp value (last insert/select timestamp - 10 mins) and the pointer to last node in doubly linked list satisfying this threshold timestamp value.
a) Insert -
1) Go to corresponding hashmap entry and check if entry already exists.
2) If not exist, create a new DLL node and add it to beginning of DLL and increment count by s. calculate current threshold value and adjust the pointer and count according to that.
3) If already exists, move this to the beginning of DLL. If prev timestamp is above the threshold, do nothing. Otherwise, increment count by 1. calculate current threshold value and adjust the pointer and count according to that.
b) Finding unique user count:
1) calculate current threshold value and adjust the pointer and count according to that. Return the count.
There are two important points to note here:
1. Focus is on "fastest retrieval of unique user count"
2. The query can come in at anytime and not fixed 10 minute intervals.
We can solve it by using a Hash-table with min-heap. The hashtable uses userid as key and stores last logged in time. The min-heap uses last logged in timestamp as key. Every node in min-heap is pointed to by an element in hashtable with corresponding userid.
Everytime a user logs in:
1. Extract-min from min heap while timestamp at root is more than 10 mins older.
2. Update the timestamp in hashtable for given userid.
3. If the element in hashtable does no point to any node in min-heap,
3.a. Insert a node to min-heap with key as new timestamp and point hashtable element to this node.
ELSE
3.b. update timestamp of the pointed min-heap node and min-heapify at this node.
Everytime user count is required:
1. Extract-min from min heap while timestamp at root is more than 10 mins older.
2. Return heap-size.
How is the user login information stored?
If its in a flat log file, then read the file with the proper fields into memory(assuming that the file can be in memory). Create a hashmap with the user id as the key and the value as the count of times the user has logged in.
While scanning the file filter on the time field such that (current_time - user login time) <= 10 mins.
If the user login is stored in database, query it as:
Select distinct user_id
from user_login_table
where datediff(mi, login_time, getdate()) <= 10
Nope, that won't do. If hashset is cleared at time t1, and the query is made at t1+delta, it will return log-ins of last delta seconds, when it should have returned log-ins of last 10 minutes. If by clear you meant cleanup of entries more than 10 minutes old, then clear operation becomes O(n).
In my opinion, a combination of data structures should solve the problem. Use a min-heap and a BST augmented by the size field.
- Murali Mohan June 09, 20131. The BST will store userids as keys along with last-logged-in field.
2. The min heap will have last-logged-in time as key. Along with the key store the userid.
3. When a new user logs in or an old user re-logs into the system, insert a key into the heap. In the BST if the corresponding userid is present, update it's last-logged-in value. Otherwise insert a new key with the userid
4. Whenever we want to find out the unique users logged in the last 10 mins.
a. Extract min from the heap if current time - last-logged-in time > 10 mins
b. For the corresponding Id in the BST, delete the node if current time - last-logged-in time > 10 mins
c. Repeat the above steps a & b until the min element of the heap satisfies current time - last-logged-in time <= 10 mins
d. Return the value of size field of the root node of the BST.