There is a server when a user

Amazon Interview Question for Senior Software Development Engineers

1

of 1 vote

21
Answers
There is a server when a user can login.
A used can login multiple times.
you have to return number of unique users in last 10 minutes.
Retrieve unique user count operation should be as fast as possible.
Note:
A user who has done login in last 10 minutes more than once should be counted only 1.
It is possible that in a particular duration no user has logged in.
- SK June 09, 2013 in India | Report Duplicate | Flag | PURGE
Amazon Senior Software Development Engineer Algorithm

Email me when people comment.

An error occurred in subscribing you.

Country: India
Interview Type: In-Person

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 7 vote

In my opinion, a combination of data structures should solve the problem. Use a min-heap and a BST augmented by the size field.

1. The BST will store userids as keys along with last-logged-in field.
2. The min heap will have last-logged-in time as key. Along with the key store the userid.
3. When a new user logs in or an old user re-logs into the system, insert a key into the heap. In the BST if the corresponding userid is present, update it's last-logged-in value. Otherwise insert a new key with the userid
4. Whenever we want to find out the unique users logged in the last 10 mins.
a. Extract min from the heap if current time - last-logged-in time > 10 mins
b. For the corresponding Id in the BST, delete the node if current time - last-logged-in time > 10 mins
c. Repeat the above steps a & b until the min element of the heap satisfies current time - last-logged-in time <= 10 mins
d. Return the value of size field of the root node of the BST.

- Murali Mohan June 09, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

This approach is best so far on the page.. We can modify it a bit for efficient space complexity by checking and removing entries whose timestamp is older than 10 minutes..

- Alok June 14, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Why don't you use hash-map instead of BST for indexing userid=>timestamp?

- Epic_coder June 16, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 2 votes

Yes, Hashtable is a better alternative to BST for this problem. I too realized this later. The algorithm can remain the same, but by replacing BST with a hash table.and a counter that maintains the number of users whose current time - last-logged-in time <= 10 mins it would be even better.

The counter is necessary to return the count of users in constant time. Otherwise iterating the hash table every time would require O(n) time. The counter needs to be updated every time we insert/delete elements to the hash table, though the updates to the counter are not needed while updating the hash table or adding to and removing entries from the min-heap.

- Murali Mohan June 17, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Since the timeStamp is already kind of sorted elements, I think min-heap is unnecessary here? normal linkedList will do, and the earliest will always be the first one.

- mike November 30, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

I agree with mike. Hash table + doubly link link will do all. Plus bit modification that we dont have to traverse whole DLL to get unique users as retrieval should be as fast as possible.
Here is my approach

1) At each node, there will be total unique visitors till that time since start of server
2) At time of user addition, check if user exists in hash map, and if exist then update its time to current time. and based on difference between lasttimeaccess of user and current time, node's counter will be updated.
3) At time of retrieval, count will be : (count at head - count in last 11th min node).

Update of list will be at time of addition and at time of retrieval.
Update can be done on 1 min timer also.

time_list
{
	time_t *nodeTimel
	int totalCount;
	time_list *next;
	time_list *prev;
}
map<userid, time_list*> users;
addUser (userid)
{
	node = users[userid];
	cur = time(NULL)
	oldest = node->nodeTime;
	if ( (cur - last) > (10*60) )
	{
		node = incrementUserCount(cur);
	}
	users[userid] = node;
}
time_t *head = NULL, * rear=NULL;
time_list * incrementUserCount (updatetime)
{
	if (!head)
	{
		head = new time_list;
		head->count = 1;
		head->next = NULL;
		head->prev = NULL;
		rear = head;
	}
	else
	{
		if (head->node_time == updatetime)
		{
			head->count += 1;
		}
		else
		{
			//Assuming always newest time entry will come
			node = time_list; 
			node->count = 0; 
			node->next = NULL;
			node->prev = NULL;

			//Keeping cumulative sum since start
			node->count += head->count;
			head->next = node;
			node->prev = head;
			head = node;

			//check if to remove old entries
			//keeping window of 11 minute entries
			while ( (updatetime - rear->node_time) > ( 11 *60 ) )
			{
				//removing rear node;
				temp = rear;
				rear=rear->next;
				remove (temp);
			}
		}
	}
	return head;
}

//it may possible there was no logging many windows
int getTotalCount ()
{
	cur = time(NULL);
	//getting last 11th min node
	lasttime = cur - 11*60;
	while (rear->node_time < lasttime)
	{
		temp = rear;
		rear=rear->next;
		remove (temp)
	}

	if (rear->time < lasttime)
	{
		return (head->count - rear->count)
	}
	else
	{
		return head->count;
	}

}

- SK January 25, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 7 vote

Why can't we use an hashmap in this scenario.

store userid as key and time stamp as value.
whenever a user logins into a server, add that info into the hashmap,
if it is a new user, a new element is inserted into the hashmap.
if the user already logged in, just the value (in this case timestamp) will be updated in the hashmap.
retrieve data using current time - last login time <= 10 mins, in this case we will get unique users

- Chennu June 10, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 1 vote

There is one way that I can think of. When ever user logs in put the data into queue. and deque data that is older than 10 minutes say every one minute interval. Parallelly put insert the data (when user logs in) into a map (update timestamp if it already contains this users data). when ever a user info is being dequed from queue, compare data with map and if timestamp is same remove this data.
This map contains information of last login of users upto last 10 minutes. so when it requires detail of number of users logged in at particular time the size of this map will give you the info.

- Anonymous June 10, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

- adityaatbitsg June 10, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

design a data structure with user id and other user information and log in time
create a hash table with this data structure. to count users, discard those user whose last log in is more than 10 mins ago. we can also start a clean up in every 10 mins, delete those entries whose last login is more than 10 mins ago.

- ashfaq.emrul June 10, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Create a Map with userid as key and time stamp that he logged in.
-> when ever user log-in's into the system, store his userid and time stamp into the map and store this into a Queue.
-> When ever time stamp crosses 1 min insert an empty data.
-> So, when the queue reaches 2o min, flush out the first 10 minutes data. As the Queue will hav latest 10 min data.
-> hope this should work.

- prashanthreddy.mtech June 10, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 2 vote

I though of this solution:
1. Create a min-heap and add users as they login
2. if time > 10 min, delete the root and maintain the min-heap
3. if count needed, traverse the heap (heap sort algo) and tag the corresponding entry of the user in hashtable (this avoids outputting the same user multiple times).
4. Keep the count of new users as you add to the hash
5. return the count (or the entries of the hash, if userIDs are needed) and clear hash.

- Anonymous June 10, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Generic solution to last N minute
======================================================================

1. Add Login Listener (typically a filter where a user login can be tracked)
[Alternate solution a scheduler @ one minute, but it's the better use rather than scheduler. KInd of ovserver design]

2. Whenever a user logs into the system then Login Event handler manages a TreeMap
where key is minute {1, 2 , ... N .....} and value is corresponding unique user count.
To determine user counter another temporary Set should be used where login user id(s) are stored when next minute comes then Set gets cleared and points to current minute users counter

startTime : Time
// initially 1
currentMinute: Integer
timeMap: TreeMap
function LoginEventHandler(userId : String) {
	endTime: Time
	loginUsers : Set
 	if( (endTime - startTime) / 60*currentMinute > 0 ) {
		// new minute starts
		currentMinute++;

		// clear temp set
		loginUsers.clear();
		loginUsers.add(userId);
		
		// update user counter against minute
		timeMap.put(currentMinute, loginUsers.size());
	} else {
		// continue with current minute
		boolean add = loginUsers.add(userId);
		if(add) {
			//new user
			// update user counter against minute
			timeMap.put(currentMinute, loginUsers.size());
		}
	}
}

3. Now actual trick comes into play ;)
timeMap could be used with HashMap but why TreeMap is used, let's understand
From above code it's oviuos for each minute entry may not be there and it should not there like <minute> => 0

Now N minute is given the trick comes if N does not exist so next existing entry ( less than N) should be picked
That's why TreeMap is used.

Here user should not be mapped because billion users login can happen so it won't be good design. If less user logs in then also it won't be good design to map user id because then whole user id set iterates

- Debasis Jana June 12, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Sorry guys,

1. I forgot to add carry forward into minute counter.
2. It can be generic like @ M minute interval (10 + 20 + 30 + ..... M). So in that case we need to increase counter with 10 and we need to check whether current time >= previous mapped time + 10

- Debasis Jana June 13, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

Using Splay Trees would provide the best solution here...

-Reasons:
1.)As here we want to keep track of the latest data for the past 10 minutes. Splay trees will have it mostly in the 10 closest node to the root.
2.)Also it solves our problem of one user accessing multiple time by providing faster updates.B'coz Splay trees are designed to give faster 2nd time access which BST does not provide.

You can Google for Splay trees and check for there AWESOME properties...

- peechus June 12, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

1) hashmap -> whose every element is userid + pointer to corresponding node in doubly linked list
2) doubly linked list (DLL) -> userid + timestamp

Along with these, we store count of users when the last insert/select operation operation is performed and the threshold timestamp value (last insert/select timestamp - 10 mins) and the pointer to last node in doubly linked list satisfying this threshold timestamp value.

a) Insert -

1) Go to corresponding hashmap entry and check if entry already exists.
2) If not exist, create a new DLL node and add it to beginning of DLL and increment count by s. calculate current threshold value and adjust the pointer and count according to that.
3) If already exists, move this to the beginning of DLL. If prev timestamp is above the threshold, do nothing. Otherwise, increment count by 1. calculate current threshold value and adjust the pointer and count according to that.

b) Finding unique user count:
1) calculate current threshold value and adjust the pointer and count according to that. Return the count.

- praneeth June 17, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

There are two important points to note here:
1. Focus is on "fastest retrieval of unique user count"
2. The query can come in at anytime and not fixed 10 minute intervals.

We can solve it by using a Hash-table with min-heap. The hashtable uses userid as key and stores last logged in time. The min-heap uses last logged in timestamp as key. Every node in min-heap is pointed to by an element in hashtable with corresponding userid.

Everytime a user logs in:
1. Extract-min from min heap while timestamp at root is more than 10 mins older.
2. Update the timestamp in hashtable for given userid.
3. If the element in hashtable does no point to any node in min-heap,
3.a. Insert a node to min-heap with key as new timestamp and point hashtable element to this node.
ELSE
3.b. update timestamp of the pointed min-heap node and min-heapify at this node.

Everytime user count is required:
1. Extract-min from min heap while timestamp at root is more than 10 mins older.
2. Return heap-size.

- asimprakash June 28, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

-1

of 1 vote

How is the user login information stored?

If its in a flat log file, then read the file with the proper fields into memory(assuming that the file can be in memory). Create a hashmap with the user id as the key and the value as the count of times the user has logged in.
While scanning the file filter on the time field such that (current_time - user login time) <= 10 mins.

If the user login is stored in database, query it as:
Select distinct user_id
from user_login_table
where datediff(mi, login_time, getdate()) <= 10

- Subhajit June 09, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Isn't this an ineffective approach? At any instant for finding the number of people eligible for that time window, you will have to scan the entire HashMap. This approach kills the purpose of HashMap.

- Learner June 09, 2013 | Flag

Comment hidden because of low score. Click to expand.

-1

of 1 vote

A simple hashset that clears and re-populates every ten minutes should do?

- anony June 09, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 1 vote

Nope, that won't do. If hashset is cleared at time t1, and the query is made at t1+delta, it will return log-ins of last delta seconds, when it should have returned log-ins of last 10 minutes. If by clear you meant cleanup of entries more than 10 minutes old, then clear operation becomes O(n).

- Elle June 09, 2013 | Flag

CareerCup

Amazon Interview Question for Senior Software Development Engineers

Books

Videos

Resume Review

Mock Interviews