Amazon Interview Question
Country: India
Interview Type: In-Person
Its a great and simple idea.
you can use Hashmap to save frequency of page sequence with this idea.That gives o(1) for lookup
@perlscar:User info not needed as per the question
In your algo, sequence 100->11->12 and 1001->1->12 would have the same key. So use any special character b4 each n every page.
Even if the sequence were specific to a user from the table we can generate
User 1 : 1 -> 2 -> 4 -> 1 -> 6
User 2 : 1 -> 2 -> 4
.
.
During the generation of this data itself we can find the frequency of the sequences.
All pages can be mapped to unique numbers :) as there are only 8 pages.
use GSP Algorithm.
Its a datamining technique to find sequential patterns.
set limit to 3
Assuming that the log file is sorted by time.
eg:
Time user page
1 dk 2
1 sp 2
2 dk 3
3 dk 5
2 sp 3
3 sp 5
Approach with Minimal memory usage:
Step1 : Find and store all users O(n)
Step2 :For Each user, Iterate through the file. (users * O(n))
Create a string with page seq like A2B3C5
use this string as Hashkey and increment the value for each time the seq occurs.
Step3: In the HashMap, find the key with the highest value and extract the page sequence from the key. eg: 2->3->5 from A2B3C5
For less time complexity,
step1: store all the records in a structure Array { char *name, int page,} , and sort the structure array by name.
Step2 : Iterate through the structure array only once. (O(n))
Create a string with page seq like A2B3C5
use this string as Hashkey and increment the value for each time the seq occurs.
Step3: In the HashMap, find the key with the highest value and extract the page sequence from the key. eg: 2->3->5 from A2B3C5
Create a counter array of 778 elements (each representing a 3 digit number that represents a 3-page sequence 111 to 888)
Create a hash map of (user:current_sequence) where current sequence is an integer representing the current 3-page sequence. After a page "p" is read for a user, reset the hash value to (cur_seq%100)*10+p. When the resultant number is >100, increment the corresponding index (cur_seq-111) in the counter array.
O(num_users + 800) space complexity. O(n) time complexity, where n is the total number of records.
Create a hash where you use a key made from 3 page ids like 23_145_9806 where 23, 145 & 9806 are 3 consecutive page ids.
In the parsing loop get next 2 page ids also, make this key and see if it exists. If yes, increment the count, if not add this new key to hash with value as 1. At the same time you can store the key with highest count. Every time you increment the count, just check whether incremented count is greater than the biggest count stored. If yes, then replace the biggest counter with this one.
In single loop you will get count of all the page ids as well as the biggest one of them.
Any comments??
as the site only has 8 pages, the possible page sequence is limited to 8x7x6. allocate an counter array to hold 8x7x6 integers.
1. create a Hash map witch each sequence as the key;
2. Parse the file to get the viewing page sequences grouped by each user , sequences are ordered by timestamps as this is a log file. O(N)
user sequences
---------------------------
A 1-2-3-4
B 2-3-5
C 4-3-5-8
3. iterate all the sequences in the hash map , increment the corresponding counter in the 8x7x6 array by 1.O(M*N)
sequeces counter
-----------------------------
1-2-3 1
2-3-4 1
2-3-5 1
4-3-5 1
.....
4. found the largest counter.
My first approach would be to create a HashMap<key = page number, value = number of times the page is opened>. This will be done in O(n).
Next, create an array of size 3 which we will return as a solution (at the end).
We will initialize the array with the first 3 keys from the HashTable.
Ex: A: 1, 2, 3
Now for every other key K in HashMap:
- compute the minimum value between the values assigned to those 3 keys stored in HashMap, and remember the key assigned to the min in Min_key
- check to see if the min is greater or equal to HashMap.get(K) ; if so.. ignore
else replace the Min_key with K
This will be done in O(5 * 3) = O(15)
So the total time : O(n) + O(15) = O(max(n,15)) = O(n)
At the end we have obtained in O(n) time complexity the 3 pages which were frequently used by the users.
Of course we can check , and reorder the values of the array to have in A[0] the most visited page, A[1] less visited, A[2] the least visited page, without changing the complexity.
For the following sequence,
7-8-4-3-6-3-6-3-6-3-6-1-9-12-9-12-9-12-9-12-9-7-8-4-7-8-4 we do not need the maximum page visit but only the sequence. Your algo will get 3,6 and 9 which is not required at all but the seqnce req is 7-8-4
lets number every sequence like 1->3->8 = 138, so every sequence will be unique
- DashDash December 19, 2012Now let say the sequence with time is 1->3->6->3->5->4....
So the sequence combinations can be 136,363,635,354,........
Now counting the sequence which occurs most can be the answer
Please do let me know if I am wrong here.