Amazon Interview Question
Software Engineer / DevelopersCountry: India
Interview Type: Phone Interview
There's just a little snag here. This works great for unique elements. How will you deal with duplicates?
@eugene: You can maintain a count along with the index in the vector. On delete you decrement the count and if it goes to zero, perform the swap.
Similar to reference counting...
A count is insufficient. You need information on where all the elements having a particular value are.
This was assuming getRandom returns each distinct element with equal probability, so allot only one entry in the vector for each distinct element.
You raise an interesting point though (albeit indirectly I suppose): what if we need the probability to be proportional to the count of the element?
I suppose you are alluding to that case?
A balanced binary tree with count information will work in that case (although run time becomes O(log n)).
I will add an answer with those details.
My interpretation of the question was that the probability of each entry must be the same, not each distinct entry. In the distinct case it really is as simple as having a count. Having some sort of balanced order statistic tree seemed like the obvious solution, but the real question is whether we can implement it while keeping all operations in expected O(1) time.
It seems that we can. Instead of your HashMap mapping values to a single index, it could map values to a HashSet of indices. Then we need a getAny() operation on this HashSet, that returns and deletes any one value in the set (with no requirement on which value it should be). We need this operation to run in expected O(1) time. Though it's tricky to prove, I do think that picking a hash bucket within the set at random until we get one that has at least one index in it, and then just taking the first index we see, accomplishes this goal, provided that we rehash periodically to keep the number of buckets within a certain ratio of the number of elements in the set.
If we have this method for the HashSet, given a delete, we get our HashSet of indices for the value to be deleted, and we pick and remove one such index, call it w, in O(1). Now we go to the end of the array, delete the value there, call it x, and put it at index w. We search the Hash*Map* for x and get a set of values where x occurs. We delete array.size()-1 from that set, and we add w to the set.
I think this covers it.
This seems same lyk simple hashtable(k,v) where k is the VALUE
and v is Count of the VALUE
Whats the difference between this and that?
PS. I'm a novice, pls explain and sorry if I'm silly
I'm actually storing a set describing where the values occur. If you only store a count, you won't be able to delete the values efficiently.
"Now do a search through the tree, using the cumulative counts as the key, with the additional step of decrement the count of a given node, when you go to its right subtree."
Hey will u please elaborate this part..I am not getting this.
please discard the above comment..
I have a doubt about how you guys are implementing hash table..what is your hash function..how are you joining distinct elements to their indices in the vector using hash table in O(1) time.
@Eugene raised an interesting point about duplicate inserts, and requiring probability to be proportional to the count.
The earlier hashtable + vector can now be replaced with a
hashtable + binary search tree.
Instead of a vector we maintain a balanced binary tree, with the insert order as the key: effective simulating an array. We maintain of count of how many times a given value has been inserted.
We also maintain a cumulative count where each node maintains the sums of counts of its descendants (counting itself).
The root contains the total number of elements that are present, say C.
To get a random number, generate a random number in {1,2, ..., C}.
Now do a search through the tree, using the cumulative counts as the key, with the additional step of decrement the count of a given node, when you go to its right subtree.
How about using HashTable and Array List?
Inserting is nothing but adding the entry to hashmap and adding element to ArrayList.
Deletion means removing the entry from hashmap and element from ArrayList.
Searching means you input the value to the hashtable, and search the value part of the entry.
Random search gets a random number under length of ArrayList and returns element at that index.
hash table + array
insertion------insert element at the end of the array and hash the element and insert the length of the array at that indexed hash element.
deletion------- get the index of array using hashing swap with the last element of the array. and change the hash table entry of the swapping element.
random-----get the any element from array.
hash table + array
insertion------insert element at the end of the array and hash the element and insert the length of the array at that indexed hash element.
deletion------- get the index of array using hashing swap with the last element of the array. and change the hash table entry of the swapping element.
random-----get the any element from array.
How about a combination of double linked list and a hash map?
While inserting any element, inset it in hast with the key as the element and value as the address in the double linked list. With this, insertion/deletion/searching/randomRetrieval all are possible.
Inserting is nothing but adding the entry to hashmap, creating a node for the double linked list.
Deletion means removing the entry from hashmap and node from the double linked list.
Searching means you input the value to the hashtable, and search the value part of the entry.
RandomSearch, get a random number and go to the start of the double linked list and traverse those many iterations, thus, giving a random value each time.
Not sure if this is an overkill,..but if anyone can point any inconsistencies/comments on this one, it would be great :).
The fundamental flaw of this approach is this:
"RandomSearch, get a random number and go to the start of the double linked list and traverse those many iterations, thus, giving a random value each time."
That's a linear time algorithm.
Thanks Eugene. So, if we modify the double linked list approach with say ArrayList as suggested by Varun, we should be in a good shape.
Hashtable + vector.
- Anonymous June 29, 2012On Insert, insert into vector end, and hash with key as element and value as the index in the vector.
On delete, follow hash, get index, swap with last element, and delete last element. Don't forget to update the hash value of swapped element.
On random, pick a random element from vector (generate random index).
Can replace hash with any other lookup structure.