Design any datastructure with

Amazon Interview Question for Software Engineer / Developers

0

of 0 votes

29
Answers
Design any datastructure with three methods insert, delete and getRandom in a highly optimized way. The interviewer asked me to think of a combination of datastructures to design a new one. Insert can be designed anyway but for random and delete i need to get the position of specific element. He gave me a hint to think about the datastructure which takes minimum time for sorting.
- Vikram June 27, 2012 in India | Report Duplicate | Flag | PURGE
Amazon Software Engineer / Developer Data Structures

Email me when people comment.

An error occurred in subscribing you.

Country: India
Interview Type: Phone Interview

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 4 vote

Hashtable + vector.

On Insert, insert into vector end, and hash with key as element and value as the index in the vector.

On delete, follow hash, get index, swap with last element, and delete last element. Don't forget to update the hash value of swapped element.

On random, pick a random element from vector (generate random index).

Can replace hash with any other lookup structure.

- Anonymous June 29, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

There's just a little snag here. This works great for unique elements. How will you deal with duplicates?

- eugene.yarovoi June 29, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

@eugene: You can maintain a count along with the index in the vector. On delete you decrement the count and if it goes to zero, perform the swap.

Similar to reference counting...

- Anonymous June 29, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

A count is insufficient. You need information on where all the elements having a particular value are.

- eugene.yarovoi June 30, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

This was assuming getRandom returns each distinct element with equal probability, so allot only one entry in the vector for each distinct element.

You raise an interesting point though (albeit indirectly I suppose): what if we need the probability to be proportional to the count of the element?

I suppose you are alluding to that case?

A balanced binary tree with count information will work in that case (although run time becomes O(log n)).

I will add an answer with those details.

- Anonymous June 30, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

@eugene: Added answer. Not too detailed, but I suppose too detailed for you :-)

- Anonymous June 30, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

My interpretation of the question was that the probability of each entry must be the same, not each distinct entry. In the distinct case it really is as simple as having a count. Having some sort of balanced order statistic tree seemed like the obvious solution, but the real question is whether we can implement it while keeping all operations in expected O(1) time.

It seems that we can. Instead of your HashMap mapping values to a single index, it could map values to a HashSet of indices. Then we need a getAny() operation on this HashSet, that returns and deletes any one value in the set (with no requirement on which value it should be). We need this operation to run in expected O(1) time. Though it's tricky to prove, I do think that picking a hash bucket within the set at random until we get one that has at least one index in it, and then just taking the first index we see, accomplishes this goal, provided that we rehash periodically to keep the number of buckets within a certain ratio of the number of elements in the set.

If we have this method for the HashSet, given a delete, we get our HashSet of indices for the value to be deleted, and we pick and remove one such index, call it w, in O(1). Now we go to the end of the array, delete the value there, call it x, and put it at index w. We search the Hash*Map* for x and get a set of values where x occurs. We delete array.size()-1 from that set, and we add w to the set.

I think this covers it.

- eugene.yarovoi June 30, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

This seems same lyk simple hashtable(k,v) where k is the VALUE
and v is Count of the VALUE
Whats the difference between this and that?
PS. I'm a novice, pls explain and sorry if I'm silly

- gokul July 13, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

I'm actually storing a set describing where the values occur. If you only store a count, you won't be able to delete the values efficiently.

- eugene.yarovoi July 13, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

"Now do a search through the tree, using the cumulative counts as the key, with the additional step of decrement the count of a given node, when you go to its right subtree."

Hey will u please elaborate this part..I am not getting this.

- Neel July 23, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

please discard the above comment..

I have a doubt about how you guys are implementing hash table..what is your hash function..how are you joining distinct elements to their indices in the vector using hash table in O(1) time.

- Neel July 23, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

The exact same way a hash table does anything in O(1) time. Our hash function? Whatever hash function you normally use for integers. So pick a random integer a and a large prime and do something like ((a * element) % p ) % numBuckets

- eugene.yarovoi July 23, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 2 vote

@Eugene raised an interesting point about duplicate inserts, and requiring probability to be proportional to the count.

The earlier hashtable + vector can now be replaced with a

hashtable + binary search tree.

Instead of a vector we maintain a balanced binary tree, with the insert order as the key: effective simulating an array. We maintain of count of how many times a given value has been inserted.

We also maintain a cumulative count where each node maintains the sums of counts of its descendants (counting itself).

The root contains the total number of elements that are present, say C.

To get a random number, generate a random number in {1,2, ..., C}.

Now do a search through the tree, using the cumulative counts as the key, with the additional step of decrement the count of a given node, when you go to its right subtree.

- Anonymous June 30, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

- Neel July 23, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

How about using HashTable and Array List?

Inserting is nothing but adding the entry to hashmap and adding element to ArrayList.
Deletion means removing the entry from hashmap and element from ArrayList.
Searching means you input the value to the hashtable, and search the value part of the entry.
Random search gets a random number under length of ArrayList and returns element at that index.

- Varun June 28, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

That thought is in the right direction, but how will you remove from the ArrayList quickly? See the top-voted solution.

- eugene.yarovoi June 29, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

Binary Search Tree:

Complexity for Inserting,Deleting and Sorting is better than Array and LL.

- gs June 28, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Instead of ArrayList and hashmap combo, linked list and hash map is the best possible way. SInce linked list takes O(1) for deletion but array list not.

- Naveen Potnuru June 29, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

HashTable with BST(or red black trees for worst case performance of values inserted) for Duplicates.

- nerd June 29, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

hash table + array
insertion------insert element at the end of the array and hash the element and insert the length of the array at that indexed hash element.
deletion------- get the index of array using hashing swap with the last element of the array. and change the hash table entry of the swapping element.

random-----get the any element from array.

- Anonymous July 05, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

- brahmasani July 05, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

-1

of 1 vote

How about a combination of double linked list and a hash map?

While inserting any element, inset it in hast with the key as the element and value as the address in the double linked list. With this, insertion/deletion/searching/randomRetrieval all are possible.

Inserting is nothing but adding the entry to hashmap, creating a node for the double linked list.

Deletion means removing the entry from hashmap and node from the double linked list.

Searching means you input the value to the hashtable, and search the value part of the entry.

RandomSearch, get a random number and go to the start of the double linked list and traverse those many iterations, thus, giving a random value each time.

Not sure if this is an overkill,..but if anyone can point any inconsistencies/comments on this one, it would be great :).

- Pavan Dittakavi June 27, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

The fundamental flaw of this approach is this:

"RandomSearch, get a random number and go to the start of the double linked list and traverse those many iterations, thus, giving a random value each time."

That's a linear time algorithm.

- eugene.yarovoi June 28, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Thanks Eugene. So, if we modify the double linked list approach with say ArrayList as suggested by Varun, we should be in a good shape.

- Pavan Dittakavi June 28, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Varun's described algo has a different flaw: the remove operation takes linear time. See the top voted solution, where everything is O(1).

- eugene.yarovoi July 07, 2012 | Flag

Comment hidden because of low score. Click to expand.

-2

of 2 vote

Hash table method is the appropriate one.

- kk June 27, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

hashtable may not be a good choice for sorting rt?

- Anonymous June 27, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

How will you provide getRandom() with just a hashtable?

- Anonymous June 28, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

what about sort in this approach?

- gs June 28, 2012 | Flag

CareerCup

Amazon Interview Question for Software Engineer / Developers

Books

Videos

Resume Review

Mock Interviews