How would you define a data st

Google Interview Question for Software Engineer / Developers

0

of 0 votes

29
Answers
How would you define a data structure that stores a set of values (i.e., a value cannot appear more than one time), and implements the following functions:
add(p)--adds the value p to the set
delete(p)--removes the value p from the set
getrandom()--returns a random value from the set (all items should be equally likely). (Assume you have access to some nice random() function.)

Need not write actual code, just sketch a structure to implement this efficiently.
- skeptical.scientist January 05, 2013 in United States | Report Duplicate | Flag | PURGE
Google Software Engineer / Developer Data Structures

Email me when people comment.

An error occurred in subscribing you.

Country: United States
Interview Type: Phone Interview

More Questions from This Interview

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 8 vote

Use an array to store the elements of the set, and a hash table to store a list of (key,value) pairs, where the keys are elements of the set, and the values are the array indices for the keys. For convenience, we also store the size of the set.

add(p):
   if p is already a key in hash table, return
   array[size]=p \\move array if overflow
   add p:size to the hash table
   size++

delete(p):
   i = value stored in hash table with key p
   size--
   array[i]=array[size]
   update the value in hash table with key array[i] to i \\Thanks, anon.
   delete the key p from the hash table

get_random():
   return array[randrange(0,size)]

The function randrange(x,y) is assumed to return a random integer in the interval [x,y).

This has amortized O(1) time for all three functions.

- skeptical.scientist January 05, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

-1

of 1 vote

Is linkedlist + hashtable more flexible? array's size if fixed and you can't free the memory of the deleted number

- zyfo2 January 05, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

If you do linkedlist+hashtable, get_random() will take O(size) time instead of O(1) time, because you need to traverse the first k list elements to find the kth.

The array size being fixed is not a big issue - if you run out of space, you can move the array somewhere else where you have twice as much space. That means you get n O(1) time insertions before a single O(n) time insertion (because of an array move), which means that insertion/deletion is still O(1) time on average.

- skeptical.scientist January 05, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Instead of having to really manage the sizes of your array and allocation and stuff like that, simply use an ArrayList, it will handle all of the background grunt work for you, and will offer constant time insert and delete.

- M.Zimmerman6 January 05, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Your delete is incomplete.

- Anonymous January 05, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

@skeptical.scientist why is O(size) for get? hashtable has O(1) search time. You can simply get that node in the hashtable instead of traversing.

- zyfo2 January 06, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

zylo, get *which* node? You need to write a function which returns each element with equal probability. I'm not saying you can't do this in O(1) time with just a hash table, but I don't see how you would do it.

- skeptical.scientist January 06, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

get it. I was referring to the delete function. Yes, you are right, for the random function, it'll need O(n).

- zyfo2 January 07, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

you could easily do this in O(1) time by getting the size of a hashmap, using its built in size function in java, then you simple need to return all values or values in the map into an array, and then choose a random index. that should be constant time

- Anonymous January 07, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

@anonymous: if you are turning the hashmap into an array every time get_random() is called, that will take O(n) time. Unless you have the array stored all the time, and just update it when you add/remove an element from the set. That is exactly what my solution does.

- skeptical.scientist January 07, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

@skeptical. Your delete code has a bug.

- Bugger. January 07, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Okay, two people have now said there's a problem with my delete function, which probably means there is... but what? I'm not seeing it.

- skeptical.scientist January 07, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

In the delete function, instead of deleting key p from the hash table, you can replace the value of key p with the value at index size (array[size]) and can delete the key with that value. This way, your array also will not be sparsed.

- letstryout January 07, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

@skeptical.

Consider the following test case:

add A, add B, add C, delete B, add D, delete C.

@letrysout: You have no clue how hashtables work, do you?

- Anonymous January 07, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Ah, thanks anonymous. I was forgetting to update the hash table with the new location of the value formerly at the end of the array when I moved the value.

- skeptical.scientist January 07, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

@skeptical: Bingo!

- Anonymous January 08, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 2 vote

Why not a hashtable itself. Just add the unique element constraint while adding and that should do. Why have an array + hash table?

- Anonymous January 06, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Adding and deleting is fine. How will you return a number randomly

- Anonymous January 06, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Agree

- Anonymous January 09, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Using HashMap size() method. Why can't you just use HashMap?

- Anonymous January 22, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

Why can't we use just simple linked list, or an array or whatever data structure where we can add and delete elements and to keep the size of it for the random function?

- Lyubomir January 05, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

If you have a linked list or array, you can't quickly insert/delete, because you can't quickly check whether the element is present and where in the list/array it is. You can do it, but it will be slower.

These operations will take O(n) time once your set grows to size n, whereas in the array+hashtable implementation both operations are O(1).

- skeptical.scientist January 05, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

I agree. Tell me if I am wrong, but I didn't sow any time and space requirements in the question :)

- Lyubomir January 05, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 4 vote

How about a balanced tree with the following structure

treeNode* leftTree;
treeNode* rightTree;
int data;
int nosNodesLT; //Number of nodes in left tree
int nosNodesRT; //Number of nodes in right tree

Then obviously
Time Complexity of add = O(log n)
Time Complexity of delete = O(log n)

For the random return, we will use the fields nosNodesLT and nosNodesRT as follows:
1.Choose k to be a random integer between 0 and n−1 inclusive. Let A initially be the root node of the tree.
2.Let m be the number of nodes in the left subtree of A. (If A is a leaf or has only right children, let m=0.)
3.If k=m, choose A as the node we want and stop.
4.Otherwise, if k<m, replace A with its left child node and repeat from step 2.
5.Otherwise (i.e. if k>m), subtract m+1 from k, replace A with its right child node and repeat from step 2.

Again Time Complexity is O( log n)
Space Complexity : O(n)

- mukherjr January 05, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

This is worse than skeptical's solution.

- Anonymous January 15, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

The only problem seems to be with random() function. Otherwise an array and a hashtable would have sufficed.
Lets say that we delete a number from array, and its corresponding entry from HashTable. Now if we want to find results of random() we need to find exact number of empty/vacant spaces in the Array. This is required because if random() function returns a random value between 'a' and 'b' and if it turns out to be some position which is vacant, then we need to run random() function again.

- Learner January 07, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

when you delete a number, you always update the array as well as hashmap by putting the last elem to the place of the deleted number. so, no empty spaces anytime

- Boba January 09, 2013 | Flag

Comment hidden because of low score. Click to expand.

-1

of 1 vote

doubly linked list + hashset

- aileen May 28, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

This doesn't implement getrandom() efficiently.

- skeptical.scientist May 28, 2013 | Flag

CareerCup

Google Interview Question for Software Engineer / Developers

Books

Videos

Resume Review

Mock Interviews