Google Interview Question
Software Engineer / DevelopersTeam: YouTube Engineering
Country: United States
Interview Type: Phone Interview
Seems like double linked list + hashmap is a viable solution.
Add "servers" as a double linked list, and keep track of where they are on the list by using hashmap. This will give you O(1) on add, delete, and pick random (given that you have a random number generator that works on O(1)).
assume that you inserted three servers:
a->b->c
hashtable:
a
b
c
then you deleted b
new hashtable
a
b null
c
the random generator gives you 2
you got null
now you need to execute the random generator again
the tricky part is how to deal with this situation in constant time
In no situation of add can be completed in O(1) using your data structure, since if it is given you the position to add, since you need to maintain the double-link-list, it can be O(n). If you are provided the previous node for you to add the new node, you need to know the positions to maintain(update) hash map, counting positions takes O(n) as well. Correct me if I am wrong
they asked to create your own data structure and write a method in any programming language of add, delete and pick at random.
The hashmap answer is correct, if lacking in details.
The problem spec does not require that we pull the servers from a list, so I assume that the list representation is not required.
Adding it is more or less trivial.
We want an implementation of sets which support some form of random selection. My personal favorite language, if asked for implementation, in interviews is python because it provides this sort of utility for us. You could look up the details in the python svn repository if you want to translate to another language.
I recommend looking at set_next() and set_table_resize() if you go down that path.
You'll see that set_next does the naive walk over indices.
Essentially, by keeping the table appropriately sized, you avoid O(n) degradation in the average case.
Of course, there are pathological cases for this approach, but I'd consider that acceptable in an interview scenario.
Create two sets of servers. Call them
good
and
bad
Whenever you detect a server failure, remove it from good, and add it to bad. If a server comes back up, just remove it from bad and add it to good, naturally.
The trickiness comes in having a good implementation of sets, but many languages provide this for you. If not, python's C implementation is about as definitive as any.
If you're asked this in a phone interview, you'll almost definitely be allowed to assume the existence of a good hashtable implementation.
During the onsite, you had better know how to implement a good hashtable in any language. The easy strategy for resizing is to do it when the number of elements drops below some fraction of the table length.
If all they want is a data structure to efficiently support add, delete, and pick one at random, i.e. a toy case and not asking you to design a full load balancing server system, you could do it with a simple array. Allocate an array big enough for what you think you might need. Have an integer variable you use as a pointer to the next unused element. To add an item you put it in the next free element of the array and increment the pointer. To delete an item, copy the last item over it and decrement the pointer. To select one at random generate a random integer between 0 and the index of the last item. All of that is O(1). If you run out of space when adding, a typical implementation is to allocate a new array twice the size of the current one, copy the elements over and free the memory of the original array.
- pollywog December 04, 2012