Amazon Interview Question
SDE1sTeam: International expansion
Country: United States
Interview Type: In-Person
I like the thought in solution proposed. But scaling that will by a stupid attempt as with each addition, we are possibly creating another thread/processor performing possible reorganization in parallel.
Interviewer will be very happy with this response, especially if you clarify assumptions, show why it is impossible to do it in "standard" way and articulate when exactly parallelized solution will be effective Namely, when comparison operation is more expensive than processor allocation and inter-process communication. Obviously, the proposed communication scheme must be rational (e.g. there is no need for re-organization but it is enough to send each request to each node every time)
When asking questions like this interviewer clearly want to see whether candidate is able to think out of the box. You can argue whether this question is good or not. Usually interviewer provide enough hints. It is open-ended question, the discussion can even end up with designing your own hardware architecture for this problem.
Nice proof, it's all there.
For the parallel solution, 3-color problem would not be "hard" if we had 3^n boxes on demand ;)
We can just have a background thread that searches for new max after deleting max (deleting non-max does not change it, inserting greater than max is trivial update)
If you are getting max and this thread is still running, let it finish. Also, if you are doing insert or delete operation and this thread is still running, let it finish so you can be certain whether you are deleting non-max or inserting greater than max.
It is obvious that worst case scenario is n when you do operation that has to wait for background thread while it is running. However, given this thread starts up only on deleting max we can be certain in average case of O(1)
P.S. When inserting, you don't have to wait for thread to finish as soon as it finds greater element than is being inserted. Also, when deleting, as soon as it finds greater element than is being deleted.
VERSION2: Stop (or pause) background thread on every insert and delete and then restart it. When the only one operation that may wait for it is get max. If insert/delete call frequency never let to complete it, it will eventually complete on next getting max and will not start until next delete max.
The proof doesn't cover non-comparison based models. Simply quoting the NlogN lower bound doesn't say much because we were using a hash table (a data structure that's not comparison-based) from the beginning.
Still, I like this answer and it's probably what an interviewer would look for.
>The proof doesn't cover non-comparison based models
it doesn't need to. We're within comparison model, since data structure implements MAX() operation. Finding maximum of two (or more) numbers is, by definition, a comparison operation, so building sorting algorithm with the help of series of MAX() yields a comparison sorting algorithm.
In fact I don't even need to prove that it must be a comparison sorting.
> how can N processors find out max element in O(1)?
Easy. For instance, we create a distributed "linked list" using N processors, where each processor stores single number and we keep all numbers/processors sorted. In order to organize a linked list, each processor might keep an index of the peer. Finding MAX() in this case is single message round-trip with the last processor, since it holds the maximum.
insert, delete and search operations are implemented by broadcasting to all processors, and in worst case the processor might forward the data the peer at most once.
Insert is the most complex operation here. It can be implemented in O(1) time complexity in following way. Let's say processor receives a broadcasted message with instruction to insert number X. If it is the last processor then new processor is allocated and becomes the last one. If it is not he last processor, but X greater than the number which is stored locally, it asks the processor on the right whether its number is equal or less than X - if reply is positive then new processor is allocated and "inserted" in between of those two, otherwise nothing happens.
Why can't we build a Max-Stack?
stackoverflow.com/questions/7134129/stack-with-find-min-find-max-more-efficient-than-on
Yeah, the presence of the delete operation wouldn't allow things to be as simple as saving the max value in an extra variable.
Appears that the interviewer may have been looking for the same argument.
You should use a hashmap to save all your numbers and a stack to track the current maximum value; Whenever the number being inserted to greater than the top of the stack, you should push the number to the stack. Whenever the number being deleted is greater than the top of the stack, you pop the stack. The top of the stack will always be the maximum value.
I don't think that it is possible that way. For example, the top of the stack contains 200 and you insert 100, then 150. just 200 will be in the stack. If I delete 200, how are you going to find out the current max after deletion (which is 150 and not in the stack)?
Can't we use a max-heap? In the case of the previous poster's scenario (add 100 -> add 200 -> add 150), after heapifying the data-structure we would end up with max 200, then when we execute the final condition (remove 200), the node with 150 will swim up and become max. The sink and swim operations will be O(logn) but extracting max will be O(1). We can check during delete operation if we are deleting max of heap, then heapify the remaining nodes accordingly.
a data structure with a hash map and max heap.
insert, delete and search is O(1) with hash map.
Insert and delete also trigger a background thread to update the heap. This thread is running in background, not affecting the time for insert and delete.
Max return the top of heap, which is the max to our knowledge so far.
what about adding hashmap for the insert,delete,search operation and in the data field of value part stored last max element. Also stored max element value in this data structure for getting max in O(1).
For eg. at first max=0, when new element is added max will be set that value.After this say new element is added having value > max, then add new value<value of second element,max> in hashmap and update max.
Well this type of data structure is really interesting
What I can think of is:
For Insert/Delete(a particular element)/Search: O(1)
Hashmap is the best solution
If want to have max operation as well
We can have Hashmap having hashing algorithm implemented in such a way that takes comparison into account (means greater element will map to greater hash key). (Though empty keys could be a problem but that can be solved using parallelism or other useful data structure for keys management)
Hope it will help!
You should use a hashmap to save all your numbers and a stack to track the current maximum value; Whenever the number being inserted to greater than the top of the stack, you should push the number to the stack. Whenever the number being deleted is equal to the top of the stack, you pop the stack. The top of the stack will always be the maximum value.
You should use a hashmap to save all your numbers and a stack to track the current maximum value; Whenever the number being inserted to greater than the top of the stack, you should push the number to the stack. Whenever the number being deleted is greater than the top of the stack, you pop the stack. The top of the stack will always be the maximum value.
Consider the following sequence:
Add 100 --> add 200 --> Add 150 --> remove 200
At the end, the stack will only have 100, but the max is 150. Next max call will return 100 not 150. Hence keeping a stack won't help.
I like this question.
- 0xF4 December 30, 2014I assume here that data structure can hold arbitrary numbers (data), otherwise the problem isn't interesting.
First of all, your intuition is right:
> But not sure Max is even possible in O(1) with the presence of delete operation?
Such datastructure/algorithm is impossible on a turing machine.
Proof by contradiction:
suppose this data structure exists. If this is the case then we can build general purpose comparison sorting algorithm by pushing all data to this structure and then pulling one-by-one using max and delete operations with O(N) complexity. However we know, that no comparison sorting algorithm on a turing machine can give us better than O(NlogN). Q.E.A./Q.E.D.
Now, we're engineers let's solve the problem :)
Given the proof above, the only possible loophole here - is to avoid using turing machine model. Since question is asked at Amazon where interviewers are passionate about scale I'd propose to parallelize the computation and leverage O(N) processors/computers. In this model all requested operations (including search) can be implemented with time complexity O(1)