Google Interview Question
InternsTeam: Software Engineering
Country: Brazil
Interview Type: In-Person
What does he mean? From the context of google, I would assume they mean requests from a client to a back-end server that have not gotten a response for a second or more. This would be RPC requests sent from a "client" (e.g. a front end server) to a "server" (e.g. a back end server). This can be 10 thousands of requests but it's probably save to assume it's less than 100'000 or let's say 500'000 requests in any 1 second timeframe: < 500K RQS
I'd pretend requests are executed strictly sequential so I place them in a queue together with the timestamp. New requests are placed at the head (front), requests at tail that are >= 1 second can be canceled and removed. I would use a ring buffer with 500K entries fix allocated with a start and end index. So, the index I'd get for a call is as well an identifier that I need back on the request's response so I can mark the request as completed. If a request is completed that is = to the tail, I can advance the tail as far as there are entries marked completed.
Complexity: O(1) to insert, O(1) to find > 1s and O(1) amortized to complete a call.
The problem with the apporach is, that a call that really takes 1s compared with other calls being returned in fractions of milliseconds will keep the queue from being emptied, so the queue can get full because of only one call. If it can be restricted in terms there are never more than 500'000 requests a second, this can still be a good solution, but if the number of calls lasting long increases and if the accepted timeout is changed to maybe 10 seconds, the solution starts to break.
I think the following might work:
Start with a doubly link list with 2 nodes and 3 pointers ( head, next, tail ) .
head is head
next is head.next
and tail is last node .
every second the head breaks connection to the next node,
head.right=null;
next.left=null;
head=next;
and add a node to the tail
tail.next=newNode
tail = newNode.
The server keeps reading data from the head of the doubly link list.
@NoOne: The thing with this problem is, that 99% of the requests will probably be super fast and they aim to cut off the long tail latency (just interpretation). Now with a heap I see two issues:
- How to remove an item from the heap, e.g. how to identify it since the heap is modified, if a response comes back, we have to find that item in the heap to remove it. How do you do this fast?
- O(Lg(N)) on insert and remove might not be such a big thing, since the insert can be optimized to O(1)
@ChrisK - the question says - ignore request more than 1 sec old. That would mean - it is using this whatever as the queuing mechanism for incoming requests. In fact I am not so sure about the response at all. Hence the proposition of a heap to keep most current at the top level - and then one separate thread can peacefully trim the heap...
@NoOne: sure, it could be incomming requests on a server, but still, a heap is not necessary, since requests will arrive in sequential order. So, they are already ordered in the sense that the oldest is next on a FIFO, there is no need to change the order. One can just consume in the order they arrived and only process them if they are younger than a second. Seems a bit simple, well, I maybe just didn't get the question at all...
I don't understand why you even need a data structure for this at all:
- srterpe January 26, 2017Your req handler could simply look at the req time at the moment it responds and if the current time - req time > 1000ms just call req.write(429). req.close(). To me, that seems far simpler and faster than building a data structure to handle this.