Google Interview Question
Software EngineersCountry: United States
Interview Type: In-Person
You can probably enqueue a video id , each time video is viewed and some background process can pick items from queue and can maintain dictionary which it increments every time a video is picked and can write back to DB when dictionary reaches a certain size.
Since counts are always increasing so if multiple copies of background processes can run
Do you think a general queue of video ids would be better idea ? because there is too much request to enqueue it and consumer which is your background process will keep waiting. For me as soon as some one will request for video, server should add the view count. So every time a Map which has key - value pair where key is video id, is accessed, it should increase the view count at same time.Value can be pointer to an object which should have static view count. As soon as it will call view function, it should increase the counter and provide the object with that view number. Once it has to update in Database, it will update with video id and count after specific time.
I'm not sure how technical I would get right away with a question like this. I feel like you have to tackle the more abstract questions first. Registering that somebody has requested the page a video is on is trivial on the surface. Server gets a request for a video id, finds that id in a DB, increments it. How these things are done can be expanded upon at the DB level, at the level of the code in the server, etc etc
The bigger questions:
- What is a view? If I refresh a page, certainly this doesn't seem like a view, except what if I view it again a week later? Or I don't refresh the page at all, but simply keep replaying the video?
Since we are 'designing' the view count, it's our call. I would only allow one view per day per ip, and ask the interviewer about what holes would exist in blocking by ip. I'd defend this design by saying it's more important to keep view counts overly-honest, because the error will be much lower than those trying to abuse it, where they might boost a video to 1 million rather than a video having 100 views instead of 300.
@asen
We don't really want a hard lock in this situation. I'm assuming a discrepancy of say 10 counts is acceptable.
You can always let your current browser video view count increment and send the count update request to the server. The server can queue the requests and update the count when processing the requests. With this scheme, the difference in view counts can be greater than one when you refresh the video.
I want to point out one observation that hasn't been bought up in the previous discussion. The data you need is simply in your web server's log. Have a reducer to aggregate the log and periodically send back the data to a key-value store. Record-level locking would be fine, because latency requirement is not real-time here.
Always simplify things and start building up.
- ahmad.m.bakr July 14, 2015Assume that we have a small set of videos that we need to track the counts. Also assume that we are not handling any spam. A basic approach is using relational databases (video_id, count) that is being incremented whenever a user clicks on.
What is the problem with this approach? The database operation is very expensive for each click. A possible solution we can have Memcache or Redis layer above the DB that counts the number of clicks for each video. And having a background job that reads the cached counts and reflects that in the DB. Pros: much faster, Cons: more complex, can lose data in case of server failure and cached values have not populated in the DB yet and the values in the DB are not the most recent values.
Another side of improvement is bandwidth wise. Assuem that it is not necessary that the number of hits to be really accurate. We can convert the connection request for increment to be based on UDP rather than TCP.
Again discuss with the interview the pros and cons and let him lead which direction should the discussion goes.