Google Interview Question for Software Engineers

Country: United States
Interview Type: In-Person

Comment hidden because of low score. Click to expand.
of 2 vote

Use a eventual consistency data store. Something like NoSQL. Of course that data is distributed. For likes and other stuff, strong consistency should hamper performance.

- Noobie January 30, 2015 | Flag Reply
Comment hidden because of low score. Click to expand.
of 2 vote

For me on hight level, there would be different application servers which would be there to handle the requests (might be load balancer can decide them). Initially in memory of server there would me many queues for up and down. At one time one queue would be filled with video id for up and similarly one queue with down. As the queue will full, another queue would be active and it will signal the background process.
Background process will update database by number and with version of master DB which has syncronized last. This table will only contain how many videos has liked or disliked at this session on their end.
When master DB will synchronize, it will check last updated number and add, reduce the count. Once master DB will get information, it will start updating databases with updated info. We can use primary secondary concept. So after Master DB will update secondary DB, it would be primary and so table lock would be required and then application server will start pointing new primary DB. Then after that new secondary will update.

- phoenix August 05, 2015 | Flag Reply
Comment hidden because of low score. Click to expand.
of 0 vote

consider a high volume, wide audience context

- rjrush January 30, 2015 | Flag Reply
Comment hidden because of low score. Click to expand.
of 0 vote

I believe the interviewer expects you to talk about sharding.

- anan January 30, 2015 | Flag Reply
Comment hidden because of low score. Click to expand.
of 0 vote

* For each video - have a counter for happy, and a counter for non-happy
* One user can either like or don't like a video. You can cancel you like or dislike

One machine design
* Users table
* Video table
* User to video table - this table will have a third column for like\dislike. Since the user can only choose one option.

Capacity in large systems
* 1 billion users (100 million active users)
* Videos 1 billion per day => 365 billion in one year. 5 * 365 = 1,825 billion in 5 years
* Average of likes per video: 100 (many videos with no likes, but some videos with a lot of likes. So in 5 years this is the number of records in the like table: 182500 billion.
* If we keep all the relationship between user and movie in a table:
* Keep one record: 64 byte for the ID of the video + 64 byte for the ID of user + 1 byte for the like\dislike = 130 bytes = 0.15 KB for each record. Total size 18250 billion KB ==> 18250 TB ==> 18 Petabyte
* So we will do something else. For each movie we will keep the count of like and dislike. And for each user, we will keep the list of movies that he liked.
* For each movie 2 counters, size of int: 8 byte * 1825 billion = 14600 billion byte = 14.5 TB
* For each user the list of likes: each user can have 1000 likes = 1000 billion * 64 byte = 64000 billion byte = 64 TB
* Keep the movie counters in DB, and also for the recent movies keep it in memory
* Keep user to movie data in DB

Handle concurrency
* What happen when 2 users update the count of a movie? The server should get the requests as "user X, Movie y, + 1". For example use the database capability to do "select for update".
* Make sure to update memory cache each time the counter updates in DB.

How many machines?
* Let say that 50% of the videos are seen daily, so we need 7 TB in memory for the counters. So considering 70 GB of RAM, this means 100 servers. And 10 more servers for handling machines failures.
* Data base: considering 4 TB SSD drive per machine: 16 machines for the user likes, and 4 machines for movie count. Let say that each machine have a slave so 40 machines

How the data is spread in machines? Should be according to location. Both the users data and the movies data.

- hava.babay February 17, 2017 | Flag Reply
Comment hidden because of low score. Click to expand.
of 2 vote

in a high level view, I would suppose it would consist of new machines/virtual machines being spun ( with initial counter being 0) by other machines whenever their load threshold is reached. a load balancer equaly distributes the load between all the machines. each of the machine knows about all the other machines and whenever they are idle they seek some other machine to merge their count into, and then shutdown. until finally only one machine is left with the counter.

- Adi February 18, 2015 | Flag Reply

Add a Comment

Writing Code? Surround your code with {{{ and }}} to preserve whitespace.


is a comprehensive book on getting a job at a top tech company, while focuses on dev interviews and does this for PMs.

Learn More


CareerCup's interview videos give you a real-life look at technical interviews. In these unscripted videos, watch how other candidates handle tough questions and how the interviewer thinks about their performance.

Learn More

Resume Review

Most engineers make critical mistakes on their resumes -- we can fix your resume with our custom resume review service. And, we use fellow engineers as our resume reviewers, so you can be sure that we "get" what you're saying.

Learn More

Mock Interviews

Our Mock Interviews will be conducted "in character" just like a real interview, and can focus on whatever topics you want. All our interviewers have worked for Microsoft, Google or Amazon, you know you'll get a true-to-life experience.

Learn More