You have a requirement where a

Groupon Interview Question for Senior Software Development Engineers

0

of 0 votes

3
Answers
You have a requirement where a user searches for a search term, which could be say, the title of a movie
and you need to find the title and then show a description of the movie. How would you implement it. Describe the data structures used. If you were going to host this service on a single machine with 250 MB of RAM while you need to handle, say 10 GB of data, how would you do this?
- thejediknight May 23, 2015 in United States | Report Duplicate | Flag | PURGE
Groupon Senior Software Development Engineer Algorithm

Email me when people comment.

An error occurred in subscribing you.

Country: United States
Interview Type: Phone Interview

More Questions from This Interview

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 1 vote

Some stats on the given data:
RAM: 250 MB
Data size: 10 GB - we dont know the nature of this data, is it just text information about the movie or also contains the movie itself. I am presuming the former.

How many movies are there in the world ? Lets put an upper bound on the this information? - 0.5 million ?

Around 20 KB of movie information is present about each of the 0.5 million movies in the given 10GB of data.

If one ought to load all of the 20KB info onto RAM, we can read only 12.5 k movies at a time. In which case, loading partial data from disk and searching for the movie of choice is one way to do the job. One needs to load data 40 times to finish searching for one movie.

Alternatively, If the movie title is hashed (a MD5 hash uses 128 bits representing 32 chars(2 chars per byte)).
The movie info's location at disk can be represented by a path - approximately 20 words? (each word can take ~10 bytes). So in all, each key(title):value(location in disk) would max take 250 bytes

Given 250 MB ram, we can store upon 1 million movies.(which is above our original guess on the number of movies).

So, when user searches for a title, md5 hash the title(takes about 400 MB per second - can handle upto 2000 requests in a second to generate hash value), look up in the in-memory hash map, and do a disk I/O for looking up description and showing it to the user.

Trouble is with disk I/O on concurrent requests.

Nature of this application is more read-oriented than write. So, using concurrent disks like (Multidisk arrays/RAID(0-6 which are more read friendly)s) could help in faster IOPs.

One could also have a multicore machine which can parallelize the process time.

- venkatvi November 28, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Preprocess the DB and create files with title -> movieInfo mapping. Caliculate the hash of the title and append this information to the corresponding file. Given a query, calculate the hash and load the corresponding file, If the file size is more than 250 MB, keep loading the file partially and search the contents.

- v May 25, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Stats:

10GB data

250 MB RAM

How many movies are there in this world ? 0.5 million ?

Amount of information per movie available = 10GB/0.5M = 20 KB of information. So most likely textual information and not the movie itself.

How much can 250MB RAM hold ? 12.5K movies at a time - 40 loads of the data per query to search for title. (linear search/binary search - you still need to load data to acknowledge that the section of data is not what you are looking for.!).

Alternative approach - MD5 hash the movie title - 16 bytes (128 bits). Another 200 bytes to store the movie location on disk. = 250 bytes per movie stored in a hash map = 1 M movies can be looked up!

So, when a query comes in, MD5 movie title, look up in hash map, go to disk (IIIIIIII/OOOOO) and fetch the description and display it!

- Multidisk arrays ? and store in hash map the disk location

- RAIDs for greater IOPs.? (Prefer read- friendly RAIDs for cheaper cost)

Use multi-core machine to parallelize request processing.

- vaidehi.venkatesan November 28, 2015 | Flag Reply

CareerCup

Groupon Interview Question for Senior Software Development Engineers

Books

Videos

Resume Review

Mock Interviews