This is design question. Start

LiveRamp Interview Question for Senior Software Development Engineers

1

of 1 vote

1
Answer
This is design question. Starts with small data size and then goes with bigger data size

Design an application to retrieve the value for a key as fast as possible. The data given to you doesnt change.
You are given a data file of 1GB. Its key value data with key being bytes[]. What would be your application design.
Now assume the data set is 1 TB or greater how would you change your application design.(Distributed is a possibility).
Now assume the keys are very small. What are your optimizations.
- um01 March 04, 2015 in United States | Report Duplicate | Flag | PURGE
LiveRamp Senior Software Development Engineer System Design

Email me when people comment.

An error occurred in subscribing you.

Country: United States
Interview Type: Phone Interview

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 1 vote

1 GB:
The data is stored in a file, as a linear array of data. I would implement an in-memory hash table which maps keys to file offsets. In addition, I would implement an in-memory LRU cache so that the most frequently accessed data can be retrieved nearly instantly.

1 TB:
The data file in the 1 GB example becomes a second-level disk based LRU cache. If the data is of uniform size then the implementation is easy and efficient, when ejecting old data with new data simply overwrite the data in the master file and update the key->file offset table with the new key. If it is not, then the cache would consist in files on the disk system named with the key value. There is an additional (in memory if feasible/disk based if not) mapping from key to server, for retrieving data which cannot be found in either cache.

Small: No need for any disk system involvement, the LRU cache becomes the entire data set.

Assumptions: Key values are small relative to the data. Looking at the last sentence, this might not be true... Key access is not randomly distributed, if it is then the caches do no good.

- tjcbs2 March 05, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Actually we should utilize the fact that key is Bytes[]. So we can have multiple level of paging. Bytes[0] will point to direct file in case of 1 GB and Bytes[1] will have an offset address. But for 1 TB, we will make 1000 key files, each representing 1 GB data and then one file which contains start \ offset of these 1000 files.
While retrieving exact data, we will start traversing Bytes array until n-1 where n is length of array - while (index < key.length - 2) to sum the offset to reach the start of that block which actually contains data and then Key[key.length - 1] will give actual data in block. To speed up that file / table which contains start / offset of files, we can put them in memory mapped file.

- pheonix July 29, 2015 | Flag Reply

CareerCup

LiveRamp Interview Question for Senior Software Development Engineers

Books

Videos

Resume Review

Mock Interviews