LiveRamp Interview Question
Senior Software Development EngineersCountry: United States
Interview Type: Phone Interview
Actually we should utilize the fact that key is Bytes[]. So we can have multiple level of paging. Bytes[0] will point to direct file in case of 1 GB and Bytes[1] will have an offset address. But for 1 TB, we will make 1000 key files, each representing 1 GB data and then one file which contains start \ offset of these 1000 files.
While retrieving exact data, we will start traversing Bytes array until n-1 where n is length of array - while (index < key.length - 2) to sum the offset to reach the start of that block which actually contains data and then Key[key.length - 1] will give actual data in block. To speed up that file / table which contains start / offset of files, we can put them in memory mapped file.
1 GB:
- tjcbs2 March 05, 2015The data is stored in a file, as a linear array of data. I would implement an in-memory hash table which maps keys to file offsets. In addition, I would implement an in-memory LRU cache so that the most frequently accessed data can be retrieved nearly instantly.
1 TB:
The data file in the 1 GB example becomes a second-level disk based LRU cache. If the data is of uniform size then the implementation is easy and efficient, when ejecting old data with new data simply overwrite the data in the master file and update the key->file offset table with the new key. If it is not, then the cache would consist in files on the disk system named with the key value. There is an additional (in memory if feasible/disk based if not) mapping from key to server, for retrieving data which cannot be found in either cache.
Small: No need for any disk system involvement, the LRU cache becomes the entire data set.
Assumptions: Key values are small relative to the data. Looking at the last sentence, this might not be true... Key access is not randomly distributed, if it is then the caches do no good.