Goldman Sachs Interview Question
Software Engineer / DevelopersCountry: United States
Interview Type: Phone Interview
Split the input file into the "N" chunks (in this case, 2Gb each) that total to size "K" bytes (10Gb in this case). Load the chunks per request. When the memory is full, unload the oldest chunk/chunks to make space for new ones. Among other possible scenarios, You can unload a chunk based on the reference count of the given chunk (i.e., ref = 0).
A disk file typically has sequential access and hence poses challenges in terms of accessing a record randomly.
Therefore, make a copy of the file such that records of it form a doubly-linked list(which means records will have pointer to the previous one also). Use a double-ended queue(deque) of size ~2GB, so that as records are accessed from first to last or vice-versa, old records are de-queued from one end and new records are added at the other end.
if only data is to be stored ,we may follow a good compression techniques .Otherwise we have to make logical storage in 2GB memory of 10 GB data.Please comment if i misunderstood problem
Folks, the entire file (or data) has to fit into 2G of memory. Partial writing/reading of chunks of data is not an acceptable solution.
Put it on drop box, and store the path! only 2GB? pshaw.
Otherwise possibly discuss compression algorithms, might be domain specific etc.
We Can DO Something Like This:
- acJavaFreak September 04, 2013Physical Data Logical Data
2 GB --> Lines of File [1-10]
2 GB --> Lines of File [11-20]
2 GB --> Lines of File [21-30]
2 GB --> Lines of File [31-40]
2 GB --> Lines of File [41-50]
2 GB --> Lines of File [51-60]
2 GB --> Lines of File [61-70]
2 GB --> Lines of File [71-80]
2 GB --> Lines of File [81-90]
2 GB --> Lines of File [91-100]
And We can Have a HASH MAP , storing the Key Range and The Physical Contents it Has IN a LOGICAL MANNER.
MAP Eg:-
{1$10 , Lines 1-10},
{11$20 , Lines 11-20} . etc.....
In This Manner , no need to load the entire file at one go , Instead we can Have OUR Map Decide WHich Portion of FIle to Load based on the Line to be Read.
Moreover we can index words that are non redundant, to help us in better searching phrases or form meanings from the sentences.
Eg: "Chemicals" can be index to be containing in the the following keys of the Hash Map
"31$40" & "82$90"
weapons is another word that is found in key "82$90". Such combination of non redundant and import words can be used to search important pharases from the file without loading the entire File.
- ARgho Chatterjee.