Facebook Interview Question
Software Engineer / DevelopersThat's wrong. In order to run a classic merge algorithm you would need to load all the data in memory and you would have the same problem.
You certainly need to run a merge algorithm with the sorted files, but you need to have just a min-heap in memory that keeps only the minimum element of every file that havent been processed.
That's wrong. In order to run a classic merge algorithm you would need to load all the data in memory and you would have the same problem.
You certainly need to run a merge algorithm with the sorted files, but you need to have just a min-heap in memory that keeps only the minimum element of every file that havent been processed.
upps, sorry for comment twice, network issue...
that exercise is well explained in the book "Algorithms for Interviews"
No guys, Anonymous is right. He splits up the 1TB file into 1024 smaller files. Sorts each file separately in memory (each small file is 1GB). Then does a k-way (1024-way) merge of the sorted smaller files in memory (and writes out result to a new file).
For the merge part we don't need to have all 1TB available in memory. We just need the (current) smallest element from each chunk (smaller file); the smallest of these 1024 smallest elements gets written to the final result file. We can maintain the smallest element with a min-heap. Merging (i.e. the heap) requires only 1024*4 bytes (assuming one element is 4 bytes) which is just 4K.
So we'll have plenty of space for extra stuff to improve performance like 1024 buffers to read elements from the smaller files and an output buffer to write elements into the resulting file (after popping head of the heap 2^38 times) - so we don't have to read/write elements from/to disk one-by-one.
Two way? That would mean splitting up the file to 2 smaller files each with size of 512 GB. To perform the merge sort you have to sort the smaller files first; but they don't fit into the main memory... Of course you can recursively two-way external sort each of the smaller files; if that's what you meant. ;)
load 1Gb file content onto the main memory...sort it using inplace heap sort...write the sorted data in a temporary file....do this for all 1000 chunks and then merge these files
- Anonymous May 11, 2011