Epic Systems Interview Question
Software Engineer / Developers1. Read n element from file sort and save in a new file
2. repeat step 1 till end of original file
3. select one element form 1st n file sort it and save in new file
4. repeat step 3 n times
5. select next set of n file and repeat step 3-4
6. select set of n file from previous step and repeat step 3 ......
Comments are welcome
@anon
By "Lets say 100 million entries occupies 900 mb.But we have only 90 mb.
divide 900 mb into chunks of 90mb each and sort them.", pussy meant to say that you first make 100 chunks of 90 mb each (i.e. your original data = 90mb x 100 = 900mb)
Now you take 1st chunk into the RAM (only 90MB capacity), use some kind of sorting (quick/merge etc.) technique over this chunk. Then write this back to the same position in the same place in the file residing in the external storage.
Now you bring next 90MB chunk into the memory and repeat the same steps for it.
Similarly you do the same thing for rest of the chunks.
Now your 100 chunks are sorted within themselves. This is the end of phase 1.
Our task is to merge these 100 sorted chunks together.
Phase 2 follows her steps 2 & 3.
Since merge-sort has an O(nlogn) comparison-based sorting algorithm, it is practical to run it using slow tape drives as input and output devices. It requires very little memory, and the memory required does not depend on the number of data elements. For the same reason it is also useful for sorting data on disk that is too large to fit entirely into primary memory. On tape drives that can run both backwards and forwards, merge passes can be run in both directions, avoiding rewind time. If you have 4 tape drives, it works as follows:
1. Divide the data to be sorted in half and put half on each of two tapes
2. Merge individual pairs of records from the two tapes; write two-record chunks alternately to each of the two output tapes
3. Merge the two-record chunks from the two output tapes into four-record chunks; write these alternately to the original two input tapes
4. Merge the four-record chunks into eight-record chunks; write these alternately to the original two output tapes
5. Repeat until you have one chunk containing all the data, sorted i.e., for log n passes, where n is the number of records.
One example of external sorting is the external merge sort algorithm, which sorts chunks that each fit in RAM, then merges the sorted chunks together.[1][2] For example, for sorting 900 megabytes of data using only 100 megabytes of RAM:
Read 100 MB of the data in main memory and sort by some conventional method, like quicksort.
Write the sorted data to disk.
Repeat steps 1 and 2 until all of the data is in sorted 100 MB chunks (there are 900MB / 100MB = 9 chunks), which now need to be merged into one single output file.
Read the first 10 MB (= 100MB / (9 chunks + 1)) of each sorted chunk into input buffers in main memory and allocate the remaining 10 MB for an output buffer. (In practice, it might provide better performance to make the output buffer larger and the input buffers slightly smaller.)
Perform a 9-way merge and store the result in the output buffer. Whenever the output buffer fills, write it to the final sorted file and empty it. Whenever any of the 9 input buffers empties, fill it with the next 10 MB of its associated 100 MB sorted chunk until no more data from the chunk is available. This is the key step that makes external merge sort work externally -- because the merge algorithm only makes one pass sequentially through each of the chunks, each chunk does not have to be loaded completely; rather, sequential parts of the chunk can be loaded as needed.
Source : Wiki
One example of external sorting is the external merge sort algorithm, which sorts chunks that each fit in RAM, then merges the sorted chunks together.[1][2] For example, for sorting 900 megabytes of data using only 100 megabytes of RAM:
Read 100 MB of the data in main memory and sort by some conventional method, like quicksort.
Write the sorted data to disk.
Repeat steps 1 and 2 until all of the data is in sorted 100 MB chunks (there are 900MB / 100MB = 9 chunks), which now need to be merged into one single output file.
Read the first 10 MB (= 100MB / (9 chunks + 1)) of each sorted chunk into input buffers in main memory and allocate the remaining 10 MB for an output buffer. (In practice, it might provide better performance to make the output buffer larger and the input buffers slightly smaller.)
Perform a 9-way merge and store the result in the output buffer. Whenever the output buffer fills, write it to the final sorted file and empty it. Whenever any of the 9 input buffers empties, fill it with the next 10 MB of its associated 100 MB sorted chunk until no more data from the chunk is available. This is the key step that makes external merge sort work externally -- because the merge algorithm only makes one pass sequentially through each of the chunks, each chunk does not have to be loaded completely; rather, sequential parts of the chunk can be loaded as needed.
Source : Wiki
We can use external merge sort.
- pussy September 21, 2009Lets say 100 million entries occupies 900 mb.But we have only 90 mb.
divide 900 mb into chunks of 90mb each and sort them.
2. now pick 10 mb each from these sorted chunks 10 * 9 = 90 and keep remaining 10 mb as buffer.
3. proceed as we do in case of merge sort , When this 10 mb gets filled empty it.