I have a log that consists of

Epic Systems Interview Question for Software Engineer / Developers

0

of 0 votes

14
Answers
I have a log that consists of more than 100 million lines. Each line is just a data about user login, login time, etc. I want to sort them based on user login, and then if there is a tie based on login time, etc. However, I have limited memory, so don't think of storing all of them in an array. The memory can only hold n data where n is much smaller than 100 millions. You can access the disk though although it is much slower. How will you do it so that it is as efficient as possible?
- manjoo September 19, 2009 | Report Duplicate | Flag | PURGE
Epic Systems Software Engineer / Developer Algorithm

Email me when people comment.

An error occurred in subscribing you.

More Questions from This Interview

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 2 vote

We can use external merge sort.
Lets say 100 million entries occupies 900 mb.But we have only 90 mb.
divide 900 mb into chunks of 90mb each and sort them.
2. now pick 10 mb each from these sorted chunks 10 * 9 = 90 and keep remaining 10 mb as buffer.

3. proceed as we do in case of merge sort , When this 10 mb gets filled empty it.

- pussy September 21, 2009 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

1. divide into k array which is smaller than n, then sort it
2. Merge k-sorted array

- sophia September 19, 2009 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

m way multi-merge sort

- Anonymous September 19, 2009 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

"Lets say 100 million entries occupies 900 mb.But we have only 90 mb.
divide 900 mb into chunks of 90mb each and sort them."
You say you have only 90 MB, then how can you divide 900MB into chunks of 90MB each. We have only one 90MB to work with.

- Anon September 21, 2009 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

boss use external merge sort......

- kumarg.pavan October 08, 2009 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

i'm sorry i found a duplicate question here.
http://www.careercup.com/question?id=192727

- Sharath October 08, 2009 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

1. Read n element from file sort and save in a new file
2. repeat step 1 till end of original file
3. select one element form 1st n file sort it and save in new file
4. repeat step 3 n times
5. select next set of n file and repeat step 3-4
6. select set of n file from previous step and repeat step 3 ......

Comments are welcome

- Anonymous October 28, 2009 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

That should work just fine. Look up for 'external merge sort'. Common database management technique.

- amused November 12, 2009 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

This algo code Sucks !!!!

- Navjot November 13, 2009 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

^^^^
u suck

- Anonymous February 12, 2010 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

@anon

By "Lets say 100 million entries occupies 900 mb.But we have only 90 mb.
divide 900 mb into chunks of 90mb each and sort them.", pussy meant to say that you first make 100 chunks of 90 mb each (i.e. your original data = 90mb x 100 = 900mb)

Now you take 1st chunk into the RAM (only 90MB capacity), use some kind of sorting (quick/merge etc.) technique over this chunk. Then write this back to the same position in the same place in the file residing in the external storage.
Now you bring next 90MB chunk into the memory and repeat the same steps for it.
Similarly you do the same thing for rest of the chunks.
Now your 100 chunks are sorted within themselves. This is the end of phase 1.

Our task is to merge these 100 sorted chunks together.

Phase 2 follows her steps 2 & 3.

- Britney February 15, 2010 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Since merge-sort has an O(nlogn) comparison-based sorting algorithm, it is practical to run it using slow tape drives as input and output devices. It requires very little memory, and the memory required does not depend on the number of data elements. For the same reason it is also useful for sorting data on disk that is too large to fit entirely into primary memory. On tape drives that can run both backwards and forwards, merge passes can be run in both directions, avoiding rewind time. If you have 4 tape drives, it works as follows:
1. Divide the data to be sorted in half and put half on each of two tapes
2. Merge individual pairs of records from the two tapes; write two-record chunks alternately to each of the two output tapes
3. Merge the two-record chunks from the two output tapes into four-record chunks; write these alternately to the original two input tapes
4. Merge the four-record chunks into eight-record chunks; write these alternately to the original two output tapes
5. Repeat until you have one chunk containing all the data, sorted i.e., for log n passes, where n is the number of records.

- anonymous May 25, 2010 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

One example of external sorting is the external merge sort algorithm, which sorts chunks that each fit in RAM, then merges the sorted chunks together.[1][2] For example, for sorting 900 megabytes of data using only 100 megabytes of RAM:
Read 100 MB of the data in main memory and sort by some conventional method, like quicksort.
Write the sorted data to disk.
Repeat steps 1 and 2 until all of the data is in sorted 100 MB chunks (there are 900MB / 100MB = 9 chunks), which now need to be merged into one single output file.
Read the first 10 MB (= 100MB / (9 chunks + 1)) of each sorted chunk into input buffers in main memory and allocate the remaining 10 MB for an output buffer. (In practice, it might provide better performance to make the output buffer larger and the input buffers slightly smaller.)
Perform a 9-way merge and store the result in the output buffer. Whenever the output buffer fills, write it to the final sorted file and empty it. Whenever any of the 9 input buffers empties, fill it with the next 10 MB of its associated 100 MB sorted chunk until no more data from the chunk is available. This is the key step that makes external merge sort work externally -- because the merge algorithm only makes one pass sequentially through each of the chunks, each chunk does not have to be loaded completely; rather, sequential parts of the chunk can be loaded as needed.

Source : Wiki

- amnesiac February 12, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

- amnesiac February 12, 2014 | Flag Reply

CareerCup

Epic Systems Interview Question for Software Engineer / Developers

Books

Videos

Resume Review

Mock Interviews