Given a large file with millio

Amazon Interview Question for Software Engineer / Developers

0

of 0 votes

10
Answers
Given a large file with million lines of data(phone numbers), give a most efficient way to sort the phone numbers.
- qwerty March 10, 2011 | Report Duplicate | Flag | PURGE
Amazon Software Engineer / Developer Sorting

Email me when people comment.

An error occurred in subscribing you.

More Questions from This Interview

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 1 vote

This is called external mergesort.

- Maxim Stepanenko March 10, 2011 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Split the file such that each split has a suitable number of phone numbers that can be held in memory. Now, sort each split using any efficient sorting algorithm ( radix sort would be linear) and write it back to the same location in the file.

Once all the splits are sorted, we can have a array of file pointers to each of the file locations where splits start. While scanning through the sorted list, one can write the numbers to another file.

- novice March 10, 2011 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

use bitmap

- Anonymous March 10, 2011 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

+1

create a bit vector of size 1 million, iterate through the phone numbers and set the corresponding bit (O(n))

then output the nonzero elements of the bit vector in order (O(n))

- airfang613 March 10, 2011 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

a 10 digit phone number needs a bitmap of size 10 billion no?

- M March 15, 2011 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

this is the first problem from Programming pearls book

- Anonymous March 11, 2011 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Well, several millions of entries can be sorted very fast (1 second is more than enough) using any traditional algorithm because they can be stored in memory. For telephone numbers I would prefer radix sort.

- ftfish March 11, 2011 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

use radix sort

- forkloop January 08, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

External sort + internal Sort

Step 1: partitioning
Divide the original file to k small file, k = original file size / available memory size .

Step 2: Internal sort
Input this small file to memory one by one, sort them and write back. In this case, maybe radix sort is better ( linear time, but linked list is used , overhead is large ), some other tradition sort method is OK.( maybe quicksort is best, it is called “QuickSort” for a reason☺ )

Step 3: Merge
You could chose M-way merge you want, 2 < M <= k ,
Pass = logM k

The key point in this case is to reduce I/O times as small as possible . In every single pass, all files are passed to memory and write back .
Total I/O = 2* Pass * (Blocks number of files )

Which means we need to reduce Pass number. There are 2 ways to achieve this:
1. enlarge M
Note: if M become so large, we’d better employ “Loser tree”, which reduce compare time to logM, otherwise we need to compare M-1 times to find a winner
2. reduce k
We could resort to “Replace-Selection Method” . The rule of thumb is that we could half k .( If we use “Replace-Selection Method”, every initial small file size will be different , to optimize I/O times, a Huffman tree style merge is recommended )

- iatbst February 02, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

CareerCup

Amazon Interview Question for Software Engineer / Developers

Books

Videos

Resume Review

Mock Interviews