## Amazon Interview Question Software Engineer / Developers

- 0of 0 votes
I have 10 million 10-bit integers to sort, how would you sort them and what's the time complexity?

Follow-on question: Instead of sorting integers, I now have 10 million pairs to sort. Each pair consists of a 10-bit integer and an object, the sort order is determined by the 10-bit integer. Will your original sort algorithm hold or do you need sort it differently?

(A word of advice: Ask as many questions as you want during the interview, but you MUST be quick. Also, don't mention anything until you've thought it through clearly, otherwise you're just inviting more questions. Time is of essence, you're too slow if this question takes you more than 15 minutes to come up with the optimal solution, because remember, you have to leave time for explanations and other questions)

**Country:**United States

Imagine that you need to sort and store the following one-digit numbers:

098723459681623612093851092837561827349878210

now, you could sort all of those and store them in order, or you could just have an int[10] and for each number, add to that cell in the array. so for the first digit, you do sort[0]++, for the second digit you do sort[9]++, etc.

At the end, rather than actually having the list, you have a summary of the list. You know that the list starts with four zeroes, but instead of storing that as 0000, you just have sort[0] = 4. To reproduce the list in sorted order, you would need to loop through the entire list and for each index, print that index a number of times equal to the value in that cell.

Assuming that there's no sub-ordering within the pairs, so it doesn't matter which of the objects associated with 0 comes first, what we're going to do is instead of having sort[0] = the number of zeroes, we're going to have sort[0] = ArrayList<Object> where that list is all of the objects associated with 0. Adding to the front of a linked list is O(1) so that's nice and fast. Does that make sense?

Nice. Once you realize that 10 bit integer is nothing but 1024, it is a simple count sort.

Sorting usually implies that all the data from the input exist in the output.

The problem says that there are 10 million integers but the solution Mike provided gives 1024 integers, so it just loses the data and therefore seems to be incorrect. Am I missing something?

Also, consider the input that consists of 5 million of number 512 and 5 millions of number 3. The proposed solution does not produce the correct answer, doesn't it?

okay. counting sort is not in place algorithm, it needs O(N) extra memory. shouldn't it be considered as the stop sign?

1) why wouldn't it get the correct answer for the case of 5 million 2s and 5 million 3s?

2) as far as needing extra memory, it doesn't require O(N), it only requires 1024 extra ints, so 4096 bytes plus minimal overhead for the definition of the array, and that doesn't need to grow unless N gets larger than an int (and even then it just doubles in size when you use longs, and technically it could be an unsigned int in languages that support that so N needs to be twice as big). That doesn't seem like a big concession to make in return for sorting in O(N) time, and if there isn't anything important in the original data's ordering, you could actually look at this as a really efficient form of compression and throw out the original list.

Okay, the code you have posted seems to be the 1st part of the counting sort(the code below copied from wikipedia article)

```
# allocate an array Count[0..k] ; THEN
# initialize each array cell to zero ; THEN
for each input item x:
increment Count[key(x)]
```

You didn't complete the code, so strictly speaking it IS invalid

Then, the counting algorithm analysis clearly stays that because it uses arrays of length k + 1 and n, the total space usage of the algorithm is also O(n + k).

I would be shocked if my initial answer wasn't sufficient in an interview, but if they actually needed you to re-expand the compressed data into the original sorted list, it's pretty straightforward:

```
ArrayList<Integer> output;
for (int i = 0; i < 1024; i++) {
for (int j = 0; j < sort[i]; j++) {
output.add(i);
}
}
```

Strictly speaking this code is invalid as well. The output should be the array, not the array list because the input is array. At least you should call output.toArray() or something like that.

Then, this code is ignoring the fact that all the elements of the input array are 10 bit integers and is using Integer type(32 bit), so 40Mb of extra memory instead of 10Mb(or 20Mb for 16-bit integers) is being used.

Finally your code is 1) invalid 2) not-optimal. I would be shocked if you were hired with this level of answer :P

now you are just trying to make yourself feel good by arguing on unimportant points. number of upvoting of Mike's solution is a proof enough for his ingenuity specially for the second part. i would have walked out of the interviewer if the interviewer argued the way you are doing. Looking back at all your comment i am damn sure that until yesterday you didn't even know about Count Sort. Looking at youur comment about the O(n) extra space makes you look even more stupid. sweetie there's nothing wrong in accepting the fact that you did not know something and someone else helped you learn it. demeaning/destroyin someone's work would not make you greater than the creator of that work.

Thanks for your valuable comment, Mr. Artist, I am glad that someone is tracking my messages :) So good to not feel lonely.

if seeking attention is your intent behind all your fuss over nothing, then you are at the wrong forum buddy

agree to mike above...10 bit integer means just 1024 distinct values and count sort is the best way. mike's suggestion to the second part can be achieved with an array of structures with struct definition as follows

```
typedaf struct array_element
{
object obj; //origianal original def
array_element next;
}
```

define an array like below

array_element a[1024];

where a[i] = first pair in the given list with integer value i. all following pairs with int i should be stored in as the next node of a[i] and so on

great solution, but a negative integer in the list would throw ArrayIndexOutOfBound error. Right, as it is not mentioned positive integers? Which would mean the number range from -1024 to 1024. A slight modification would suffice though

int[1024] sort;

int[1024] sortNegative;

for (i in list) {

if(i < 0) {

sortNegative[i*(-1)]++;

} else {

sort[i]++;

}

}

@Mike Lewis, your solution will work if the algorithm need not be stable. ie none of the given elements have any satellite data, that must be preserved after the sort. In case of repeating numbers, this algorithm will break.

Counting sort requires O(n) space in order to be stable. For this problem that means O(1000000) extra space.

Radix sort would be a good choice in this problem. It is stable and requires constant extra space. Running time is O(nk) where k is the max number of digits.

In this particular case, it beats comparison sort algorithms as k(4) is much smaller than logn(1000).

logn>>>k

part 1:

which is O(n).

- Mike Lewis on November 12, 2012 Edit | Flag Replypart 2: if the pairs are really just integer and object, I would probably just have an array of linked lists of some sort, which will still be O(n) (get the first item in the appropriate list and add the new item before that)