Yahoo Interview Question
Software Engineer / DevelopersCountry: United States
Yes suppose the values start from 50,000 to 50000000. Since these are huge set of values, so I would also pick up counting sort as algorithm
This is how I would answer. Not sure if it is right though :)
1) Merge sort: If the data is too big to fit into memory, I will use something like external k-way merge sort. - O(nlogn)
2) Insertion sort: If the data is almost sorted, this is the way to go !
3) Quick sort: If the data is almost unsorted and is small to fit into memory and extra space is not allowed, this is the way to go !
4) Counting sort: If additional O(n) space is allowed, data is almost unsorted and small enough to fit into memory and most importantly, the difference between numbers is much lesser than the number of different numbers, this is the way to go !
5) Bubble sort: if the data is small enough to fit in memory, almost sorted in ascending order, this can be used. But I will still incline to use insertion sort.
1) I don't think a k-way merge sort has O(n log n) like a traditional one.
4) Not O(n) space but O(range), where range is 50 000 000 - 50 000
This question would be best answered with more questions to narrow down the problem. Asking things like:
Where is the data coming from?
How will the data be used?
etc.
This will give you a better understanding of if the data is partially sorted or not and later of implementation, if the data needed to be randomly accessed. You can then sniff out if the worse case is for the sort you will be using will ever be reached and if you could optimize better with a sort because the data is identical to that of a best case.
Quick sort can have a worst-case running time of O(n^2) if we don't implement the partitioning function properly. For instance, if we are so unlucky to pick the longest as the pivot! I think mergesort is a good call since it is very stable. Counting is good but the range of value is too big. It needs to additional spaces of O(array-size) and O(max(array-element)).
Quick sort has a worst-case running time of O(n^2), no matter how good your partition function is (unless the number distribution is known in advance). Good partition functions can decrease the probability of hitting O(n^2) on a random input.
No one has mentioned that the values will not fit into an unsigned 4 byte integer (int). You can use an 8 byte integer though. But there can be an excessive cost associated with using 8 byte integers, depending on the h/w.
But deciding which sort to use really depends on whether the amount of data to sort, as well as whether it is likely to be already partially sorted.
the values go up to 50 million whereas the unsigned 4 byte int holds values in the order of 4 billion..
We should ask how many values are in the array. If we have millions of elements, counting sort is the way to go since the range fits easily in memory. Otherwise, I would pick quick sort.
- Miguel Oliveira July 30, 2013