Microsoft Interview Question
Software Engineer / DevelopersI mentioned counting sort, but the interviewer said that we assumer the range of the numbers are unpredictable, counting sort doesn't work ( since counting sort requires all the numbers fall within a finite range, which is the size of memory we need)
i am not sure about this one too. here is another pigeonhole sorting algorithm.
iterate through the entire array and check the min and maximum. this take O(n). now, instead of count sorting each element into individual buckets. have buckets of certain ranges.
Now, say you have buckets such that
b[0]: 0-100000
b[1]: 100001-200000
b[2]: 200001-300000
....
b[9]: top of range
you can say that top buckets + x items in some middle bucket add to 1000000 items and they are the top 1 million numbers.
i guess my point is that eventhough the range might be unpredictable, if you read it, it does not hurt your O(n) algorithm.
Use the heap sort.
Construct the heap is O(n)
Then it taks O(c * lg n) to find the largest C items.
So the total time is still
O(n + c * lgn) = O(n).
The construction of heap doesnot take place in o(n) time , it takes O(nlogn) . The solution proposed would not work.
Construction of heap takes O(n) time. Its a tighter bound. Refer to Cormen for more details.
Say we have to find the k largest elements in an array. We can use the the logic in Cormen Rivest to find the kth largest element using the same logic of partitioning the array around a pivot.
Here is the Pseudo code:
FindKLargest(arr, int start, int end, int k)
{
i = Randomized_Partition(arr, start, end)
if(i > k)
FindKLargest(arr, start, i, k)
else if(i < k)
FindKLargest(arr, i+1, end, k)
}
The arr will contain all the largest elements from the index (n-k) to n.
What do you think?
Should the Pseudo code be:
FindKLargest(arr, int start, int end, int k, int length)
{
i = Randomized_Partition(arr, start, end)
if((length-i) < k)
FindKLargest(arr, start, i, k-(length-i))
else if((length-i) > k)
FindKLargest(arr, i+1, end, k)
}
The solution proposed by temp1 will probably work. We can also use partial sorting here. Take the first million numbers and keep them in minimum priority queue. If the next number in the input list is less than min priority queue value, we replace the value in the heap. This way, we get nlogm solution and if "m" is small which in this case is, we can say the solution is O(n).
temp1 solution is Quick Sort Algo...
One more solution...
Use bit vector... Initialize it initally to zero.
Bit Vector will contain 0 to 1 million bits.. A number less than 1 billion will be subtracted from 1 billion and the so obtained subtracted value will be the index of that bit in the vector. Set that bit to 1.
Once all the numbers are scanned..navigate the vector from end to 0th location.. Go on printing only that mber that has its bit set.. The number will be 1billion - (size - location).
Since we have the internal memory to hold the 1 billion number , why cannot we go for using radix and bucket sort ?
Comments are appreciated.
What I initially though of is that we construct a max heap which will take o(n) time. Now at the first level of heap we have only 1 element. At second level we have 2 and at h height we have 2^h elements. We also know that elements at each level are smaller than the elements that are one level above. If we just want to pick top i million numbers without caring that they should be sorted we can just scan through the heap printing elements till we have reached the 20th level. This is obtained as follows: 2^0 + 2^1 + 2^2 + ----+2^h = 1 million -->Applying GP 2^(h+1) = 1 million = 2^20 and so h = 19 .So we only need to scan till level 19th and we have top 1 million elements(a little more than a million) and obviously not sorted. Any comments??
Gauravs solution will not work. Heap just means that parent is greater than children. Which means you can have a heap like
100
85 90
80 82 17 19
75 76 55 56
So in this case your solution will miss out on 75,76,55,56.
kth largest element is the correct way to go.
Say we are supposed to find K largest integer from n integers where n>=k
1) We can sort the integer using any good sorting algorithm like merge sort or heap sort, which is having (nlogn)comlexity.
2) Now take 1 pointer from the begining of the list and move it n-k times forward.
Nw we have all K largest integer from n integers
Use a bit vector ... Keep on setting the bits depending upon whether the number is visited .... Once all billion numbers are stored spit the largest one million numbers ...
I guess we need to use the Selection in Worst case linear times method - using median of medians.. to find the Kth smallest/Largest number....
Once we have gotten (billion -million)th number using this approach in O(n) time.. then do a scan of the entire list to spit the numbers >= that No...
Million is a constant. Billion is a constant. Million multiplied by a billion, raised to any power, etc., is still a constant. So any algorithm will do, because the result is O(1). It's only when the size of the input is a variable like n, that you get O(n), O(N log n), O(n^2) complexities.
Look up counting sort. Using an approach similar to this would give you linear time.
- vodangkhoa June 13, 2007