Riverbed Interview Question
Software Engineer / DevelopersCountry: United States
This approach is a bit slow, like you mentioned. One way to improve it would be to use a min heap of size m to keep track of the next smallest element of each array. Until there is no more data remaining, remove the min element in the heap, and then add to the heap the next element from the same array the removed element originated from (so you'd have to keep track of that info together with the values themselves). What this does is cut your search time for getting the minimum of m arrays down to O(log m) from your earlier O(m), giving a total complexity of O(m*log(m)*n). Note that heapsort is a special case of this algorithm where n = 1.
_
Merging pairs of arrays is a decent approach too. It's basically like making the final log(m) passes of merge sort, and so will run in O(m*n*log(m)) time. A standard merge sort is a special case of this algorithm where n=1 (and then there are m*1 = m total elements, and they are sorted in O(m*log(m) time)).
_
Which of the two efficient algos discussed here is better probably depends on use case. The heap algo requires more operations on highly localized data but makes only one pass over the input. In an environment where the data doesn't fit into memory and disk reads are expensive, the heap approach may perform better. On the other hand, the second algorithm (which reads each element a logarithmic number of times) is easy to parallelize across multiple machines.
Merge pair of arrays
blockSize = n;
totalLength = m * n;
while (blockSize < totalLength)
{
//each iteration
int currentPos = 0;
while (currentPos + blockSize - 1 < totalLength - 1)
{
int firstBlockAddress = currentPos ;
int firstBlockEndAddress = currentPos + blockSize - 1;
int secondBlockStartAddr = firstBlackEndAddress + 1;
int secondBlockEndAddr = min (secondBlockStartAddr + blockSize - 1, totalLength - 1);
merge (first, firstEnd, second, secondBlock);
currentPos = secondBlockStartEndAddr + 1;
}
blockSize += blockSize;
}
One more approach:
It may not too efficient, but simpler one.
Create a new big array of size mxn elements. Copy all the arrays one by one to this new array. Sort the new array using merge sort.
Time Complexity :
1) First to copy all elements - O(n)
2) Then merge sort - O(nlog n)
Total O(n) + O(n logn)?
I am thinking merging arrays of group two to create m/2 bigger array and then merging those arrays in groups of two to create m/4 array etc would give us the best time complexity.
However, I am trying a simple approach below. At each iteration, I am finding the minimum of all the m arrays and inserting them in the big array. This is just an extension of simple merging of two sorted arrays.
Getting minimum of m arrays is O(m) and there are m*n elements. Therefore total complexity is going to be O(n*m^2). I hope someone comes up with a smarter way! Space complexity is also O(m*n) but that's not a problem because the question asks us to merge them in a (supposedly new) big array.
- vijay January 14, 2012