Microsoft Interview Question
Computer ScientistsCountry: United States
Interview Type: In-Person
@blackfever
>>total time complexity will be ->m is total no of element i.e m=k*n
What about the heap operation of insert that is of O(lgk) ? And you will have O(nk) inserts into the heap.
time complexity will be m*[log k] assuming log k << m so it can be taken as m ofcourse we need to do nk inserts in heap
>> assuming log k << m so it can be taken as m
FYI, In big-oh notation, conventionally you can ignore constants, but can't drop functions. Algorithms with complexity of O(log*(n)) are still mentioned as having asymptotic complexity in n, but never considered as constant-time algorithms just because log*(n) is very small.
How do we know which array that the removed element (from the heap) came from? Do we have to check all arrays for that element?
@guest in the heap we keeping 2 things the value of element and the array to which it belongs so when removed we will know which element to fetch
what is constant factor of heap sort in k sorted array
as far i know constant factor of heap sort ---> 11.62* (nln(n))
and quick sort --->2* (nln(n))
but in case of k sorted array it will remains same or change and why???
The algorithm that I am suggesting is O(m) and the way the question has been framed I can understand that the total time should be m. But if its actually O(m) then here's my solution.
1. Pair all arrays such that we have k/2 arrays. Merge the arrays in each pair. Time complexity to merge each individual pair is O(n) (n comparisons). Time complexity to merge all such pairs is (n*k/2)
2. Repeat step 1 till we have got our final sorted array.
Total time complexity for all merges is:
= k/2*O(n) + k/4*O(n) + k/8*O(n) +.....
= O(n) * k *(1/2 + 1/4 + 1/8 + ...)
= O(n) * k * 1/2*(1/(1-1/2)) (Geometric series)
= O(n) * k
= O(n*k)
= O(m)
you can consider this approach:
a. Now in your example there are 5 arrays and they are sorted.
b. Let us suppose the arrays are a,b,c,d,e. Then take three variables i,j,d such that i,j=0 and d=m that is the size of the array. Take a destination array of size 5m that is didx[5m]
c. Then do this.
1.while(i<m&&j<m)
if a[i]<didx[d] then didx[d]=a[i],i++,d++
else didx[d]=b[j]j++,d++
then take i=0,d=0,j=m
2.while(i<m&&j<3m) then
if(c[i]<didx[j] didx[d]=c[i],i++,d++
else didx[d]=didx[j],j++,d++
again take i=0,j=0,d=4m and then do the above step 1
then take i=0,d=0,j=5m and do step 2 at last didx holds the sorted elements with 15 swaps.
How about if we create a BST using first array and then for each array keep adding elements into the tree as per BST rules. Insertion in tree if counted as one for each element will be K*n = m
Once tree is completed, simply traverse using pre-order traversal.
Since space is not an issue, we can create a hash map with two entries: one for the actual values and second for their frequencies. It takes O(k*n) to hash all the elements of these lists, based on the values of the nodes of these lists. As a result, the hash entries will be sorted (they are based on the nodes' values). Next, we write these values, as per their corresponding frequencies, in the final destination. The pseudo code looks like :
for(i=0;i<HashSize;i++)
{
for(j=0;j<Hash[i][0];j++) //say, hash[i][0] refers to the freq. and hash[i][1] to the actual values
destination[index++]=hash[i][1];
}
This is also O(n*k), ignoring the inner loop that adds a small factor, depending on the number of repeated values. If this factor is big, it implies that the range is very small and values are mostly repeated, and hence does not affect the overall complexity.
I dont understand why people are bent on using a heap here?
Wouldn't it be more efficient to just keep track of the current ptr (named curPtr) in each of the K lists.
1.) All the K curptr are initialized to 0. We find the minimum element from the set of K elements formed by (list1[curPtr1], list2[curptr2], etc)
2.) We output the corresponding minimum element and hence increase the corresponding curPtr
3.) We need to make sure that once the curPtr for some list reaches the "n" we ignore it in the next iteration
4.) Since we will output one element in each iteration. The function will run in O(K*n) time.
Maybe I have missed something or committed some grave blunder. So, please feel free to comment.
UPDATE: Here is some quick code:
int _tmain(int argc, _TCHAR* argv[])
{
int k, n, mat[100][100];
printf("Enter K:");
scanf("%d", &k);
printf("Enter N:");
scanf("%d", &n);
for(int i = 0; i < k; i++)
{
for(int j = 0; j < n; j++)
{
scanf("%d", &(mat[i][j]));
}
}
int ptr[100], min = 0, index;
int done = 0;
for(int i = 0; i < k; i++)
{
ptr[i] = 0;
}
index = 0;
for(int i = 0; done == 0; i++)
{
min = mat[index][ptr[index]];
for(int j = 0; j < k; j++)
{
if ((ptr[j] < n) && (min > mat[j][ptr[j]]))
{
min = mat[j][ptr[j]];
index = j;
}
}
printf("%d\n", min);
ptr[index]++;
done = 1;
for(int j = 0; j < k; j++)
{
if (ptr[j] < n)
{
done = 0;
index = j;
break;
}
}
}
return 0;
}
Your step 1 runs in O(k). Using a heap allows to speed it up to O(log k) that's the reason.
Ok. But my overall algorithm seems to be O(kn). Which looks better than O(k*n*logn). Isnt it?
Let M = K*N be the total amount of numbers.
Your algorithm runs in O(M * K), while the heap version runs in O( M * log K ).
Note that your "for(int i = 0; done == 0; i++)" cycle runs M times, not N. And then you have a nested cycle search in each of the K arrays.
Algorithm as follows
- blackfever August 01, 20131.Create a min heap of first elements of all the arrays.
(a node of heap would be an object containg number's value and number of array it belongs to.)O(Klogk)
2.Now remove the element from the min heap
3. Add corresponding element of that array to heap to which min element belonged.
repeat steps 2 and 3
total time complexity will be ->m is total no of element i.e m=k*n