Google Interview Question
Software Engineer / DevelopersYou are inserting same element multiple times in the heap.
For exmaple
when you cosider a[0]+b[0] as first element you will inert a[0]+b[1] and a[1]+b[0]
Now heap contains 2 element. Suppose a[0]+b[1] is largest in heap than you will add a[1]+b[1] and a[0]+b[2] in heap now heap contains a[1]+b[0], a[1]+b[1] and a[0]+b[2]
Now if a[1]+b[0] is largest in heap you will add a[1]+b[1] and a[2]+b[0] in heap.
but heap already contains a[1]+b[1] in heap and due to this you will loose track of elements.
Fix: for a[i] + b[j] insert a[i+1]+b[j] and a[i]+b[j+1] if i ==0
insert a[i] + b[i+1] else
I think this problem has a simple dynamic programming solution without using any heaps. We maintain a table t containing pairs, with t[0] = (0,0) (that is, a pair of zeros). The ith element of this table denotes the indices of the elements from "a" and "b" which we should add together to get the ith maximum sum possible.
Now t[k] is (x1, y1) where
x = t[k-1].first;
y = t[k-1].second;
if ((a[x+1]+b[y]) > (a[x]+b[y+1])) {
x1 = x+1
y1 = y
} else {
x1 = x
y = y+1
}
So, if you want to calculate the 5th maximum sum we fill our table from 1..5, then return a[t[5].first]+b[t[5].second]
I dont think that it would work because considering the following arrays:
A: 39,31,27,21
B: 70,29,5,1
It is clear that the largest would be A[0]+B[0] but for the next largest you need do some comparisons which is why a[t[5].first]+b[t[5].second] will not work.
In this case the next largest sum is either 31+70 or 39+29. There are no other possibilities. And that's exactly how we fill up our table's 2nd element.
Anyway, I also think there is an error in the solution I suggested. Basically I think we should check 4 values and decide which one is the largest at any given moment.
Let's say that the 5th largest element is a[42]+b[33]. Now what is the 6th largest element? It is either:
1. a[43]+b[33] (we step to a smaller one in a[])
2. a[42]+b[34] (we step to a smaller one in b[])
3. a[41]+b[34] (we step to a bigger one in a[] and a smaller one in b[])
4. a[43]+b[32] (we step to a smaller one in a[] and a bigger one in a[]).
I think...
Since arrays are sorted in decresing order, we can solve it in O(1).
Consider:
A[]={10, 7, 6, 5, 4, 1} hence m = 6
B[]={20, 19, 3, 2}
Largest: 10+20 = 30, i.e. A[0] + B[0]
Second largest: 10+19 = 29, i.e. max(A[0]+B[1], B[0]+A[1])
Each index i, from 0 to {min(m, n) - 1} would give us 3 values. With this we can, sort of, get pair of three values and put them in decreasing order. Each pair comes from a certain magicIndex which can be obtained by dividing k (where k start from 0, i.e. 0th index holds largest) by 3.
Algo:
maxk = (min(m,n) - 1) * 3 + 1;
if(m != n) maxk++;
if(k > maxk)
'Not possible to return kth';
k--;
magicIndex=k/3;
j=k%3;
if(j == 0)
return A[magicIndexI]+B[magicIndexI];
else
{
a=A[magicIndex]+B[magicIndex+1];
b=B[magicIndexI]+A[magicIndex+1];
if(j==1)
return a>b?a:b;
else
return a<b?a:b;
}
1) Compare the largest of a[m] and b[n]
Suppose max(a[m])>=max(b[n]), otherwise, same handling, just swap a/b and m/n
2) Add max(a[m]) to all of Array b, takes O(n)
3) Then we will have a larger array Array-b[n] + Array-a[m] still in decreasing order
4) Compute the index: i=k/m j=k%m
5) kth elem=a[i]+b[j]-max(a)
Consider this :
we have
a1,a2...an
and b1,b2...bm
for b1 K max values will be b1+a1,b1+a2,b1+a3 ... ,b1+ak
similarly
for bk K max values will be bk+a1,bk+a2,bk+a3, ...bk+ak
hence we will have
k*k
values
and we have to get
k'th largeest value out of these 'k*k' OR k^2 VALUES
i think complexity will be
O(k^2 + Finding k th largest )..
IS this correct .????
or tell me if i am WRONG ..!!!
This will work. If you use the selection algorithm for finding the kth largest value amongst your k*k values, the overall complexity will be O(k^2). But you can do better than O(k^2).
The complexity can be O(k lg k). Think along the lines of carefully picking elements from the k*k matrix formed by all the possible sums and pushing each one into a max-heap. Which element would you push first. After that which element(s) to push next? Draw a diagram of the matrix with the "largest" value at the apex, and use this diagram to visualize what to do.
Do merge sort of the two arrays in descending order.. Take the 0th element and Kth element for Kth largest sum.(Presuming the sorted array is avoided with duplicate values) Complexity is (nlogn)+1.
Oops.. this will take into account of a[i] + a[j]. This has to be filtered out.. So the solution will not work!
I thought of a O(k*(logm + logn) solution.
The idea is to find the 1th minimum sum(which is a1+b1), the 2nd minimum sum, the 3rd..., until the kth minumum sum.
Denote the ith minimum sum is min(i). The key observation is that if min(i) = a(p) + b(q), min(i+1) is either a(p+1) + b(x) or a(y) + b(q+1), where b(x) is the minimum element in array b that hold b(x) + a(p+1) >= min(i), a(y) is the minimum element in array a that hold a(y) + b(q+1) >= min(i). min(i+1) is the smaller of the two.
Obviously, m(1) = a(1) + b(1), from where we can use iteration to compute m(k). Use binary search to locate a(x) need log(m) time, and locate b(y) need log(n) time. So k iteration need O(k*(logm + logn)) time.
Request for your comments.
LuanJunYi, you claim that every single time either the row or the column must advance by 1. This is not true. Sometimes, the column can go backwards by one and the row can advance by two. In any case, the advancements of min sums is not as simple as you stated. Just try [1, 4, 8] and [1, 3, 5].
First of all, finding k-th element is not about sorting but about partitioning. So, the brute force solution is to compute all sums and then treat the mxn matrix as one long array and find the k-th largest element using standard algorithms (there is one that runs in expected O(mxn) time, there's a more complicated one that runs in worst-case O(mxn) time. Both use partitioning similar to Quick Sort. They both would need additional O(mxn) space.
int kLargest(int[]a, int m, int b[], int n, int k){
if (k==0)return (a[m]+b[n]);
int i =1;
while(a[m-1]>=b[n-i]){
k--; i++;
}
kLargest(b, n-i, a, m-1,k);
}
golu
(a) Inserting code without any explanation wastes people's time.
(b) Inserting code that's obviously incorrect is even worse
-- a[m] is outside the array
-- relationshiop between a and b is irrelevant--think about it, if I add 1000 to each element of a, nothing should change, only the k-th element will be larger by 1000
(c) Even if your algorithm were correct, it would run in O(k) time which can be as bad as O(m*n). So nothing interesting here ...
czpete
(1) I thought after such long discussion, ppl here are acquainted with problem and obviously smart enough to get the logic behind the solution but seeing your response I feel you are right.
(2) Obviously talking in air is bad, especially commenting on a code. You were right, I should have written some explanation, because your comment shows that you need some easy understanding
-- m is index value passed to the system by a calling function.. how come it is outside of the array ??
--I dont think a and b is not related in context of this question. You need to go through the question requirement again
(3) again this is big mistake on you part thinking that O(k) is as bad as O(m*n) because Obviously m*n makes it quadratic .. while O(k) is linear. right :)
So pls go through the solution again. It needs a calling fucntion
int kLarge = kLargest(a, a.length-1, b, b.length-1, kLargest)
(1) People are acquainted with the problem but not with the solution. So if you believe that you have it (and no one ahead of you came up with it) than you should explain it.
(2)
-- m is size of the array, a[m] is undefined because it's the (m+1)-st element. Get it? If you use m for something else than what the problem says, you should explain it. Or better yet, don't do that.
-- I think you need to re-read the problem. I even gave you a counter-example and you still don't get it! So let me repeat it again--adding any constant to all elements of array a will not change the problem but it will change your solution! That's a clear indication that your solution is completely wrong (you asked for it).
(3) There are m*n elements, k can be any number between 1 and m*n. If k = m*n/2 then O(k) = O(m*n/2)=O(m*n).
OK, so here is an algorithm that runs in expected O(m * (log n)^2). I have not yet figured out how to really prove this but my simulations show this to be true. A very rough analysis (see below) similar to that of QuickSort also suggests this.
To make my life easier, arrays are sorted increasingly and I am looking for the k-th smallest element (k starts from 1). I assume m<=n. If not, switch the arrays a and b.
The idea is roughly this--keep partitioning the matrix M[i,j] = a[i]+b[j] into smaller and smaller chunks until you get a 1-element chunk with the k-th smallest element.
Details:
1) For each 0 <= i < m keep 3 arrays: begin[i], partition[i], last[i]. Originally, begin[i] = 0, last[i] = n - 1. We call "active" those elements M[i,j] for which begin[i] <= j <= last[i].
2) Now repeat the following:
2A) Find a pivot element
2Aa) Find a row with maximum (last[i]-begin[i]). To speed up computation, resolve ties by choosing the median of all maximum rows. Call this row maxRow.
2Ab) Choose pivot to be the median of currently active elements on maxRow; i.e. pivot = M[maxRow,(begin[maxRow]+last[maxRow])/2].
2B) Partition the active elements into 3 pieces: left, pivot, and right. All values in left are <= pivot, all values in right are >= pivot.
2Ba) Partition each row of M with pivot using Binary Search (rows are sorted). Assign partition index into partition[i].
2Bb) Figure out if k-th element is the pivot element, or it is in the left partition or in the right partition.
---- If pivot is the k-th element, we are done.
---- If k-th element is in the right partition, set begin[i]=partition[i]+1.
---- If k-th element is in the left partition, set last[i]=partition[i] for all i. Remeber to exclude pivot, i.e. last[maxRow]--;
Analysis (with some hand-waving):
- We can assume that at each step, the chunk will be about 1/2 of the previous chunk. This is clearly not true in the worst case, but on average, sometimes I get smaller chunk than 1/2, sometimes bigger, so 1/2 is a good expected/average value. We have a total of m*n elements, so one can expect step 2 to repeat lg(m*n) times = lg(m)+lg(n) = O(log n) since m <= n.
- At each execution of step 2, we do BinarySearch on each row. We have m rows, at most n elements in each row, so it takes O(m*log n) time to do each step 2.
- Multiply these 2 together and get the total expected/average running time to be O(m*(log n)^2).
I won't attach the code (it's about 90 lines just for the function itself without the supporting code for Binary Search and main) unless somebody really wants it. The idea is more interesting than the code :-)
One more interesting fact--there is no need to actually compute the matrix M[i,j]. Just that would of course take m*n time. In fact, the only time I need to compute M[i,j] is when I compute the pivot! The binary search itself is run on the array b looking for the value pivot - a[i] (a[i] is constant for the whole row).
"the rough idea is to keep martitioning the Matrix[i,j]".
From where did you get this matrix czpete? To get all elements of this matrix you will need to fill m*n elements or 1/2(m*n) elements as you suggest. Thats O(mn) task.
Looks like I have solution with time and memory usage like O(sqrt(k)). I tested on several examples and it works. Please comment on explanation and algorithm.
Let's put S(i,j)=A[i]+B[j].
Finding 1-st sum is easy: S(1,1).
Finding 2-nd and third requires checking and ordering S(1,1), S(1,2) and S(2,1).
In general, finding k-th sum requires checking and ordering S(i,j) for all combinations of i,j|i+j=m+1 (this is tricky moment, I think I can prove it using partial ordering of sums S(i,j).
The number of combinations is N(m)=(m+1)*m/2, which means that we'll check all sums from 1-st to N(m)-th.
We can find m from equation m*m+m-2k=0 and round it to larger number.
For example, for k=5, m=3.
Also, we know that first N(m-1) sums will be obtained from combinations of i,j|i+j=m.
Thus, we should check combinations where i+j=m+1, EXCLUDING all possible combinations where i+j=m.
We store found sums in ordered collection (for example piramid). Collection size is O(m).
These will be sums from N(m-1)+1 to N(m), so we just should find sum with number k-N(m-1).
The algorithm looks like:
int find_k_sum(A, B, k)
{
P - ordered collection
m=roundup((-1+sqrt(1+8*k))/2);
for(int i=1;i<=m;++i)
P<-(A[i]+B[m+1-i]); //storing sum in the collection
return P[k-(m*m-m)/2].
}
Here is a sketch of my solution.
Draw a grid M by N. We know for k = 1 we have to search (0,0). For k = 2 or k = 3 we have to search (1,0) or (0,1). If we label the grid appropriately we see that we have to search ~ sqrt(T) where T is the next largest triangle number greater than k.
Then this is O(sqrt(k))
The rest is details to get the indices correct.
@czpete : This is the first problem for which I could not understand your solution..generally I keep looking out for your solution as u give a good analysis of what ever u write and never believe in spamming others...If you don't mind can you please explain this in a still better way ?
thanks
I don't think using heap is necessary. Here is my algo,please help me checking if it's correct:
x = 0;
y = 0;
loops for k-2 times:(the biggest is a[0]+b[0])
(we suppose x+1<a.Length and y+1<b.Length )
compares a[x]+[y+1] with a[x+1]+b[y]
if a[x]+b[y+1] bigger
y++
else
x++
finally compares a[x]+[y+1] with a[x+1]+b[y], return the bigger.
the case : x out of range or y out of range is very trivial ,don't need to be put here.
// Complexity - O(k logk) time O(k) space
//a[] → 9 7 5 2 1 0 -1 -5
//b[] → 11 2 0 -4 -7
//Using PriorityQueue -> take the best element and add its neighbours
//i.e(a,b): add(9, 11),
//poll(9,11), add(9,2), add(7,11)
//poll(7,11), add(7,2), add(5,11)
//poll(5,11), add(5,2), add(2,11)
//etc....
here is the code:
public int selectLargestSum(int[] a, int[] b, int k) {
int resultSum = -1;
int currK = 0;
PriorityQueue<Item> q = new PriorityQueue<Item>(11, Collections.reverseOrder());
int idxA = 0;
int idxB = 0;
addItem(q, a, b, idxA, idxB);
Item qItem = null;
while (!q.isEmpty()) {
currK++;
qItem = q.poll();
if (currK == k) {
resultSum = qItem.value;
break;
} else {
addItem(q, a, b, qItem.idxA + 1, qItem.idxB);
addItem(q, a, b, qItem.idxA, qItem.idxB + 1);
}
}
return resultSum;
}
protected void addItem(PriorityQueue<Item> q, int[] a, int[] b, int idxA, int idxB) {
if (idxA < a.length && idxB < b.length())
q.add(new Item(a[idxA] + b[idxB], idxA, idxB));
}
protected class Item implements Comparable<Item> {
int value;
int idxA;
int idxB;
protected Item(int value, int idxA, int idxB) {
this.value = value;
this.idxA = idxA;
this.idxB = idxB;
}
@Override
public int compareTo(Item x) {
return this.value - x.value;
}
}
This code works for two arrays of increasing sequences. The idea is the same just that you will need to search in the opposite direction. Solution is based on finding the median of two sorted arrays. Complexity is logn
public static int findKth(int arr1[], int arr2[], int left, int right, int k) {
if(left > right) {
return findKth(arr2, arr1, 0, arr2.length - 1, k);
} else {
int i = (left + right) / 2;
int y = k - i - 1;
int y1 = y - 1;
int l2 = arr2.length - 1;
if(i >= k || (y >= 0 && y <= l2 && arr1[i] > arr2[y])) { // if i is >= k (we use >= instead of 0 since arrays are zero indexed) elements don't bother
return findKth(arr1, arr2, left, i-1, k);
} else if(y1 > l2 || (y1 >= 0 && arr1[i] < arr2[y1])) { // if the length of y2 is greater than the length of l2 we must increase since there is no way for i to be the kth
return findKth(arr1, arr2, i+1, right, k);
} else
return arr1[i];
}
}
Take first K sums. i.e.
a[ ] = { # , # , # , # , ....... }
| X | X | X | ......
b[ ] = { # , # , # , # , ...... }
}
void Summer ( int index, int k )
{
static sums = 0;
//hypothetical heap
Heap myHeap;
myHeap.Add(a[index] + b[index]); sums++; //a[i] + b[i]
if (sums >= k) return;
myHeap.Add(a[index] + b[index+1]); sum++; //a[i] + b[i+1]
if (sums >= k) return;
myHeap.Add(a[index+1] + b[index]); sum++ //a[i+1] + b[i]
if (sums >= k) return;
Summer(index+1, k);
}
we now have the first k highest sums (doesn't matter if you need the first k lowest sums index changes and index+1 will be index-1 in the function). Go through the heap to get the kth highest or lowest depending on your need and max or min heap constructed.
Here is my solution. Basically you would maintain a max-heap such that when you consider a[i] + b[j] as the (c - 1)th sum, you insert a[i + 1] + b[j] and a[i] + b[j + 1] (if not out of border) to the heap. And then take the top element of the heap as the cth sum. Obviously this process should start from a[0] + b[0].
- BXH August 29, 2010The complexity is O(klgn) for heap insertion and deletion of the top consumes O(lgn).
Please correct me if I am wrong. thx