## Google Interview Question

Software Engineers**Country:**United States

Merge operations takes O(nmlgn).

In the operations, you should merge n sorted blocks (length: m) to 1 block (length : mn). If we assume that the n is even, the total level of this tree is lgn + 1. In each level, the merge takes O(nm).

So, the total operations O(nm*lgn+1) = O(nmlgn)

Yes, it takes O(m) to each pair of rows but you have to do it for every two consecutive rows. So in each merge operation you have to check each number once which means each merge operation takes O(nm) and there would be logn merge operation in total. So it's O(nmlogn).

There is a better way to achieve this. We can find the kth smallest element from the 2D array by using a min heap. Then median will be n/2th smallest element. (or avg of n/2-1 and n/2+1 th).

For example, we can start with building a min heap by inserting first column of the 2D array in O(nlgn) time. Now, at each iteration extract min from the heap and insert the next element from the row of the element extracted (if the min is at the end of the row then no insert). This ensures we are traversing in ascending order. That is , the extracted element at kth iteration will be the kth smallest element. The complexity for such traversal is O(klgn) if k>n.

```
public static int kthSmallestElement(int[][] A, int k){
int n = A.length;
int m = A[0].length;
MatrixElement kthSmallest = null;
PriorityQueue<MatrixElement> minHeap = new PriorityQueue<MatrixElement>();
//add column 0 into meanHeap - O(nlgn)
for(int i = 0; i<n; i++){
minHeap.offer(new MatrixElement(A[i][0], i, 0));
}
//extract min from minheap and insert next element from the same row of the extracted min
int count = 0;
while(!minHeap.isEmpty() && count < k){
kthSmallest = minHeap.poll();
count++;
//
if(kthSmallest.col+1 < m){
minHeap.offer(new MatrixElement(A[kthSmallest.row][kthSmallest.col+1], kthSmallest.row, kthSmallest.col+1));
}
}
return kthSmallest.val;
}
public static class MatrixElement implements Comparable<MatrixElement>{
public int val;
public int row;
public int col;
public MatrixElement(int val, int row, int col){
this.val = val;
this.row = row;
this.col = col;
}
@Override
public int compareTo(MatrixElement o) {
return Integer.compare(this.val, o.val);
}
}
```

In this problem K=(M*N)/2 which means time complexity of your algorithm, even using a min-heap, is still O(nmlogn).

Also, note that creating a heap out of n elements is not O(nlogn), it's O(n) :)

I hope you may know the difference between tight bound and lower bound. It is theta(n) means tight bound. Also, you may already know why building Heap can O(nlgn). It all depends on how you are pushing down/pulling up the heap elements.

Since the rows are already sorted, you can find the row median by indexing at m/2 or averaging the two mid values.

Next you want to compute the new median for 2 such rows. Note that the combined median lies between the previous two medians. You find it by stepping lower from the high median and stepping higher from the low median until your new values match or pass each other. At each step of this process you are increasing the left side of one row and increasing the right side of the other row. The combined rows thus maintain equal amounts on the left and right of the median.

Doing this linearly cost m/2 per row and you repeat n -1 times + the merge cost.

At this point you would normally have to mergesort the two rows before proceeding. So let us instead extend the idea to all n rows at once.

Again we note that the overall median will lie between the existing medians. The new strategy is to sort the existing medians. Then starting at the outermost medians start shifting them towards each other. As the lowest and highest new median passes the median of another row we add it to a list of the highest or lowest ones we are operating on. At each iteration we choose the lowest member to add to the left and highest member to add to the right from the lowest / highest lists. When the swap values match or pass each other we are done.

You could use a binary search to speed this instead of linear maybe. So take the lowest and highest and use binary sort to quickly find their median. Insert them in their new location in the sorted median list and repeat.

Or maybe inside out is best. Sort the medians, then start with the center ones as they are likely closest to the overall median. The center coalesces into a pool and you pick the highest / lowest current value to shift as you merge in from a higher / lower median row not yet in the pool. This feels like win to me, especially considering that no data movement takes place.

Average (not completely sure of this) would be log(m/4) * n shifts. Each row is already split in half so max m/2 shifts with average of m/4. So O n log(m)?

The code complexity for this possibly best case seems a bit high for an interview.

Here's an O(nm) algorithm using the k-select algorithm on a 2-D array. It takes O(1) memory but alters the original matrix to get the solution.

```
public static int getMedian(int[][] mat){
if(mat == null){
throw new NullPointerException();
}
int lo = 0;
int hi = convertIndex(mat[0].length, mat.length, mat[0].length);
int target = hi/2;
int ignoredLo = 0;
int ignoredHi = 0;
int mid = select(mat, lo, hi);
while(mid != target){
if(mid < target){
low = mid + 1;
}
else{
hi = mid - 1;
}
mid = select(mat, lo, hi);
}
int[] indices = convertIndex(mat[0].length, mid);
return mat[indices[0]][indices[1]];
}
private static int select(int[][] mat, int lo, int hi){
int pivotIndex = lo;
int[] indices = convertIndex(pivotIndex);
int pivot = mat[indices[0]][indices[1]];
int tempLo = lo + 1;
int tempHi = hi;
while(tempLo < tempHi){
indices = convertIndex(tempLo);
int check = mat[indices[0]][indices[1]];
if(check > pivot){
swap(mat, tempLo, tempHi);
tempHi--;
}
else{
tempLo++;
}
}
indices = convertIndex(tempLo);
if(mat[indices[0]][indices[1]] < pivot){
swap(mat, tempLo, pivotIndex);
return tempLo;
}
else{
swap(mat, tempLo-1, pivotIndex);
return tempLo -1;
}
}
private static void swap(int[][] mat, int pos1, int pos2){
int[] pos1Indices = convertIndex(pos1);
int[] pos2Indices = convertIndex(pos2);
int temp = mat[pos1Indices[0]][pos2[indices]];
mat[pos1Indices[0]][pos2Indices[1]] = mat[pos2Indices[0]][pos2Indices[1]];
mat[pos2Indices[0]][pos2Indices[1]] = temp;
}
private static int convertIndex(int rowSize, int row, int col){
return rowSize * row + col;
}
private static int convertIndex(int rowSize, int index){
int[] indices = new int[2];
indices[0] = index / rowSize;
indices[1] = index - (indices[0] * rowSize);
return indices;
}
```

Here's an approximation that should run faster with O(m + n log n) complexity with O(n) memory: basically, find the median of each row, sort the row medians, then find the median of the row medians.

```
public static int getMedian(int[][] mat){
if(mat == null){
throw new NullPointerException();
}
int[] rowMedian = new int[mat[0].length];
for(int row = 0; row < mat.length; row++){
rowMedian[row] = getSortedMedian(mat[row]);
}
return getSortedMedian(arrays.sort(rowMedian));
}
private static int getSortedMedian(int[] arr){
if(arr == null || arr.length == 0){
throw new IllegalArgumentException();
}
if(arr.length % 2 == 0){
int index = arr.length / 2;
return (arr[index - 1] + arr[index])/2;
}
else{
return arr[arr.length/2];
}
}
```

The best solution should be O(nm/2)~O(nm).

We have:

18 30 42 51

13 21 50 62

70 85 91 100

Min (18,13,70 ) = 13 -> we can remove it because is the smallest num of the matrix

Min (18,21,70 ) = 18 -> we can remove it because is the 2th smallest num of the matrix

Min (30,21,70 ) = 21 -> we can remove it because is the 3th smallest num of the matrix

Min (30,50,70 ) = 30 -> we can remove it because is the 4th smallest num of the matrix

Min (42,50,70 ) = 42 -> we can remove it because is the 5th smallest num of the matrix

Min (51,50,70 ) = 50 -> we can take it because is the 6th smallest num of the matrix

Min (51,62,70 ) = 51 -> we can take it because is the 7th smallest num of the matrix

The Min function is O(n). We repeat Min for m/2 or m/2+1 based on the number of element of the matrix.

So the total is O(nm/2).

if the elements of the matrix are even we have to calculate the median.

Won't the complexity here be O(n*(nm/2)) => O((n^2)*m). Please correct me if I'm wrong.

Won't the complexity here be O(n*(nm/2)) => O((n^2)*m). Please correct me if I'm wrong.

Alright, here is my answer. O(n^2 * log^2(m)). It finds k'th order statistic. Basic idea - find the longest array, take its mid element and find its rank in the whole matrix by doing bin-search in each array.

Now we now if k > rank or k < rank so we can cut away either left or right half. So we cut and repeat until only one non-empty array left. The code below slices lists but it can be avoided by carefully keeping bounds of each array.

```
def insert_pos(lst, elem):
left = 0
right = len(lst)
while left < right:
mid_idx = left + (right - left - 1) / 2
mid = lst[mid_idx]
if elem <= mid:
right = mid_idx
else:
left = mid_idx + 1
return left
def kth_order(lists, k):
longest_list = max(lists, key=len)
total_length = sum(map(len, lists))
if total_length == len(longest_list):
return longest_list[k]
mid_idx = (len(longest_list) - 1) / 2
mid_elem = longest_list[mid_idx]
mid_elem_rank = mid_idx
for lst in lists:
if lst is not longest_list:
mid_elem_rank += insert_pos(lst, mid_elem)
if mid_elem_rank < k:
del longest_list[:mid_idx + 1]
return kth_order(lists, k - mid_idx - 1)
elif mid_elem_rank > k:
del longest_list[mid_idx:]
return kth_order(lists, k)
else:
return mid_elem
import random, copy
num_arrays = 10
array_size = 5
arrays = []
for i in xrange(num_arrays):
arrays.append(sorted(random.randrange(100) for j in xrange(array_size)))
new_array = []
for k in xrange(num_arrays * array_size):
new_array.append(kth_order(copy.deepcopy(arrays), k))
assert new_array == sorted(sum(arrays, []))
```

After we have kth_order function, it's trivial to implement median function.

Probably there are better solutions

If I understood correctly, your first have a O(n) operation when you look for the list with the maximum length; then you have O(n*log(m)) to do a binary search on each row. But why do you have one more O(log(m)), don't you do this in the worst case for O(log(n*m)) times? So the final complexity would be O(n^2 * log(m) * log(n*m))?

I think this can be improved to O(n * log(n) * log(m) * log(n*m)) if instead of doing a linear search for the row with the maximum number of elements, you would keep them in a max-heap. In a language like C/C++/Java this can be done efficiently because you can store a reference/pointer to the list and getting the size is an O(1) operation.

Here's my O(nlogm) solution:

Main Idea: Binary search on answer:

I do binary search on the the integer values to find the median (in range of -2^30 to 2^30)

Two terms:

lower_bound(V,x): returns the first item in V, which is greater than or equal x.

upper_bound(V,x): returns the first item in V, which is greater than x.

Let's say I want to check that whether x is the median or not.

All I need is to know is that how many numbers are less than x. (I'll explain about equality later), let's call it cnt1. Because each row is sorted I can easily do an lower_bound on each row and add the answer to cnt1.

Now based on cnt1 I can decide whether x is the median or I have to look up for bigger values of x or smaller values.

So total time complexity would be log(2^30)*N*log(M)=(30*N*logM)=O(N*logM)

The tough part of implementation of this idea is how to handle equality.

1 1 (3 3 3) 5 5

let's x=3

Now I need to know the range of all x's.

I handled it using an upper_bound, and then based on lower_bound and upper_bound we can decide that whether x is the median or not.

```
#include <iostream>
#include <vector>
using namespace std;
pair<int,int> countElements(vector< vector<int> > &V,int val)
{
int N=V.size();
int ansLow=0,ansHigh=0;
for (int i=0;i<N;i++){
int cnt= lower_bound(V[i].begin(),V[i].end(),val) - V[i].begin();
ansLow+=cnt;
cnt = upper_bound(V[i].begin(),V[i].end(),val) - V[i].begin();
ansHigh+=cnt;
}
return {ansLow,ansHigh};
}
double findMedian(vector< vector<int> > V)
{
int N=V.size(),M=V[0].size();
int needed=(N*M)/2;
int s=-(1<<30);
int e=(1<<30);
bool found=false;
int ans1;
while (s<=e){
int mid = (s+1LL+e) >> 1;
auto tmp = countElements(V,mid);
int low=tmp.first,hi=tmp.second;
if ((N*M)%2==0){
if (low == needed && hi == low ){ // 1 2 4 5 --> (3)
found=true;
ans1=mid;
break;
}
if (hi - low >1 && low<needed && hi>needed){ // (1 1) 2 5 3 3 1 1
found = true;
ans1=mid;
break;
}
if (hi==needed){ // 1 (2) 3 4
ans1=mid;
break;
}
}
else{
if (hi - low >1 && low<=needed && hi>needed){ // 1 (3 3 3) 5 or 1 1 (3 3) 5
found = true;
ans1=mid;
break;
}
if (low == needed && hi-low==1){ // 1 2 (3) 5 7
ans1=mid;
found=true;
break;
}
}
if (hi <= needed){ // (1 1) 3 4 5
s=mid+1;
}else
e = mid-1;
}
if (found)
return ans1;
int ans2=(1<<30);
for (int i=0;i<N;i++){
int ind = upper_bound(V[i].begin(),V[i].end(),ans1) - V[i].begin();
if (ind != V[i].size() ){
ans2= min (ans2 , V[i][ind]);
}
}
cout << ans1 << ":" << ans2 << endl;
return (ans1+ans2)/2.0;
}
int main()
{
int N,M;
vector< vector<int> > V;
cin >> N >> M;
V.resize(N);
for (int i=0;i<N;i++){
V[i].resize(M);
for (int j=0;j<M;j++){
cin >> V[i][j];
}
}
cout << findMedian(V) << endl;
return 0;
}
```

nice one. you could probably adapt it to floating point numbers since IEEE float maps to integer with preserving order (but with some trickery about positive-negative).

@emb: Yes, you're right. With some modification it could work on floating point as well.

how the order of this algo is O(ngn) ? can you explain little bit more? it looks like time complexity for countElements() itself taking O(nlgm) ..

It is finding kth rank in the n*m matrix, where k = (n+m)/2

Is that correct?

time complexity O(n+m)

Just do the following:

1. Put every row in a vector. O(n*m)

2. Pop the last element from each vector and put them in a max heap (maintain pointer from element to its row vector). O(n log(n))

3. Pop the max element from heap and if the corresponding row vector has elements then pop and add the element from the row vector to the heap. O(log(n))

4. Do step 3 in a loop m*n / 2 times to access the median.

Total O(m n log(n)).

Here is my O(m*log(n)*(log(n)+log(m)) solution, for m and n equal it will be n*log(n)*log(n)

```
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
class Row
{
int[] a;
int N;
public Row(int[] a)
{
this.a = a;
this.N = a.Length;
}
public void split(int dx, out int nl, out int lmax, out int ng, out int rmin, out int ne)
{
int i1 = -1;
int i2 = N;
while (i1 + 1 < i2)
{
int m = (i1 + i2) / 2;
if (2 * a[m] >= dx)
i2 = m;
else
i1 = m;
}
nl = 2 * i2;
lmax = nl > 0 ? a[i1] : 0;
i1 = -1;
i2 = N;
while (i1 + 1 < i2)
{
int m = (i1 + i2) / 2;
if (2 * a[m] <= dx)
i1 = m;
else
i2 = m;
}
ng = 2 * (N - i2);
rmin = ng > 0 ? a[i2] : 0;
ne = 2 * N - nl - ng;
}
}
class Program
{
static int matMedianSort(int[][] a)
{
List<int> all = new List<int>();
foreach (int[] r in a)
all.AddRange(r);
all.Sort(delegate(int x, int y)
{
return x.CompareTo(y);
});
if (all.Count % 2 == 0)
return (all[all.Count / 2 - 1] + all[all.Count / 2]) / 2;
return all[all.Count / 2];
}
static int matMedian(int[][] a)
{
int M = a.Length;
if (M == 0)
throw new Exception("Invalid");
int N = a[0].Length;
if (N == 0)
throw new Exception("Invalid");
if (M == 1 && N == 1)
return a[0][0];
Row[] rows = new Row[M];
int min = 0;
int max = 0;
for (int i = 0; i < M; i++)
{
rows[i] = new Row(a[i]);
if (i == 0 || min > a[i][0])
min = a[i][0];
if (i == 0 || max < a[i][N - 1])
max = a[i][N - 1];
}
int twon = N * M * 2;
int n = N * M;
while (true)
{
int lmax = min;
int rmin = max;
int dmix = min + max;
int nl, ng, ne;
nl = ng = ne = 0;
foreach (Row r in rows)
{
int nli, ngi, lmaxi, rmini, nei;
r.split(dmix, out nli, out lmaxi, out ngi, out rmini, out nei);
if (nli > 0 && lmaxi > lmax)
lmax = lmaxi;
if (ngi > 0 && rmini < rmin)
rmin = rmini;
nl += nli;
ng += ngi;
ne += nei;
}
if (nl <= n && ng <= n)
{
if (nl < n || ng < n)
return dmix / 2;
return (lmax + rmin) / 2;
}
if (nl < ng)
min = rmin;
else
max = lmax;
}
}
static void Main(string[] args)
{
Random r = new Random();
int M = 4;
int N = M;
int[][] a = new int[M][];
a[0] = new int[] { 174, 178, 184, 192 };
a[1] = new int[] { 311, 319, 321, 329 };
a[2] = new int[] { 319, 319, 319, 324 };
a[3] = new int[] { 327, 329, 338, 344 };
Console.WriteLine("median={0}", matMedian(a));
}
}
```

In order to find the median you have to merge all n rows and get the mid element. So to merge the n rows of m elements you could use a heap, insert in the heap the first element of every row and then repeat this operation: Extract min insert next element of the extracted element row, note that you have to do this (n*m)/2 wich is the position of the median.

1 - Create the heap and insert first n elements - O(n logn)

2- until extract element at (n*m)/ 2 do: extract min insert next element - O(nmlogn)

So the time complexity is O(nmlogn)

* The case n*m is even so the median will be the average of two numbers in the middle is the same because requires O(1) time complexity the average operation.

Two things to consider:

1- We can find the median of an array in O(n) with selection algorithm. So the time complexity for this problem can be easily O(n*m) by just creating an array of all elements and run the selection algorithm. For sure it can be solved better than O(nm) and of course O(nmlogn) is not an acceptable solution.

2- Creating a heap is O(n) not O(nlogn)

Do a merge operation (as we do in the merge phase of merge sort) on each row successively. Then median is the m*n/2 the element id m*n is odd or if its even the media is the average of (m*n/2-1, m*n/2+1)th element. Total complexity is O(n*m)

Do a merge operation (as we do in the merge phase of merge sort) on each row successively. Then median is the m*n/2 the element id m*n is odd or if its even the media is the average of (m*n/2-1, m*n/2+1)th element. Complexity of this solution is O(nmlg(n)).

There is a better way to achieve this. We can find the kth smallest element from the 2D array by using a min heap. Then median will be n/2th smallest element. (or avg of n/2-1 and n/2+1 th).

For example, we can start with building a min heap by inserting first column of the 2D array in O(nlgn) time. Now, at each iteration extract min from the heap and insert the next element from the row of the element extracted (if the min is at the end of the row then no insert). This ensures we are traversing in ascending order. That is , the extracted element at kth iteration will be the kth smallest element. The complexity for such traversal is O(klgn) if k>n.

- zahidbuet106 September 08, 2015