Amazon Interview Question
Software Engineer / DevelopersCountry: United States
Could you please elaborate your idea of solving the problem using quaternary search tree in words? Quaternary search tree is new to me. Thanks!
Actually I should correct it, it should be call quad (range search) tree which each internal node has exactly four children. The range in each node is defined by minimum (at the top left corner) and maximum (at the bottom right corner). We only traverse the node if the range of the node contain the lookup value. Due to the mutual exclusion of the first and the fourth leaves we can always trim one leave while we do searching. We choose the pivot at the middle to make our tree balanced. So in term of complexity in the worst case it similar to binary tree with (1/4, 3/4) cut when the search always happens at the 3/4 part. However the complexity still O(log N) with N = m * n
@eugen.yarovoi: You have flaws in your analysis since you have wrong constant 3 (if you consider binary tree) or n/2 (if you consider quad tree) that is why the runtime is incorrect. BTW: my analysis based on the total number (m*n) as the pivot point divide the space onto 4 sub quad. See my first explanation for more details
@LinhHA05
Though I haven't understood your solution thoroughly, I wanted to ask you, if your solution is similar to having a 'divide and conquer' approach, where the nxn(assume nxn instead of nxm) matrix is divided into four sub matrices and that in you search in a submatrix based on the min and min value of that sub-matrix?
Does your analysis fall on similar lines?
This is blatantly wrong! As Eugene says there are well know Omega(n) lower bounds (for the case m=n).
You answer clearly is not consistent with that. There is no need to check your recursion etc, the burden of proof is upto you.
Of course, you can choose to delude yourself.
@Apostle: yes, my analysis falls on that similar line. However, we should see the idea as is
- Each sub-quad has the same properties of the original (the value increase from left to right, and from top to bottom)
- The range is determined in constant time based on the left top corner and right bottom corner. So checking range is the same cost as your check in binary search
-I choose the pivot point in the way that I ballance size of the sub-quad
So in my analysis
T(n*m) = 3 * T(m*n/ 4) + c
- c is the checking cost
- m * n / 4 is the size of the sub-quad
- Constant 3 reflects the mutual exclusion of first-and forth quads
as
-- if a[i][j] < v then the all the value of the first quads < v without doing any futher investigation.
-- if (a[i][j] > v) then all the value of the fourth quads > v without doing any futher investigation
So we alway have to check only 3/4 quads at most. The range checking can further trimp down the branch in constant time.
When one of the size become 1 : we fall back to binary search
Good explanation, thanks for it!. But, though the recurrence is right, sorry to say that you have got the analysis wrong buddy. For your recurrence, you can apply the master theorem as mentioned in page nos 58 and 59 of cs.berkeley.edu/~vazirani/algorithms/all.pdf. The solution for your recurrence is obtained by case 3 of the master theorem mentioned there and is O((mn)^b(3)), for the worst case and O((mn)^b(1)) for the best case, where b is a log function with base 4.
The best-case time complexity with this approach is therefore O(mn).
@Apostle: Thank you for the constructive discussion. I think I don't have tight bound for my analysis. Believe me or not Saddleback is not the best solution available
Google this and you will see
"Improving Saddleback Search: A Lesson in Algorithm Design"
Thank you (Apostle) for leading me to this paper. I will not post the solution here since it does not belong to me. BTW: if you can not access to the paper, you can find it in the book "Pearls of Functional Algorithm Design"
@Anonymous: I don't know who you are but I often do not trust someone who can not reveal his identity. We are here to discuss and find the truth and also find the weankness of our theory. If you have nothing to share then don't steer us away from that
I'm pretty supprised of some anonymous one here. This is what he said
"Oh shut up. The reason O(m+n) was mentioned was because the interviewer did so. This was a comment to claim that the interviewer was a Moron. Seems like you missed the point again, idiot "
And this is my answer
Before you call someone moron or idiot, read the paper that I put there first, if you can read
As I said earlier, you are free to delude yourself, with whatever reasons you want (eg: it has been posted by some with no name). At least you should have tried listening what eugene said. btw, for your reading pleasure, I am posting under a real name.
There is an Omega(n) (at least for the nxn case) lower bound. Take a set of n numbers in random order. Put them as the diagonal which goes bottom left to top right. Now we can fill the rest of elements of the matrix so that the sorted property holds (It is not that hard, try it yourself)
Now for any number of that diagonal, the worst case is necessarily Omega(n) for any algorithm.
As to the Anonymous who called someone else an idiot, I believe he was provoked by being called a 'pedantic jerk'. Posting comments without the proper context?
Anyway, it is your loss if you choose to ignore these comments.
I agree with the T(n*m) = 3 * T(m*n/ 4) + c. Sorry I misquoted you. I had believed you to be claiming T(n) = 3*T(n/4) + C.
My central point that your analysis is incorrect, however, will have to remain. What do you believe your recurrence relation evaluates to? Do you believe it's O(log (MN))? It's actually O((MN)^log_4(3)). When you consider the M = N case, you get the bound of O(N^log_2(3)) that I mentioned earlier. For the N x N case, this runtime is asymptotically worse than the runtime of the O(M + N) algorithm.
@eugene.varovoj: my analysis is incorrect, agree as I said with Apostle (see the discussion above). If you read the solution (not mine but the one I refer) you will know that O(M+N) is not the best. Actually mine is not as good as O(M+N) when M is not much different from N, however if M >> N, it is a better one. Just show you a simple example, when N = 1, while the saddle back complexity still M + 1, we know a binary search with log(M) complexity.
Short answer "Saddleback Search"
This is a sorted matrix, i.e., the rows and columns are sorted, so you can take advantage of that.
You can either start from the top-right M[0][n-1]
While until key found in M
Let M[i][j] be the current element
if M[i][j] < key, go down (M[i+1][j])
else if M[i][j] > key, go to left (M[i][j-1])
Or you can start from bottom left M[n-1][0]
While until key found in M
Let M[i][j] be the current element
if M[i][j] < key, go to right (M[i][j+1])
else if M[i][j] > key, go up (M[i-1][j])
Total running time is O(n)
Sorry, I've just realized that you've changed the question, after I submit the answer. I said this algorithm is O(n) for square matrix because O(2n) = O(n), but i general case it runs O(n+m) as you mentioned.
@oOZz
Nice! Just one question: is the name "Saddleback Search" your invention or it is commonly used?...
Its a famous algorithm which is used for these types of problems. You will find the solution to it in here
cs.genesco.edu
@Chih.Chiu.19
I wish I could claim credit lol, but no it's not mine and it's been around for quiet some time.
If I ever invent one though, I wouldn't call it "Saddleback", maybe "Oz Sort" :))
but interviewer is looking for better then o(n+m)......better then this approach..there is one more approach which is 'improved binary partition'....this approach is fastest approach for searching element in sorted 2d array....
first we do binary search along either the middle row or middle column and find two element a1 and a2 such that a1< search element > a2 or we find required element if element is present in middle row or column
after this our problem divided into two subproblem
solve these subproblem recursively
for example we have to find 22 in following matrix
[1 14 25 35]
[2 16 28 38]
[5 20 28 40]
[16 22 38 41]
first we do binary search in middle row which is row 2
in binary search if we find required element then return,otherwise we find two element like 16< 22 >28 after this our problem divided in two sub problem
first
[5 20]
[16 22]
second is [25 35]
then we solve this two problem recursively
this approach is much faster
Binary search o(log(m)*log(n))
binarySearch( matrix M, rows n, columns m, value v){
// Binary search the rows
nLow = 0, nHigh = n-1
while(nLow <= nHigh ){
nMid = (nLow + nHigh)/2
// Binary search through a column
mLow = 0, mHigh = m-1
while(mLow <= mHigh){
mMid = (mLow + mHigh)/2
if (v < M [nMid] [mMid])
mHigh = mMid-1
else if (v > M [nMid] [mMid])
mLow = mMid+1
else
return 1
}
// TODO: Figure out how to change nHigh and nLow
/*
* I got stuck here and I have to go to bed but this is what i have so far!
*/
}
}
search rows
in order to achieve better complexity we can iterate rows
and for each row do a binary search in their colums O( n * log(m) )
The algorithm works using elimination of rows and columns and its worst case complexity is O(n+m), where <n> is the count of rows and <m> is the count of columns.
Solution:
When moving left we eliminate all the values below that column.
Similarly, when moving down we eliminate all the elements to the left
of that cell in the current row.
bool find_int(int mat[][], int rows, int cols, int val)
{
int n = 0; // row index
int m = cols - 1; // column index
while(m >= 0 && n < rows){
int curval = mat[n][m];
if (curval == val) {
printf("found at: %-3d %-3d\n", n, m);
return true;
}
else if (curval > val)
--m;
else
++n;
}
return false;
}
public static void main(String[] args) {
int a[][] = { { 1, 14, 25, 35 }, { 2, 16, 28, 38 }, { 5, 20, 28, 40 },
{ 16, 22, 38, 41 }};
int noToFind = 41;
boolean found = false;
int foundi = 0;
int foundj = 0;
System.out.println(a.length);
System.out.println(a[0].length);
for (int i = 0; i < a.length; i++) {
if (a[i][0] < noToFind && a[i][a[i].length-1] < noToFind) {
continue;
} else if (a[i][0] < noToFind && a[i][a[i].length-1] >= noToFind) {
for (int j = 0; j < a[i].length; j++) {
if (a[i][j] == noToFind) {
found = true;
foundi = i;
foundj = j;
break;
}
}
}
if (found == true) {
break;
}
}
System.out.println("foundi -- "+foundi);
System.out.println("foundj -- "+foundj);
}
I have tried implementing the quareternary search
void quarternary_search(int a[][5],int r_begin,int r_end,int c_begin,int c_end,int x)
{
if (r_begin>r_end || c_begin>c_end)
return;
if (r_begin==r_end==c_begin==c_end)
{
if(x==a[r_begin][r_end])
{
std::cout<<"\nMatch found:";
return;
}
else
{
std::cout<<"\nMatch not found:";
return;
}
}
int r_mid=(r_begin+r_end)/2;
int c_mid=(c_begin+c_end)/2;
if (r_mid<r_begin || r_mid>r_end || c_mid<c_begin || c_mid>c_end)
return;
if(a[r_mid][c_mid]==x)
std::cout<<"\nMatch Found:";
else
{
int x1=abs(a[r_begin][c_begin]-x);
int x2=abs(a[r_begin][c_mid]-x);
int x3=abs(a[r_mid][c_begin]-x);
int x4=abs(a[r_mid][c_mid]-x);
if(x1==0 || x2==0 || x3==0 || x4==0)
{
std::cout<<"\nMatch Found:";
return;
}
if(x1<x2 && x1<x3 && x1<x4)
quarternary_search(a,r_begin,r_mid-1,c_begin,c_mid-1,x);
else if(x2<x4 && x3<x4)
quarternary_search(a,r_begin,r_mid,c_mid,c_end,x);
else if (x3<x4)
quarternary_search(a,r_mid,r_end,c_begin,c_mid-1,x);
else
quarternary_search(a,r_mid,r_end,c_mid,c_end,x);
}
return;
}
And I made a mistake, I have corrected my code
void quarternary_search(int a[][6],int r_begin,int r_end,int c_begin,int c_end,int x)
{
if (r_begin>r_end || c_begin>c_end)
return;
if (r_begin==r_end==c_begin==c_end)
{
if(x==a[r_begin][r_begin])
{
std::cout<<"\nMatch found:";
return;
}
else
{
std::cout<<"\nMatch not found:";
return;
}
}
int r_mid=(r_begin+r_end)/2;
int c_mid=(c_begin+c_end)/2;
if(a[r_mid][c_mid]==x)
std::cout<<"\nMatch Found:";
else
{
int x1=abs(a[r_begin][c_begin]-x);
int x2=abs(a[r_begin][c_mid]-x);
int x3=abs(a[r_mid][c_begin]-x);
int x4=abs(a[r_mid][c_mid]-x);
if(x1==0 || x2==0 || x3==0 || x4==0)
{
std::cout<<"\nMatch Found:";
return;
}
if(x1<x2 && x1<x3 && x1<x4)
quarternary_search(a,r_begin,r_mid,c_begin,c_mid,x);
else if(x2<x4 && x3<x4)
quarternary_search(a,r_begin,r_mid,c_mid+1,c_end,x);
else if (x3<x4)
quarternary_search(a,r_mid+1,r_end,c_begin,c_mid,x);
else
quarternary_search(a,r_mid+1,r_end,c_mid+1,c_end,x);
}
return;
}
//// Split the matrix into 4 rectangles along the diagonals and recursively search through /// left bottom and right top
#include<iostream>
#include<cstdlib>
#include<algorithm>
#define p(x) std::cout<< #x" : " << x << "\n"
int count,tra;
class Coordinate
{
public:
int x;
int y;
Coordinate( int X=0, int Y=0):x(X),y(Y){}
Coordinate operator+(int add){ x += add; y += add ; return *this; }
Coordinate operator-(int add){ x -= add; y -= add ; return *this; }
///void operator=(Coordinate& c) {this->x = c.x ; this->y = c.y }
bool isBefore(Coordinate& p) { return (this->x <= p.x && this->y <= y );}
void setToAverage( Coordinate& b, Coordinate& e) { this->x = (b.x+e.x)/2 ; this->y = (b.y+e.y)/2 ;}
bool inBound(Coordinate& end) { return ( this->x >=0 && this->x <= end.x && this->y >= 0 && this->y <= end.y ); }
bool isValid() { return !(this->x == -1 && this->y == -1 ); }
friend std::ostream& operator << (std::ostream& o, Coordinate& p);
};
Coordinate traverse(int **mat, int ele, Coordinate beg, Coordinate end, Coordinate matEnd)
{
Coordinate start(beg.x,end.y);
///p(start);
while ( start.inBound(matEnd) )
{
if ( mat[start.x][start.y] == ele )
return start;
else if ( ele < mat[start.x][start.y])
start.y -= 1;
else
start.x += 1;
//start = start - 1;
///p(start);
tra++;
}
return Coordinate(-1,-1);
}
std::ostream& operator << (std::ostream& o, Coordinate& p)
{
o << "(x,y) : ( " << p.x << " , " << p.y << ") \n";
return o;
}
Coordinate lookUpElement(int **mat, Coordinate begin, Coordinate end, Coordinate matEnd,int element);
Coordinate binarySearch(int **m, int fixedIndex, int beg, int end, bool isRow, int element);
Coordinate partitionAndSearch(int **mat, Coordinate begin, Coordinate end, Coordinate pivot ,int element, Coordinate matEnd)
{
Coordinate lowerLeftBegin ( pivot.x, begin.y), lowerLeftEnd( end.x, pivot.y-1 );
Coordinate lower = lookUpElement(mat,lowerLeftBegin, lowerLeftEnd, matEnd,element);
p(lowerLeftBegin);
p(lowerLeftEnd);
if ( lower.isValid() )
{
return lower;
}
Coordinate upperRightBegin ( begin.x, pivot.y), upperRightEnd( pivot.x-1, end.y );
p( upperRightBegin );
p( upperRightEnd );
return lookUpElement(mat, upperRightBegin, upperRightEnd , matEnd,element);
}
/*
10 20 30 40 50
100 200 300 400 500
1000 1020 1030 1040 1050
1031 1040 1050 1060 1090
*/
Coordinate lookUpElement(int **mat, Coordinate begin, Coordinate end, Coordinate matEnd,int element)
{
count++;
if ( !begin.inBound(matEnd) || !end.inBound(matEnd) )
{
return Coordinate(-1,-1);
}
else if ( mat[begin.x][begin.y] == element )
{
return begin;
}
else if ( mat[end.x][end.y] == element )
{
return end;
}
else if ( !begin.isBefore(end))
{
return Coordinate(-1,-1);
}
int diagDistance = std::min( abs(begin.x - end.x), abs(begin.y-end.y));
p(diagDistance);
p(begin);
p(end);
#ifndef OPTIMIZE
if ( !diagDistance )
{
bool isRow ( begin.x == end.x );
Coordinate o =binarySearch(mat, isRow ? begin.x : begin.y , isRow ? begin.y : begin.x , isRow ? end.y : end.x, isRow, element );
if ( o.isValid() )
return o;
}
#endif
Coordinate start(begin),p;
Coordinate last(begin.x+diagDistance, begin.y + diagDistance );
while( start.isBefore(last) )
{
p.setToAverage(start,last);
if ( element > mat[p.x][p.y] )
{
start = p + 1;
///p(start);
}
else
{
last = p - 1;
//p(last);
}
count++;
}
//p( start );
return partitionAndSearch(mat,begin,end,start,element,matEnd);
}
Coordinate binarySearch(int **m, int fixedIndex, int beg, int end, bool isRow, int element)
{
count++;
if ( beg > end )
return Coordinate(-1,-1);
int mid = beg + (end-beg)/2;
int chekElement = isRow ? m[fixedIndex][mid] : m[mid][fixedIndex];
if ( chekElement == element )
return ( !isRow ? Coordinate( mid, fixedIndex) : Coordinate(fixedIndex, mid) );
else if ( chekElement > element)
return binarySearch(m, fixedIndex, beg, mid-1, isRow, element);
else
return binarySearch(m,fixedIndex, mid+1, end, isRow, element);
}
/// To search call :
Coordinate res = lookUpElement(m, Coordinate(), Coordinate(N-1,N-1), Coordinate(N-1,N-1), key);
if ( res.isValid() )
{
std::cout << "element found at " << res ;
}
1. Compile without -DOPTIMIZE
2. For an matrix of 150*150 it took an average of 46 searches to find element by using the above method.
ie searched each of m[i][j] element for 150*150 elements and it took an average of 46 comparisons.
Whereas if you use traverse from top right to bottom left it took an average of 149 comparisons. ( see traverse function above )
For 1024 * 1024 matrix it takes an average of 72 searches only !!
I did this algorithm based in the next rules:
a matrix have corners (0,0) and (n,m)
if you have a matrix of 1x1 dimension check if the number is that cell.
if number < (0,0) ---> we know the number is not in the matrix.
if number > (n, m) ---> we know the number is not in the matrix.
I divide the matrix in 4 submatrix and do recursion wich each of them.
#!/usr/bin/perl
#
# search an element in a double sorted table
#
#
#
#
use warnings;
use strict;
#-------------------------------------------------------------------------------
sub table_fill
{
my $t = shift;
my $r = shift;
my $c = shift;
my $i = 0;
my $j = 0;
for($i = 0; $i < $r; ++$i){
for($j = 0; $j < $c; ++$j){
$t->[$i][$j] = ($i + 1) * 10 + $j;
}
}
}
#-------------------------------------------------------------------------------
sub table_print
{
my $t = shift;
my $r = shift;
my $c = shift;
my $i = 0;
my $j = 0;
print " table_print {\n";
for($i = 0; $i < $r; ++$i){
print " ";
for($j = 0; $j < $c; ++$j){
print " ".(@$t)[$i][$j];
}
print "\n";
}
print " }\n";
}
#-------------------------------------------------------------------------------
#
# log(rows+columns)
#
sub table_search
{
my $t = shift;
my $rows = shift;
my $columns = shift;
#submatrix
my $r1 = shift;
my $c1 = shift;
my $r2 = shift;
my $c2 = shift;
my $x = shift; #number to search
my $n = shift; #iterations
my $tab = '';
#check if we are out of the matrix
if( $r1 >= $rows ||
$r2 >= $rows ||
$c1 >= $columns ||
$c2 >= $columns )
{
print " out of matrix [$rows, $columns] ($r1,$c1) ($r2,$c2)\n";
return;
}
foreach my $i (0..$n) {
$tab .= ' ';
}
print "$tab ($r1, $c1) $t->[$r1][$c1] ($r2, $c2) $t->[$r2][$c2] ";
if( ($r1 == $r2 ) &&
($c1 == $c2 )
)
{
if( $x == $t->[$r1][$c1] )
{
print " t[$r1][$c1] == $x <==== FOUNDED in $n iterations\n";
return;
}
print "\n";
return;
}
#check if $x is in the left of the table
if( $x < $t->[$r1][$c1] )
{
print "sub matrix descarted $x < $t->[$r1][$c1]\n";
return;
}
#check if $x is in the right of the table
if( $x > $t->[$r2][$c2] )
{
print "sub matrix descarted $x > $t->[$r2][$c2]\n";
return;
}
print "\n";
my $pivot_r = int( ($r1 + $r2) / 2 );
my $pivot_c = int( ($c1 + $c2) / 2 );
table_search($t ,$rows, $columns
, $r1 , $c1 , $pivot_r, $pivot_c, $x, $n + 1);
table_search($t ,$rows, $columns
, $r1 , $pivot_c + 1 , $pivot_r, $c2 , $x, $n + 1);
table_search($t ,$rows, $columns
, $pivot_r + 1 , $c1 , $r2 , $pivot_c, $x, $n + 1);
table_search($t ,$rows, $columns
, $pivot_r + 1 , $pivot_c + 1 , $r2 , $c2 , $x, $n + 1);
}
#-------------------------------------------------------------------------------
sub table_search_all
{
my $t = shift;
my $rows = 10;
my $columns = 20;
table_fill($t ,$rows,$columns);
table_print($t,$rows,$columns);
my $i = 0;
my $j = 0;
print " table_search_all {\n";
for($i = 0; $i < $rows; ++$i){
for($j = 0; $j < $columns; ++$j){
print " searching $t->[$i][$j] : \n";
table_search($t, $rows, $columns,0,0,$rows - 1, $columns - 1,$t->[$i][$j],0);
}
}
print " }\n";
}
#-------------------------------------------------------------------------------
sub main
{
my @t;
print "search and element in a double sorted table\n";
table_search_all(\@t);
}
#-------------------------------------------------------------------------------
main();
Assuming the standard rows and columns sorted property, you cannot do better than O(n+m). There are known proofs. People doing binary search etc should try proving the correctness of their algorithm once in a while.
Perhaps you should try proving the correctness of your lower bound -- there's an O(min(m, n) log (max(m, n)/min(m, n)))-time algorithm, which is o(m + n) if one dimension is significantly larger than the other.
@Anon: Take this statement for the case m=n. Now this blows every other binary search answer out the water.
What you are quibbling about is minor and missed the point entirely.
If you're going to be a pedantic jerk, at least be right, like the interviewer was in believing that there's a solution that, in general, bests the obvious O(m + n) algorithm.
The idea is to use quaternary search tree so the complexity T(m*n) = 3 * T(m*n/4) + c it results in O((m*n)^log 3 base 4) complexity - it is not the best result available, see below discussion for a better result
- LinhHA05 July 26, 2013