Amazon Interview Question
SDETsCountry: India
I don't think you can get better than O(n^2) since there are O(n^2) pairs, and if you wan't to process them all in some way (print, append to vector etc) you're going to have to do O(n^2) prints, appends, whatever.
Not sure why this got downvoted. As anon above me said, you have O(n^2) solution, so how can you find a O(n) solution?...
A more interesting problem might be return the number of pairs given these contraints
yes, it requires O(n^2) time..
int a[] = {5, 4, 3, 2, 1};
in this case need to print all possible 10 pairs, so O(n^2)..
in O(n) time can find a pair a[i] > a[j] and j-i is max
> Not sure why this got downvoted
Upvote it :) After all, this asymptotically optimal solution written in a simple form.
A = 20, 19, 18, 17 ,16
B= 1 , 2 , 3 , 4 ,5
for each i < j, there is a pair
so, there will be O(N^2) possible pair and so complexity.
While it's true that this is asymptotically optimal if the number of pairs returned is as large as it can be, a more interesting way to solve the problem might be to give an algorithm that runs in, say, O(n log n + r), where r is the number of results.
How would it be more interesting to solve the problem in O(nlogn + r)? r = n^2 so you are pretty much asking to solve this in nlogn + n^2 as opposed to n^2, both of which is O(n^2) except what you suggest is slightly worse than the solution we are commenting on.
> How would it be more interesting to solve the problem in O(nlogn + r)?
This makes perfect sense. There's a special term in CS for this, google for "Output-sensitive algorithm".
I think a suitable approach could be using a Binary Search Tree for the elements of B that have an index j>i.
We iterate from i=A.length-1 to 0 and keep updating the BST at each iteration by adding 1 element B[j] with j=i+1 (add an element to a BST costs O(logn)). Complexity O(nlogn)
If you have the pairs:
(1,2)(1,3)(1,4)(1,7)
you can retrieve them as:
(1,(2,3,4,7))
and you won't loss any information;
with my algorithm for each i you should go down the BST built with the elements of B[] with index j > i, and associate that i with the sublist of the elements already present in the BST that are smaller of A[i].
so:
Build a Binary Search Tree of the elements of the array B with index j>A.length.
for each i from A.length-1 to 0; O(n)
- add the element B[i+1] (if exists) to the BST; O(logn)
- go down the BST (that holds the elements of B[] with index j>i) and find which should be the position of A[i] in the BST O(logn)
- associate i with the sublist of smaller elements in the BST.
Adding an element to a BST is O(n) complexity and not O(log n).
Consider a binary tree with 3 elements {6,7,9). Now imagine you add 10 to it, starting from the root you will need to iterate through all the elements, in other words the height of the tree would also happen to be 'n' in case you have sorted elements in your array.
So, even for the BST solution you proposed the complexity would be O(n^2)
I'm pretty sure that adding an element to a BST is O(logn), such as searching for an element or deleting it. The cost of those operations on a BST is proportional to the height of the BST. The height of a BST is log(n). The worst case is present only if the height of the BST is n, that is a completely unbalanced tree. But in this case the tree behave like a linked list, and is called a degenerate tree.
How do you figure that the tree behaves like a linked list and is infact a degenerate tree?
Consider a scenario where Array B is a sorted array in the increasing order. This would lead us to a completely unbalanced BST.
In such a situation the height of the tree = the number of nodes and add and search operations would both be O(n).
Your solution would no doubt be a good one to discuss with the interviewer, but the worst case time complexity would still be O(n^2)
Therefore you recognize that this has O(nlogn) complexity and a worst case of O(n^2), that is definitely better than an algorithm that has a complexity of O(n^2) in the average and worst case. Example given: the quick sort algorithm has an average compexity of O(nlogn) and a worst case of O(n^2), and is definitely better than bubble sort that is O(n^2) in both average and worst case...
Saying O(something) itself means worst case complexity.
You can argue that your approach has a better average complexity than O(n^2), but average complexity is not denoted by 'O'.
Sorry, I still do not agree...to describe the complexity of an algorithm you can use both average and worst case complexity, both described by the Big O notation. Again take into account QuickSort that is a well known algorithm, and has an average case performance of O(nlogn) and a worst case performance of O(n^2). And it is always presented as one of the best perfoming sorting algorithms, often with better performance than MergeSort that has a complexity of O(nlogn) in both average and worst case.
In practice QuickSort is usually implemented with random median selection and with optimizations like dutch-flag partitioning. That makes algorithm *expected* time complexity O(NLogN) regardless of the input!
You won't be able to construct anything like this for the problem we are discussing.
Even unmodified quicksort works well with random input. It has average complexity O(NlogN). For the problem we are discussing - no algorithm will work great with random input since in our case expected number of pairs in a random input will be O(N^2).
So this is fundamentally O(N^2) problem. The only thing you can do you is to optimize special cases where expected number of pairs is asymptotically lower than O(N^2).
Alternatively, you can cross the line and apply various form of cheating - see the idea of O(1) "algorithm" described in comments.
If you have the pairs:
(1,2)(1,3)(1,4)(1,7)
you can retrieve them as:
(1,(2,3,4,7))
and you won't loss any information;
with my algorithm for each i you should go down the BST built with the elements of B[] with index j > i, and associate that i with the sublist of the elements already present in the BST that are smaller of A[i].
so:
for each i from A.length-1 to 0; O(n)
- add the element B[i+1] (if exists) to the BST; O(logn)
- go down the BST (that holds the elements of B[] with index j>i) and find which should be the position of A[i] in the BST O(logn)
- associate i with the sublist of smaller elements.
Think dynamically ...supose we have a array a1,a2,a3,a4...an.Now divide this into two parts(a1 to an-1) and an.Now comparing the max of second part with min of first part of b that is (b1,b2,tobn-1) will give us valid result (adjust invarient case when n>m).Now recurse using n iteration. Now sub problem is finding the max of subpart of a and min of sub part of b. efficiently. I think you know the answer use priority queue.
There *is* a way we can talk about a worst-case bound here while making apparent the O(NlogN) nature of this solution. We can say it is O(NlogN + R), where R is the size of the result set. That is a worst-case bound if we use a self-balancing BST, which guarantees O(logN) per lookup in the worst case.
@teli: standard BST implementations actually do have a worst-case of O(logN), so while many of your points were good, I can see why someone would downvote that particular comment (though it wasn't me). These BST implementations use tree rebalancing techniques to provide these worst-case guarantees.
Let's try not to use the word "idiot" around here because there may be aspects you're not considering, or even when someone's wrong they may just be confused or uninformed :)
The problem is basically counting number of inversions in an array.
Use modified merge sort.
1.combine A and B arrays into C.
2.label each element in C with either A or B.
3.Count number of inversion pairs using merge sort.
4. For each inversion pair , reject if label(i) == label(j)
I voted you up, then I voted you down.
Your solution may be correct but it's not more efficient than the naive approach.
Assuming A[i] > B[j] for all i,j (or even only i < j) then there are O(len(A)*len(B)) pairs and simply printing them to output would take O(len(A)*len(B)), same as the naive approach.
If you only meant to return the number of pairs your solution might have been faster - if there wasn't the need to check every pair's labels, hence making it O(#pairs) = O(len(A)*len(B)) again and not any better than the naive solution.
Since algorithm has to produce ALL pairs, it is easy to show that bunch of worse cases exist where algorithm have to produce O(N^2) pairs. Thus, you won't be able to construct anything asymptotically better than O(N^2).
This is the correct answer. If you don't have to specifically report all the pairs in (int1, int2) format, you might be able to make an algorithm with a better time complexity (something like gigi84's algorithm), for example if all the O(n^2) pairs can be easily extracted from the data structure but creating the data structure is less than O(n^2). But in that case where do you draw the line, as you could just give the original arrays and a naive pair-extraction algorithm and claim O(1) time complexity.
Maybe a custom type of quad-nary tree of either A or B maybe such that each node has four leaves instead of two:
X > A[i], y > i -- node A[i], i -- X < A[i] , y > i
X > A[i], i < y / \ X < A[i], i < y
And so on, but we would also need some internal balancing scheme similar to red black tree to fully ensure logarithmic run time, which would probably be non-trivial to implement.
Another approach could be Java TreeMap. Store all the element of second array in Treemap. Format of Treemap would be array-values as key and array-index as value.
Now for each element X of first array,do a headMap() on hashmap. It will give all the values which are less than X. Now choose the appropriate index from sub map.
#include <bits/stdc++.h>
using namespace std;
long long tree[200005];
int A[200005];
int B[200005];
long long query(int idx)
{
long long res = 0;
while ( idx > 0 ) {
res += tree[idx];
idx -= (idx & (-idx));
}
return res;
}
void update(int idx)
{
while ( idx <= 100001 ) {
tree[idx] += 1;
idx += (idx &(-idx));
}
return;
}
int main()
{
int t=1,n;
while ( t-- ){
cin >> n;
for ( int i = 0; i < n; i++ ) cin >> A[i];
for ( int i = 0; i < n; i++ ) cin >> B[i];
memset(tree, 0, sizeof(tree));
long long ans = 0;
for ( int i = n-1; i >= 0; i-- ) {
ans += query(A[i]-1);
update(B[i]);
}
cout << ans << endl;
}
return 0;
}
OK, as many of us concluded it's impossible to construct algorithm for this problem with time complexity better than O(N^2) [in worst case]. What to do when interviewer still expects O(N) solution for worst case [where O(N^2) pairs has to be reported]?
1) Provide a formal proof that this is impossible in a classic computation model (turing machine) [providing an argument about # of pairs is enough].
2) Tell the interviewer that we're engineers and we are going to solve this no matter what. Show what constraints we can change to provide O(N) time complexity or better.
For instance we can escape from classical sequential model and design a system (either design a new hardware system or build a distributed systems with many processors/computers) - where we can leverage O(N) processors with very efficient communication between each pair of processors. In this case O(N) solution is possible. We can find O(N^2) pairs and report them in O(N) time into a distributed storage across our O(N) nodes.
If I was an interviewer at Amazon I'd be happy to hear such answer.
Maybe I'm missing something but this seems really trivial:
have two opposing indices and increment/decrement comparing every time they move.
Javaish:
int i = 0;
int j = B.size() - 1;
boolean increment = true;
while(i < j) {
if(A[i]>B[j]) {
// add pair
}
if(increment) {
i++;
} else {
j--;
}
increment = !increment;
}
// return pairs
import java.util.HashMap;
public class Inversion{
int sum =0;
public int [] merge(int a[], int s, int m, int e, HashMap<Integer, Integer> index){
int l1 = m-1;
int [] helper = clone(a);
int r = s;
int inv = 0;
int sum=0;
while(s <= l1 && m <= e){
if(index.get(a[s]) <= index.get(a[m])){
sum += inv;
helper[r++] = a[s++];
}
else{
if(sum == 0)
sum = inv;
inv++;
sum += 1;
helper[r++] = a[m++];
}
}
while(r < e+1){
if(s > l1){
helper[r++] = a[m++];
}
else{
if(sum == 0)
sum += inv;
helper[r++] = a[s++];
}
}
this.sum += sum;
return helper;
}
public int [] countInversions(int a[], int s, int e, HashMap<Integer, Integer> index){
if(s > e || s > a.length-1 || e < 0){
return null;
}
if(s == e){
return a;
}
else{
int m = (s+e)/2;
a = countInversions(a, s, m, index);
a = countInversions(a, m+1, e, index);
a = merge(a, s, m+1, e, index);
return a;
}
}
public int countInversions(int a [], HashMap<Integer, Integer> index){
a = countInversions(a, 0, a.length-1, index);
for(int i:a){
System.out.print(i+" ");
}
System.out.println();
return this.sum;
}
private int [] clone(int a[]){
int helper [] = new int[a.length];
for(int i=0; i<a.length; i++){
helper[i] = a[i];
}
return helper;
}
private static HashMap<Integer, Integer> reverseIndex(int a[]){
HashMap<Integer, Integer> index = new HashMap<Integer, Integer> ();
for(int i =0; i<a.length; i++){
index.put(a[i],i);
}
return index;
}
public static void main(String [] args){
int a[] = {5,4,3,2,1};
int b[] = {1,2,3,4,5};
HashMap<Integer, Integer> index = reverseIndex(b);
Inversion inv = new Inversion();
System.out.println(inv.countInversions(a,index));
}
}
Sort the two arrays and start an imaginary merge process by keeping the index i and j at a and b respectively.Whenever we find a[i]>b[j] and i<j, find the ceil(a[i]) in b[j+1....b.size], that will give us the count of pairs corresponding to each a[i] found.Continuing in this manner, all pairs corresponding to each a[i] can be found in O((M+N)Log(M+N)).
/* package whatever; // don't place package name! */
import java.util.*;
import java.lang.*;
import java.io.*;
/* Name of the class has to be "Main" only if the class is public. */
class Ideone
{
public static void printPair(int[]a ,int[] b)
{
for(int i=0;i<a.length-1;i++)
{
for(int j= i+1;j<b.length;j++)
{
if(a[i] > b[j])
{
System.out.println("("+a[i]+","+b[j]+")");
}
}
}
}
public static void main (String[] args) throws java.lang.Exception
{
// your code goes here
int a[] ={2,7,3,4,9,10};
int b[] ={8,4,9,3,2};
printPair(a,b);
}
}
According to me the complexity of for loop solution is n log m and not n^2 because execution of inner loop is decreasing as value of i is increasing.
#include<stdio.h>
#include<stdlib.h>
int main(int argc,char *argv[])
{
int A[] = {1,3,4,8,2,5,6};
int B[] = {8,2,7,9,4,1};
int i,j;
int len1 =0;
int len2 =0;
i = 0;
while(A[i])
{
len1++;
i++;
}
j =0;
while(B[j])
{
len2++;
j++;
}
printf(" length of A %d",len1);
printf("length of B %d\n",len2);
for(i=0;i<len1;i++)
{
for(j=i+1;j<len2;j++)
{
if(A[i]>B[j])
{
printf("pairs are %d%c%d\n", i+1,',',j+1);
}
}
}
}
~
~
~
~
My approach:
1. Create a binary search tree for the array with longer length where each node in the BST will have value and the index
2. For each element in the other array, search for the appropriate nodes based on greater or lesser needed, print all the nodes from the subtree.
Complexity: it is still quadratic. But it has potential of elements parsing to half for the larger array.
@0xF4, if the tree is self balancing, then the complexity will be what I have mentioned.
Victor: even if tree is self-balancing the complexity is still O(N^2).
Note, that you have to find ALL the pairs. In worst case (or on a random input) there will be O(N^2) pairs to report, so overall complexity is O(N^2). The fact that you propose to use self-balancing tree doesn't help here.
You inrecment both i and j. But what if you can increment only the i and next maximum will be bigger then B1[j]?
- Anonymous December 26, 2014