Amazon Interview Question
Country: United States
Interview Type: Phone Interview
Your analysis is incorrect. This is an O(n^2) solution. In the best case, your solution is O(n) but that is only if you get really lucky and you choose the pivot correctly first try.
On average this solution should be O(nlog(n)).
If pivot is choosen using right algorithm....quickselect gives linear time complexity on average case as we discard the 2nd half always....if Quick Sort is known as O(n log n) algo because of its average case this should be known as O(n) algo.....although its worst case should be still O(n^2)
This is the algorithm that I gave him:
1) Go through all the points once to calculate distance of each of the points from (0, 0).
2) Copy first 100 points in an array.
3) Sort the array
3) Starting point number 101, compare it to the last element of the array.
If the point is smaller, swap the points.
Swap the new last point in the array with points that come before it until the array is sorted again.
Repeat step 3 until end.
for any element that fits into smallest 100, you need to rearrange the heap / array to keep heap / array in sorted order, this will take log 100... you need to do this exercise for all n elements... so time complexity is O( n log 100) and space complexity is O(100)
O( n log 100) = O(n * constant K) since log 100 is a constant = K O(n) = O(n)
so the solution have time complexity O(n)
A max heap of size 100 is better than the array implementation for array contant will be 100log100 while for max heap it will be just log100
Nope. It's max heap. Insert the first 100 distances in the max heap thus the root node will have the maximum of the 100 distances. For every following distance check if the distance is less than the root (max) distance in the heap. If it is less delete the root and insert the new distance.
What if the next point is greater than root, discard it? and when the root node is deleted, the heap is balanced again?? Could you please elaborate the answer a bit.
I would use a min heap, as I go along and compute distances. Retrieval will be in O(1) time. The complexity will be O(N)+O(N*logN). But when N is million, log N would be smaller than the constants in previous solutions.
here my C++ solution:
#include <iostream>
#include <vector>
#include <limits>
#include <algorithm>
using namespace std;
typedef struct dist_ind {
float dist;
int index;
dist_ind(float dist_, int i): dist(dist_), index(i) {}
}dist;
typedef struct pos {
int x;
int y;
pos(int x_, int y_): x(x_), y(y_) {}
}pos;
vector<pos> find_k_clostest_points(vector<pos> vec, int x, int y, int k) {
vector<dist_ind> v;
vector<pos> out;
if(vec.size() - 1 <= k) {
return vec;
}
for(int i = 0; i < vec.size(); i++) {
float distance = sqrt(pow(abs(x - vec[i].x), 2) + pow(abs(y - vec[i].y), 2));
v.push_back(dist_ind(distance, i));
}
for(int i = 0; i < k; i++) {
float min = numeric_limits<float>::max();
int min_index = 0;
for(int j = 0; j < v.size(); j++) {
if(v[j].dist < min) {
min = v[j].dist;
min_index = j;
}
}
out.push_back(vec[min_index]);
v[min_index].dist = numeric_limits<float>::max();
}
return out;
}
int main() {
// your code goes here
vector<pos> vec = {{1,2}, {3,4}, {5,6}, {20,30}, {2,2}, {10,11}, {4,4}};
vector<pos> k_points = find_k_clostest_points(vec, 0, 0, 3);
for_each(k_points.begin(), k_points.end(), [](pos p) { cout << p.x << ", " << p.y << endl; } );
return 0;
}
here the improved version with a heap:
#include <iostream>
#include <vector>
#include <limits>
#include <algorithm>
#include <queue>
using namespace std;
typedef struct dist_ind {
float dist;
int index;
dist_ind(float dist_, int i): dist(dist_), index(i) {}
}dist;
typedef struct pos {
int x;
int y;
pos(int x_, int y_): x(x_), y(y_) {}
}pos;
typedef struct comparison {
bool operator() (const dist_ind &lhs, const dist_ind &rhs) {
return lhs.dist < rhs.dist;
}
}comparison;
vector<pos> find_k_clostest_points2(vector<pos> vec, int x, int y, int k) {
priority_queue<dist_ind, vector<dist_ind>, comparison> q;
vector<pos> out;
if(vec.size() - 1 <= k) {
return vec;
}
for(int i = 0; i < vec.size(); i++) {
float distance = sqrt(pow(abs(x - vec[i].x), 2) + pow(abs(y - vec[i].y), 2));
if(q.size() == k) {
if(q.top().dist > distance) {
q.pop();
q.push(dist_ind(distance, i));
}
}
else {
q.push(dist_ind(distance, i));
}
}
while(!q.empty()) {
out.push_back(vec[q.top().index]);
q.pop();
}
return out;
}
int main() {
// your code goes here
vector<pos> vec = {{1,2}, {3,4}, {5,6}, {20,30}, {2,2}, {10,11}, {4,4}};
vector<pos> k_points = find_k_clostest_points2(vec, 0, 0, 3);
for_each(k_points.begin(), k_points.end(), [](pos p) { cout << p.x << ", " << p.y << endl; } );
return 0;
}
(1) Find the distances of the points from (0,0) - O(N)
(2) Find the minimum value in the list and store the point and remove it from the list - O(N)
(3) Do step (2) until 100 points are found
Complexity:
O(N) + O(100 * N) = O(N)
Ya but what if we are given (N/3) in place of 100 or some really big number...it becomes O(n^2) right?
Top solution using quick select has better chances than this on that case...
Distance from (0,0) is min if |x| + |y| is min for any cor-ordinates.
I use two stacks for this algo.
1. if main stack is empty insert insert (x,y)
else
2.if |x|+|y|<Main Stack.top(x,y) then pop it and put in second stack.Keep popping unless you get a value less than eq to |x|+|y|
3.Insert x,y, into main stack
4.Pop all elements from second stack and put it back into main stack.
At any time the main stack contains points at min distance for (0,0)
Let me know if it can be improved
Can be done in O(n) without additional memory... Just modify quick select(selection problem) a bit...here is an tested implementation in java:
- Anshul December 02, 2012