## Amazon Interview Question

Software Engineer / Developers**Country:**India

This looks like a clustering problem except that we need to find just one cluster of k closest elements rather than finding clusters for all elements (i.e., k-means clustering which is NP-hard). We can have an O(n^2 lgn) solution where for each point we compute its distance from other points. For each point, we have a list of n distances. We can sort each list in O(nlgn) and find k-1 smallest element and compare the sum of distances of these points with the already calculated minimum sum of distances and replace if needed.

Total time complexity = n * (O(n) + O(nlgn)) = O(n^2 lgn).

We can use the order statistics method to find k-1 smallest elements in each list, which makes the algorithm O(n^2).

This won't work. You are considering distance from a point to other k-1 points only for minimization rather you should minimize sum of pairwise distance.

e.g. When we find closest triplet of points then it means we have to find a triangle from set of points which has minimum perimeter i.e. we minimize AB+BC+CA not AB+BC.

That's a great point Cerberuz, and might make the problem more challenging. There could be two interpretations possible from the problem statement.

1. Find k closest points such that sum of distance among all the points is minimized.

2. Find a circle with smallest diameter that can contain k points (k closest points in as small space as possible).

I think the problem statement hints towards the second interpretation, so we need to find k elements such that the max distance between any two points is minimized (smallest diameter). However, even if we consider the second case, things are not easy. Minimizing the sum of distances of k-1 points from the center (kth point) does not ensure minimizing the diameter of the circle (which my earlier post assumed). The diameter represents the maximum distance between any two points in the sub-cluster. Let us see how can we minimize the diameter :

(i) Initially maintain an NxN matrix and compute distances of all points from each other O(n^2).

(ii) Take a point p1 and find k-1 closest elements to p1.O(n)

(iii) Maintain a heap of distances of all k points with each other. Build heap - O(k^2) time, O(k^2) space. Also find Sk for k points which is the sum of distances from the rest of k-1 points.

(iv) Get the next point (k+1) and find its maximum distance from any of the k points. If this maximum is greater than the max Heap, ignore this point and move to the next one. Else remove that point of the max Heap distance (each distance has two points associated with it) whose Sk is larger. Now remove all k distances of that point from the heap, and insert new k distances in the heap. O(klgk + klgk) = O(klgk).

(v) Repeat point (iv) for the rest of the points.

Total time complexity = O(n^2) + O(k^2 + nklgk) = O(n^2).

-sort the points based on the distance between them and (0,0). Now start with two pointers: one = 0, and two = k; Compute the total distance between one and two , if its < min then store it in min. increment one and two , the new distance is D = previous_distance - distanceBetween(point[one-1], point[one]) + distanceBetween(point[two], point[two-1]); if distance < min store it in min and so on till two goes over the point.length.

Each time we store a new value in min we also store in 2 variables the values of "one" and "two" . At the end we return all the points betweeen one and two.

What is the meaning of k closest here ?

- Cerberuz January 08, 2013If its minimizing sum of pairwise distance then i think there is no polynomial time solution for this problem.