Amazon Interview Question
SDE1sCountry: United States
The algorithm is correct. When you compare across the "y" axis for seven consecutive points, it makes sure that any point with distance smaller than that of sub-problems is detected.
For a proof of correctness, I suggest you refer to the Algorithms course in Stanford by "Tim Roughgarden". He solved it as an example for divide and conquer (I took it on coursera).
Yes, but this solves only the case when k is 2, What about the others? I think this algorithm is NP hard and only an approximate algorithm would work in the general case
I guess the question is finding "k" closest pair of nodes. That is, we consider n choose 2 cases and sort them based on distance and then return the "k" smallest. Otherwise, depending on the definition of "k closest points", the problem can be much harder.
For the former case, the algorithm above is better than listing all pairs since it is O(knlogn). If k = n, the complexity will be O(n2 logn) which is that of sorting all pairs.
Clarification Required:
K closest point among each other?
OR
K closest point from origin?
OR
K closest point from some given point?
Just a random thought:
If the problem were on 1D, the solution would be to sort the points in ascending order, then you would take the first k items and consider that your k-set of closest points with a cost equal to the difference of the k point minus the first one, then you'd visit k + 1 point and see if the distance between the k + 1 point and second one is smaller than the current minimum cost, if so, shift the k-item one point to the right, otherwise start a new k window with its first index located at the current point and repeat the procedure until you cannot create a k window anymore. By that time you'd have the k-set with the closest items, though, of course, you have to consider the edge case where you have less than k items in your original set.
That algorithm takes O(nlgn + n)
If we take close notice, the invariant we need to hold for the algorithm to work is that for all 1 <= i <= n - 1 point it is true that the euclidean distance d(i - 1, i) <= d(i - 1, i + 1), so can we still hold this property on a 2D scenario? I think so.
If you sort the items by its euclidean distance to the origin (in the scenario where all points are positive), then by triangulation the property holds. In the edge case where you have negative values for x or y axis, the property may break, so to fix that you'd have to locate the new origin with its x-axis equal to the smallest x value from the list of points and similarly, with the y-axis equal to the smallest y value from the collection.
This algorithm takes O(n) to find the smallest x and y values, O(lgn) to sort them in ascending order by its distance to the new origin and finally O(n) to get the k-set of closest points (using the euclidean distance as a way to calculate the cost of the current k-window).
You can use k means algorithm.
For example n = 12 and k = 3. So we have to create 4 clusters (call it 'nc' - number of clusters) which contain 3 points.
1) Choose randomly 'nc' - 4 points as an initial cluster's centers.
2) Assign remaining points to the centers (closest points to the center's center are assigned to it). in this way we have just created clusters.
3) Recompute a center of each cluster. New cluster's center is the average of coordinates of all points in the cluster.
4) Repeat 2,3 till there is no changes in the clusters. In other words we found nc clusters which contain k closest points.
5) Check the total distance between points in each cluster and choose the one for which the total distance is minimum.
For small "k" this can be done in O(k N log(N)) using divide and conquer.
- Ehsan January 12, 2014Start by sorting the list of points with respect to "x" and "y". Let's form two sorted arrays this way, namely Ax and Ay.
1- Take Ax and divide it into lower and upper halves (roughly N / 2 length). That is, divide the array around its median (say Mx).
2- Run the algorithm for the first half and then second half, independently (divide and conquer: this is a recursive call).
3- Lets assume by solving it over the two subarrays, "delta" is the minimum length found.
NOW We want to see if the two closest points are in different subarrays.
4- Now choose all points where "|x - Mx| < delta".
5- Go through all such points, (at most N). For a given point (x, y), look for the next "7" points on the array sorted with respect to "y". This will take O(N) time. At the end, if you find a pair with distance less than "delta", return it as solution, otherwise, return "delta".
I saw this algorithms in some course notes from Algorithm I in Stanford.