Microsoft Interview Question
Software Engineer / DevelopersCountry: United States
Interview Type: In-Person
I think this question is similar to the closest pair in a given set of point. Instead of taking the mid point we consider the given point p(x,y) and compute the points based on x co-ordinate which are closest. Repeat for the closest point with the y - co ordinate. I think this will give the answer.
I wrote code for this problem using KDTree, you can read the theory of KD Tree in wikipedia. Here is the implementation:
int compare(point p1, point p2, bool divX)
{
if(divX)
return (p1.x < p2.x)?-1:((p1.x>p2.x)?1:0);
else
return (p1.y < p2.y)?-1:((p1.y>p2.y)?1:0);
}
int partition(point *p, int s, int e, int pivot, bool divX)
{
swap(p[pivot], p[s]);
pivot = s;
while(s <= e)
{
while(s <= e && compare(p[s], p[pivot], divX) <= 0) s++;
while(s <= e && compare(p[e], p[pivot], divX) >= 0) e--;
if(s < e) swap(p[s], p[e]);
}
swap(p[pivot], p[e]);
return s-1;
}
// amortized O(n) method to find the nth element(0-based) in an array
int nthElement(point *p, int s, int e, int n, bool divX)
{
if(s == e) return s;
// get a random index between [s,e] inclusive
int RND = s + rand()%(e-s+1);
int pivot = partition(p, s, e, RND, divX);
int k = pivot - s;
if(n < k){
return nthElement(p, s, pivot-1, n, divX); // go left
}
if(n > k){
return nthElement(p, pivot+1, e, n-k-1, divX); // go right
}
return pivot;
}
// build the KD tree
// third parameter is for the division dimension
void build(point *p, int s, int e, bool divX)
{
if(s>=e) return;
// we will try to find the median of p[s..e]
int mid = (e+s)/2;
int n = mid - s; // we are interested in the median
nthElement(p, s, e, n, divX);
build(p, s, mid-1, !divX);
build(p, mid+1, e, !divX);
}
long bestDist = 1<<30;
int bestNode = -1;
void findNearestNeighbour(point *p, int s, int e, int x, int y, bool divX)
{
if(s >= e) return;
int mid = (e+s)/2;
point midp = p[mid];
long dx = x - midp.x;
long dy = y - midp.y;
long d = dx*dx + dy*dy;
if(bestDist > d)
{
bestDist = d;
bestNode = mid;
}
int s1 = s, e1 = mid-1;
int s2 = mid+1, e2 = e;
long delta = divX?dx:dy;
if(delta > 0)
{
swap(s1, s2);
swap(e1, e1);
}
findNearestNeighbour(p, s1, e1, x, y, !divX);
long delta2 = delta*delta;
if( delta2 < bestDist )
findNearestNeighbour(p, s2, e2, x, y, !divX);
}
Usage:
You need to first build the kd tree. Above method builds the tree in the array itself, it uses the quicksort partition algorithm to keep the tree balanced with max O(logn) height. This ensures that the query execution will tak O(logn).
// build the tree
build(points, 0, n-1, true); // n no. of points, true means we start with splitting by x-axis
// how to query the tree
findNearestNeighbour(points, 0, n-1, x, y, true);
// here (x,y) is the input point
I almost got this, but not quite. You can see that (4, 4) throws my approach due to hits at both x=4 and y=4, neither of which are the true nearest neighbor. Basically, I'm using a balanced BST for x-coords and y-coords, with the intuition being that we always want to minimize sqrt(x-delta^2 + y-delta^2), which means we want to only compare those cities which are closest in x and y:
package question;
import java.util.Arrays;
import java.util.LinkedList;
import java.util.List;
public class Main {
public static class City {
private String label;
private int x;
private int y;
public City(String label, int x, int y) {
this.label = label;
this.y = y;
this.x = x;
}
public double distance(int x, int y) {
return Math.sqrt(Math.pow(this.x - x, 2) + Math.pow(this.y - y, 2));
}
}
public static class Node {
Node l, r;
int val;
List<City> cities = new LinkedList<City>();
public Node(int v) {
val = v;
}
}
public static class Tree {
Node root = null;
public void insert(int n, City c) {
if(root == null) {
root = new Node(n);
root.cities.add(c);
}
else {
insertHelper(root, n, c);
}
}
private void insertHelper(Node node, int n, City c)
{
if(n == node.val) {
node.cities.add(c);
}
else if (n < node.val)
{
if (node.l == null) {
node.l = new Node(n);
node.l.cities.add(c);
}
else
insertHelper(node.l, n, c);
}
else
{
if (node.r == null) {
node.r = new Node(n);
node.r.cities.add(c);
}
else
insertHelper(node.r, n, c);
}
}
public List<City> nearest(int n) {
Node prev = root, curr = root;
while(curr != null && curr.val != n) {
prev = curr;
if(n < curr.val) {
curr = curr.l;
}
else {
curr = curr.r;
}
}
return (curr == null)? prev.cities : curr.cities;
}
}
private static final City DENVER = new City("Denver", 0, 0);
private static final City SEATTLE = new City("Seattle", -4, 5);
private static final City PORTLAND = new City("Portland", -2, 4);
private static final City SANFRANCISCO = new City("San Francisco", -3, -2);
private static final City LOSANGELES = new City("Los Angeles", -5, -3);
private static final City CHICAGO = new City("Chicago", 2, 3);
private static final City ATLANTA = new City("Atlanta", 3, 0);
private static final City MIAMI = new City("Miami", 4, -5);
private static List<City> cities = Arrays.asList(DENVER, SEATTLE, PORTLAND,
SANFRANCISCO, LOSANGELES,
CHICAGO, ATLANTA, MIAMI);
// Trees balanced by insertion order, not self-balancing
private static Tree xt = new Tree();
static {
xt.insert(DENVER.x, DENVER);
xt.insert(SEATTLE.x, SEATTLE);
xt.insert(LOSANGELES.x, LOSANGELES);
xt.insert(SANFRANCISCO.x, SANFRANCISCO);
xt.insert(PORTLAND.x, PORTLAND);
xt.insert(ATLANTA.x, ATLANTA);
xt.insert(CHICAGO.x, CHICAGO);
xt.insert(MIAMI.x, MIAMI);
}
private static Tree yt = new Tree();
static {
yt.insert(DENVER.y, DENVER);
yt.insert(ATLANTA.y, ATLANTA);
yt.insert(PORTLAND.y, PORTLAND);
yt.insert(CHICAGO.y, CHICAGO);
yt.insert(SEATTLE.y, SEATTLE);
yt.insert(LOSANGELES.y, LOSANGELES);
yt.insert(SANFRANCISCO.y, SANFRANCISCO);
yt.insert(MIAMI.y, MIAMI);
}
public static void main(String[] args) {
search(0, 0);
search(0, 1);
search(0, 2);
search(2, 2);
search(4, 4);
search(0, -1);
search(0, -2);
search(-2, 2);
search(4, -4);
}
private static void search(int x, int y) {
System.out.println("Searching for city nearest (" + x + ", " + y + ")");
linearSearch(x, y);
binarySearch(x, y);
}
private static void linearSearch(int x, int y) {
double min = Double.MAX_VALUE;
City nearest = null;
for(City c : cities) {
double dist = c.distance(x, y);
if(dist < min) {
nearest = c;
min = dist;
}
}
System.out.println("Linear search determined that " + nearest.label +
" is closest with " + cities.size() + " calculations.");
}
private static void binarySearch(int x, int y) {
List<City> candidates = new LinkedList<City>();
candidates.addAll(xt.nearest(x));
candidates.addAll(yt.nearest(y));
double min = Double.MAX_VALUE;
City nearest = null;
for(City c : candidates) {
double dist = c.distance(x, y);
if(dist < min) {
nearest = c;
min = dist;
}
}
System.out.println("Binary search determined that " + nearest.label +
" is closest with " + candidates.size() + " calculations.");
}
}
can someone summarize the preprocessing options here? couldn't you sort based on distance to the user location?
Something like a quadtree (en.wikipedia.org/wiki/Quadtree) would probably work very well.
A simpler solution would be to split the world up into an arbitrary number of 'squares' and keep track of which cities each square contains. Depending on the size of the 'square' the solution would be faster than O(N). The solution of assigning one 'square' to every kilometer or so would obviously have time complexity of about O(1) but also very significant space complexity, so you'd probably want to trade that off to some extent. Not as good as a quadtree in terms of space, but simpler to implement.
My first thought is that if you're taking "You can preprocess the data in any way you like" literally, and generally assuming the horse is a sphere, its trivial to do this in constant time. Make a hash map with every possible coordinate pair as a key to the closest city. This is probably not practical, but you could easily use some sort of rounding rule to group coordinates into reasonably sized block that act as Hashkeys to a list of cities. Thus reducing the search space by a few orders of magnitude with a constant time operation.
kd-tree is the answer and can guarantee O(lg(n)) complexity for queries. However I think it takes O(nlg(n)) pre-computation complexity.
Another option is: as the input data is location of cities we may be able to assume they're spreading normally through the whole grid (usually its not!). in this case a simple O(n) clustering works with less than O(n) search complexity. At least one can test this approach for the input data and see if this type of clustering works or not! (this will be a good thing to mention in interview!)
If you're allowed massive preprocessing, you might want to connect your cities in a graph, roughly analogous to a highway system. Start by taking each city, and find its closest three neighbors and make highways between them. This will create a forest of highways (sorry to mix analogies). You may have isolated foursomes of cities that are only connected to each other (e.g. four cities in Alaska). For isolated foursomes, connect each of the cities to its next closest neighbor. After this step, you may still have an isolated cluster of eight cities; if so, repeat the process. After your preprocessing, you'll eventually have a highway system that connects all cities, but your highways will have the property that a trip between two cities always takes one time unit, no matter the actual distance.
Now imagine you're the traveler trying to get closest to home on the highway system. Start at any city, and look at the map to find neighboring cities. There will be at least three cities reachable by highway. Whichever city is closer to your destination, then go to that city next. If you're already in the closest city, then you're done.
There are edge cases where this approach won't produce the optimal solution, but it will always be pretty close. Consider continental U.S. geography and the top 50 US cities. Imagine creating a relatively minimal highway system between those 50 cities and think about your route.
We go by tree,
We consider a tree in which each node can have at max 4 children. and each children contain a range in "x" and "y", which decide weather the value lies in this node or someone else.we go upon the point where each child get only a single point. and we know that here linear case does not exist, always search, delete,insert complexities goes down to O(logn).
First sort all the cities by their x coordinate. Then do a binary search for the input point's x coordinate. Say, city x coordinates were 3 6 8 9 23 and user's x was 12. The O(log n) binary search will give that user lies between 9 and 23. At this point all we need to choose between 9 and 23 by comparing the y coordinates O(1). Overall O(log n).
get difference between co ordinates i.e. x and y and calculate distance by Pythagoras theorem.
Python like psuedo code (don't remember exact syntax for *args)
def min(lst, func, *args):
min = None
for elem in lst
felem = func(elem, args)
if min == None or felem < min:
min = felem
return min
def distance(coord1, coord2):
xdiff, ydiff = coord1[0] - coord2[0], coord1[1] - coord2[1]
return math.sqrt(xdiff*xdiff + ydiff*ydiff)
def closest(lst, co-ords):
return min(lst, distance, co-ords)
@HauntedGhost: Given the current info, you cannot do better than linear.
Are you sure of the question?
Perhaps they allowed preprocessing (like creating a Voronoi diagram) after which nearest point queries can be sub-linear, but you have to pay the Theta(n) cost (in fact, probably Theta(n log n) for Voronoi preprocessing).
Preprocessing is allowed, you can sort them, build kd tree or whatever you want. But you need to minimize the query execution time. Since it is an interview question, I'm guessing interviewer wouldn't be looking for complex solution like Voronoi diagram.
@HauntedGhost. This is supposed to be a classic application of Voronoi diagrams :-) You are right though, it is too complex to be coding it up during an interview.
Downvoting this because the question got updated. Sorry. Nice answer for the original question, though.
If need faster than O(n) then need to look at tree search:
- uugan March 10, 2013for example Kd-tree's search gives O(logn) --> en.wikipedia. org/wiki/Kd-tree
or try to divide into hemisphere zones on binary tree.