Given coordinates (X[i],Y[i])

Microsoft Interview Question for Software Engineer / Developers

0

of 0 votes

29
Answers
Given coordinates (X[i],Y[i]) of all the cities in the world(for simplicity lets consider we have 2D world map). You are given a user's location (x,y), find out the closest city to the user. Write code for it.

Update:
Time complexity of the solution should be better than O(n) as there will be multiple lookup queries with different input points.
You can preprocess the data in any way you like, but you need to minimize the query execution time complexity.
- HauntedGhost March 09, 2013 in United States | Report Duplicate | Flag | PURGE
Microsoft Software Engineer / Developer Algorithm

Email me when people comment.

An error occurred in subscribing you.

Country: United States
Interview Type: In-Person

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 1 vote

If need faster than O(n) then need to look at tree search:
for example Kd-tree's search gives O(logn) --> en.wikipedia. org/wiki/Kd-tree
or try to divide into hemisphere zones on binary tree.

- uugan March 10, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 1 vote

seems a bit much for an interviewer to come up and design/implement a k-d tree on the spot

- Anonymous March 10, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 1 vote

I think this question is similar to the closest pair in a given set of point. Instead of taking the mid point we consider the given point p(x,y) and compute the points based on x co-ordinate which are closest. Repeat for the closest point with the y - co ordinate. I think this will give the answer.

- Nascent March 10, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 1 vote

I wrote code for this problem using KDTree, you can read the theory of KD Tree in wikipedia. Here is the implementation:

int compare(point p1, point p2, bool divX)
{
    if(divX)
        return (p1.x < p2.x)?-1:((p1.x>p2.x)?1:0);
    else
        return (p1.y < p2.y)?-1:((p1.y>p2.y)?1:0);
}

int partition(point *p, int s, int e, int pivot, bool divX)
{
    swap(p[pivot], p[s]);
    pivot = s;
    
    while(s <= e)
    {
        while(s <= e && compare(p[s], p[pivot], divX) <= 0) s++;
        while(s <= e && compare(p[e], p[pivot], divX) >= 0) e--;
        if(s < e)   swap(p[s], p[e]);
    }
    swap(p[pivot], p[e]);
    return s-1;
}

// amortized O(n) method to find the nth element(0-based) in an array
int nthElement(point *p, int s, int e, int n, bool divX)
{
    if(s == e) return s;
    // get a random index between [s,e] inclusive
    int RND = s + rand()%(e-s+1);
    int pivot = partition(p, s, e, RND, divX);
    
    int k = pivot - s;
    if(n < k){
        return nthElement(p, s, pivot-1, n, divX); // go left
    }
    if(n > k){
        return nthElement(p, pivot+1, e, n-k-1, divX); // go right
    }
    return pivot;
}
// build the KD tree
// third parameter is for the division dimension
void build(point *p, int s, int e, bool divX)
{
    if(s>=e) return;
    
    // we will try to find the median of p[s..e] 
    int mid = (e+s)/2;
    int n = mid - s; // we are interested in the median
    nthElement(p, s, e, n, divX);
    
    build(p, s, mid-1, !divX);
    build(p, mid+1, e, !divX);
}

long bestDist = 1<<30;
int bestNode = -1;
void findNearestNeighbour(point *p, int s, int e, int x, int y, bool divX)
{
    if(s >= e) return;
    
    int mid = (e+s)/2;
    
    point midp = p[mid];
    long dx = x - midp.x;
    long dy = y - midp.y;
    
    long d = dx*dx + dy*dy;
    if(bestDist > d)
    {
        bestDist = d;
        bestNode = mid;
    }
    
    int s1 = s, e1 = mid-1;
    int s2 = mid+1, e2 = e;
    long delta = divX?dx:dy;
    if(delta > 0)
    {
        swap(s1, s2);
        swap(e1, e1);
    }
    
    findNearestNeighbour(p, s1, e1, x, y, !divX);
    long delta2 = delta*delta;
    if( delta2 < bestDist )
        findNearestNeighbour(p, s2, e2, x, y, !divX);
        
}

Usage:

You need to first build the kd tree. Above method builds the tree in the array itself, it uses the quicksort partition algorithm to keep the tree balanced with max O(logn) height. This ensures that the query execution will tak O(logn).

// build the tree
build(points, 0, n-1, true); // n no. of points, true means we start with splitting by x-axis
// how to query the tree
findNearestNeighbour(points, 0, n-1, x, y, true);
// here (x,y) is the input point

- HauntedGhost March 10, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 1 vote

I almost got this, but not quite. You can see that (4, 4) throws my approach due to hits at both x=4 and y=4, neither of which are the true nearest neighbor. Basically, I'm using a balanced BST for x-coords and y-coords, with the intuition being that we always want to minimize sqrt(x-delta^2 + y-delta^2), which means we want to only compare those cities which are closest in x and y:

package question;

import java.util.Arrays;
import java.util.LinkedList;
import java.util.List;

public class Main {
	public static class City {
		private String label;
		private int x;
		private int y;
		
		public City(String label, int x, int y) {
			this.label = label;
			this.y = y;
			this.x = x;
		}
		
		public double distance(int x, int y) {
			return Math.sqrt(Math.pow(this.x - x, 2) + Math.pow(this.y - y, 2));
		}
	}

	public static class Node {
		Node l, r;
		int val;
		List<City> cities = new LinkedList<City>();
		
		public Node(int v) {
			val = v;
		}
	}
	
	public static class Tree {
		Node root = null;
		
		public void insert(int n, City c) {
			if(root == null) {
				root = new Node(n);
				root.cities.add(c);
			}
			else {
				insertHelper(root, n, c);
			}
		}
		 
		private void insertHelper(Node node, int n, City c)
		{
			if(n == node.val) {
				node.cities.add(c);
			}
		    else if (n < node.val)
		    {
		        if (node.l == null) {
		            node.l = new Node(n);
		            node.l.cities.add(c);
		        }
		        else
		            insertHelper(node.l, n, c);
		    }
		    else
		    {
		        if (node.r == null) {
		            node.r = new Node(n);
		            node.r.cities.add(c);
		        }
		        else
		            insertHelper(node.r, n, c);
		    }
		}
		
		public List<City> nearest(int n) {
			Node prev = root, curr = root;
			while(curr != null && curr.val != n) {
				prev = curr;
				if(n < curr.val) {
					curr = curr.l;
				}
				else {
					curr = curr.r;
				}
			}
			
			return (curr == null)? prev.cities : curr.cities;
		}
	}
	
	private static final City DENVER = new City("Denver", 0, 0);
	private static final City SEATTLE = new City("Seattle", -4, 5);
	private static final City PORTLAND = new City("Portland", -2, 4);
	private static final City SANFRANCISCO = new City("San Francisco", -3, -2);
	private static final City LOSANGELES = new City("Los Angeles", -5, -3);
	private static final City CHICAGO = new City("Chicago", 2, 3);
	private static final City ATLANTA = new City("Atlanta", 3, 0);
	private static final City MIAMI = new City("Miami", 4, -5);
	private static List<City> cities = Arrays.asList(DENVER, SEATTLE, PORTLAND, 
													 SANFRANCISCO, LOSANGELES, 
													 CHICAGO, ATLANTA, MIAMI);
	
	// Trees balanced by insertion order, not self-balancing
	private static Tree xt = new Tree();
	static {
		xt.insert(DENVER.x, DENVER);
		xt.insert(SEATTLE.x, SEATTLE);
		xt.insert(LOSANGELES.x, LOSANGELES);
		xt.insert(SANFRANCISCO.x, SANFRANCISCO);
		xt.insert(PORTLAND.x, PORTLAND);
		xt.insert(ATLANTA.x, ATLANTA);
		xt.insert(CHICAGO.x, CHICAGO);
		xt.insert(MIAMI.x, MIAMI);
	}
	private static Tree yt = new Tree();
	static {
		yt.insert(DENVER.y, DENVER);
		yt.insert(ATLANTA.y, ATLANTA);
		yt.insert(PORTLAND.y, PORTLAND);
		yt.insert(CHICAGO.y, CHICAGO);
		yt.insert(SEATTLE.y, SEATTLE);
		yt.insert(LOSANGELES.y, LOSANGELES);
		yt.insert(SANFRANCISCO.y, SANFRANCISCO);
		yt.insert(MIAMI.y, MIAMI);
	}
	
	public static void main(String[] args) {
		search(0, 0);
		search(0, 1);
		search(0, 2);
		search(2, 2);
		search(4, 4);
		search(0, -1);
		search(0, -2);
		search(-2, 2);
		search(4, -4);
	}
	
	private static void search(int x, int y) {
		System.out.println("Searching for city nearest (" + x + ", " + y + ")");
		linearSearch(x, y);
		binarySearch(x, y);
	}
	
	private static void linearSearch(int x, int y) {
		double min = Double.MAX_VALUE;
		City nearest = null;
		for(City c : cities) {
			double dist = c.distance(x, y);
			if(dist < min) {
				nearest = c;
				min = dist;
			}
		}
		System.out.println("Linear search determined that " + nearest.label + 
				           " is closest with " + cities.size() + " calculations.");
	}
	
	private static void binarySearch(int x, int y) {
		List<City> candidates = new LinkedList<City>();
		candidates.addAll(xt.nearest(x));
		candidates.addAll(yt.nearest(y));
		double min = Double.MAX_VALUE;
		City nearest = null;
		for(City c : candidates) {
			double dist = c.distance(x, y);
			if(dist < min) {
				nearest = c;
				min = dist;
			}
		}
		System.out.println("Binary search determined that " + nearest.label + 
				           " is closest with " + candidates.size() + " calculations.");
	}
}

- DJ March 11, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

can someone summarize the preprocessing options here? couldn't you sort based on distance to the user location?

- Anonymous March 10, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

There could be multiple queries (i.e. different users).

Think of it as: You are given a set of city location.

You preprocess that as you want.

Now you are constantly bombarded with user queries.

- Anonymous March 10, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

Anyone with the solution to this? I can't seem to be able to do it in better than O(N)

- hack14 March 10, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Something like a quadtree (en.wikipedia.org/wiki/Quadtree) would probably work very well.

A simpler solution would be to split the world up into an arbitrary number of 'squares' and keep track of which cities each square contains. Depending on the size of the 'square' the solution would be faster than O(N). The solution of assigning one 'square' to every kilometer or so would obviously have time complexity of about O(1) but also very significant space complexity, so you'd probably want to trade that off to some extent. Not as good as a quadtree in terms of space, but simpler to implement.

- cork March 10, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

the cities can be categorized on the basis of countries and the max and min (x,y) coordinates of the countries can be stored. There might be an overlap, but will definitely reduce the no of countries and thus to be searched for.

- alex March 10, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

My first thought is that if you're taking "You can preprocess the data in any way you like" literally, and generally assuming the horse is a sphere, its trivial to do this in constant time. Make a hash map with every possible coordinate pair as a key to the closest city. This is probably not practical, but you could easily use some sort of rounding rule to group coordinates into reasonably sized block that act as Hashkeys to a list of cities. Thus reducing the search space by a few orders of magnitude with a constant time operation.

- Anonymous March 11, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Input location (x,y) can be different than the world cities locations. :P

- HauntedGhost March 11, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

Why can't I preprocess and put in hash table
key is x-y. e.g. x=100, y=200 then key is "100-200" and value is city name.

- Anon March 11, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

kd-tree is the answer and can guarantee O(lg(n)) complexity for queries. However I think it takes O(nlg(n)) pre-computation complexity.

Another option is: as the input data is location of cities we may be able to assume they're spreading normally through the whole grid (usually its not!). in this case a simple O(n) clustering works with less than O(n) search complexity. At least one can test this approach for the input data and see if this type of clustering works or not! (this will be a good thing to mention in interview!)

- iman.goodarzi March 11, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

If you're allowed massive preprocessing, you might want to connect your cities in a graph, roughly analogous to a highway system. Start by taking each city, and find its closest three neighbors and make highways between them. This will create a forest of highways (sorry to mix analogies). You may have isolated foursomes of cities that are only connected to each other (e.g. four cities in Alaska). For isolated foursomes, connect each of the cities to its next closest neighbor. After this step, you may still have an isolated cluster of eight cities; if so, repeat the process. After your preprocessing, you'll eventually have a highway system that connects all cities, but your highways will have the property that a trip between two cities always takes one time unit, no matter the actual distance.

Now imagine you're the traveler trying to get closest to home on the highway system. Start at any city, and look at the map to find neighboring cities. There will be at least three cities reachable by highway. Whichever city is closer to your destination, then go to that city next. If you're already in the closest city, then you're done.

There are edge cases where this approach won't produce the optimal solution, but it will always be pretty close. Consider continental U.S. geography and the top 50 US cities. Imagine creating a relatively minimal highway system between those 50 cities and think about your route.

- showell30@yahoo.com March 13, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Why do we need K-D tree here? simple binary tree will work. Cant we use heap here?

- dbms March 13, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

We go by tree,
We consider a tree in which each node can have at max 4 children. and each children contain a range in "x" and "y", which decide weather the value lies in this node or someone else.we go upon the point where each child get only a single point. and we know that here linear case does not exist, always search, delete,insert complexities goes down to O(logn).

- sonesh March 13, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

QR-Quadtree

- Kevin March 19, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

First sort all the cities by their x coordinate. Then do a binary search for the input point's x coordinate. Say, city x coordinates were 3 6 8 9 23 and user's x was 12. The O(log n) binary search will give that user lies between 9 and 23. At this point all we need to choose between 9 and 23 by comparing the y coordinates O(1). Overall O(log n).

- vstudio April 04, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

-1

of 1 vote

get difference between co ordinates i.e. x and y and calculate distance by Pythagoras theorem.

- abhi March 09, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

O(n) is simple. they needed something better.

- HauntedGhost March 10, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Downvoting this because question got updated. Sorry.

- Loler March 10, 2013 | Flag

Comment hidden because of low score. Click to expand.

-1

of 1 vote

Python like psuedo code (don't remember exact syntax for *args)

def min(lst, func, *args):
    min = None
    for elem in lst
        felem = func(elem, args)
        if min == None or felem < min:
            min = felem
    
    return min

def distance(coord1, coord2):
    xdiff, ydiff = coord1[0] - coord2[0], coord1[1] - coord2[1]
    return math.sqrt(xdiff*xdiff + ydiff*ydiff)
  
def closest(lst, co-ords):
    return min(lst, distance, co-ords)

- Anonymous March 10, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

O(n) is simple. they needed something better.

- HauntedGhost March 10, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

@HauntedGhost: Given the current info, you cannot do better than linear.

Are you sure of the question?

Perhaps they allowed preprocessing (like creating a Voronoi diagram) after which nearest point queries can be sub-linear, but you have to pay the Theta(n) cost (in fact, probably Theta(n log n) for Voronoi preprocessing).

- Loler March 10, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Preprocessing is allowed, you can sort them, build kd tree or whatever you want. But you need to minimize the query execution time. Since it is an interview question, I'm guessing interviewer wouldn't be looking for complex solution like Voronoi diagram.

- HauntedGhost March 10, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

@HauntedGhost. This is supposed to be a classic application of Voronoi diagrams :-) You are right though, it is too complex to be coding it up during an interview.

- Loler March 10, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Downvoting this because the question got updated. Sorry. Nice answer for the original question, though.

- Loler March 10, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Why cant we put all the city co ordinates in a 2*2 array. Then get the user location map it into the array and try to find nearest city.

- GT March 15, 2013 | Flag

CareerCup

Microsoft Interview Question for Software Engineer / Developers

Books

Videos

Resume Review

Mock Interviews