Given a seria of points (Xi, Y

Google Interview Question for Software Engineer / Developers

1

of 1 vote

27
Answers
Given a seria of points (Xi, Yi), find the line containing the highest number of points from the list.
Per my question he mentioned that I can assume that there is a given function that receives two points and returns the a and b of the line euqation (aX+b)
- Grr1967 November 29, 2012 | Report Duplicate | Flag | PURGE
Google Software Engineer / Developer Algorithm Data Structures

Email me when people comment.

An error occurred in subscribing you.

Interview Type: Phone Interview

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 3 vote

My suggestion was to create a map with key pair<a, b> and value - number of points contained in line aX+b.
Then traverse each pair of points (Xi, Yi) (Xj Yj), send to the given function to receive a, b of the line connecting those two points
And update the map by adding 1 to the key <a,b> (creating it with value of 1 if not yet in the map).
In addition keeping aside the <a, b> for which the number of points is the highest.
Order of this solution is O(N^2)
I was wondering if there is a solution with better performance.

- Grr1967 November 29, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

i don't think it can't be solved better than O(n^2) as there are n(n-1) distinct pair of points

- dgbfs November 29, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Why add 1? Since we are using a pair, for every match of <a,b> in the map 2 should be added. Also sometimes 2 pairs will have a point common and in this case addition of 2 is also wrong.

Not really sure what to do

- Arvind December 28, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Good point Arvind. As long as we need to output the line with maximum points, the algo should still be okay with +1 increments. However, if we need to know the exact number of points that fall on a single line, we can do the following:

for each line, we nominate a parent point based on which point on this line was encountered first. Then in the two loops of i and j for two points(i being outer loop), we increment line count by 2 for the first occurrence in map for two points, and later on increment line count by 1 on each subsequent occurrence of the same line with same parent. For next occurrences of the line, if parent is different, we do not increment the count. As an example, for 3 points (1,1), (2,2), (3,3) all lying on the same line, we increment the count by 2 for (1,1),(2,2) and by 1 for (1,1),(3,3) both lines with same parent (1,1), but do not increment the line count for (2,2),(3,3) because these two points have already been accounted for in an earlier loop with parent (1,1). So we have a count of 3 in the map which is the correct answer. This means at each map check, we also need to check the parent of the line, which adds to the complexity of the algorithm, but is needed to get the exact number of points lying on a line

- famee December 29, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

You need to account for the fact that the list of tuples (x_i, y_i) may not contain unique points. If you don't do this, your counts will be off. For example suppose my list of points is {A, B, C, A, D}. Then the pairings without removing the extra A would be

A, B B, C C, A A, D
A, C B, A C, D
A, A B, D
A, D

Here A, D occurs twice, A, A doesn't make sense, and I should not count A, B separate from B, A nor A, C separate from C, A.

Therefore, you should first trim this list to contain only unique points. If after trimming there are K unique points, then there are K(K-1)/2 unique pairings of points, not K(K-1). The reason for this is that we should not consider the pairing A, B different from B, A.

- Killian July 19, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Another problem is that you cannot represent a vertical line in slope-intercept form since the slope is necessarily infinity. A better representation consists of a direction vector and a point on the line. For example if A and B are tuples in R^2, then the line through A and B is traced out by the vector function (x,y) = (B-A)t + A, where t is a real-valued parameter. Unfortunately, if you use this representation it becomes more difficult to determine if two lines are equivalent. You need to check that their direction vectors are scalar multiples and if either of the points lie on the line you are comparing with. Normalization of the direction vector to a unit vector removes the first issue.

- Killian July 19, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 3 vote

Your algorithm is O(N^2*lgN) { insertion in map is lgN }

Another approach:

ans = 0
For each point A:
       store all possible slopes with the remaining points in an array slopes[]
       sort the array slopes[] and now you can easily find the slope with max. occurence in this array i.e ans = MAX( ans, current_max occurence of slope )

ans gives you the final result

This method has same time complexity O(N^2lgN) but will be relatively faster due to much smaller constact factor. You may also try to hash the slopes and then find the max but hashing floating point values will be extra overhead.

- Cerberuz November 29, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

@Charles is right. Slopes are not enough because parallel lines have same slope but do not contain same points.

- heuristican November 29, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Two lines with same slope which pass through same point are coincident :)

- Cerberuz November 29, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Insert into an unordered map is O(1). In this case I think unordered map is enough.

- Leo December 02, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Java hashmap never guarantees that insertion and searching will be O(1) always.

- Cerberuz December 02, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Let's be honest - unless your hash function is terrible, your hashmap complexities will nearly always be constant.

Though this design might not work completely due to floating point calculation limitations (slopes might not appear to be the same though they really are)

- dchang.me December 06, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

How about Simple linear regression???

- Bevan December 09, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

We could use Moore’s Voting Algorithm(see "find majority element" task) to find maximum slopes number in array of slopes for each iteration(instead of sorted array keeping). Such way we could achieve O(n^2) complexity.

- Vern December 11, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

@Cerberuz, at any point of time, array slopes[] contains slope for all the pair of points or only for pair of points with a point common?

I think it's later (in that case indentation is screwing up the understanding)

- Anonymous December 28, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Yes later is correct. In every iteration array slopes[] will only have slope of lines formed by ith point and rest of the points.

- Cerberuz December 28, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

If (pt1, pt2) and (pt3, pt4) have same slope though they are parallel lines, the algo will consider them passing through single line.
Without storing y-intercept, it is not possible to decide if different points with same slopes pass through single line or not.
Moreover, storing duplicate slopes in array will waste space. HashMap count mechanism will take care of that.
I think Grr1967's own solution is the best possible solution. It is safe to assume that each hashtable insert operation has amortized O(1) cost, so that the overall complexity of the algorithm is O(n^2).

- famee December 29, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

first sort all points according to there polar angle i have given code for sorting point according to polar angle 

typedef struct point{
	int x;
	int y;
}point;

int cross_product(point a, point b){
	if((a.x*b.y-a.y*b.x)>0)
	return 1;
	return -1;
}

int partition(arr,p,r){
	point pivot = arr[r-1];
	int i=p,flag,j=-1;
	while(i<r-1){
		flag = cross_product(pivot,arr[i]);
		if(flag == 1){
			if(j==-1){
				j = p+1;
			}
			while(j<r-1){
				flag = cross_product(pivot,a[j])
				if(flag!=1){
					swap(arr[i],arr[j]);
					j=j+1;
					break	
				}
				j++;
			}
			if(j == r-1){
				swap(arr[i+1],arr[r])
				return(i+1);
				break
			}
			
		}
		i++;
	}
	swap(arr[i+1],arr[r])
	return(i+1);
	
}

void polar_sort(point arr[],int start,int end){
	
	int stack[end],top =-1,p,r,k;
	stack[++top] = start;
	stack[++top] = end;
	while(isEmpty(stack)){
	
		r = stack[top--];
		p = stack[top--];
		if(p<r){
			k = partition(arr,p,r);
			if(k+1<r){
				stack[++top] = k+1;
				stack[++top] = r;
			}
			if(p<k){
				stack[++top] = p;
				stack[++top] = k;
			}
		}
	
	}
	
}


in this way u sort the point according to polar angle now just we have to trverse this sorted point array and for comparing
,is they lie on same line or not do cross product and if cross product is 0 then yup they are in same line and as array is sorted 
according to polar angle, all points are in same line coming together just find max group whose cross product is 0 that u can do esily
in one traversal and for finding the line ,found two farthest point in the group and make a line with those two points

correct me if i m wrong complexity(nlogn)

- raunak.gupta29 November 30, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

It will not work. A point lying in 1st and 3rd quadrant will have different polar angle so they will not be together when you sort the points acc. to polar angle but they may belong to same line.

- Cerberuz November 30, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Yup correct it will not work for all cases if the line(which contains max points) will go from origin in that case it will work but if i/p is like {(1,2),(1,4),(1,5),(1,7)..(3,2)}or some other in that case it wll not work .

- raunak.gupta29 November 30, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

First, choose pairs of points and find slope and intercept for each pair using the given function and add them to a map keyed by say "slope:intercept" and value as value+1 (count). The key with the highest count is the answer.

- Metta December 01, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

initialize linkedhashmap<point,<slope,count>> point_slope_counts
initialize linkedhashmap<slope,count> : point_slopes
1. for each point p
   1.i   clear 
   1.ii. for each other point q
       1.ii.a.   slope = (yq - yp)/(xq - xp) // possible error divide by 0, for lines ||lel to y axis
       1.ii.b    count = point_slopes.get(slope) // handle null
       1.ii.c    count += 1
       1.ii.d    point_slopes.put(count)
   1.iii. <slope.count> = argmax-count (point_slopes)
   1.iv. point_slope_counts.put(p,<slope,count>) // line with max points passing thru p
2. final_point <point,<slope,count>> = argmax-count (point_slope_counts)

This would be O(n^2).

- Anonymous December 01, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

initialize linkedhashmap<point,<slope,count>> point_slope_counts
initialize linkedhashmap<slope,count> : point_slopes
1. for each point p
   1.i   clear point_slopes
   1.ii. for each other point q
       1.ii.a.   slope = (yq - yp)/(xq - xp) // possible error divide by 0, for lines ||lel to y axis
       1.ii.b    count = point_slopes.get(slope) // handle null
       1.ii.c    count += 1
       1.ii.d    point_slopes.put(count)
   1.iii. <slope.count> = argmax-count (point_slopes)
   1.iv. point_slope_counts.put(p,<slope,count>) // line with max points passing thru p
2. final_point <point,<slope,count>> = argmax-count (point_slope_counts)

This would be O(n^2).

- Anonymous December 01, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

it looks like the problem "find the most common number in a list of numbers"

- ranran December 03, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

A line will have maximum points if they have same slope.

Its a DP problem for finding the maximum slope pair across all the points, Like Knapsack problem.

- Praveen December 04, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

check my answer and if any mistakes are there suggest me..
#include<stdio.h>
char * eq(int x1,int y1,int x2,int y2);
main()
{
int x1,y1,x2,y2;
char *equ;
printf("enter 2 pts(x1,y1)&(x2,y2)\n");
scanf("%d%d%d%d",&x1,&y1,&x2,&y2);
equ=eq(x1,y1,x2,y2);
printf("%s\n",equ);
}
char * eq(int x1,int y1,int x2,int y2)
{
float sl;
sl=(x2-x1)/(y2-y1);
char equ[]="ax+y";
equ[0]=sl+48;
return(equ);
}

- yaswanth December 05, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Apply the RANSAC algorithm with threshold = 0. In the worst case the number of choices is choose(n,2), however randomizing the choice of 2 points reduces the expected time complexity. This is the best that is done in line fitting.

Linear regression finds the line of best fit, it is NOT robust to outliers. RANSAC is the way to go here.

- Anonymous February 03, 2014 | Flag Reply

CareerCup

Google Interview Question for Software Engineer / Developers

Books

Videos

Resume Review

Mock Interviews