Google Interview Question
Software Engineer / DevelopersInterview Type: Phone Interview
i don't think it can't be solved better than O(n^2) as there are n(n-1) distinct pair of points
Why add 1? Since we are using a pair, for every match of <a,b> in the map 2 should be added. Also sometimes 2 pairs will have a point common and in this case addition of 2 is also wrong.
Not really sure what to do
Good point Arvind. As long as we need to output the line with maximum points, the algo should still be okay with +1 increments. However, if we need to know the exact number of points that fall on a single line, we can do the following:
for each line, we nominate a parent point based on which point on this line was encountered first. Then in the two loops of i and j for two points(i being outer loop), we increment line count by 2 for the first occurrence in map for two points, and later on increment line count by 1 on each subsequent occurrence of the same line with same parent. For next occurrences of the line, if parent is different, we do not increment the count. As an example, for 3 points (1,1), (2,2), (3,3) all lying on the same line, we increment the count by 2 for (1,1),(2,2) and by 1 for (1,1),(3,3) both lines with same parent (1,1), but do not increment the line count for (2,2),(3,3) because these two points have already been accounted for in an earlier loop with parent (1,1). So we have a count of 3 in the map which is the correct answer. This means at each map check, we also need to check the parent of the line, which adds to the complexity of the algorithm, but is needed to get the exact number of points lying on a line
You need to account for the fact that the list of tuples (x_i, y_i) may not contain unique points. If you don't do this, your counts will be off. For example suppose my list of points is {A, B, C, A, D}. Then the pairings without removing the extra A would be
A, B B, C C, A A, D
A, C B, A C, D
A, A B, D
A, D
Here A, D occurs twice, A, A doesn't make sense, and I should not count A, B separate from B, A nor A, C separate from C, A.
Therefore, you should first trim this list to contain only unique points. If after trimming there are K unique points, then there are K(K-1)/2 unique pairings of points, not K(K-1). The reason for this is that we should not consider the pairing A, B different from B, A.
Another problem is that you cannot represent a vertical line in slope-intercept form since the slope is necessarily infinity. A better representation consists of a direction vector and a point on the line. For example if A and B are tuples in R^2, then the line through A and B is traced out by the vector function (x,y) = (B-A)t + A, where t is a real-valued parameter. Unfortunately, if you use this representation it becomes more difficult to determine if two lines are equivalent. You need to check that their direction vectors are scalar multiples and if either of the points lie on the line you are comparing with. Normalization of the direction vector to a unit vector removes the first issue.
Your algorithm is O(N^2*lgN) { insertion in map is lgN }
Another approach:
ans = 0
For each point A:
store all possible slopes with the remaining points in an array slopes[]
sort the array slopes[] and now you can easily find the slope with max. occurence in this array i.e ans = MAX( ans, current_max occurence of slope )
ans gives you the final result
This method has same time complexity O(N^2lgN) but will be relatively faster due to much smaller constact factor. You may also try to hash the slopes and then find the max but hashing floating point values will be extra overhead.
@Charles is right. Slopes are not enough because parallel lines have same slope but do not contain same points.
Let's be honest - unless your hash function is terrible, your hashmap complexities will nearly always be constant.
Though this design might not work completely due to floating point calculation limitations (slopes might not appear to be the same though they really are)
We could use Moore’s Voting Algorithm(see "find majority element" task) to find maximum slopes number in array of slopes for each iteration(instead of sorted array keeping). Such way we could achieve O(n^2) complexity.
@Cerberuz, at any point of time, array slopes[] contains slope for all the pair of points or only for pair of points with a point common?
I think it's later (in that case indentation is screwing up the understanding)
Yes later is correct. In every iteration array slopes[] will only have slope of lines formed by ith point and rest of the points.
If (pt1, pt2) and (pt3, pt4) have same slope though they are parallel lines, the algo will consider them passing through single line.
Without storing y-intercept, it is not possible to decide if different points with same slopes pass through single line or not.
Moreover, storing duplicate slopes in array will waste space. HashMap count mechanism will take care of that.
I think Grr1967's own solution is the best possible solution. It is safe to assume that each hashtable insert operation has amortized O(1) cost, so that the overall complexity of the algorithm is O(n^2).
first sort all points according to there polar angle i have given code for sorting point according to polar angle
typedef struct point{
int x;
int y;
}point;
int cross_product(point a, point b){
if((a.x*b.y-a.y*b.x)>0)
return 1;
return -1;
}
int partition(arr,p,r){
point pivot = arr[r-1];
int i=p,flag,j=-1;
while(i<r-1){
flag = cross_product(pivot,arr[i]);
if(flag == 1){
if(j==-1){
j = p+1;
}
while(j<r-1){
flag = cross_product(pivot,a[j])
if(flag!=1){
swap(arr[i],arr[j]);
j=j+1;
break
}
j++;
}
if(j == r-1){
swap(arr[i+1],arr[r])
return(i+1);
break
}
}
i++;
}
swap(arr[i+1],arr[r])
return(i+1);
}
void polar_sort(point arr[],int start,int end){
int stack[end],top =-1,p,r,k;
stack[++top] = start;
stack[++top] = end;
while(isEmpty(stack)){
r = stack[top--];
p = stack[top--];
if(p<r){
k = partition(arr,p,r);
if(k+1<r){
stack[++top] = k+1;
stack[++top] = r;
}
if(p<k){
stack[++top] = p;
stack[++top] = k;
}
}
}
}
in this way u sort the point according to polar angle now just we have to trverse this sorted point array and for comparing
,is they lie on same line or not do cross product and if cross product is 0 then yup they are in same line and as array is sorted
according to polar angle, all points are in same line coming together just find max group whose cross product is 0 that u can do esily
in one traversal and for finding the line ,found two farthest point in the group and make a line with those two points
correct me if i m wrong complexity(nlogn)
It will not work. A point lying in 1st and 3rd quadrant will have different polar angle so they will not be together when you sort the points acc. to polar angle but they may belong to same line.
initialize linkedhashmap<point,<slope,count>> point_slope_counts
initialize linkedhashmap<slope,count> : point_slopes
1. for each point p
1.i clear
1.ii. for each other point q
1.ii.a. slope = (yq - yp)/(xq - xp) // possible error divide by 0, for lines ||lel to y axis
1.ii.b count = point_slopes.get(slope) // handle null
1.ii.c count += 1
1.ii.d point_slopes.put(count)
1.iii. <slope.count> = argmax-count (point_slopes)
1.iv. point_slope_counts.put(p,<slope,count>) // line with max points passing thru p
2. final_point <point,<slope,count>> = argmax-count (point_slope_counts)
This would be O(n^2).
initialize linkedhashmap<point,<slope,count>> point_slope_counts
initialize linkedhashmap<slope,count> : point_slopes
1. for each point p
1.i clear point_slopes
1.ii. for each other point q
1.ii.a. slope = (yq - yp)/(xq - xp) // possible error divide by 0, for lines ||lel to y axis
1.ii.b count = point_slopes.get(slope) // handle null
1.ii.c count += 1
1.ii.d point_slopes.put(count)
1.iii. <slope.count> = argmax-count (point_slopes)
1.iv. point_slope_counts.put(p,<slope,count>) // line with max points passing thru p
2. final_point <point,<slope,count>> = argmax-count (point_slope_counts)
This would be O(n^2).
check my answer and if any mistakes are there suggest me..
#include<stdio.h>
char * eq(int x1,int y1,int x2,int y2);
main()
{
int x1,y1,x2,y2;
char *equ;
printf("enter 2 pts(x1,y1)&(x2,y2)\n");
scanf("%d%d%d%d",&x1,&y1,&x2,&y2);
equ=eq(x1,y1,x2,y2);
printf("%s\n",equ);
}
char * eq(int x1,int y1,int x2,int y2)
{
float sl;
sl=(x2-x1)/(y2-y1);
char equ[]="ax+y";
equ[0]=sl+48;
return(equ);
}
Apply the RANSAC algorithm with threshold = 0. In the worst case the number of choices is choose(n,2), however randomizing the choice of 2 points reduces the expected time complexity. This is the best that is done in line fitting.
Linear regression finds the line of best fit, it is NOT robust to outliers. RANSAC is the way to go here.
My suggestion was to create a map with key pair<a, b> and value - number of points contained in line aX+b.
- Grr1967 November 29, 2012Then traverse each pair of points (Xi, Yi) (Xj Yj), send to the given function to receive a, b of the line connecting those two points
And update the map by adding 1 to the key <a,b> (creating it with value of 1 if not yet in the map).
In addition keeping aside the <a, b> for which the number of points is the highest.
Order of this solution is O(N^2)
I was wondering if there is a solution with better performance.