.if a file has million stars c

Amazon Interview Question for Software Engineer / Developers

0

of 0 votes

30
Answers
.if a file has million stars coordinates how would you find the nearest 100 stars. Interviewer asked various variations of this questions and their space and time complexities
- ben February 06, 2012 in United States | Report Duplicate | Flag | PURGE
Amazon Software Engineer / Developer

Email me when people comment.

An error occurred in subscribing you.

Country: United States
Interview Type: Phone Interview

More Questions from This Interview

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 2 vote

for rows 1 to10000
			for cluster 1 to 10
				closestCluster = min(closestCluster,(cluster[j]-row[i]))
			end
      end

6) Assume we randomly select centroid to find densely populated location. Assume we selected 10 location in kansas state (because they are in the middle of US) and we calculate distance between each person to all the 10 location. Find the min distance and assign that person to that centroid.

7) Once we assign all the points/person to the centroid , recalculate the centroid.
8) Each centroid has some points/person assigned to them. Recalculate the centroid for x,y,z using ((x1+x2+x3+...+xn)/n
This is the new centroid. After this step the centroids are moved to different direction and they are scattered.

9) Continue the step 5,6,7,8 till the centroid is converged.
10) When I say converge centroid doesn't change much distance from previous runtime to current time and change is really insignificant.
11) While finding centroid for each point/person , maintaing a minheap structure for each cluster and this maintains informations about 100 closest point from the cluster.
12) For each cluster find distance between centroid to each point and sum them up.

int[] distances;
			for each cluster c
			   c=0;
			   for each points 1 to 100			   
					c=c+ distance between point and cluster[i]
			   end 
			   distance[i]=c;
			end

13) After this step find the mindistance from the array and all the points belongs to that centroid/cluster are closest points.

Note : In some places I used generously data structure. I was more concentrating on finding the solution and implementation rather than best implementation. If you explain the stuff above in a normal way it takes more than 60 minutes and he would be more happy with solution.

- analog76 February 11, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 2 vote

Target is to find closest 100 starts.

The best way to implement is K-NN algorithm.
Solution Step
1) Here assume K is 10 . Selection of K is find out using RMSE. this is another bigger topic and let's not deep dive into that.
2) Here are selecting 10 centroid not 1 centroid. Why 10 centroid?. Because some star cluster can happen more than one location.
3) Let's take an analogy. let's assume in US is a space (3 dimension) and people are stars. We have to find where the population density is more. NY is one cluster, LA is another cluster. Bay area is 3rd cluster and similary we have more clusters. Out of these we have to find the densely populated location. Of course the obvious answer is NY city. Of course we don't know the answer how do we calculate.

4) Read first 10 rows from the fille ( because we assume our K is 10) and make them as centroid.
5) Read rows from 1 to 100000 and for each row find closest centroid.

for rows 1 to10000
for cluster 1 to 10
closestCluster = min(closestCluster,(cluster[j]-row[i]))
end
end

6) Assume we randomly select centroid to find densely populated location. Assume we selected 10 location in kansas state (because they are in the middle of US) and we calculate distance between each person to all the 10 location. Find the min distance and assign that person to that centroid.

7) Once we assign all the points/person to the centroid , recalculate the centroid.
8) Each centroid has some points/person assigned to them. Recalculate the centroid for x,y,z using ((x1+x2+x3+...+xn)/n
This is the new centroid. After this step the centroids are moved to different direction and they are scattered.

9) Continue the step 5,6,7,8 till the centroid is converged.
10) When I say converge centroid doesn't change much distance from previous runtime to current time and change is really insignificant.
11) While finding centroid for each point/person , maintaing a minheap structure for each cluster and this maintains informations about 100 closest point from the cluster.
12) For each cluster find distance between centroid to each point and sum them up.

int[] distances;
for each cluster c
c=0;
for each points 1 to 100
c=c+ distance between point and cluster[i]
end
distance[i]=c;
end

13) After this step find the mindistance from the array and all the points belongs to that centroid/cluster are closest points.

Note : In some places I used generously data structure. I was more concentrating on finding the solution and implementation rather than best implementation. If you explain the stuff above it takes more than 60 minutes and he would be more happy with solution.

- analog76 February 11, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

what are stars coordinates?

- King@Work February 06, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

(x,y,z) coordinates of a point

- ben February 06, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Find the centroid of the X, Y, Z points. Then find the closest 100 points from the centroid using a max-heap of size 100.

- Anonymous February 06, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

@Anonymous: What would be the formula for centroid calculation here in 3 dimensional space?

- Learner February 07, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

can't we take it (0,0,0)

- yash February 07, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

centroid_x = sum(X_i) / n, sum all the x coordinates and divide by number of points.
centroid_y = sum(Y_i) / n;
centroid_z = sum(Z_i) / n;

Distance of (X_i, Y_i, Z_i) from centroid = SQRT( (centroid_x - X_i) ^ 2 + (centroid_Y - Y_i) ^ 2 + (centroid_z - Z_i) ^ 2 );

- Anonymous February 08, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

nearest 100 stars respect to Earth? or to what planet? or to what coordinate?

- Khai Nguyen February 06, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

We have to make Clusters of 100 stars ( K-means Clustering ).
These clusters will be defined by their centroid .

- Anonymous February 07, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

K-nearest_neighbor_algorithm wiki should help

- hello world February 07, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

@anon ..selection algorithm will find kth max/min in O(n) but we have to have to find n such stars so it won't remain linear ..i wonder if we can do it in O(N) ..let me know if any have idea ?

Guys Use Maxheap Here , basically dat muggy guy were addicted of fixed answers , you can have a look at more application of max-heap below link

Here is algorithm for above
create maxheap of k=100 points , time O(k)
let tottal starts are n then for each n-k stars compare each start in this set of remaining elements with root of max-heap , if its less then root then make it root & call heapify
time required O(n-k) * O(logk)
Finally, MinHeap has k =100 closet elements and root of the Minheap H is the kth closet element.
so tottal complexity O(n-k)*O(logk)+O(k)

correct me if anything wrong ?

Thanks
Shashank
visit us shashank7s dot blogspot dot com
www dot facebook dot com/LestCode

- WgpShashank February 07, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

wrong. It's to find nearest points, rather than smallest points.
Just to copy solution from "find minimum K number from total N numbers" doesn't work here.

- hanks February 07, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Yeah this is finding the minimum k numbers...
but we know the co-ordinates of earth and co-ordinates of stars are given.
So couldn't we find the euclidean distance of each star and then include it in the above procedure...
just wondering...would that work??

- Anonymous February 18, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

I think line sweep algorithm will work for it

- amit February 07, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

I think line sweep algorithm will work for it

- amit February 07, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

I think line sweep algorithm will work for it

- amit February 07, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

I think line sweep algorithm will work for it

- amit February 07, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Were you asked to code this? If so, I can't do this as I don't know how to calculate the distance between 2 3-dimensional points. If not, you would do it as such:

Run through each coordinate, calculating the distance and storing it in a priority queue, where a smaller distance has a higher priority. Then, pop the first 100 values and do whatever you like with them (print to screen, store in an array, etc).

Where n is the number of star coordinates, the complexity is O(n+100).

- Anonymous February 08, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

I messed up the complexity. It would not be O(n) because generating a priority queue involves traversing a heap. I assume it would be O(nlogn), but it's early, so forgive me if I'm wrong.

- Anonymous February 08, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

you dunno how to calculate the euclidean distance btw two points ?? oh shame on you !!
btw it's not about finding 100 points with minimal distance from the origin, it's about finding 100 "clustered" points, ie. those which close two each other. K-nearest neighbor problem

- Anonymous February 09, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

How about creating a k-ary min heap. The closest star will be the root and its children will be the stars closest to the root and so on. Insertion of a new star will take O(log (base k) n) while traversing will take O(nlog(base k) n)

- doomguy February 08, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Map Reduce!!!

- anon.guy February 10, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

I think we can use min heap

- not qualified for anything February 10, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Take your position as 0,0,0
then calculate the d1,d2,d3 of all the stars from your position..
Now make the min heap of all the distances..
Now delete the root node 100 times ..and there u find the nearest 100 stars from ur point..

reviews please

- RishiMIttal February 10, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

I thought of this. Please let me know what you think.

1- Sort points based on their distance.
2- Set the minimal distance to the distance between the 1st and the 100th point. Then iterate through points, calculate the distance between the 2nd and 101th, 3rd and 102nd, ... and update the minimal distance accordingly.
I assumed the set of 100 closest points is defined as the set of points which the distance between its two farthest points is less than all other possible sets of 100 points.

- sara February 10, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

use quad tree

- Anonymous February 11, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

This is my solution
Put the first 100 star coordinates in an array and calculate the bounding sphere (wikipedia :Smallest_enclosing_sphere) of the points. While doing so, also mark the nodes in the array if they lie on the surface of the bounding sphere.
From then on keep picking the rest of the points from the file. If a point falls outside( distance from center of current bounding sphere to this coordinate is greater than the radius of the bounding sphere), ignore it. Else if inside, find the point from all the marked nodes( surface points) which is the farthest from this point, replace it, recalculate the bounding sphere and proceed. The time complexity should be O(k*n) ; k = 100

- Sanil February 18, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

-2

of 4 vote

The problem is similar to kth smallest of n items problem. So it could be solved by using selection algorithm which can have O(N) complexity and O(1) space.

Here is the pseudo code of selection algorithm taken from wikipedia:

function select(list, left, right, k)
     if left = right // If the list contains only one element
         return list[left]  // Return that element
     select pivotIndex between left and right
     pivotNewIndex := partition(list, left, right, pivotIndex)
     pivotDist := pivotNewIndex - left + 1 
     // The pivot is in its final sorted position, 
     // so pivotDist reflects its 1-based position if list were sorted
     if pivotDist = k 
         return list[pivotNewIndex]
     else if k < pivotDist 
         return select(list, left, pivotNewIndex - 1, k)
     else
         return select(list, pivotNewIndex + 1, right, k - pivotDist)

- Anonymous February 07, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Sorry mate, this does not work at all in this case.

We are not measuring the closest to a specific point, but the closest to each other (if we measure against a specific point then your algorithm fails again. The good solution in that case is to order them by Manhattan distance and choose the top by applying real distance).

So, this is pretty much a clustering problem... Whereas selection algorithm does not help.

- Adam February 07, 2012 | Flag

CareerCup

Amazon Interview Question for Software Engineer / Developers

Books

Videos

Resume Review

Mock Interviews