is a comprehensive book on getting a job at a top tech company, while focuses on dev interviews and does this for PMs.
CareerCup's interview videos give you a real-life look at technical interviews. In these unscripted videos, watch how other candidates handle tough questions and how the interviewer thinks about their performance.
Most engineers make critical mistakes on their resumes -- we can fix your resume with our custom resume review service. And, we use fellow engineers as our resume reviewers, so you can be sure that we "get" what you're saying.
Our Mock Interviews will be conducted "in character" just like a real interview, and can focus on whatever topics you want. All our interviewers have worked for Microsoft, Google or Amazon, you know you'll get a true-to-life experience.
Target is to find closest 100 starts.
- analog76 February 11, 2012The best way to implement is K-NN algorithm.
Solution Step
1) Here assume K is 10 . Selection of K is find out using RMSE. this is another bigger topic and let's not deep dive into that.
2) Here are selecting 10 centroid not 1 centroid. Why 10 centroid?. Because some star cluster can happen more than one location.
3) Let's take an analogy. let's assume in US is a space (3 dimension) and people are stars. We have to find where the population density is more. NY is one cluster, LA is another cluster. Bay area is 3rd cluster and similary we have more clusters. Out of these we have to find the densely populated location. Of course the obvious answer is NY city. Of course we don't know the answer how do we calculate.
4) Read first 10 rows from the fille ( because we assume our K is 10) and make them as centroid.
5) Read rows from 1 to 100000 and for each row find closest centroid.
for rows 1 to10000
for cluster 1 to 10
closestCluster = min(closestCluster,(cluster[j]-row[i]))
end
end
6) Assume we randomly select centroid to find densely populated location. Assume we selected 10 location in kansas state (because they are in the middle of US) and we calculate distance between each person to all the 10 location. Find the min distance and assign that person to that centroid.
7) Once we assign all the points/person to the centroid , recalculate the centroid.
8) Each centroid has some points/person assigned to them. Recalculate the centroid for x,y,z using ((x1+x2+x3+...+xn)/n
This is the new centroid. After this step the centroids are moved to different direction and they are scattered.
9) Continue the step 5,6,7,8 till the centroid is converged.
10) When I say converge centroid doesn't change much distance from previous runtime to current time and change is really insignificant.
11) While finding centroid for each point/person , maintaing a minheap structure for each cluster and this maintains informations about 100 closest point from the cluster.
12) For each cluster find distance between centroid to each point and sum them up.
int[] distances;
for each cluster c
c=0;
for each points 1 to 100
c=c+ distance between point and cluster[i]
end
distance[i]=c;
end
13) After this step find the mindistance from the array and all the points belongs to that centroid/cluster are closest points.
Note : In some places I used generously data structure. I was more concentrating on finding the solution and implementation rather than best implementation. If you explain the stuff above it takes more than 60 minutes and he would be more happy with solution.