Google Interview Question
Software Engineer / Developersvector<int> data;
int i;
data is the list of records. we choose a sample of k records from data in a single pass.
vector<int> sample(data.begin(),data.begin()+k);
data.end() be a marker to the end of list data .
int random;
for(i=k;i!=data.end();i++)
{
random=rand()%(k+1234);
if(random<k)
sample[random]=data[i];
}
Say line count is a. For a=1, return the line read since prob is 1 and store it in variable b. For a=2 and thereafter, for every line read, find a random number between 0 and 1. If it lies between 0 and a-1/a, then return the new line and store it in variable b. If it lies between a-1/a and 1, then return the line previously stored in a.
The idea is that prob of line 1 is 1, 2 is (1 * 1/2), 3 is (1 * 1/2 * 2/3) and so on
Google really seems to be fond of this question. This has appeared many times on many sites, including this one.
- LOLer July 17, 2009bing "reservoir sampling" for the answer.