## Google Interview Question

Software Engineer / Developers**Country:**United States

**Interview Type:**Phone Interview

The question sounds like, always pick a country with highest population at random, this is impossible.

Question should be, probability of picking the higher populated country should be high.

No problem. All the best for your results :)

Can you share more questions of your interview?

@anonymous: I think you're misinterpreting the question. It's not an online algorithm. You get to look at all populations before you have to decide. The probability of picking a country i should be Pop(i)/(Sum of Pop(k) over all k)

The problem is also not about picking the country with the highest population, but picking a country at random such that the more people a country has, the more likely it is to be picked (proportionally, we are to assume).

I dont believe that drawing a number line will help here. Number line solution would be somewhat similar to sorting the list.

I think the solution will be in terms of probability. Also, for finding the highest populated country, we need to know the limits of list.

Not sure how it can be done by randomly selecting. If we know the Min and Max then we can pick some values randomly and decide.

What data structure will you use for storing lines and for finding a country by a given random number?

There is no ds called lines so I think we need to pick some DS such as Hash Table or Dictionary. and the scan all of the elements and fill it into Hash Table.

Could you elaborate on how you would use HashTable provided we have N countries with total population of B?

Hi Andy,

Number line was given just to think in terms of that. Implementation wise its very simple. Corresponding to each country you just need to store the min number and max number. If the random number falls in that range, return that country. If you are still not convinced I'll code it for you by tonight.

I was thinking to Scan the countries and start storing as Key as Value of Population and Country name as Value of Hash Key

We could use a property that if one country has 99% of total population, we don't need to go over the list of all countries.

It is correct, but we can search faster using the binary search. if we have population such as n1, n2, ... we can make the sequence a1=n1, a2=n1+n2, a3=n1+n2+n3, .... an then store all ai values in a binary search tree. then we select a random number between 0 to N=n1+n2+.. = an and search that number in the tree. the associated range can be found by looking at the final node and its parent which can give us the associated country.

Edit: the probability was calculated wrong at the first time. i have fixed it.

1. randomly select one country c1

2. put it back to the country set,

3. randomly pick another country c2

4. return the one with more population.

suppose there are n countries.

the probability that the country with the smallest population being picked is 1/n * 1/ n

the probability that the second smallest country being picked is 3/n^2

the probability that the kth smallest country being picked is 2 * 1/n * (k - 1)/n + 1/n * 1/n = (2k - 1) / n^2

I think my solution is also correct. can anybody see if there is any problem with the solution?

That depends on your interpretation of the problem statement. I would think the probabilities need to be proportional.

I doubt it is correct. Randomly selecting could also mean that we are picking the same value again and again. Also if you will pick from 2 countries then we might be picking max of 2 smallest.

Hopefully this shows the shortcomings in your solution. (Now this is aside from the fact as per your algorithm the country with the smallest population will never get picked)

You propose that you will pick 2 countries and return the one with max population.

Now just take this further and lets pick three countries and return the max of these.

Lets keep increasing it from 2 to 3 to 4...to almost infinity. Now every pick will always pick the country with the highest population.

So I originally dismissed this solution, but it could actually be correct and quite efficient, depending on your interpretation of the question. The k-th smallest country has (2*(k-1))/(n*(n-1)) probability of being picked. So when you calculate the probabilities correctly (they are calculated incorrectly in the answer), they do add to 1. And it's clear that countries with larger populations have better chances of being picked.

My interpretation of the question was different, and this wouldn't be correct under my interpretation. I viewed the question as saying that the probability of a country being picked should be *proportional* to its size. Of course this is to be clarified with the interviewer.

Knuth gives an algorithm attributed to A. J. Walker. It requires at most two lookups per random choice.

The basic idea is that if you have different colored cubes, n_j of color C_j, j = 1, 2, ..., k, and you have k boxes each holding n cubes and n_1 + n_2 + ... + n_k = k*n, then it is possible to distribute the cubes such that box B_i contains at most two colors one of which is C_i. (Simple induction proof.)

Now generate a random floating-point number x in the range [1, k+1). Let i be the integer part and y the excess. If n*y is less than the number of cubes of color C_i in box B_i output C_i otherwise output the other color in box B_i.

To better understand the code I suggest that you change the countries to the possible outcomes of throwing a pair of dice, 2, 3, ..., 12 and change the population counts to the number of ways you can get the different outcomes. So, 1 for 2, 2 for 3 etc.

```
#define _HAS_CPP0X 1
#include <algorithm>
#include <deque>
#include <iostream>
#include <map>
#include <numeric>
#include <ostream>
#include <random>
#include <string>
#include <utility>
#include <vector>
std::vector<std::pair<double, int>>
preprocess(std::vector<std::pair<std::string, double>>& container)
{
double sum = 0;
std::for_each(container.cbegin(), container.cend(),
[&sum](std::pair<std::string, double> p){ sum += p.second; });
const unsigned int n = container.size();
std::deque<std::pair<int, double>> deq(n);
{
int index = 0;
for (auto iter = container.cbegin(); iter != container.cend(); ++iter)
{
deq[index] = std::make_pair(index, n * iter->second / sum);
++index;
}
}
std::sort(deq.begin(), deq.end(),
[](const std::pair<int, double> lhs, const std::pair<int, double> rhs)
{
return lhs.second < rhs.second;
});
std::vector<std::pair<double, int>> result(n);
for (unsigned int i = 0; i != n - 1; ++i)
{
auto p = deq.front();
result[p.first].first = p.second;
auto iter = std::find_if(deq.begin(), deq.end(),
[](std::pair<int, double> p){ return p.second >= 1.0; });
result[p.first].second = iter->first;
iter->second -= 1.0 - p.second;
deq.pop_front();
}
// Handle last entry
auto p = deq.front();
result[p.first] = std::make_pair(1.0, p.first);
return result;
}
int main()
{
std::vector<std::pair<std::string, double>> countries;
countries.push_back(std::make_pair(std::string("China"), 1347350000));
countries.push_back(std::make_pair(std::string("India"), 1210193422));
countries.push_back(std::make_pair(std::string("United States"), 314303000));
countries.push_back(std::make_pair(std::string("Indonesia"), 237641326));
countries.push_back(std::make_pair(std::string("Brazil"), 193946886));
countries.push_back(std::make_pair(std::string("Pakistan"), 180582000));
countries.push_back(std::make_pair(std::string("Nigeria"), 166629000));
countries.push_back(std::make_pair(std::string("Bangladesh"), 152518015));
countries.push_back(std::make_pair(std::string("Russia"), 143142000));
countries.push_back(std::make_pair(std::string("Japan"), 127570000));
countries.push_back(std::make_pair(std::string("Mexico"), 112336538));
const unsigned int n = countries.size();
std::vector<std::pair<double, int>> table = preprocess(countries);
std::default_random_engine e;
std::vector<int> counts(n);
const int m = 4186; // Expected counts = population in millions
for (int i = 0; i != m; ++i)
{
// r is a random variable in the range [0, n)
double r = static_cast<double>(e()) / (1.0 + e.max());
r *= n;
int index = static_cast<int>(r);
double excess = r - index;
if (excess <= table[index].first)
{
std::cout << countries[index].first << std::endl;
++counts[index];
}
else
{
index = table[index].second;
std::cout << countries[index].first << std::endl;
++counts[index];
}
}
return 0;
}
```

Can you explain in detail what you believe the problem statement to be? It's ambiguous, and it seems our understanding of it differs. Given my current understanding of the problem statement, this makes no sense to me.

My code will return "China" with probability 1247350000/4186212187 = 0.32, "India" with probability 0.29, "United States" with probability 0.075, ..., Mexico with probability 0.027. So, it returns a random country but the higher the population is, the more likely the country is to be picked.

Pseudocode:

Given Country Cn with Population Pn.

int sum=sum of all P

for(int i=0,i<n;i++){

if(Pi/sum>Random(0,1)){

return Ci;

}else{

sum=sum-Pi;

}

}

import java.util.*;

class Country {

String name;

int pop;

public Country(String name, int pop) {

this.name = name;

this.pop = pop;

}

}

public class task3 {

public static int binarySearch(int[] a, int target) {

int mid, lo = 0, hi = a.length - 1;

while (lo < hi) {

mid = (hi - lo) / 2 + lo;

if (a[mid] < target)

lo = mid;

else if (a[mid - 1] > target)

hi = mid - 1;

else

// if (a[mid] >= target && a[])

return mid;

}

return -1;

}

public static String randomCountry(ArrayList<Country> arr) {

Random rand = new Random();

int[] a = new int[arr.size()];

a[0] = arr.get(0).pop;

for (int i = 1; i < a.length; i++)

a[i] = a[i - 1] + arr.get(i).pop;

int rand_num = rand.nextInt(a[a.length - 1]);

int country_pop = binarySearch(a, rand_num);

return arr.get(country_pop).name;

}

public static void main(String[] args) {

ArrayList<Country> arr = new ArrayList<Country>();

arr.add(new Country("x", 5));

arr.add(new Country("y", 10));

arr.add(new Country("z", 13));

arr.add(new Country("w", 2));

System.out.println(randomCountry(arr));

}

}

What you are doing man? Just Binary Search? I think if interviewer were agree with solving this with Binary search, question would loose the value.

How is this approach any different than the number line method? Array a is the number line. With the data in the main method, the line is 5, 15, 28, 30. Then you draw a random number from 0 to 30 say 29 and search it in the number line and return left index of the range. (although there are issues in the binary search algo here. infinite loop for 28), but the idea is right.

```
const char * random_one_country(char* countries[],
const int population[], const int N)
{
int* count = new int[N];
count[0] = population[0];
for(int i = 1; i < N; i++)
count[i] = count[i-1] + population[i];
const int r = 1 + rand() % count[N-1];
// binary search: A[l] < r <= A[u].
int l = 0, u = N - 1;
while(l <= u)
{
int m = l + (u - l) / 2;
if(count[m] < r)
l = m + 1;
else
u = m - 1;
} // while
return countries[l];
} // random_one_country
int main(void)
{
char *countries[] = {"ammerica", "china", "taiwan"};
int population[] = {10, 100, 20};
for(int i = 0; i < 10; i++)
{
std::cout <<
random_one_country(countries, population,
sizeof(countries)/ sizeof(countries[0]))
<< std::endl;
}
return 0;
}
```

i think another way of luking into the same problem ia as follows:

1)sort the countries in decending order of their population.

2)generate a random number between 0 to n-1

3)let output[rand(N)]=i

4)choose the subarray arr[0,i]

5)generate a random number between 0 to i

6)let rand(i)=k

7)return arr[k];

by this method probability(arr[i])<probability(arr[j]) such that i<j

```
public String getRandom(String[] countryName, Double[] countryPopulation){
double countryPopulationSum = 0;
for(int i=0; i<countryName.length; i++){
countryPopulationSum += countryPopulation[i];
}
for(int i=0; true; i = (i+1)%countryName.length){
if(Math.random() < countryPopulation[i]){
return countryName[i];
}
}
}
```

division was missing:

```
public String getRandom(String[] countryName, int[] countryPopulation){
int countryPopulationSum = 0;
for(int i=0; i<countryName.length; i++){
countryPopulationSum += countryPopulation[i];
}
for(int i=0; true; i = (i+1)%countryName.length){
if(Math.random() < countryPopulation[i]/countryPopulationSum){
return countryName[i];
}
}
}
```

Another idea:

1. Find the GCD of population of all the countries.

2. Create an array and repeat name of country X, K times in the array where K=(population of country X)/GCD

3. Now pick up a random element from the array

It could be easily proved mathematically that probability of choosing a country with higher population is higher in appropriate proportion.

No. The top-voted solution uses O(n) memory where n is the number of countries. There is no way your solution can use any less, and will use exponentially more in some cases.

How do you plan to represent n points on number line where n is the sum fo all populations?

Well, I'd just store them as a cumulative total, like pop[0], pop[0]+pop[1], pop[0]+pop[1]+pop[2] , ...

Then after choosing an index I'd binary search. Technically this may be O(logN) for a lookup, but the number of people in a country is potentially large, so I would prioritize avoiding storing 1 entry per person.

Create an array of size equal to number of countries. Loop through the list and add the cumulative population on each cell of the array. Get random of ceil value equal to the last array cell value. Check where the number falls in the range of the array. Return the index of that array.

Example: A->5, B->10, C->3, D->8, E->7

array = { 5, 15, 18, 26, 33 }

int r = Random.next(33);

int i = 0;

while (r > a[i]) { i++; }

return list.get(i).CountryName

My issue with this question is that it asks us choose something random (a country) that to some degree depends on something else (population). The mathematical definition of random is "chosen without regard to any characteristics of the individual members of the population so that each has an equal chance of being selected." Given the mathematical definition of random, how does one reconcile these differences and still preserve true randomness? Is this function REALLY random? I am not aware of an algorithm that can accomplish this.

I would think that if we could draw the countries in number line and then limit the random number generation with max distance on the number lines. Like generate random number between a given range and in that we can give a range of farthest distance of line. And in every call change the range of Random number.

First of all, sum the population of all countries. Let the sum be N.

- deadman September 05, 2012Now think of it in terms of a number line. Have a number line from 1 to N. Now allocate proportional part of the number line to corresponding country based on its population (Higher the population more numbers in number line and vice versa). Let say 1 to n1 is for 1st country , n1+1 to n2 to second and so on.

Now generate a random number between 1 to N. Just return the country which owns that number in that number line.