If you're given a list of coun

Google Interview Question for Software Engineer / Developers

1

of 1 vote

52
Answers
If you're given a list of countries and its corresponding population, write a function that will return a random country but the higher the population of the country, the more likely it is to be picked at random.
- Curious September 05, 2012 in United States | Report Duplicate | Flag | PURGE
Google Software Engineer / Developer Algorithm

Email me when people comment.

An error occurred in subscribing you.

Country: United States
Interview Type: Phone Interview

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 32 vote

First of all, sum the population of all countries. Let the sum be N.
Now think of it in terms of a number line. Have a number line from 1 to N. Now allocate proportional part of the number line to corresponding country based on its population (Higher the population more numbers in number line and vice versa). Let say 1 to n1 is for 1st country , n1+1 to n2 to second and so on.
Now generate a random number between 1 to N. Just return the country which owns that number in that number line.

- deadman September 05, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

You're absolutely right. He hinted at using a number line but I didn't get it.

- Curious September 05, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

The question sounds like, always pick a country with highest population at random, this is impossible.
Question should be, probability of picking the higher populated country should be high.

- Anonymous September 05, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

No problem. All the best for your results :)
Can you share more questions of your interview?

- deadman September 05, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

@anonymous: I think you're misinterpreting the question. It's not an online algorithm. You get to look at all populations before you have to decide. The probability of picking a country i should be Pop(i)/(Sum of Pop(k) over all k)

- Anonymous September 05, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

The problem is also not about picking the country with the highest population, but picking a country at random such that the more people a country has, the more likely it is to be picked (proportionally, we are to assume).

- Anonymous September 05, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

great solution !!

- saraf2206 September 05, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

I dont believe that drawing a number line will help here. Number line solution would be somewhat similar to sorting the list.

I think the solution will be in terms of probability. Also, for finding the highest populated country, we need to know the limits of list.

Not sure how it can be done by randomly selecting. If we know the Min and Max then we can pick some values randomly and decide.

- Andy2000 September 06, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

What data structure will you use for storing lines and for finding a country by a given random number?

- Anonymous September 06, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

There is no ds called lines so I think we need to pick some DS such as Hash Table or Dictionary. and the scan all of the elements and fill it into Hash Table.

- Andy2000 September 06, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Could you elaborate on how you would use HashTable provided we have N countries with total population of B?

- Anonymous September 06, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Hi Andy,
Number line was given just to think in terms of that. Implementation wise its very simple. Corresponding to each country you just need to store the min number and max number. If the random number falls in that range, return that country. If you are still not convinced I'll code it for you by tonight.

- deadman September 06, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

I was thinking to Scan the countries and start storing as Key as Value of Population and Country name as Value of Hash Key

- Andy2000 September 06, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

We could use a property that if one country has 99% of total population, we don't need to go over the list of all countries.

- Anonymous September 06, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

It is correct, but we can search faster using the binary search. if we have population such as n1, n2, ... we can make the sequence a1=n1, a2=n1+n2, a3=n1+n2+n3, .... an then store all ai values in a binary search tree. then we select a random number between 0 to N=n1+n2+.. = an and search that number in the tree. the associated range can be found by looking at the final node and its parent which can give us the associated country.

- mk September 06, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

We can have an array, choosing[no_of_country]
loop : for all the countries
choosing[country_i] = rand(0,1)* (population[i]/total_population)
return max(choosing[country_i])

- ss11 September 10, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 4 vote

Edit: the probability was calculated wrong at the first time. i have fixed it.

1. randomly select one country c1
2. put it back to the country set,
3. randomly pick another country c2
4. return the one with more population.

suppose there are n countries.
the probability that the country with the smallest population being picked is 1/n * 1/ n
the probability that the second smallest country being picked is 3/n^2

the probability that the kth smallest country being picked is 2 * 1/n * (k - 1)/n + 1/n * 1/n = (2k - 1) / n^2

- gnahzy September 05, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

I think my solution is also correct. can anybody see if there is any problem with the solution?

- gnahzy September 05, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

That depends on your interpretation of the problem statement. I would think the probabilities need to be proportional.

- Anonymous September 05, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

I doubt it is correct. Randomly selecting could also mean that we are picking the same value again and again. Also if you will pick from 2 countries then we might be picking max of 2 smallest.

- Andy2000 September 06, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Hopefully this shows the shortcomings in your solution. (Now this is aside from the fact as per your algorithm the country with the smallest population will never get picked)

You propose that you will pick 2 countries and return the one with max population.

Now just take this further and lets pick three countries and return the max of these.

Lets keep increasing it from 2 to 3 to 4...to almost infinity. Now every pick will always pick the country with the highest population.

- raj September 07, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

So I originally dismissed this solution, but it could actually be correct and quite efficient, depending on your interpretation of the question. The k-th smallest country has (2*(k-1))/(n*(n-1)) probability of being picked. So when you calculate the probabilities correctly (they are calculated incorrectly in the answer), they do add to 1. And it's clear that countries with larger populations have better chances of being picked.

My interpretation of the question was different, and this wouldn't be correct under my interpretation. I viewed the question as saying that the probability of a country being picked should be *proportional* to its size. Of course this is to be clarified with the interviewer.

- eugene.yarovoi September 08, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Good! A very efficient idea if the probability doesn't need to be "proportional" to the population.

- Alva0930 February 20, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 1 vote

Knuth gives an algorithm attributed to A. J. Walker. It requires at most two lookups per random choice.
The basic idea is that if you have different colored cubes, n_j of color C_j, j = 1, 2, ..., k, and you have k boxes each holding n cubes and n_1 + n_2 + ... + n_k = k*n, then it is possible to distribute the cubes such that box B_i contains at most two colors one of which is C_i. (Simple induction proof.)
Now generate a random floating-point number x in the range [1, k+1). Let i be the integer part and y the excess. If n*y is less than the number of cubes of color C_i in box B_i output C_i otherwise output the other color in box B_i.

- Anonymous September 05, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

@anonymous Can you please post the solution?

- Andy2000 September 06, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

To better understand the code I suggest that you change the countries to the possible outcomes of throwing a pair of dice, 2, 3, ..., 12 and change the population counts to the number of ways you can get the different outcomes. So, 1 for 2, 2 for 3 etc.

#define _HAS_CPP0X 1

#include <algorithm>
#include <deque>
#include <iostream>
#include <map>
#include <numeric>
#include <ostream>
#include <random>
#include <string>
#include <utility>
#include <vector>

std::vector<std::pair<double, int>> 
    preprocess(std::vector<std::pair<std::string, double>>& container)
{
    double sum = 0;
    std::for_each(container.cbegin(), container.cend(), 
        [&sum](std::pair<std::string, double> p){ sum += p.second; });

    const unsigned int n = container.size();
    std::deque<std::pair<int, double>> deq(n);
    {
        int index = 0;
        for (auto iter = container.cbegin(); iter != container.cend(); ++iter)
        {
            deq[index] = std::make_pair(index, n * iter->second / sum);
            ++index;
        }
    }

    std::sort(deq.begin(), deq.end(), 
        [](const std::pair<int, double> lhs, const std::pair<int, double> rhs)
        {
            return lhs.second < rhs.second;
        });
    
    std::vector<std::pair<double, int>> result(n);
    for (unsigned int i = 0; i != n - 1; ++i)
    {
        auto p = deq.front();
        result[p.first].first = p.second;
        auto iter = std::find_if(deq.begin(), deq.end(), 
            [](std::pair<int, double> p){ return p.second >= 1.0; });
        result[p.first].second = iter->first;
        iter->second -= 1.0 - p.second;
        deq.pop_front();
    }

    // Handle last entry
    auto p = deq.front();
    result[p.first] = std::make_pair(1.0, p.first);

    return result;
}

int main()
{
    std::vector<std::pair<std::string, double>> countries;
    countries.push_back(std::make_pair(std::string("China"), 1347350000));
    countries.push_back(std::make_pair(std::string("India"), 1210193422));
    countries.push_back(std::make_pair(std::string("United States"), 314303000));
    countries.push_back(std::make_pair(std::string("Indonesia"), 237641326));
    countries.push_back(std::make_pair(std::string("Brazil"), 193946886));
    countries.push_back(std::make_pair(std::string("Pakistan"), 180582000));
    countries.push_back(std::make_pair(std::string("Nigeria"), 166629000));
    countries.push_back(std::make_pair(std::string("Bangladesh"), 152518015));
    countries.push_back(std::make_pair(std::string("Russia"), 143142000));
    countries.push_back(std::make_pair(std::string("Japan"), 127570000));
    countries.push_back(std::make_pair(std::string("Mexico"), 112336538));

    const unsigned int n = countries.size();

    std::vector<std::pair<double, int>> table = preprocess(countries);

    std::default_random_engine e;

    std::vector<int> counts(n);
    const int m = 4186;         // Expected counts = population in millions
    for (int i = 0; i != m; ++i)
    {
		// r is a random variable in the range [0, n)
        double r = static_cast<double>(e()) / (1.0 + e.max());
        r *= n;

		int index = static_cast<int>(r);
        double excess = r - index;

        if (excess <= table[index].first)
        {
            std::cout << countries[index].first << std::endl;
            ++counts[index];
        }
        else
        {
            index = table[index].second;
            std::cout << countries[index].first << std::endl;
            ++counts[index];
        }
    }

    return 0;
}

- Anonymous September 06, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Can you explain in detail what you believe the problem statement to be? It's ambiguous, and it seems our understanding of it differs. Given my current understanding of the problem statement, this makes no sense to me.

- eugene.yarovoi September 06, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

My code will return "China" with probability 1247350000/4186212187 = 0.32, "India" with probability 0.29, "United States" with probability 0.075, ..., Mexico with probability 0.027. So, it returns a random country but the higher the population is, the more likely the country is to be picked.

- Anonymous September 06, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

The is a nice description of Walker's alias method at ActiveState's code repository. The title is "Walker's alias method for random objects with different probabilities (Python recipe)".

- Anonymous September 07, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 1 vote

Pseudocode:
Given Country Cn with Population Pn.
int sum=sum of all P
for(int i=0,i<n;i++){
if(Pi/sum>Random(0,1)){
return Ci;
}else{
sum=sum-Pi;
}
}

- Anonymous September 06, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Idea is from Reservoir sampling

- Anonymous September 06, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Also sorting countries in decreasing order of population will improve this solution

- Anonymous September 06, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 2 vote

I came up with an answer that wasn't very efficient using 2 arrays. Any other ideas?

- Curious September 05, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

can you post that solution?

- Anonymous September 05, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 4 vote

import java.util.*;

class Country {
String name;
int pop;

public Country(String name, int pop) {
this.name = name;
this.pop = pop;
}
}

public class task3 {

public static int binarySearch(int[] a, int target) {
int mid, lo = 0, hi = a.length - 1;

while (lo < hi) {
mid = (hi - lo) / 2 + lo;
if (a[mid] < target)
lo = mid;
else if (a[mid - 1] > target)
hi = mid - 1;
else
// if (a[mid] >= target && a[])
return mid;
}
return -1;
}

public static String randomCountry(ArrayList<Country> arr) {
Random rand = new Random();

int[] a = new int[arr.size()];
a[0] = arr.get(0).pop;
for (int i = 1; i < a.length; i++)
a[i] = a[i - 1] + arr.get(i).pop;

int rand_num = rand.nextInt(a[a.length - 1]);

int country_pop = binarySearch(a, rand_num);

return arr.get(country_pop).name;

}

public static void main(String[] args) {

ArrayList<Country> arr = new ArrayList<Country>();
arr.add(new Country("x", 5));
arr.add(new Country("y", 10));
arr.add(new Country("z", 13));
arr.add(new Country("w", 2));

System.out.println(randomCountry(arr));

}

}

- tanlit91 September 05, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

What you are doing man? Just Binary Search? I think if interviewer were agree with solving this with Binary search, question would loose the value.

- Andy2000 September 06, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

How is this approach any different than the number line method? Array a is the number line. With the data in the main method, the line is 5, 15, 28, 30. Then you draw a random number from 0 to 30 say 29 and search it in the number line and return left index of the range. (although there are issues in the binary search algo here. infinite loop for 28), but the idea is right.

- dc360 September 08, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

const char * random_one_country(char* countries[],
	const int population[], const int N)
{
	int* count = new int[N];
	count[0] = population[0];

	for(int i = 1; i < N; i++)
		count[i] = count[i-1] + population[i];
	
	const int r = 1 + rand() % count[N-1];
	// binary search: A[l] < r <= A[u].
	int l = 0, u = N - 1;
	while(l <= u)
	{
		int m = l + (u - l) / 2;
		if(count[m] < r)
			l = m + 1;
		else
			u = m - 1;
	} // while

	return countries[l];
} // random_one_country

int main(void)
{
	char *countries[] = {"ammerica", "china", "taiwan"};
	int population[] = {10, 100, 20};
	for(int i = 0; i < 10; i++)
	{
		std::cout << 
		random_one_country(countries, population,
		sizeof(countries)/ sizeof(countries[0]))
		<< std::endl;
	}

	return 0;
}

- hwen.tmp September 06, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 2 vote

i think another way of luking into the same problem ia as follows:

1)sort the countries in decending order of their population.
2)generate a random number between 0 to n-1
3)let output[rand(N)]=i
4)choose the subarray arr[0,i]
5)generate a random number between 0 to i
6)let rand(i)=k
7)return arr[k];

by this method probability(arr[i])<probability(arr[j]) such that i<j

- sandeep September 07, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

public String getRandom(String[] countryName, Double[] countryPopulation){
	double countryPopulationSum = 0;
	for(int i=0; i<countryName.length; i++){
		countryPopulationSum += countryPopulation[i];
	}
	for(int i=0; true; i = (i+1)%countryName.length){
		if(Math.random() < countryPopulation[i]){
			return countryName[i];
		}
	}
}

- Anonymous September 16, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

division was missing:

public String getRandom(String[] countryName, int[] countryPopulation){
	int countryPopulationSum = 0;
	for(int i=0; i<countryName.length; i++){
		countryPopulationSum += countryPopulation[i];
	}
	for(int i=0; true; i = (i+1)%countryName.length){
		if(Math.random() < countryPopulation[i]/countryPopulationSum){
			return countryName[i];
		}
	}
}

- Anonymous September 16, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 2 vote

Another idea:
1. Find the GCD of population of all the countries.
2. Create an array and repeat name of country X, K times in the array where K=(population of country X)/GCD
3. Now pick up a random element from the array
It could be easily proved mathematically that probability of choosing a country with higher population is higher in appropriate proportion.

- Epic_coder October 23, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Sorry, Updated the solution.

- Epic_coder October 23, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

I think this might take lesser memory if GCD>1.

- Epic_coder October 23, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

No. The top-voted solution uses O(n) memory where n is the number of countries. There is no way your solution can use any less, and will use exponentially more in some cases.

- eugene.yarovoi October 23, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

How do you plan to represent n points on number line where n is the sum fo all populations?

- Epic_coder October 23, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Well, I'd just store them as a cumulative total, like pop[0], pop[0]+pop[1], pop[0]+pop[1]+pop[2] , ...

Then after choosing an index I'd binary search. Technically this may be O(logN) for a lookup, but the number of people in a country is potentially large, so I would prioritize avoiding storing 1 entry per person.

- eugene.yarovoi October 30, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

Create an array of size equal to number of countries. Loop through the list and add the cumulative population on each cell of the array. Get random of ceil value equal to the last array cell value. Check where the number falls in the range of the array. Return the index of that array.

Example: A->5, B->10, C->3, D->8, E->7

array = { 5, 15, 18, 26, 33 }

int r = Random.next(33);
int i = 0;
while (r > a[i]) { i++; }
return list.get(i).CountryName

- Jasiri January 28, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

@deadman Can you post your solution?

- Anonymous September 08, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

-1

of 1 vote

My issue with this question is that it asks us choose something random (a country) that to some degree depends on something else (population). The mathematical definition of random is "chosen without regard to any characteristics of the individual members of the population so that each has an equal chance of being selected." Given the mathematical definition of random, how does one reconcile these differences and still preserve true randomness? Is this function REALLY random? I am not aware of an algorithm that can accomplish this.

- Anonymous September 06, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Pick a random person. But instead of outputting the person's name, you output his/hers country.

- Anonymous September 06, 2012 | Flag

Comment hidden because of low score. Click to expand.

-1

of 1 vote

I would think that if we could draw the countries in number line and then limit the random number generation with max distance on the number lines. Like generate random number between a given range and in that we can give a range of farthest distance of line. And in every call change the range of Random number.

CareerCup

Google Interview Question for Software Engineer / Developers

Books

Videos

Resume Review

Mock Interviews