## Google Interview Question for Software Engineer / Developers

• 1
of 1 vote

Country: United States
Interview Type: Phone Interview

Comment hidden because of low score. Click to expand.
32
of 32 vote

First of all, sum the population of all countries. Let the sum be N.
Now think of it in terms of a number line. Have a number line from 1 to N. Now allocate proportional part of the number line to corresponding country based on its population (Higher the population more numbers in number line and vice versa). Let say 1 to n1 is for 1st country , n1+1 to n2 to second and so on.
Now generate a random number between 1 to N. Just return the country which owns that number in that number line.

Comment hidden because of low score. Click to expand.
0

You're absolutely right. He hinted at using a number line but I didn't get it.

Comment hidden because of low score. Click to expand.
0

The question sounds like, always pick a country with highest population at random, this is impossible.
Question should be, probability of picking the higher populated country should be high.

Comment hidden because of low score. Click to expand.
0

No problem. All the best for your results :)
Can you share more questions of your interview?

Comment hidden because of low score. Click to expand.
0

@anonymous: I think you're misinterpreting the question. It's not an online algorithm. You get to look at all populations before you have to decide. The probability of picking a country i should be Pop(i)/(Sum of Pop(k) over all k)

Comment hidden because of low score. Click to expand.
0

The problem is also not about picking the country with the highest population, but picking a country at random such that the more people a country has, the more likely it is to be picked (proportionally, we are to assume).

Comment hidden because of low score. Click to expand.
0

great solution !!

Comment hidden because of low score. Click to expand.
0

I dont believe that drawing a number line will help here. Number line solution would be somewhat similar to sorting the list.

I think the solution will be in terms of probability. Also, for finding the highest populated country, we need to know the limits of list.

Not sure how it can be done by randomly selecting. If we know the Min and Max then we can pick some values randomly and decide.

Comment hidden because of low score. Click to expand.
0

What data structure will you use for storing lines and for finding a country by a given random number?

Comment hidden because of low score. Click to expand.
0

There is no ds called lines so I think we need to pick some DS such as Hash Table or Dictionary. and the scan all of the elements and fill it into Hash Table.

Comment hidden because of low score. Click to expand.
0

Could you elaborate on how you would use HashTable provided we have N countries with total population of B?

Comment hidden because of low score. Click to expand.
0

Hi Andy,
Number line was given just to think in terms of that. Implementation wise its very simple. Corresponding to each country you just need to store the min number and max number. If the random number falls in that range, return that country. If you are still not convinced I'll code it for you by tonight.

Comment hidden because of low score. Click to expand.
0

I was thinking to Scan the countries and start storing as Key as Value of Population and Country name as Value of Hash Key

Comment hidden because of low score. Click to expand.
0

We could use a property that if one country has 99% of total population, we don't need to go over the list of all countries.

Comment hidden because of low score. Click to expand.
0

It is correct, but we can search faster using the binary search. if we have population such as n1, n2, ... we can make the sequence a1=n1, a2=n1+n2, a3=n1+n2+n3, .... an then store all ai values in a binary search tree. then we select a random number between 0 to N=n1+n2+.. = an and search that number in the tree. the associated range can be found by looking at the final node and its parent which can give us the associated country.

Comment hidden because of low score. Click to expand.
0

We can have an array, choosing[no_of_country]
loop : for all the countries
choosing[country_i] = rand(0,1)* (population[i]/total_population)
return max(choosing[country_i])

Comment hidden because of low score. Click to expand.
2
of 4 vote

Edit: the probability was calculated wrong at the first time. i have fixed it.

1. randomly select one country c1
2. put it back to the country set,
3. randomly pick another country c2
4. return the one with more population.

suppose there are n countries.
the probability that the country with the smallest population being picked is 1/n * 1/ n
the probability that the second smallest country being picked is 3/n^2

the probability that the kth smallest country being picked is 2 * 1/n * (k - 1)/n + 1/n * 1/n = (2k - 1) / n^2

Comment hidden because of low score. Click to expand.
0

I think my solution is also correct. can anybody see if there is any problem with the solution?

Comment hidden because of low score. Click to expand.
0

That depends on your interpretation of the problem statement. I would think the probabilities need to be proportional.

Comment hidden because of low score. Click to expand.
0

I doubt it is correct. Randomly selecting could also mean that we are picking the same value again and again. Also if you will pick from 2 countries then we might be picking max of 2 smallest.

Comment hidden because of low score. Click to expand.
0

Hopefully this shows the shortcomings in your solution. (Now this is aside from the fact as per your algorithm the country with the smallest population will never get picked)

You propose that you will pick 2 countries and return the one with max population.

Now just take this further and lets pick three countries and return the max of these.

Lets keep increasing it from 2 to 3 to 4...to almost infinity. Now every pick will always pick the country with the highest population.

Comment hidden because of low score. Click to expand.
0

So I originally dismissed this solution, but it could actually be correct and quite efficient, depending on your interpretation of the question. The k-th smallest country has (2*(k-1))/(n*(n-1)) probability of being picked. So when you calculate the probabilities correctly (they are calculated incorrectly in the answer), they do add to 1. And it's clear that countries with larger populations have better chances of being picked.

My interpretation of the question was different, and this wouldn't be correct under my interpretation. I viewed the question as saying that the probability of a country being picked should be *proportional* to its size. Of course this is to be clarified with the interviewer.

Comment hidden because of low score. Click to expand.
0

Good! A very efficient idea if the probability doesn't need to be "proportional" to the population.

Comment hidden because of low score. Click to expand.
1
of 1 vote

Knuth gives an algorithm attributed to A. J. Walker. It requires at most two lookups per random choice.
The basic idea is that if you have different colored cubes, n_j of color C_j, j = 1, 2, ..., k, and you have k boxes each holding n cubes and n_1 + n_2 + ... + n_k = k*n, then it is possible to distribute the cubes such that box B_i contains at most two colors one of which is C_i. (Simple induction proof.)
Now generate a random floating-point number x in the range [1, k+1). Let i be the integer part and y the excess. If n*y is less than the number of cubes of color C_i in box B_i output C_i otherwise output the other color in box B_i.

Comment hidden because of low score. Click to expand.
0

@anonymous Can you please post the solution?

Comment hidden because of low score. Click to expand.
0

To better understand the code I suggest that you change the countries to the possible outcomes of throwing a pair of dice, 2, 3, ..., 12 and change the population counts to the number of ways you can get the different outcomes. So, 1 for 2, 2 for 3 etc.

``````#define _HAS_CPP0X 1

#include <algorithm>
#include <deque>
#include <iostream>
#include <map>
#include <numeric>
#include <ostream>
#include <random>
#include <string>
#include <utility>
#include <vector>

std::vector<std::pair<double, int>>
preprocess(std::vector<std::pair<std::string, double>>& container)
{
double sum = 0;
std::for_each(container.cbegin(), container.cend(),
[&sum](std::pair<std::string, double> p){ sum += p.second; });

const unsigned int n = container.size();
std::deque<std::pair<int, double>> deq(n);
{
int index = 0;
for (auto iter = container.cbegin(); iter != container.cend(); ++iter)
{
deq[index] = std::make_pair(index, n * iter->second / sum);
++index;
}
}

std::sort(deq.begin(), deq.end(),
[](const std::pair<int, double> lhs, const std::pair<int, double> rhs)
{
return lhs.second < rhs.second;
});

std::vector<std::pair<double, int>> result(n);
for (unsigned int i = 0; i != n - 1; ++i)
{
auto p = deq.front();
result[p.first].first = p.second;
auto iter = std::find_if(deq.begin(), deq.end(),
[](std::pair<int, double> p){ return p.second >= 1.0; });
result[p.first].second = iter->first;
iter->second -= 1.0 - p.second;
deq.pop_front();
}

// Handle last entry
auto p = deq.front();
result[p.first] = std::make_pair(1.0, p.first);

return result;
}

int main()
{
std::vector<std::pair<std::string, double>> countries;
countries.push_back(std::make_pair(std::string("China"), 1347350000));
countries.push_back(std::make_pair(std::string("India"), 1210193422));
countries.push_back(std::make_pair(std::string("United States"), 314303000));
countries.push_back(std::make_pair(std::string("Indonesia"), 237641326));
countries.push_back(std::make_pair(std::string("Brazil"), 193946886));
countries.push_back(std::make_pair(std::string("Pakistan"), 180582000));
countries.push_back(std::make_pair(std::string("Nigeria"), 166629000));
countries.push_back(std::make_pair(std::string("Russia"), 143142000));
countries.push_back(std::make_pair(std::string("Japan"), 127570000));
countries.push_back(std::make_pair(std::string("Mexico"), 112336538));

const unsigned int n = countries.size();

std::vector<std::pair<double, int>> table = preprocess(countries);

std::default_random_engine e;

std::vector<int> counts(n);
const int m = 4186;         // Expected counts = population in millions
for (int i = 0; i != m; ++i)
{
// r is a random variable in the range [0, n)
double r = static_cast<double>(e()) / (1.0 + e.max());
r *= n;

int index = static_cast<int>(r);
double excess = r - index;

if (excess <= table[index].first)
{
std::cout << countries[index].first << std::endl;
++counts[index];
}
else
{
index = table[index].second;
std::cout << countries[index].first << std::endl;
++counts[index];
}
}

return 0;
}``````

Comment hidden because of low score. Click to expand.
0

Can you explain in detail what you believe the problem statement to be? It's ambiguous, and it seems our understanding of it differs. Given my current understanding of the problem statement, this makes no sense to me.

Comment hidden because of low score. Click to expand.
0

My code will return "China" with probability 1247350000/4186212187 = 0.32, "India" with probability 0.29, "United States" with probability 0.075, ..., Mexico with probability 0.027. So, it returns a random country but the higher the population is, the more likely the country is to be picked.

Comment hidden because of low score. Click to expand.
0

The is a nice description of Walker's alias method at ActiveState's code repository. The title is "Walker's alias method for random objects with different probabilities (Python recipe)".

Comment hidden because of low score. Click to expand.
1
of 1 vote

Pseudocode:
Given Country Cn with Population Pn.
int sum=sum of all P
for(int i=0,i<n;i++){
if(Pi/sum>Random(0,1)){
return Ci;
}else{
sum=sum-Pi;
}
}

Comment hidden because of low score. Click to expand.
0

Idea is from Reservoir sampling

Comment hidden because of low score. Click to expand.
0

Also sorting countries in decreasing order of population will improve this solution

Comment hidden because of low score. Click to expand.
0
of 2 vote

I came up with an answer that wasn't very efficient using 2 arrays. Any other ideas?

Comment hidden because of low score. Click to expand.
0

can you post that solution?

Comment hidden because of low score. Click to expand.
0
of 4 vote

import java.util.*;

class Country {
String name;
int pop;

public Country(String name, int pop) {
this.name = name;
this.pop = pop;
}
}

public static int binarySearch(int[] a, int target) {
int mid, lo = 0, hi = a.length - 1;

while (lo < hi) {
mid = (hi - lo) / 2 + lo;
if (a[mid] < target)
lo = mid;
else if (a[mid - 1] > target)
hi = mid - 1;
else
// if (a[mid] >= target && a[])
return mid;
}
return -1;
}

public static String randomCountry(ArrayList<Country> arr) {
Random rand = new Random();

int[] a = new int[arr.size()];
a[0] = arr.get(0).pop;
for (int i = 1; i < a.length; i++)
a[i] = a[i - 1] + arr.get(i).pop;

int rand_num = rand.nextInt(a[a.length - 1]);

int country_pop = binarySearch(a, rand_num);

return arr.get(country_pop).name;

}

public static void main(String[] args) {

ArrayList<Country> arr = new ArrayList<Country>();

System.out.println(randomCountry(arr));

}

}

Comment hidden because of low score. Click to expand.
0

What you are doing man? Just Binary Search? I think if interviewer were agree with solving this with Binary search, question would loose the value.

Comment hidden because of low score. Click to expand.
0

How is this approach any different than the number line method? Array a is the number line. With the data in the main method, the line is 5, 15, 28, 30. Then you draw a random number from 0 to 30 say 29 and search it in the number line and return left index of the range. (although there are issues in the binary search algo here. infinite loop for 28), but the idea is right.

Comment hidden because of low score. Click to expand.
0
of 0 vote

``````const char * random_one_country(char* countries[],
const int population[], const int N)
{
int* count = new int[N];
count[0] = population[0];

for(int i = 1; i < N; i++)
count[i] = count[i-1] + population[i];

const int r = 1 + rand() % count[N-1];
// binary search: A[l] < r <= A[u].
int l = 0, u = N - 1;
while(l <= u)
{
int m = l + (u - l) / 2;
if(count[m] < r)
l = m + 1;
else
u = m - 1;
} // while

return countries[l];
} // random_one_country

int main(void)
{
char *countries[] = {"ammerica", "china", "taiwan"};
int population[] = {10, 100, 20};
for(int i = 0; i < 10; i++)
{
std::cout <<
random_one_country(countries, population,
sizeof(countries)/ sizeof(countries[0]))
<< std::endl;
}

return 0;
}``````

Comment hidden because of low score. Click to expand.
0
of 2 vote

i think another way of luking into the same problem ia as follows:

1)sort the countries in decending order of their population.
2)generate a random number between 0 to n-1
3)let output[rand(N)]=i
4)choose the subarray arr[0,i]
5)generate a random number between 0 to i
6)let rand(i)=k
7)return arr[k];

by this method probability(arr[i])<probability(arr[j]) such that i<j

Comment hidden because of low score. Click to expand.
0
of 0 vote

``````public String getRandom(String[] countryName, Double[] countryPopulation){
double countryPopulationSum = 0;
for(int i=0; i<countryName.length; i++){
countryPopulationSum += countryPopulation[i];
}
for(int i=0; true; i = (i+1)%countryName.length){
if(Math.random() < countryPopulation[i]){
return countryName[i];
}
}
}``````

Comment hidden because of low score. Click to expand.
0

division was missing:

``````public String getRandom(String[] countryName, int[] countryPopulation){
int countryPopulationSum = 0;
for(int i=0; i<countryName.length; i++){
countryPopulationSum += countryPopulation[i];
}
for(int i=0; true; i = (i+1)%countryName.length){
if(Math.random() < countryPopulation[i]/countryPopulationSum){
return countryName[i];
}
}
}``````

Comment hidden because of low score. Click to expand.
0
of 2 vote

Another idea:
1. Find the GCD of population of all the countries.
2. Create an array and repeat name of country X, K times in the array where K=(population of country X)/GCD
3. Now pick up a random element from the array
It could be easily proved mathematically that probability of choosing a country with higher population is higher in appropriate proportion.

Comment hidden because of low score. Click to expand.
0

Sorry, Updated the solution.

Comment hidden because of low score. Click to expand.
0

I think this might take lesser memory if GCD>1.

Comment hidden because of low score. Click to expand.
0

No. The top-voted solution uses O(n) memory where n is the number of countries. There is no way your solution can use any less, and will use exponentially more in some cases.

Comment hidden because of low score. Click to expand.
0

How do you plan to represent n points on number line where n is the sum fo all populations?

Comment hidden because of low score. Click to expand.
0

Well, I'd just store them as a cumulative total, like pop[0], pop[0]+pop[1], pop[0]+pop[1]+pop[2] , ...

Then after choosing an index I'd binary search. Technically this may be O(logN) for a lookup, but the number of people in a country is potentially large, so I would prioritize avoiding storing 1 entry per person.

Comment hidden because of low score. Click to expand.
0
of 0 vote

Create an array of size equal to number of countries. Loop through the list and add the cumulative population on each cell of the array. Get random of ceil value equal to the last array cell value. Check where the number falls in the range of the array. Return the index of that array.

Example: A->5, B->10, C->3, D->8, E->7

array = { 5, 15, 18, 26, 33 }

int r = Random.next(33);
int i = 0;
while (r > a[i]) { i++; }
return list.get(i).CountryName

Comment hidden because of low score. Click to expand.
0
of 0 vote

Comment hidden because of low score. Click to expand.
-1
of 1 vote

My issue with this question is that it asks us choose something random (a country) that to some degree depends on something else (population). The mathematical definition of random is "chosen without regard to any characteristics of the individual members of the population so that each has an equal chance of being selected." Given the mathematical definition of random, how does one reconcile these differences and still preserve true randomness? Is this function REALLY random? I am not aware of an algorithm that can accomplish this.

Comment hidden because of low score. Click to expand.
0

Pick a random person. But instead of outputting the person's name, you output his/hers country.

Comment hidden because of low score. Click to expand.
-1
of 1 vote

I would think that if we could draw the countries in number line and then limit the random number generation with max distance on the number lines. Like generate random number between a given range and in that we can give a range of farthest distance of line. And in every call change the range of Random number.

Comment hidden because of low score. Click to expand.
-2
of 4 vote

Good question and equally good answer..

Name:

Writing Code? Surround your code with {{{ and }}} to preserve whitespace.

### Books

is a comprehensive book on getting a job at a top tech company, while focuses on dev interviews and does this for PMs.

### Videos

CareerCup's interview videos give you a real-life look at technical interviews. In these unscripted videos, watch how other candidates handle tough questions and how the interviewer thinks about their performance.