Google Interview Question
Software Engineer / DevelopersCountry: United States
This is what I would do as well but with a binary search if n is large to locate the range. There's also a small bug in your code where the first range is +1/n more likely and the last range is -1/n less likely because you use <= instead of <.
@ravishankar.balaji
I apologize but I am unable to grasp the logic here.
Where exactly is the concept of "Example probabilities:
w / sum = 1/9, 2/9, 1/3, 2/9, 1/9 "
is used? Please explain.
Ok. Good. I'd thought of a similar approach. C# code below.
/*function to return an integer with weighted probability*/
public static int[] w = new int[]{1, 2, 3, 2, 1};
public static int? NextInt()
{
int sum = 0;
for (int i = 0; i < w.Length; i++)
{
sum += w[i];
}
double[] percentageArr = new double[w.Length];
double currentPercentage = 0;
for (int i = 0; i < w.Length; i++)
{
double percentage = (w[i] / sum) * 100;
percentage += currentPercentage;
percentageArr[i] = currentPercentage;
}
Random r = new Random();
double randNum = r.NextDouble() * 100;
for (int i = 0; i < w.Length; i++)
{
if (randNum < percentageArr[i])
return w[i];
}
return null;
}
def add(a,b): return a+b
class weightedRandom:
aliases = {}
sum = 0
def __init__(self, weights):
self.aliases = {}
self.sum = reduce(add, weights)
slot = 0
dsum = weights[slot]
for i in range(sum):
if i >= dsum:
slot+=1
dsum+=weights[slot]
self.aliases = slot
def next(self):
return aliases[random.randint(0,self.sum-1)]
wr = weightedRandom([1, 2, 3, 2, 1])
wr.next()
Here's a solution using floating-point numbers. The generation is not as efficient as it could be since it's going through the array every time, but given the array is small it's negligible anyways. You could make it faster by binary-searching where you're at in the range.
import java.util.Random;
//I Got this Crazy Question on PHONE INTERVIEW AT GOOGLE:
//
//Design and implement a class to generate random numbers in an arbitrary probability distribution given by an array of integer weights, i.e. for int[] w return a number, n, from 0 to w.length - 1 with probability w[n] / sum(w). Using an existing random number generator with a uniform distribution is permitted.
//
//Example distribution:
//w = 1, 2, 3, 2, 1
//
//Example probabilities:
//w / sum = 1/9, 2/9, 1/3, 2/9, 1/9
//
//Example results:
//n = 0, 1, 2, 3, 4
//
//Documentation:
//
//Class java.util.Random
//
//public int nextInt(int n)
//
//Returns a pseudorandom, uniformly distributed int value between 0 (inclusive) and the specified value (exclusive), drawn from this random number generator's sequence. The general contract of nextInt is that one int value in the specified range is pseudorandomly generated and returned. All n possible int values are produced with (approximately) equal probability.
//
//Parameters:
//n - the bound on the random number to be returned. Must be positive.
//Returns:
//the next pseudorandom, uniformly distributed int value between 0 (inclusive) and n (exclusive) from this random number generator's sequence
//Throws:
//IllegalArgumentException - if n is not positive
public class ProbGen {
/**
* @param args
*/
public static void main(String[] args)
{
//1/9, 2/9, 1/3, 2/9, 1/9
float [] distro = {1.0f/9, 2.0f/9, 1.0f/3, 2.0f/9, 1.0f/9};
int [] values = {0,1,2,3,4};
ProbGen gen = new ProbGen(values, distro);
int [] valuesCount = new int[5];
for(int i = 0; i < 100000; i++)
{
valuesCount[gen.next()]++;
}
for(int i = 0; i < 5; i++)
{
System.out.println((float)valuesCount[i]/100000 + ", " + distro[i]);
}
}
float [] distribution;
int [] values;
private Random random;
public ProbGen(int [] values, float [] distribution)
{
this.distribution = distribution;
this.values = values;
this.random = new Random();
}
public int next()
{
float prob = random.nextFloat();
float probAt = distribution[0];
int valAt = 0;
while(probAt < prob)
{
probAt += distribution[valAt+1];
valAt++;
}
return values[valAt];
}
}
In your solution, 1 more thing. We shouldnt hard code the distribution list "distro".
But, even then. You always assign float value. ie (1/9) etc.
There is no way you will get full values like (1,2,3) etc. Without multiplying it some value.
Your result will be list of floats.
Hm? No, the output is one of the values in the 'values' array, a whole number. The fractions are only there to calculate the distribution.
Try the code out for yourself, it includes a test which shows that it works in producing the desired distribution.
And the main function is only a test case. The values are not hard-coded, they are passed to the class constructor as arrays.
When i meant "distro" should not be hard coded. i meant, it should be calculated rather than sent as an input.
Also, your code, does return fractions. And not the desired output.
This is your output.
0.11172, 0.11111111
0.22012, 0.22222222
0.33512, 0.33333334
0.22142, 0.22222222
0.11162, 0.11111111
It does *not* return fractions. Just look at the 'next' method. It returns an int. The values that are output are not the result, but the tabulation of the probabilities. The print statement shows that the method is behaving correctly.
If you mean that you want to input values like "1/9", "1/3" instead of fractions then that's simple enough, just use a regexp to find the '/' character, convert the substring from 0 to just before to an int, the substring from after to the end to an int and then divide the one by the other.
Given: w={w1, w2, .... wN}
output: n with probability of (wn/sum(w))
Create a non zero weight array, W, with another array containing their corresponding index+1, K.
for example: w={11,22,13,0,2}
create 2 arrays W={11,22,13,2} and K={1,2,3,5} --- O(N)
Generate a cumulative array of weights, Wc = {Wc1, Wc2, ... WcN} where Wci=Wc(i-1)+Wi for i=2...N and WC1= W1 --- O(N)
For every input call generate a random number, r.
let R=floor(r*WcN), so that R will be distributed randomly between 0 and WcN-1, remember WcN = sum(w)
I can think of two ways to solve this, one in O(log N) time and another in O(1) time but needs O(sum of w, i.e.WcN) memory.
O(log N) solution:
Do a binary search on Wc to find i such that Wc(i-1)<=R<Wci, assume Wc-1=-1
Then output K[i]
O(1) solution:
Precompute output for full range of R (0 ... WcN-1) and store in an array of length WcN. You can look up an array after computing R for each input in O(1)
p=0
for i from 0 to W.length-1
for j=1 to W[i]
O[p]=K[i]
p++
end
end
E.g.
input: w={3,1,2,0,2}
W={3,1,2,2}
K={1,2,3,5}
Wc={3,4,6,8}
R={0,1,2,3,4,5,6,7}
O={1,1,1,2,3,3,5,5} (precomputed output)
Here is my C/C++ based solution.
Idea here is that calculate the cumulative frequency array.
Now suppose a number 10 appears at index 3 so cumu[3] will contain number 10 + cumu[3-1].
Therefore a index will have numbers, out of total sum in its bucket, equal to the weight assigned to it.
Now take a random function which returns a number "output" uniformly between 0 and (sum-1) that is rand()%sum. check this output falls in which bucket and return the index of that bucket.
Hence this function will give you the index with probability equal to the weight of the number it holds in the original array.
#include<cstdlib>
using namespace std;
int getRandom(int *Array, int len)
{
int sum = 0;
int *cumu = new int[len];
for(int i =0; i < len; i++)
{
sum += Array[i];
cumu[i] = ((i == 0) ? Array[i] : (Array[i] + cumu[i-1]));
}
int output = rand()%sum;
if(output<= cumu[0])
return 0;
for(int i = 1; i < len; i++)
if(output > cumu[i-1] && output <= cumu[i])
return i;
}
Please let me know your comments :)
Hi Guys,
I thought of two approaches :
Approach 1 :
1. input = {1,2,3,2,1}; modify it so that each index = current value + sum of previous values
modified input = {1,3,6,8,9}
2. choose random value between 1 & 9 (9 is the total sum of weights) using existing random function
3. Output is the correct index for the value . eg. if the value is 5, the index is 3 (this can be found using binary search).
This approach takes O(log k) time (where k is the length of input array) and no additional memory.
Approach 2:
1. Make a copy of the array. eg. array a = {1,2,3,2,1}; array b = copy of a.
Note the total sum s (in this case, 9), and tempSum = s
2. If tempSum > 0
Randomly select a number x between (1, b.length)
else
return to step 1
3. If b[x] > 0
3.1 b[x]--
3.2 select x
else
return to step 2
This approach takes constant time and O(k) extra memory (where k is the length of the array).
Please correct me if I am wrong.
for the solution 1, why it is O(log k)?
for the summation part, it consumes O(k) already!?
public class MyRandom {
private List<Integer> values;
private Random random;
public MyRandom(int[] w) {
values = new ArrayList<Integer>();
for (int i = 0; i < w.length; i++) {
int amount = w[i];
for (int j = 0; j < amount; j++) {
values.add(i);
}
}
random = new Random();
}
public int nextInt() {
return values.get(random.nextInt(values.size()));
}
}
Can you please explain, what is it your trying to do with
distribution?
And this part?
while(probAt < prob)
{
probAt += distribution[valAt+1];
valAt++;
}
For example, for the input the probability of returning the values is 1/9, 2/9, 1/3, 2/9, 1/9 which works out to 0.11, 0.22, 0.33, 0.22, 0.11.
The loop simply checks whether we are over the generated floating point value, if not it adds the next probability step to the current value.
So for example, if we generate 0.3, then in the first step the result is 0.3 > 0.11, so we're not there yet. We add the next value, 0.22, and get 0.3 < 0.33 so we've found the correct value to return.
At the moment the algorithm only works properly if the weights add up to 1 (which they do in this case) but it could be adjusted to work with arbitrary weights by dividing each weight by the total sum of weights.
As to how it's using the distribution, it uses the distribution values as 'step' values to check which part of the distribution we fall into.
So for a distribution of 1/3, 2/3, randomly generated valuse under 0.333 would cause the first value to be returned, values over that the second. For distributions with more different cases, the same applies, there are just more steps to check.
As a further addendum, my solution includes something that wasn't required in the specification. The 'values' array contains the values to be returned, so that you could return values other than 0...n if you wanted to.
If that's not needed, you could remove the values array and replace its use with a simple counter which is incremented in the loop.
My solution:
Easily use an exisiting Random function, and set its low bound to 0 while high bound to n.
N equals -> sum(int[] w), so it is nature to do as following:
If w[1] = 2, than if the random number equals 0 or 1, it will falls into n = 0;
If w[2] = 3, than if the random number equals 2 or 3 or 4, it will falls into n = 1...
If w[n-1] = x, than if the random number equals n-1-x,n-x .... or n-1, it will falls into n = count(w)-1;
This will provide a way to generate a random number with the expected possibility.
So you have a space of n possible outputs and you have their weights in w[0 .. n - 1]. You can use rand() assuming it returns random numbers based on an uniform distribution.
One thing you could do is create an array of size sum(w) and fill it with numbers i: from 0 .. n -
1. Each i will have w[i] copies in this new array (regardless of where). Then just call rand() % sum(w) and return the number stored at this array's index.
#include <iostream>
#include <stdlib.h>
using namespace std;
int getNext(int p[], int N)
{
int c[N];
c[0] = p[0];
for (int i = 1; i < N; i++) {
c[i] = p[i] + c[i-1];
}
int next = rand() % c[N-1];
for (int i = 0; i < N; i++) {
if (next < c[i]) {
return i;
}
}
return (N-1);
}
#define N 5
#define LOOP 100000000
int main()
{
int p[N] = {1, 2, 3, 2, 2};
int c[N] = {0, 0, 0, 0, 0};
for (int i = 0; i < LOOP ; i++) {
c[getNext(p, N)]++;
}
for (int i = 0; i < N; i++) {
cout << c[i] << " ";
}
cout << "Loop : " << LOOP << endl;
}
This paper has an extremely elegant linear time init, constant time sampling algorithm to do just this:
(I cannot post a link, so just search for it)
A Linear Algorithm For Generating Random Numbers With a Given Distribution
Michael D. Vose
at web.eecs.utk.edu in the directory /~vose/Publications/random.pdf
Here's a simple solution. Please note that the numbers to be selected is assumed from 0 - w.length-1. This can be replaced with an array oflength w.length and corresponding element can be selected
import java.util.Random;
public class RandGen {
public static void main(String[] args) {
int w[] = { 1, 2, 3, 2, 1 };
int arr[] = gen(w);
for (int i = 0; i < arr.length; i++)
System.out.print(arr[i] + " ");
}
public static int[] gen(int w[]) {
int sum = 0;
for (int i = 0; i < w.length; i++)
sum += w[i];
Random gen = new Random();
int newArr[] = new int[sum];
int index;
//choose based on weight distribution
for (int i = 0; i < sum; i++) {
do {
index = gen.nextInt(w.length);
} while (w[index] < 1);
newArr[i] = index;//can be newArr[i] = elements[index];
w[index]--;
}
return newArr;
}
}
@hrishi can you please explain ur comment in detail. I have used the function nextInt with the correct signature.correct?
Hi please check the second section of the Question. You are returning an "array" of int[] , instead of single random number based on the weights.
The point is, the return should be a single integer where any number of calls can be make to get any number of random numbers, irrespective of total number of items in weight array.
A simple solution in python, which iteratively maps the weights to a 0 - 1.0 space, which random.random() is applied to
import random
def weighted_random(W):
r = random.random()
weight_sum = float(sum(W))
last_theta = 0
for i, weight in enumerate(W):
theta = last_theta + (weight / weight_sum)
if r <= theta:
return i
last_theta = theta
print weighted_random([1,2,1,2,1])
from random import random,choice
def myrand(den):
m = float(max(den))
im = filter(lambda x: x[1]>=random(), enumerate(map(lambda x: x/m, den)))
return choice(im)[0]
Test
den = list(randint(50)+1 for i in range(20))
print den
[4, 40, 14, 45, 2, 36, 49, 44, 35, 13, 11, 42, 9, 18, 1, 36, 26, 44, 10, 11]
myrand(den)
17
I *think* this answers the question. This code will return an int value between 0 and n with the probability of a[n]. I have seen this question elsewhere specified a bit differently. (I think.)
/*
Distribute probabilities with weights read from an array.
Usage:
./a.out ntests array_size array[1] ... array[array_size]
./a.out 100000 6 0 2 3 4 5 6
That will run 100000 tests for a list of 6 numbers with relative probabilities
2 3 4 5 6. (The initial 0 is required.)
What is stored in the array is the sequence of accumulated total probabilities
to that point in the array. The algorithm does a binary search for the input
until
a[mid-1] <= number < a[mid]
OR
a[mid] <= number < [mid+1]. So the
"array" could be considered a very primitive form of hash.
another example:
./a.out 100000 9 0 2 2 4 10 3 3 10 10
int array[9] = { 0, 2, 4, 8, 18, 21, 24, 34, 44 };
input: 0 2 2 4 10 3 3 10 10
lookup values: 0-1, 2-3 4-7 8-17 18-20 21-23 24-33 34-43 NA
probability : 2/44 2/44 4/44 10/44 3/44 10/44 10/44 10/44 NA
The total is 44 and each number after the inital zero is the weight
assigned to the *preceding* index.
First entry MUST be lowest number you expect to get from random
generator, 0 is not assumed, but if you are going to use '%'
to get your randome values, you'll get 0s.
Running the first example:
./a.out 100000 6 0 2 3 4 5 6
probabilities for usage example are are:
1/10 3/20 1/5 1/4 3/10
./a.out 1000000 6 0 2 3 4 5 6
array_size=6 { 0, 2, 5, 9, 14, 20, }
running 1000000 tests with max_number=20
value=0 count=100022
value=1 count=150392
value=2 count=199435
value=3 count=250420
value=4 count=299731
value=5 count=0
for even distribution:
./a.out 1000000 6 0 1 1 1 1 1
Notes:
Doesn't now deal with probability 0, but that can be handled by
checking for contiguous equivalent sums in the array.
The max_num check is unnecessary here, but left in because that
may not always be the case. Also, the recursion runaway hasn't happened,
but I haven't tested with all possibly weird inputs.
All '0's will give a float exception at the '%'.
*/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>
int max_recursion_level;
int find_index( int array[], int number, int start, int end );
int *create_array(int array_size, int argc, char **argv, int *ptr);
int find_index( int array[], int number, int start, int end ) {
int mid;
mid = (end - start)/2;
mid += start;
if (!--max_recursion_level) {
printf("Exceeded max recursion level!\n");
return(-1);
}
if (number < array[mid] ) {
int lbound = mid-1;
if (number >= array[lbound]) { return lbound; }
end = lbound;
} else if ( number == array[mid] ) {
return mid;
} else if ( number > array[mid] ) {
int ubound = mid+1;
if (number < array[ubound]) { return mid; }
if (number == array[ubound] ) { return ubound; }
start = ubound;
}
return find_index( array, number, start, end );
}
int *create_array(int array_size, int argc, char **argv, int *ptr) {
int i, sum;
int *curr;
if (argv != NULL && array_size != argc ) {
printf("Argument count %d is inconsistent with array size %d.\n",argc,array_size);
exit(1);
}
ptr = calloc(sizeof(int),array_size);
if (!ptr) {
printf("calloc() failed.\n");
exit(1);
}
if (argv == NULL) {
return ptr;
}
curr = ptr;
printf("array[%d] = { ",array_size);
sum = 0;
for(i=0; i<array_size; i++) {
sum += atoi(*argv++);
printf("%d%c",sum,i==array_size-1 ? ' ' : ',' );
*curr++ = sum;
}
printf("};\n");
return ptr;
}
main(int argc, char **argv) {
int ntests;
int value,number;
int *array, *result_array;
int max_number, min_number, array_size;
*argv++;
ntests = atoi(*argv++);
array_size = atoi(*argv++);
argc -= 3;
result_array = create_array(array_size, 0, NULL, result_array);
array = create_array(array_size, argc, argv, array );
min_number = array[0];
max_number = array[array_size-1];
printf("running %d tests with max_number=%d\n",ntests,max_number);
while (ntests--) {
max_recursion_level = 8;
number = rand() % max_number;
if (number >= max_number) {
printf("Number %d out_of_range 0-%d.\n", number,array[array_size-1]);
continue;
}
value = find_index(array, number, 0, array_size );
result_array[value]++;
}
for (value=0; value<array_size; value++) {
printf("value=%d count=%d\n",value,result_array[value]);
}
free(array);
free(result_array);
exit(0);
}
First compute partial sums for w:
partial_sums = [0] * len(w)
partial_sums[0] = w[0]
for i in xrange(1, len(w)):
partial_sums[i] = partial_sums[i-1] + w[i]
For nextInt, just generate random number, multiply by partial sum of that length, and go through it until you hit.
import random
from __future__ import division
...
# assuming n <= len(w)
def nextInt(n):
prob = random.random()*partial_sums[n-1]
partial_sum = partial_sums[n-1]
index = 0
running_sum = w[0]/partial_sum
while running_sum < prob:
index += 1
running_sum += w[index]/partial_sum
return w[index]
Actually there's an even more efficient way to do this such that nextInt is O(log(n)). First, generate cumulative sums. You could make this cumulative probabilities, but that could cause rounding errors. Thus, for [2,3,2,4,5] we have cumulative_sums= [2,5,7,11,16]. Then, consider nextInt(n). Set x = random.random()*cumulative_sums[n-1]. Then do binary search in cumulative sums for x and find the index (on the left side) that it is closest to. Then return w[index]
The idea is to create a prob. distribution out of the given prob. density and then choose the random numbers out of it.
Here is the example. For a given Prod. density, say Den = [1,2,4,5,1,3], the distribution is cumulative sum of the array. So, Dist = [1,3,7,12,13,16].
Now, generate a uniform random number between 0 and Dist[n] = 16. The number to be returned is the index of whichever interval the generate random number fell into. Lets say, we generated 10. 10 lies in [7,12], so the return value is index(12) = 3.
Here is a sample code:
PS: My apologies if this solution has already been proposed. I could not understand some of the code in the comment section - my bad.
- ekalavya April 12, 2013