Google Interview Question
Software Engineer / DevelopersCountry: Israel
Interview Type: In-Person
Shouldn`t we move random p * sum and condition outside the while loop and before the return?
@autoboli, excellent approach to think about simpler problem first, but you don't even have solution for a simpler problem. We don't know number of elements ahead of time, how can we select k?
Regarding replacing - are you sure you are not giving preference based on order of appearance?
@CT: For simple problem the 'k' is number of samples seen so far. One can compute the probability of each sample being replaced after 'k-th' sample appears. It can be proven that for each sample, the probability is equal. Now, the problem is essentially the same except that probability being sampled is driven by weight.
@autoboli, let's just talk about simple version for the time being. It seems to me you are talking about extending selection from k-1 elements to selection from k elements. Each element has 1/k probability to be selected, including the last one, I got this part about the last one, it will be selected with probability 1/k. The other ones will be selected with probability (k-1/k)/(k-1) =1/k, Wow, it works ...
Incredible solution, autoboli. Probabilities are indeed equal. I'd never have thought it myself.
Any link to a theory of why this works? Meaning why samples have equal probability of being chosen (according to the weights) in this approach. I think I grasp how it works, but not quite why. Any theorem, rule, or Wikipedia article?
@Valoris: Samples with higher weights have better chances of being chosen, now consider a sample with some weight, if it's near the beginning, chances of being chosen is high, but chances of being replaced is high as well. if it's near the end, chances of being chosen is low, but chances of being replaced is low as well.
Bottom line, no matter where a sample is, it's chances of being chosen is proportional to its weight
I created a document which mathematically proves the answer given by autoboli is correct, If interested, please email me (koosha.nejad@gmail.com) and I'll send it to you.
@valoris, I put some proof sketch below the code above. I hope it will help to understand the concept :)
Currently the getRandom() function is O(n). We can reduce it to O(log n) if we sort the tuples in order of non-decreasing weights and then do binary search.
import java.util.Scanner;
import java.util.List;
import java.util.LinkedList;
class WeightedProbability {
public static void main (String [] args) {
Scanner s = new Scanner(System.in);
List<Tuple> tuples = new LinkedList<Tuple>();
int totalWeight = 0;
Tuple tuple;
do {
tuple = readTuple(s);
totalWeight += tuple.weight;
tuples.add(tuple);
} while (!tuple.isLastTuple());
// Remove the last tuple from the set
for (int i = 0; i < 100; i++) {
getRandom(tuples, totalWeight);
}
}
public static Tuple readTuple(Scanner s) {
Tuple t = new Tuple();
t.num = s.nextInt();
t.weight = s.nextInt();
return t;
}
public static int getRandom(List<Tuple> tuples, int totalWeight) {
int rand = (int)(Math.random() * totalWeight);
System.out.println("rand = " + rand);
int i = 0;
Tuple t = null;
while (rand >= 0) {
t = tuples.get(i);
rand -= t.weight;
i++;
}
System.out.println("Number = " + t.num);
return t.num;
}
}
class Tuple {
int num;
int weight;
boolean isLastTuple() {
if (num == 0 && weight == 0) {
return true;
}
return false;
}
}
Keep track of the total_weight;
At each iteration (id,weight)
• total_weight += weight
• x = generate a random number between 1 and total_weight
• if (total_weight - x < weight ) => selected_id = id
Trace:
1. (1,2) => with 2/2 chance selected_id = 1
2. (2,2) => with 2/4 chance selected_id = 2
3. (3,4) => with 4/8 chance selected_id = 3 => with 4/8 chance selected_id = 2
4. (4,10) => with 10/18 chance selected_id = 4 => with 8/18 chance selected_id = 2
Chances of "selected_id = 2" at the end: (2/4)*(4/8)*(8/18)=1/9
Assume the input from [0,W0], [1,W1],...[n,Wn], the selected index is i. Let totalWeight=sum(W0,...,Wn).
Now consider a new input [n+1,Wn+1], the probability to keep index I is totalWeight / (totalWeight + Wn+1), and probbability to have a new index n+1 is Wn+1 / (totalWeight + Wn+1).
See my analysis and code: allenlipeng47.com/PersonalPage/index/view/127/nkey
package feb;
public class ReturnWithProbabilityProportionalToItsWeight {
public static void main(String[] args) {
int[][] input = { { 1, 4 }, { 2, 2 }, { 3, 2 }, { 0, 0 } };
int result = getRandom(input);
System.out.println(result);
/*
* Following runs for 1000 times to check the answer correctness.
*
* int[] test = new int[3]; for (int i = 0; i < 1000; i++) {
* test[getRandom(input)]++; } for (int i = 0; i < test.length; i++) {
* System.out.print(test[i] + "\t"); }
*/
}
public static int getRandom(int[][] input) {
double totalWeight = input[0][1];
int result = 0;
for (int i = 1; i < input.length; i++) {
if (input[i][0] == 0 && input[i][1] == 0) {
break;
}
double newTotalWeight = totalWeight + input[i][1];
double newChance = ((double) input[i][1]) / newTotalWeight;
if (Math.random() <= newChance) {
result = i;
}
totalWeight = newTotalWeight;
}
return result;
}
}
Considering that we are allowed to modify the given list of pairs, then we can store the cumulative sum in the list itself.
Let list[i] store the cumulative sum till index i.
The weight of the given index can be retrieved as list[i] - list[i-1].
Once we have stored the cumulative sum (say, totalSum), we may call the random function as rand()*totalSum. As the cumulative sum array is an nondecreasing list we can do a binary search on the list to obtain the key corresponding to "rand()*totalSum".
Time complexity : O(n) to traverse the list + O(log n) to do the binary search.
Correct me if i am wrong.
class Program
{
static void Main(string[] args)
{
int size = Convert.ToInt32(Console.ReadLine());
int[] a = new int[size];
for (int i = 0; i < a.Length; i++)
{
a[i] = Convert.ToInt32(Console.ReadLine());
}
int unique = 0;
int x = a[0];
for (int i = 1; i < a.Length; i++)
x ^= a[i];
for (int i = 0; i < a.Length; i++)
{
if (Convert.ToBoolean(x) & Convert.ToBoolean(a[i]))
unique ^= a[i];
}
Console.WriteLine(unique);
}
}
class Program
{
static void Main(string[] args)
{
int size = Convert.ToInt32(Console.ReadLine());
int[] a = new int[size];
for (int i = 0; i < a.Length; i++)
{
a[i] = Convert.ToInt32(Console.ReadLine());
}
int unique = 0;
int x = a[0];
for (int i = 1; i < a.Length; i++)
x ^= a[i];
for (int i = 0; i < a.Length; i++)
{
if (Convert.ToBoolean(x) & Convert.ToBoolean(a[i]))
unique ^= a[i];
}
Console.WriteLine(unique);
}
}
If there was no weight the problem would be to sample from a stream with equal probability: Sample the first input from the stream. Given the k-th input from the stream, generate random number p=1..k, if p=1 replace the sample.
Now, the problem is that selection is no longer uniform but driven by 'weight'. One can use very similar approach: (i) cumulate the weights seen so far and (ii) decide wheather to replace based on current weight and cumulated value.
A sample code is shown below:
EDIT:
- autoboli January 27, 2015As some people requested, here is a proof of the simpler problem:
The goal is to find a random sample from the stream. After receiving the k-th input, the sample will be replaced with probability 1/k with this new input. Let's compute a probability that the first input is still the selected ranodm sample after k iterations. Several consecutive events must happen: (i) The first input is selected as a sample and (ii) the sample is never replaced afterwards.
The first input is selected with probability p1 = 1/k = 1/1, the j-th input does not replace the sample with probability not_pj = (j-1)/j. Hence, the probability of the first input is a sample after k-th iteration can be written as:
p1_isSample = p1 * not_p2 * not_p3 ... * not_pk= 1/1 * 1/2 * 2/3 ... (k-1)/k = 1/k
Notice that this result holds for the second, third, and any other n-th element
pn_isSample = 1/n * n/(n+1) * (n+1)/(n+2) ... (k-1)/k = 1/k
hence, each element can be selected as the sample with equal probability of 1/k.
Q.E.D.
This proof can be now generalized to the non-uniform distribution case with weights.