Google Interview Question
Software Engineer / DevelopersCountry: United States
Interview Type: In-Person
What if N=2*i? There is a case that 2 slots with i elements in each and you are counting this case twice. What if N=k*i?
I'm not sure what's posted is correct so far.
Looks like what joe kid posted gives the probability of this outcome:
(1) Exactly i items will land in one particular bucket b. All other N-i items land in any of the other buckets. To me, this seems incomplete because it considers just one special bucket b.
As a correction, Charlie suggests multiplication by M to make this the probability of exactly i items landing in *any* bucket. This is using the assumption that the outcomes defined by (1) are mutually exclusive for each bucket b, because you would need that assumption to use the sum all M of them to get the probability of any of them occuring. However, they are not mutually exclusive. You can have i in one bucket and also have i land in another bucket by chance.
Here is the only correct answer on the page, you can find the details in my blog post: goo.gl/Ex9an1
private static double calculateKcollisions(int m, int n, int k) {
double probability = 0;
int sign = 1;
for (int i = 1; n - i * k >= 0 && i <= m; i++) {
double p = binomial(m, i) * pow(1.0 / m, i * k);
p *= pow((double) (m - i) / m, n - i * k);
for (int j = 0; j < i; j++) {
p *= binomial(n - j * k, k);
}
probability += sign * p;
sign *= -1;
}
return probability;
}
Base case, i=0.
Probability of an element hashing in a slot = 1/M
Probability of an element not hashing in a particular slot (ending in remaining M-1 slots) = (M-1)/M
All N elements not hashing in that slot = (M-1)/M * (M-1)/M * (M-1)/M ...
= ((M-1)/M)^N
Let us take the case of i=1.
First element can hash at any possible slot, probability = 1.
Now, probability of remaining N-1 elements not hashing in that particular slot = ((M-1)/M)^(N-1)
Let us take the case of i=2.
First element can hash at any possible slot, probability = 1.
Probability of second element going into same slot = 1/M
Now, probability of remaining N-2 elements not hashing in that particular slot = ((M-1)/M)^(N-2)
Probability of a slot having 2 elements = (1/M)*((M-1)/M)^(N-2)
To make it generic:
P(i) = ((M-1)/M)^N, when i=0
P(i) = (1/M)^(i-1) * ((M-1)/M)^N, otherwise
I'm afraid it's wrong. You are calculating the probability that a particular slot has i elements, not "the probability that there will be a slot with i elements".
This is wrong. The above formula is for question "what is the probability that slot x(i.e. a particular slot) has i elements".
The question is asking what is the probability that THERE WILL BE A SLOT with i elements.
Do you mean (1/M)^i (1-1/M)^(N-i) ?
For example, say i=2 and suppose you have keys labeled K0, K1, K2... This would be the probability that a certain pair of keys, say K1 and K2 for example land in a particular spot (while all other keys land somewhere else). joe kid multilplies the above expression by N choose i and gets (for this example) the probably any possible pair landing in a particular bucket.
This is statistics math problem.
I would think it this way.
(1)P(there is one slot with i elements) = 1- P(there is no slot with i elements)
(2)P(there is no slot with i elements) = P(the first slot does not have i elements)*P(the second slot does not have i elements)*...*P(the last slot does not have i elements)
Because the probability of any slot does not have i elements is the same, we get
(2*) P(there is no slot with i elements) = P(a slot does not have i elements)^N
(3) P(a slot does not have i elements) = 1- P(a slot has i elements)
(4) P(a slot has i elements) = [(1/N)^i] * [(1-1/N)^(N-i)]
Use (1), (2*), (3), (4) to solve the problem.
# of ways to drop N samples into M slots = M^N (denote this as denominator).
binding i samples together, dropping the bundle into one of M slots, and dropping the other samples into the remaining M-1 slots, the number of ways to do this = C(N, i) * M * (M-1) ^ (N-i) (denote this as numerator).
The probability is therefore numerator / denominator.
The probability that a particular slot has i elements = (1/M)^i * ((M-1)/M)^(N-i)
No. of ways that slot can be chosen = C(M,1) = M
Therefore probability that a slot has i elements = M * (1/M)^i * ((M-1)/M)^(N-i)
Note: There would be no C(N,i) because we do not get to choose which i elements go to that slot. It is already pre-decided by the hash function.
Probability of an event = No of occurences of an event / All possible event occurences
Total No of ways to put N objects in M slots = M^N
Total No of ways such that there is a slot with i objects = No of ways to put i objects in a slot * no of ways to put N-i objects in M-1 slots
No of ways to put i objects in 'a' slot = choose a slot * choose i objects * 1^i (since there is a singel way to put i objects into a single chosen slot)
No of ways to put i objects in 'a' slot = MC1 * NCi * 1
No of ways to put N-i objects in M-1 slots = (M-1) ^ (N-i)
Probability = MC1 * NCi * (M-1)^(N-i) / (M^N)
but I am worried about how to eliminate the duplicates.
Let any one of m slots contains exactly i element and remaining n-i elements distributed among rest (m-i) slots =
C(m,1)*C(n,i)*((1/m)^i)*((1-1/m)^n-1) ..1
But above formula counts multiple time if there can be more than 1 slot with i elements. Probability that more than 1 slot with i element =
Summation over k = 2 to [n/i], (k-1)*C(m, k)*C(n, k*i)* C(k*i, i)*C((k-1)*i, i)*...C(i,i) *((1/m)^k*i)*((1-1/m)^(n-k*i)) ..2
Substract ..2 from ..1, we get the answer:
C(m,1)*C(n,i)*((1/m)^i)*((1-1/m)^n-1) -
Summation over k = 2 to [n/i], (k-1)*C(m, k)*C(n, k*i)* C(k*i, i)*C((k-1)*i, i)*...C(i,i) *((1/m)^k*i)*((1-1/m)^(n-k*i))
Edit int
Let any one of m slots contains exactly i element and remaining n-i elements distributed among rest (m-i) slots =
C(m,1)*C(n,i)*((1/m)^i)*((1-1/m)^n-1) ..1
But above formula counts multiple time if there can be more than 1 slot with i elements. Probability that more than 1 slot with i element =
Summation over k = 1 to [n/2i], C(m, 2*k)*C(n, 2*k*i)* C(2*k*i, i)*C((2*k-1)*i, i)*...C(i,i) *((1/m)^2*k*i)*((1-1/m)^(n-2*k*i)) ..2
//Prob of odd no. of slots with i elements cancels out, so summation over 2k number of slots with i elements included
Substract ..2 from ..1, we get the answer:
C(m,1)*C(n,i)*((1/m)^i)*((1-1/m)^n-1) -
Summation over k = 1 to [n/2i], C(m, 2*k)*C(n, 2*k*i)* C(2*k*i, i)*C((2*k-1)*i, i)*...C(i,i) *((1/m)^2*k*i)*((1-1/m)^(n-2*k*i))
This follows the binomial distribution with parameters (N, 1/M) and so P(X = i) = N Ci (1/M)^i(M-1/M)N-i. This is because all M slots were equally probable. If the probabilities of falling into a slot varies, that would be a multinomial distribution(with different probabilities for individual trial for each slot)
My attempt, feels right but probably wrong.
What is the probability that there will be at least 1 of m slots with (exactly) k elements in a hash table with n inserted elements.
Assuming k < n.
Total ways to bucket n elements into m slots = m^n
Pr [ the first slot has exactly k elements ] =
= [number of ways n-k elements are in one of the other m - 1 buckets (which effectively enforces that k elements are in the first bucket)] / [number of ways n elements are in m slots]
= (m-1)^(n-k) / (m^n)
Pr [ exactly 0 slots have exactly k elements]
= product over i=1...m of Pr [ith slot does not have k elements]
= product over i=1...m {1 - Pr [ith slot has exactly k elements]}
= {1 - [(m-1)^(n-k) / m^n]}^m
Pr [ > 0 slots have exactly k elements] = 1 - Pr [ 0 slots have exactly k elements] = 1 - {1 - [(m-1)^(n-k) / m^n]}^m
This is a problem of poisson approximation to the binomial distribution.
Total number of keys = N
In an average if two keys hash to the same bucket, then N = 2M
P(i) ==> Probability that i keys collide
P(i) = e ^ (-lambda) * (lamda^i) / i!
where lambda = np
where n = total number of keys = 2M
where p = probability that a bucket is chosen = 1/M
therefore lambda = 2.
Hence p(i) = e^-2 * (2^i)/i!
The solutions using Bernoulli trial are not correct, because they count multiple occurrence of i multiple times. They are quite close though.
Let me introduce dynamic programming that may be helpful with this problem. Hopefully there are no bugs, but if they are, feel free to raise your hands.
Let P(M,N) be a function that counts probability that within N fields, there will be at least one with exactly i matches from the N numbers hashed inside.
P(M,n < i) =0
P(0,i) =0
P(M,N) = sum (for x <- 0..N:
if( x==i) :"probability that exactly x integers matches into Nth slot"
else: "probability that exactly x integers matches into Nth slot" * P(M-1,N-x)
)
The probability in quotes is (1/M)^x * ( (M-1)/M ^ (N-x) ) * (n choose x)
We first calculate the probability P that none of the slots have i elements, then the answer should be 1 - P.
Calculating P is a traditional interview question:
find all possible M non-negative (<=N) numbers such that their sum is N and none of them == i.
This can be done by a recursion algorithm.
std::vector<std::size_t> v;
v.reserve(M);
double g() {
return (1/M)^N;
}
void f(std::vector<std::size_t> &v, int i, double &p, int sum) {
assert(i<=M);
if (i == M) {
p += g();
return;
}
for (int j=0; j<N; ++j) {
if (sum + j < N) {
v[i] = j;
f(i+1, v, p, sum+j);
}
}
}
f(0, v, 0, 0);
Isn't the question "what the probability that there's at least one slot with i elements" ?
Let
N = 2
M = 4
Then….
0 0 0 2
0 0 2 0
0 2 0 0
2 0 0 0
0 0 1 1
0 1 0 1
1 0 0 1
0 1 1 0
1 0 1 0
1 1 0 0
=> Total 10 cases
(now......... "i" can be anywhere from 0 to N)
Assume i = 0?
100%: all cases have at least one slot with 0 element
Assume i = 1?
60%: 6 out of 10 cases have at least one slot with 1 element
Assume i = 2?
40%
A typo in my above anwser:
(2*) P(there is no slot with i elements) = P(a slot does not have i elements)^M
(4) P(a slot has i elements) = [(1/M)^i] * [(1-1/M)^(N-i)]
If M is 1, then (1-1/M)^(N-i) becomes 0, so the probability becomes 0, when it should be 1.
Alex, lets say we have only one slot: M = 1. The values are always hashed in the same slot. So, we have only 2 possible results: probability = 1 if i = N and 0 otherwise.
N = i, leads to (1-1/M)^(N-i) = 0^0 = 1 by convention
N != i leads to 0
the formula gives the expected values
I believe your answer is correct, except for one small thing in Step 4. The expression should be multiplied by c(N, i).
@Zoro, can you explain how did you reach the formula in step 4? I think, it should be (1/M)^(i-1) in the first part. See explanation below.
Let's re-sentence (2*) (3) (4) to avoid confusion here:
(2*) P(there is no slot with i elements) = P(the first slot does not have i elements)^M
(3) P(the first slot does not have i elements) = 1- P(the first slot has i elements)
(4) P(the first slot has i elements) = [(1/N)^i] * [(1-1/N)^(N-i)]
The reasoning starts from (1) and goes to (4), to solve the problem, solve the equations in reverse order to solve (1).
(1)P(there is one slot with i elements) = 1- P(there is no slot with i elements)
(2)P(there is no slot with i elements) = P(the first slot does not have i elements)*P(the second slot does not have i elements)*...*P(the last slot does not have i elements)
(2)->(2*) because the slots have equal chance of having any element fall into it
(2*) P(there is no slot with i elements) = P(the first slot does not have i elements)^M
(3) P(the first slot does not have i elements) = 1- P(the first slot has i elements)
(4) P(the first slot has i elements) = [(1/M)^i] * [(1-1/M)^(N-i)]
I just realize that Anish made a great point. To have i elements in the first slot, you are freely choosing any i from the N, therefore there's cases, each with the same probability.
(4) P(the first slot has i elements) = [(1/M)^i] * [(1-1/M)^(N-i)] * C(N,i)
step 1. (1)P(there is one slot with i elements) = 1- P(there is no slot with i elements)
should it be like (1)P(there is one slot with i elements) = 1- P(there is no slot with i elements) - P(there is two slots with i elements) ...etc?
Probability of 1 slot having i elements P(1)=
(1/M)^i * ((M-1)/M)^N-i
Probability of 2 slots having i elements P(2)=
(1/M)^2i * ((M-1)/M)^N-2i
Probability of 3 slots having i elements P(3)=
(1/M)^3i * ((M-1)/M)^N-3i
...
and so on, so the probability of having A slot with i elements is: P(1)+P(2)+....+P(M)
but I am not sure of a formula to generalize this equation.
Why not to consider this as Bernoulli trial, where:
- joe kidd August 19, 20131/M - probability of single success (that a value goes to a single slot)
i - expected numbers of success (the success = the value goes exactly to the same slot)
N - numbers of tries
1 - 1/M - a probability of failure:
So then having a Bernoulli trial formula where (N i) stands for binomial coefficient:
P(i)=(N i) * (1/M)^i * (1 - 1/M) ^(N - i)