## Google Interview Question

Software Engineer / Developers**Country:**United States

**Interview Type:**In-Person

What if N=2*i? There is a case that 2 slots with i elements in each and you are counting this case twice. What if N=k*i?

I'm not sure what's posted is correct so far.

Looks like what joe kid posted gives the probability of this outcome:

(1) Exactly i items will land in one particular bucket b. All other N-i items land in any of the other buckets. To me, this seems incomplete because it considers just one special bucket b.

As a correction, Charlie suggests multiplication by M to make this the probability of exactly i items landing in *any* bucket. This is using the assumption that the outcomes defined by (1) are mutually exclusive for each bucket b, because you would need that assumption to use the sum all M of them to get the probability of any of them occuring. However, they are not mutually exclusive. You can have i in one bucket and also have i land in another bucket by chance.

Base case, i=0.

Probability of an element hashing in a slot = 1/M

Probability of an element not hashing in a particular slot (ending in remaining M-1 slots) = (M-1)/M

All N elements not hashing in that slot = (M-1)/M * (M-1)/M * (M-1)/M ...

= ((M-1)/M)^N

Let us take the case of i=1.

First element can hash at any possible slot, probability = 1.

Now, probability of remaining N-1 elements not hashing in that particular slot = ((M-1)/M)^(N-1)

Let us take the case of i=2.

First element can hash at any possible slot, probability = 1.

Probability of second element going into same slot = 1/M

Now, probability of remaining N-2 elements not hashing in that particular slot = ((M-1)/M)^(N-2)

Probability of a slot having 2 elements = (1/M)*((M-1)/M)^(N-2)

To make it generic:

P(i) = ((M-1)/M)^N, when i=0

P(i) = (1/M)^(i-1) * ((M-1)/M)^N, otherwise

I'm afraid it's wrong. You are calculating the probability that a particular slot has i elements, not "the probability that there will be a slot with i elements".

Here is the only correct answer on the page, you can find the details in my blog post: goo.gl/Ex9an1

```
private static double calculateKcollisions(int m, int n, int k) {
double probability = 0;
int sign = 1;
for (int i = 1; n - i * k >= 0 && i <= m; i++) {
double p = binomial(m, i) * pow(1.0 / m, i * k);
p *= pow((double) (m - i) / m, n - i * k);
for (int j = 0; j < i; j++) {
p *= binomial(n - j * k, k);
}
probability += sign * p;
sign *= -1;
}
return probability;
}
```

This is wrong. The above formula is for question "what is the probability that slot x(i.e. a particular slot) has i elements".

The question is asking what is the probability that THERE WILL BE A SLOT with i elements.

Do you mean (1/M)^i (1-1/M)^(N-i) ?

For example, say i=2 and suppose you have keys labeled K0, K1, K2... This would be the probability that a certain pair of keys, say K1 and K2 for example land in a particular spot (while all other keys land somewhere else). joe kid multilplies the above expression by N choose i and gets (for this example) the probably any possible pair landing in a particular bucket.

This is statistics math problem.

I would think it this way.

(1)P(there is one slot with i elements) = 1- P(there is no slot with i elements)

(2)P(there is no slot with i elements) = P(the first slot does not have i elements)*P(the second slot does not have i elements)*...*P(the last slot does not have i elements)

Because the probability of any slot does not have i elements is the same, we get

(2*) P(there is no slot with i elements) = P(a slot does not have i elements)^N

(3) P(a slot does not have i elements) = 1- P(a slot has i elements)

(4) P(a slot has i elements) = [(1/N)^i] * [(1-1/N)^(N-i)]

Use (1), (2*), (3), (4) to solve the problem.

# of ways to drop N samples into M slots = M^N (denote this as denominator).

binding i samples together, dropping the bundle into one of M slots, and dropping the other samples into the remaining M-1 slots, the number of ways to do this = C(N, i) * M * (M-1) ^ (N-i) (denote this as numerator).

The probability is therefore numerator / denominator.

The probability that a particular slot has i elements = (1/M)^i * ((M-1)/M)^(N-i)

No. of ways that slot can be chosen = C(M,1) = M

Therefore probability that a slot has i elements = M * (1/M)^i * ((M-1)/M)^(N-i)

Note: There would be no C(N,i) because we do not get to choose which i elements go to that slot. It is already pre-decided by the hash function.

Probability of an event = No of occurences of an event / All possible event occurences

Total No of ways to put N objects in M slots = M^N

Total No of ways such that there is a slot with i objects = No of ways to put i objects in a slot * no of ways to put N-i objects in M-1 slots

No of ways to put i objects in 'a' slot = choose a slot * choose i objects * 1^i (since there is a singel way to put i objects into a single chosen slot)

No of ways to put i objects in 'a' slot = MC1 * NCi * 1

No of ways to put N-i objects in M-1 slots = (M-1) ^ (N-i)

Probability = MC1 * NCi * (M-1)^(N-i) / (M^N)

but I am worried about how to eliminate the duplicates.

Let any one of m slots contains exactly i element and remaining n-i elements distributed among rest (m-i) slots =

C(m,1)*C(n,i)*((1/m)^i)*((1-1/m)^n-1) ..1

But above formula counts multiple time if there can be more than 1 slot with i elements. Probability that more than 1 slot with i element =

Summation over k = 2 to [n/i], (k-1)*C(m, k)*C(n, k*i)* C(k*i, i)*C((k-1)*i, i)*...C(i,i) *((1/m)^k*i)*((1-1/m)^(n-k*i)) ..2

Substract ..2 from ..1, we get the answer:

C(m,1)*C(n,i)*((1/m)^i)*((1-1/m)^n-1) -

Summation over k = 2 to [n/i], (k-1)*C(m, k)*C(n, k*i)* C(k*i, i)*C((k-1)*i, i)*...C(i,i) *((1/m)^k*i)*((1-1/m)^(n-k*i))

Edit int

Let any one of m slots contains exactly i element and remaining n-i elements distributed among rest (m-i) slots =

C(m,1)*C(n,i)*((1/m)^i)*((1-1/m)^n-1) ..1

But above formula counts multiple time if there can be more than 1 slot with i elements. Probability that more than 1 slot with i element =

Summation over k = 1 to [n/2i], C(m, 2*k)*C(n, 2*k*i)* C(2*k*i, i)*C((2*k-1)*i, i)*...C(i,i) *((1/m)^2*k*i)*((1-1/m)^(n-2*k*i)) ..2

//Prob of odd no. of slots with i elements cancels out, so summation over 2k number of slots with i elements included

Substract ..2 from ..1, we get the answer:

C(m,1)*C(n,i)*((1/m)^i)*((1-1/m)^n-1) -

Summation over k = 1 to [n/2i], C(m, 2*k)*C(n, 2*k*i)* C(2*k*i, i)*C((2*k-1)*i, i)*...C(i,i) *((1/m)^2*k*i)*((1-1/m)^(n-2*k*i))

This follows the binomial distribution with parameters (N, 1/M) and so P(X = i) = N Ci (1/M)^i(M-1/M)N-i. This is because all M slots were equally probable. If the probabilities of falling into a slot varies, that would be a multinomial distribution(with different probabilities for individual trial for each slot)

My attempt, feels right but probably wrong.

What is the probability that there will be at least 1 of m slots with (exactly) k elements in a hash table with n inserted elements.

Assuming k < n.

Total ways to bucket n elements into m slots = m^n

Pr [ the first slot has exactly k elements ] =

= [number of ways n-k elements are in one of the other m - 1 buckets (which effectively enforces that k elements are in the first bucket)] / [number of ways n elements are in m slots]

= (m-1)^(n-k) / (m^n)

Pr [ exactly 0 slots have exactly k elements]

= product over i=1...m of Pr [ith slot does not have k elements]

= product over i=1...m {1 - Pr [ith slot has exactly k elements]}

= {1 - [(m-1)^(n-k) / m^n]}^m

Pr [ > 0 slots have exactly k elements] = 1 - Pr [ 0 slots have exactly k elements] = 1 - {1 - [(m-1)^(n-k) / m^n]}^m

This is a problem of poisson approximation to the binomial distribution.

Total number of keys = N

In an average if two keys hash to the same bucket, then N = 2M

P(i) ==> Probability that i keys collide

P(i) = e ^ (-lambda) * (lamda^i) / i!

where lambda = np

where n = total number of keys = 2M

where p = probability that a bucket is chosen = 1/M

therefore lambda = 2.

Hence p(i) = e^-2 * (2^i)/i!

A typo in my above anwser:

(2*) P(there is no slot with i elements) = P(a slot does not have i elements)^M

(4) P(a slot has i elements) = [(1/M)^i] * [(1-1/M)^(N-i)]

If M is 1, then (1-1/M)^(N-i) becomes 0, so the probability becomes 0, when it should be 1.

Alex, lets say we have only one slot: M = 1. The values are always hashed in the same slot. So, we have only 2 possible results: probability = 1 if i = N and 0 otherwise.

N = i, leads to (1-1/M)^(N-i) = 0^0 = 1 by convention

N != i leads to 0

the formula gives the expected values

I believe your answer is correct, except for one small thing in Step 4. The expression should be multiplied by c(N, i).

@Zoro, can you explain how did you reach the formula in step 4? I think, it should be (1/M)^(i-1) in the first part. See explanation below.

Let's re-sentence (2*) (3) (4) to avoid confusion here:

(2*) P(there is no slot with i elements) = P(the first slot does not have i elements)^M

(3) P(the first slot does not have i elements) = 1- P(the first slot has i elements)

(4) P(the first slot has i elements) = [(1/N)^i] * [(1-1/N)^(N-i)]

The reasoning starts from (1) and goes to (4), to solve the problem, solve the equations in reverse order to solve (1).

(1)P(there is one slot with i elements) = 1- P(there is no slot with i elements)

(2)P(there is no slot with i elements) = P(the first slot does not have i elements)*P(the second slot does not have i elements)*...*P(the last slot does not have i elements)

(2)->(2*) because the slots have equal chance of having any element fall into it

(2*) P(there is no slot with i elements) = P(the first slot does not have i elements)^M

(3) P(the first slot does not have i elements) = 1- P(the first slot has i elements)

(4) P(the first slot has i elements) = [(1/M)^i] * [(1-1/M)^(N-i)]

I just realize that Anish made a great point. To have i elements in the first slot, you are freely choosing any i from the N, therefore there's cases, each with the same probability.

(4) P(the first slot has i elements) = [(1/M)^i] * [(1-1/M)^(N-i)] * C(N,i)

step 1. (1)P(there is one slot with i elements) = 1- P(there is no slot with i elements)

should it be like (1)P(there is one slot with i elements) = 1- P(there is no slot with i elements) - P(there is two slots with i elements) ...etc?

Probability of 1 slot having i elements P(1)=

`(1/M)^i * ((M-1)/M)^N-i`

Probability of 2 slots having i elements P(2)=

`(1/M)^2i * ((M-1)/M)^N-2i`

Probability of 3 slots having i elements P(3)=

`(1/M)^3i * ((M-1)/M)^N-3i`

...

and so on, so the probability of having A slot with i elements is: P(1)+P(2)+....+P(M)

but I am not sure of a formula to generalize this equation.

Why not to consider this as Bernoulli trial, where:

- joe kidd August 19, 20131/M - probability of single success (that a value goes to a single slot)

i - expected numbers of success (the success = the value goes exactly to the same slot)

N - numbers of tries

1 - 1/M - a probability of failure:

So then having a Bernoulli trial formula where (N i) stands for binomial coefficient:

P(i)=(N i) * (1/M)^i * (1 - 1/M) ^(N - i)