Goldman Sachs Interview Question
AnalystsTeam: Strats division
Country: United States
Interview Type: Phone Interview
In case of without replacement, every draw reduces the size of the set by one. As a result, we draw from n, n-1, n-2, n-3,... i.e., the denominator of the probability becomes 1/n!.
Suppose you pick m
X be the number of attempts to pick a smaller number
X=1+Z(n) + Z(n-1)....+Z(n-m) where Z(n-m) be 1 if (n-m) is picked before picking a smaller number
E(X)=1 + (n-m)*E(Z) since all E(z) are equal
also E(Z)=1/(m-1+1)=1/m
therefore E(X)=n/m
also your ans for 2nd case is less than the 1st case which doesn't make sense as the number of no. greater than m are removed here. so picks will be reduced
the number you pick:m, is still a random variable. what you gave: n / (m - 1) = E(y | m), where y is the number of times you need to pick in order to find a number smaller than m. Ultimately, we need to find E(y) = E(E(y|m)) = E(n/(m-1)). Since m is ranging from 2 to n, with prob 1/n, we can get E(n/(m - 1)) = sum(m =2 to n) {n/(m-1) * 1/n} = 1 + 1/2 + 1/3 + ... + 1/(n-1), which is a partial sum of harmonic series
for the second case, it gets messier.... hard to find a close form .
can't believe this is a phone interview.
I have solved this problem partially but I am unable to continue at this point:
case 1:
There are 2 approaches: logical and recursive.
if you pick a number i, to find a number smaller than the one you have is:
Exp(num<i)=1*(i-1)/(n-1)+2*(n-i)(i-1)/(n-1)^2+3*(n-i)^2(i-1)/(n-1)^3.......
which reduces to Exp(num<i)=(n-1)/(n-i)
Also, recursively
E(i)=1*(i-1)/(n-1)+(E(i)+1)*(n-i)/(n-1)
when you solve it, u get
E(i)=(n-1)/(n-i)
Now, this is where i get stuck.
to find final expectation value
E=sum(E(i)) where i=1:n =infinite :/
and if u exclude the case of picking up 1, u get a harmonic sum. so, what do i do next?
case 2:
this doesn't lead to infinite but it poses a logical question: you know that if u pick the number 1, even if we look through the remaining n-1 chits, there is nothing u can find smaller than 1. so it's a failure. meaning u never find it, not... u find it after n-1, so we can't assume it as any number....
so what do i do for that??
Continuing, if i exclude 1 and try to solve logically, cause the recursion formula i couldn't get
if u pick
n -> 1
n-1 -> 1*(n-2)/(n-1)+2*1/n-1
n-2 -> 1*(n-3)/(n-1)+2*2/(n-1)(n-2)
and so on
2 -> 1*1/(n-1)+2*(n-2)*1/(n-1)(n-2)+3*(n-2)(n-3)*1/(n-1)(n-2)(n-3)+.......
how do i simply it?
It's 3.
The expectation is
1*(K-1)/(n-1) + 2*(n-K)/(n-1) *(K-1)/(n-1) + 3 * ((n-K)/(n-1))^2 * (K-1)/(n-1) +4*......
Because K is randomly chosen, so K's expectation is n/2;
then (n-K) and(K-1) is roughly n/2 as well
The above equation becomes function f:
1*(1/2)+2*(1/2)^2+3*(1/2)^3+...
we can solve it by assuming function f = x, then 2x = 1*1+2*(1/2)+3*(1/2)^2+...
2x-x = x = 1+1*(1/2)+1*(1/2)^2+1*(1/2)^3+.....= 1+2 = 3
x=3
Nice solution. Your answer is almost correct. It is not trivial to find the exact value of expectation esp. for the case with replacement.
There is a minor flaw (in my opinion) in your argument. Mathematically speaking, the problem involves choosing a random number "m" and then finding the number of picks which follows until we obtain a number less than "m", e.g., "k" picks. For each "m", we have a specific distribution for "k". Then, the answer would be to calculate E[k|m] for fixed "m" with respect to "k", and then find the expected value of the result with respect to "m".
You are instead finding E[k|E[m]] which is different from E[E[k|m]].
Another perspective is that the function you have above is nonlinear in "K" that is another reason why you shouldn't put K / 2 and solve the problem.
Short answer: 2 + O(1 / n).
Reason: Note that picking a larger number is as probable as picking a smaller number. For large n, picking the same number has a chance of 1 / n. So ignore it. Then, at each pick you might pick a larger number with probability 1 / 2 and a smaller number with probability 1 / 2. Therefore, the average number of picks in both cases is almost 2.
This approximation is shaky if n is not large, e.g., n = 2, 3, 4.
NOTE: Also, there is a technical problem with current question.
If you are looking for a number less than the initial pick you should note that if the first pick is "1", there is no way to find a smaller number. Then, the average number of picks will be infinity.
My first thought:
Let X be random variable counting the number of picks until we get success (that is, we get success on Xth pick).
Case 1) With replacement.
--------------------------------------
Probability of success on any trial is always (k-1)/n (since k-1 nums < k)
Let p= ( that junk )
So PMF of X is:
f(x) = (probability of x-1 failures)*(probability of success on xth trial)
= (1-p)^(x-1) * p
So,
E[X] = SUM_to_inf {x*f(x) } = 1/p (use derivative of geom. series to prove)
so E[X] = n/(k-1)
Case 2) Without replacement.
----------------------------------------
PMF of X is:
f(x) = (probability of x-1 failed picks)*(probability of a success pick after)
= (a)*(b) <-- defining a and b for convenience
a = (number of ways to choose "x-1" nums out of total "n-k+1" bad nums)
---------------------------------------------------------------------------------------------------
(number of ways to choose "x-1" nums out of the full n nums
a = (n-k+1)C(x-1) <--- binomial coefficient
-------------------
(n)C(x-1)
Now for b:
x-1 nums have been removed from n choices, so we have n-(x-1) = n-x+1 leftover nums to pick from
b = k / (n-x+1) <--- probability of picking a good num from the leftovers
so PMF looks like:
f(x) = (n-k+1)C(x-1) k
------------------- * ------------------
(n)C(x-1) (n-x+1)
but this is not infinite random variable as
x is limited to 1, 2, ..., (n-k+1)
So we are done,
E[X] = SUM_{x=1 to (n-k+1)} f(x) <-- plug junk in for f(x)
I am not obligated to simplify this further for Goldman Sachs because it is a finite sum.
I didn't check my work above (cross my fingers).
My first thought:
Let X be random variable counting the number of picks until we get success (that is, we get success on Xth pick).
Case 1) With replacement.
--------------------------------------
Probability of success on any trial is always (k-1)/n (since k-1 nums < k)
Let p= ( that junk )
So PMF of X is:
f(x) = (probability of x-1 failures)*(probability of success on xth trial)
= (1-p)^(x-1) * p
So,
E[X] = SUM_to_inf {x*f(x) } = 1/p (use derivative of geom. series to prove)
so E[X] = n/(k-1)
Case 2) Without replacement.
----------------------------------------
PMF of X is:
f(x) = (probability of x-1 failed picks)*(probability of a success pick after)
= (a)*(b) <-- defining a and b for convenience
a = (number of ways to choose "x-1" nums out of total "n-k+1" bad nums)
---------------------------------------------------------------------------------------------------
(number of ways to choose "x-1" nums out of the full n nums
a = (n-k+1)C(x-1) <--- binomial coefficient
-------------------
(n)C(x-1)
Now for b:
x-1 nums have been removed from n choices, so we have n-(x-1) = n-x+1 leftover nums to pick from
b = k / (n-x+1) <--- probability of picking a good num from the leftovers
so PMF looks like:
f(x) = (n-k+1)C(x-1) k
------------------- * ------------------
(n)C(x-1) (n-x+1)
but this is not infinite random variable as
x is limited to 1, 2, ..., (n-k+1)
So we are done,
E[X] = SUM_{x=1 to (n-k+1)} f(x) <-- plug junk in for f(x)
I am not obligated to simplify this further for Goldman Sachs because it is a finite sum.
I didn't check my work above (cross my fingers).
Uggh... hate the formatting issues.. repeat:
P(first x-1 choices were all bad choices)=
a =
(n-k+1)C(x-1) <--- binomial coefficient
-------------------
(n)C(x-1)
Now for b:
x-1 nums have been removed from n choices, so we have n-(x-1) = n-x+1 leftover nums to pick from
b = k / (n-x+1) <--- probability of picking a good num from the leftovers
so PMF looks like:
P(X=x) = f(x) =
(n-k+1)C(x-1) k
------------------- * --------------
(n)C(x-1) (n-x+1)
m: number of times
replace:
p = (k - 1) / n;
Pr(number of times = m) = (1 - p)^(m - 1) * p
E(m) = sum((1 - p)^(m - 1) * p * m), m = 1,2,...
no replace:
number of times = m
1 <= m <= n - k + 2
Pr(number of times = m) = (n - k + 1) / n * (n - k) / (n - 1) * ... (n - k - m + 3) / (n - m + 2) * (k - 1) / (n - m + 1)
E(m) = sum(Pr(number of times = m) * m), m = 1, 2, .. n - k + 2
With replacement:
We need to find E(X) where X is event where I pick a number m <=n and draw a number from Hat k then k < m.
Let's p = P(I pick a number m <=n and draw a number from Hat k then k < m ).
E(X) = p*(1 + 0{As we have already reached success in the first step})+ (1-p){probability of failuer} (1 + E(X) {As we need to repeat the step again} )
i.e. E(X) = 1/p .
Now, we need to find the value of p where p = P(I pick a number m <=n and draw a number from Hat k then k < m ).
let's say A is event of picking a random number m <= n
And, B is a event of drawing a number k from Hat which is lesser than m.
We can calculate p by adding all possibilities of success, i.e. picking 1, and drawing < 1, or picking 2 and drawing < 2 or picking 3 and drawing < 3 ......... or picking n and drawing < n
P( Picking a number m from 1 to n ) = 1/n
P( drawing a number from hat < m) = m-1/n
Therefore,
p = 1/n*( (1-1)/n) + 1/n*((2-1)/n) + 1/n*((3-1)/n) + ...... + 1/n*((n-1)/n)
p =1/n^2 * ( 0 + 1 + 2 + 3 ......... + n-1) = n(n-1)/2n^2 = (n-1)/2n
And hence E(X) = 1/p = 2n/(n-1)
this should be correct for the version with replacement, validated with simulation in python
import numpy as np
import time
def simulate(n, x):
arr = list(range(1,n+1))
count = 0
while True:
draw_idx = np.random.randint(low=0, high=len(arr)) # exclusive end
draw = arr[draw_idx]
arr = arr[:draw_idx] + arr[draw_idx+1:]
count += 1
if draw < x:
return count
def test(num_trials, n, x):
results = []
t0 = time.time()
for i in range(num_trials):
if i % 10000 == 0:
print("%d k, time elasped %.2f m" % ( i//1000, (time.time()-t0)/60 ))
results.append(simulate(n,x))
return np.mean(results)
def get_prob(n, x, num_toss):
cumprod = 1.0
for term_i in range(1, num_toss+1):
if term_i == 1:
term = (x-1) * 1.0 / (n-(num_toss-1))
else:
term = (n-(x-1+num_toss-term_i)) * 1.0 / (n-(num_toss-1)+(term_i-1) )
if term <= 0:
return 0
#print (term_i, term)
cumprod *= term
return cumprod
def cal(n, x):
expectation = 0
for num_toss in range(1,n):
p = get_prob(n,x, num_toss)
expectation += p * num_toss
return expectation
num_trials = 8*10**5
n = 11
x = 8
if True:
print (test(num_trials, n, x))
print (cal(n,x))
# both give 1.5
If getting a number smaller than the number that is picked first is considered a success, and otherwise a failure, then these events become Bernoulli trials.
- Murali Mohan August 23, 2013The question asks:
>> Find the expectation value of *number of times* you need to pick numbers...
The answer to the question: "How many trials before a success" is provided by geometric distribution, whose mean, E[X] = 1/p, where p is the probability of success.
Case 1: If you replace
Let m be the number picked, then probability of choosing a smaller number is m-1/n. The expected value is therefore 1/(m-1)/n = n/(m-1)
Case 2: if you don't replace
Let m be the number picked, then probability of choosing a smaller number is m-1/n-1. The expected value is therefore 1/(m-1)/(n-1) = (n-1)/(m-1)