Google Interview Question
Research ScientistsCountry: United States
Interview Type: Phone Interview
Part 1: The distribution of the number of (say) heads is Binomial. The p-value is the probability of obtaining a number of heads that is as extreme or more extreme. You can test (for example) if the probability of the coin yielding heads is different from 1/2. Then the p-value is the probability of getting 1,2,9, or 10 heads (you can also add 3 and 8 if you opt for a non-strict inequality). We reject the hypothesis that the coin is fair if the p-value is too small (i.e. if we have witnessed an event that is highly unlikely given our assumption).
Part 2: Assuming all ten coins are of the same mint, we find P(average of head-counts for the 10 coins is extreme). To do that we need to know the distribution of this average. It can be calculated explicitly with a bit of probability. In fact, the sum of (independent) Binomials with the same probability of success is another Binomial.
And again, if the p-value is too small, we reject the null hypothesis of P(heads)=1/2.
In the first case: p(h)=0.8, p(t)=0.2 For a fair coin, p(h)=p(t)=0.5
So, in the first case, toss the coin ten times again u get p(h_run2), do couple more times
then we are looking at
(p(h_run1) + p(h_run2) + p(h_run3)+...p(h_runN) ) /N should be close to 0.5
Also, as we increase the runs, the problem should get closer to 0.5 if the coin is really fair.
May be u could do the same for tail and observe the probability getting close to 0.5 as well
Same thing can be extended to the second case...u do p(h_coin1_run1) + p(h_coin1_run2)+.....p(h_coint1_run10) /10 +
same thing for coin 2 + .... till coint 10
entire thing divided by 10 should be close to 5
I thought the first part of the question was very straightforward. Calculating the p-value of 56/1024 was easy. But Google was very interested in how you analyzed it. I was asked what else I could do to test the fairness of thecoin. To answer it, I mentioned chi-squared distribution and its test statistic. Then Google guy asked me that "is this an approximation?" He went on and asked me "if I tossed the coin more than 10 times, say 100 times, would it help?"
For the second part of the question, I had to pause for about five seconds before I opened my mouth. I though I needed to modify it because if you stick to 5% significant level for each coin, then the probability for failing to detect a biased coin would decrease for 10 coin case. I mean 0.95^10 = 0.6. So you could fail to find a biased coin 4 our of 10. In order to be 95% sure, you need to raise the significance level to 0.05/10 = 0.5% for each coin. I think this is similar to Bonferroni confidence band.
I don't know the answer. Gooogle didn't give me any answer during the interview.
May I ask 2 more questions:
1. in first part, why you use chi-square? Because you want to test the variance?
2. In the second part, what's the reason you mention 5%/10=0.5%? Is it because (1-0.005)^10=0.995^10=95.11%?
Thanks.
You can analyze this several ways. First way is summing up the binomial probabilities like others did. You can also have a chi square test where you are comparing sampled ratios to the expected ratio of 0.5 with 1 degree of freedom. Although the results would not be good due to small number of observations. But for the second part of the question, it could be used.You can also do a students t-test to see if the ratio 0.2 is within the confidence interval around 0.5 (expected) with that many samples.
I think that the interviewer wanted you to mention the t-test. Its a bit harder to compute the sample standard deviation each time, but it's easy to see how increasing n will affect the p value: t=(x-u)/(s/sqrt(n))
Check out my blog at meditationsofmarcus.blogspot.com
For the second part, I am assuming they meant that you need to add multiple hypothesis correction (assuming each coin is an independent hypothesis). You can use Bonforroni, which will tell you that you should have a new threshold that is divided by the number of coins (so now you need to reach <0.005) or Benjamini Hochberg
As others have noted, tossing a coin N times results in a random variable $X ~ Binomial(n, p)$ with pdf $P(X=k) = {n \choose k} p^k (1-p)^{n-k}$ and standard deviation $\sigma = \sqrt{np(1-p)}$
However, first we should assess our statistical power. This depends on the statistical test used and the alpha used as the threshold for significance. I'd normally use a two-sided t-test (which is good for small data) and a threshold of 0.05. The null hypothesis to test is H_0: p = 1/2; if we can reject this, then the coin is unfair.
However, the statistical power calculation yields that the minimum detectable effect size is $c_\alpha \sigma/\sqrt{n} = 1.96/2\sqrt(10) \approx 0.98/3.16 \approx 1/3$ (ok, a little less than 1/3). In other words, we could only detect whether p is > 5/6 or < 1/6 (only very extremely unfair coins). I think we don't really have enough statistical power to answer the question unless we got 9 or 10 (or 0 or 1) heads out of it.
Anyway, the p-value is the probability of getting data this or more extreme if the null hypothesis were true. That works out to p-value = $P(X \ge 8) = 1/2^{10} ({10 \choose 8} + {10 \choose 9} + {10 \choose 10}) = 56/1024 \approx .05x$. This is outside the usual confidence threshold of 0.05, but we might relax that given our low statistical power and maybe accept it. I think it's an underpowered test and we should insist on more trials.
If we toss 10 independent coins 10 times, we don't learn any additional information about the other coins' fairness, but we are now multiple testing which brings family-wise error rate (FWER) into play. If we relaxed the confidence threshold to 0.1, then we will expect one false positive in ten trials. We should apply a multiple-testing correction to account for this, such as Bonferroni (which replaces alpha with alpha/m where m is the number of trials; here that would drop the threshold to 0.01) or Holm-Bonferroni (which sorts the p-values ascending and chooses the first k such that p_k > alpha/(1 + m - k) and treats the results up to k as significant and those beyond k as insignificant). Given our already too-low statistical power, I think we have no chance of detecting an unfair coin from among the 10 coins tested. Any apparent detections are too likely to be false positives.
I think you might be expected to jump to bayesian analysis for their follow up questions.
Assume you want to get the probability of getting head by given observation:
p(theta | Y=y1, y2, ... yn)
You could have a prior assumption: p(theta) ~ beta(a, b) [let a = b = 1: uniform distribution)
Then your observation y1, y2, .... yn ~ i.i.d. binomial distribution.
That is p(Y | theta) ~ binomial (n, theta)
Based on this, the conjugate posterior distribution of theta p(theta | Y) = beta(a+sum(y), b+n-sum(y))
Then the expected theta: E(theta | Y ) = (a + sum(y))/(a + b + n)
The above term can be explained as weight of prior * prior_mean + weight of sampling * sampling_mean: when sample size is small, the posterior_mean is more rely on prior_mean, while when the sample size goes up, the posterior_mean will goes to sampling_mean finally. Also, you can get the 95% confidence interval for posterior_theta and then find whether 0.5 is inside or outside the area.
I will set up a Hypothesis testing. Let theta be the probability that the coin is head, k be the number of times we observed that the coin is head.
- shuaixin.david November 02, 2013H0 (null hypothesis): the coin is fair, i.e. theta=0.5
H1 (alternative): the coin is not fair, i.e. theta!=0.5
And the p value is the probability that the observed results are obtained under H0.
Because P(k) follows binomial distribution so P(k) = (C10,k)*theta^(k)*(1-theta)^(10-k)
Under H0, theta=0.5, so we compute p value as P(k=8)=C10,8*0.5^(10)=0.044<0.05
If we set the significance level as 0.05 then we can reject the null hypothesis H0, in other words, we select H1, i.e. this is not a fair coin.
If we have more (i.e. 10) coins, because each coin is independent, so I can judge the fairness of each coin use the same method above.