Tossing a coin ten times resul

Google Interview Question for Research Scientists

-3

of 5 votes

16
Answers
Tossing a coin ten times resulted in 8 heads and 2 tails. How would you analyze whether a coin is fair? What is the p-value?
In addition, more coins are added to this experiment. Now you have 10 coins. You toss each coin 10 times (100 tosses in total) and observe results. Would you modify your approach to the the way you test the fairness of coins?
- midtownguru November 01, 2013 in United States | Report Duplicate | Flag | PURGE
Google Research Scientist Probability

Email me when people comment.

An error occurred in subscribing you.

Country: United States
Interview Type: Phone Interview

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 5 vote

I will set up a Hypothesis testing. Let theta be the probability that the coin is head, k be the number of times we observed that the coin is head.

H0 (null hypothesis): the coin is fair, i.e. theta=0.5
H1 (alternative): the coin is not fair, i.e. theta!=0.5

And the p value is the probability that the observed results are obtained under H0.
Because P(k) follows binomial distribution so P(k) = (C10,k)*theta^(k)*(1-theta)^(10-k)
Under H0, theta=0.5, so we compute p value as P(k=8)=C10,8*0.5^(10)=0.044<0.05
If we set the significance level as 0.05 then we can reject the null hypothesis H0, in other words, we select H1, i.e. this is not a fair coin.

If we have more (i.e. 10) coins, because each coin is independent, so I can judge the fairness of each coin use the same method above.

- shuaixin.david November 02, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 1 vote

Part 1: The distribution of the number of (say) heads is Binomial. The p-value is the probability of obtaining a number of heads that is as extreme or more extreme. You can test (for example) if the probability of the coin yielding heads is different from 1/2. Then the p-value is the probability of getting 1,2,9, or 10 heads (you can also add 3 and 8 if you opt for a non-strict inequality). We reject the hypothesis that the coin is fair if the p-value is too small (i.e. if we have witnessed an event that is highly unlikely given our assumption).

Part 2: Assuming all ten coins are of the same mint, we find P(average of head-counts for the 10 coins is extreme). To do that we need to know the distribution of this average. It can be calculated explicitly with a bit of probability. In fact, the sum of (independent) Binomials with the same probability of success is another Binomial.
And again, if the p-value is too small, we reject the null hypothesis of P(heads)=1/2.

- mathytime September 23, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

In the first case: p(h)=0.8, p(t)=0.2 For a fair coin, p(h)=p(t)=0.5
So, in the first case, toss the coin ten times again u get p(h_run2), do couple more times
then we are looking at
(p(h_run1) + p(h_run2) + p(h_run3)+...p(h_runN) ) /N should be close to 0.5
Also, as we increase the runs, the problem should get closer to 0.5 if the coin is really fair.

May be u could do the same for tail and observe the probability getting close to 0.5 as well

Same thing can be extended to the second case...u do p(h_coin1_run1) + p(h_coin1_run2)+.....p(h_coint1_run10) /10 +
same thing for coin 2 + .... till coint 10
entire thing divided by 10 should be close to 5

- adi November 01, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

If you divide the 'entire thing' by 10, you will get to know the fairness of the group of 10 coins (rather fairness of '10 tosses of 10 coins').
But to get to know the fairness of each coin, we will have to divide individually , right?

- nharryp April 02, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

Clear solution. 1 comment, you are using 1 tail test here. If 2 tailed test, conclusion will be different.

Thanks.

- Datacoder November 03, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

I thought the first part of the question was very straightforward. Calculating the p-value of 56/1024 was easy. But Google was very interested in how you analyzed it. I was asked what else I could do to test the fairness of thecoin. To answer it, I mentioned chi-squared distribution and its test statistic. Then Google guy asked me that "is this an approximation?" He went on and asked me "if I tossed the coin more than 10 times, say 100 times, would it help?"

For the second part of the question, I had to pause for about five seconds before I opened my mouth. I though I needed to modify it because if you stick to 5% significant level for each coin, then the probability for failing to detect a biased coin would decrease for 10 coin case. I mean 0.95^10 = 0.6. So you could fail to find a biased coin 4 our of 10. In order to be 95% sure, you need to raise the significance level to 0.05/10 = 0.5% for each coin. I think this is similar to Bonferroni confidence band.

I don't know the answer. Gooogle didn't give me any answer during the interview.

- midtownguru November 04, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

some google guy took u for a ride

- anony November 04, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

May I ask 2 more questions:

1. in first part, why you use chi-square? Because you want to test the variance?

2. In the second part, what's the reason you mention 5%/10=0.5%? Is it because (1-0.005)^10=0.995^10=95.11%?

Thanks.

- datacoder November 04, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Chi-squared test in this example will be an approximation since the sample is not normal distributed so the sum of the (samples)^2 is not chi-squared distributed.

- andycuisong September 25, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

You can analyze this several ways. First way is summing up the binomial probabilities like others did. You can also have a chi square test where you are comparing sampled ratios to the expected ratio of 0.5 with 1 degree of freedom. Although the results would not be good due to small number of observations. But for the second part of the question, it could be used.You can also do a students t-test to see if the ratio 0.2 is within the confidence interval around 0.5 (expected) with that many samples.

- memo November 18, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

I don't get the second part, there are 10 coins right?, is it the question how to tell which one is fair and which one is not? I don't see how observations on one coin give information about the others. I would exactly the same hypothesis test as for the single coin case, but, 10 times.

- cronopio November 26, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

I think that the interviewer wanted you to mention the t-test. Its a bit harder to compute the sample standard deviation each time, but it's easy to see how increasing n will affect the p value: t=(x-u)/(s/sqrt(n))

Check out my blog at meditationsofmarcus.blogspot.com

- Lucas_adams November 28, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

I think t-test needs normal approximation, the condition is np>5 and n(1-p)>5. But his problem does not meet this condition. So either binomial test or chi-square test works.

- Joyce December 23, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

Could we make use of a bootstrap approach for the second part when we have 10 coins and see what the empirical distribution converges to? I think that would be more asymptotically accurate?

- Jakob November 25, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

For the second part, I am assuming they meant that you need to add multiple hypothesis correction (assuming each coin is an independent hypothesis). You can use Bonforroni, which will tell you that you should have a new threshold that is divided by the number of coins (so now you need to reach <0.005) or Benjamini Hochberg

- Assaf March 04, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

As others have noted, tossing a coin N times results in a random variable $X ~ Binomial(n, p)$ with pdf $P(X=k) = {n \choose k} p^k (1-p)^{n-k}$ and standard deviation $\sigma = \sqrt{np(1-p)}$

However, first we should assess our statistical power. This depends on the statistical test used and the alpha used as the threshold for significance. I'd normally use a two-sided t-test (which is good for small data) and a threshold of 0.05. The null hypothesis to test is H_0: p = 1/2; if we can reject this, then the coin is unfair.

However, the statistical power calculation yields that the minimum detectable effect size is $c_\alpha \sigma/\sqrt{n} = 1.96/2\sqrt(10) \approx 0.98/3.16 \approx 1/3$ (ok, a little less than 1/3). In other words, we could only detect whether p is > 5/6 or < 1/6 (only very extremely unfair coins). I think we don't really have enough statistical power to answer the question unless we got 9 or 10 (or 0 or 1) heads out of it.

Anyway, the p-value is the probability of getting data this or more extreme if the null hypothesis were true. That works out to p-value = $P(X \ge 8) = 1/2^{10} ({10 \choose 8} + {10 \choose 9} + {10 \choose 10}) = 56/1024 \approx .05x$. This is outside the usual confidence threshold of 0.05, but we might relax that given our low statistical power and maybe accept it. I think it's an underpowered test and we should insist on more trials.

If we toss 10 independent coins 10 times, we don't learn any additional information about the other coins' fairness, but we are now multiple testing which brings family-wise error rate (FWER) into play. If we relaxed the confidence threshold to 0.1, then we will expect one false positive in ten trials. We should apply a multiple-testing correction to account for this, such as Bonferroni (which replaces alpha with alpha/m where m is the number of trials; here that would drop the threshold to 0.01) or Holm-Bonferroni (which sorts the p-values ascending and chooses the first k such that p_k > alpha/(1 + m - k) and treats the results up to k as significant and those beyond k as insignificant). Given our already too-low statistical power, I think we have no chance of detecting an unfair coin from among the 10 coins tested. Any apparent detections are too likely to be false positives.

- Michael September 06, 2016 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

I think you might be expected to jump to bayesian analysis for their follow up questions.

Assume you want to get the probability of getting head by given observation:
p(theta | Y=y1, y2, ... yn)
You could have a prior assumption: p(theta) ~ beta(a, b) [let a = b = 1: uniform distribution)
Then your observation y1, y2, .... yn ~ i.i.d. binomial distribution.
That is p(Y | theta) ~ binomial (n, theta)
Based on this, the conjugate posterior distribution of theta p(theta | Y) = beta(a+sum(y), b+n-sum(y))
Then the expected theta: E(theta | Y ) = (a + sum(y))/(a + b + n)
The above term can be explained as weight of prior * prior_mean + weight of sampling * sampling_mean: when sample size is small, the posterior_mean is more rely on prior_mean, while when the sample size goes up, the posterior_mean will goes to sampling_mean finally. Also, you can get the 95% confidence interval for posterior_theta and then find whether 0.5 is inside or outside the area.

- Impromptu March 01, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

-1

of 1 vote

Job title? Google Research, some sort of machine learning team?

- S O U N D W A V E November 01, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Yes. This team does some machine learning in Google. They are called quantitative analysts or statistician engineers. They work on projects to improve search engine and ads.

- midtownguru November 04, 2013 | Flag

CareerCup

Google Interview Question for Research Scientists

Books

Videos

Resume Review

Mock Interviews