assume your computer is readin

Amazon Interview Question for Software Engineer / Developers

0

of 0 votes

22
Answers
assume your computer is reading characters one by one from a stream (you don't know the length of the stream before ending). Note that you have only one character of storage space (so you cann't save the characters you've read to a something like a strong). When you've finished reading you should return a character out of the stream with equal probability.
- other January 30, 2006 | Report Duplicate | Flag | PURGE
Amazon Software Engineer / Developer Brain Teasers Algorithm

Email me when people comment.

An error occurred in subscribing you.

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 4 vote

Wow, that's kinda a cool question. I think I have an answer to this, but let me explain the way I'm thinking about it.

So, first of all, we don't have a real buffer - that's the complication. Our buffer can only store one character.

--
This problem reminded me of a problem from my algorithms class : "How can you create a website hit counter that counts to 100,000 with only 2 digits (you don't need an exact count - an estimate is fine)? Answer: Increment the count with probability 100/100000 = 0.1%. Multiply your count by 1000 to get your estimate.

The point of the second problem is that it uses a probabilistic method.

--
Getting back to this problem... We only have a one character buffer, so that means that we have the following choice: do we want to keep the current character in the buffer, or swap it for the current character in the stream? Clearly, we must make this decision probabilistically. But, with what probability do we keep the swap the character?

Stream = A, B, C, D, E, ...
n = length of stream
P_n(A) = probability that A is our chosen character after we've read n characters from the stream.

---
As a first try, let's try swapping the character in our buffer with the current character with probability 1/2.

This doesn't seem to make sense, however, because the last character will always have a 50% chance of being chosen, which can't be equal odds (do the math for 3 characters if you want).

--
Hmm, ok. What if we change the probability based on how many characters we read? If you think about it, we <i>must</i> have the odds of swapping get lower and lower, so that later character aren't favored.

Well, let's work this out.

If n = 2, we want all probabilities to be 1/2:
P_2(A) = 1/2
P_2(B) = 1/2
So, P_2(swap) = 1/2. Doesn't help us much.

If n = 3, we want all probabilities to be 1/3:
P_3(A) = P_2(A) * P_3(swap) = 1/2 * P_3(no swap) = 1/2 * (1 - P_3(swap)) = 1/3
P_3(B) = P_2(B) * P_3(swap) = 1/2 * P_3(no swap) = 1/2 * (1 - P_3(swap)) = 1/3
P_3(C) = P_3(swap) = 1/3
So, if n = 3, P_3(swap) = 1/3.

This suggests that we want P_n(swap) = 1/n, but we don't really know that until we've proven it.

So, do this:
1. Try it for n = 4.
2. Prove it mathematically.

- Gayle L McDowell January 31, 2006 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

B.E.A.utiful

- JH January 01, 2008 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Hi Gale, there is a small typo in your solution:
Should change
P_3(A) = P_2(A) * P_3(swap)
to
P_3(A) = P_2(A) * (1-P_3(swap))

- YL June 21, 2019 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

Anyone? Gayle?

- Khoa January 31, 2006 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

I don't understand your answer. What do you do when reading a character from the stream? Swap it with the stored character in a probability of 1/n? But you don't know n, because you don't have storage space for a counter.

- Erniu February 05, 2006 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Well, you only have storage space for one character from the buffer. You do have storage space for a counter.

- Gayle L McDowell February 05, 2006 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

I agree with your solution. However I also agree with the other poster that the problem is poorly worded.

- Jack November 14, 2007 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

randomly choose between 0 and 1 (so probability of 1/2)
whenever you read the next char to see if swap or
no swap. Comments!

- tariq February 12, 2006 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

tariq - if you use a probability of 1/2, it will be grossly waited to the end of the string. The last character will have a 50% probability of being picked, which alone tells you that not all the characters will have equal probability.

- Gayle L McDowell February 12, 2006 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Hi

Gayle, I still coudlnt get the algorithm you proposed.i didnt understand P_3(A) = P_2(A) * P_3(swap) = 1/2 * P_3(no swap) = 1/2 * (1 - P_3(swap)) = 1/3;

- Monty February 12, 2006 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

I think what Gayle meants was following.

Let p_k be the probability for the kth incoming char to be saved in the buffer. p_k = 1/k, which means:

p_1=1 : we always put the first char in the buffer.
p_2=1/2 : the 2nd incoming char has 1/2 chance of being put in the buffer
p_3=1/3 : the 3rd incoming char has 1/3 chance of being put in the buffer.
...
p_n=1/n : the nth, which is the last incoming char has the chance of 1/n being put in the buffer. and n is unknown until we reached the end of the stream.

so the probability for the ith character being the output char is:
p_i*(1-p_(i+1))(1-p_(i+2))...(1-p_n) =
1/i*i/(i+1)*(i+1)/(i+2)*...(n-1)/(n)= 1/n

- kikongoo February 12, 2006 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Well...let's examin it more closely:

Let e1, e2, ..., en be the characters. and P({ei}t) be the
probability of selection at any given time, t. when you have
seen 1 character then P({e1}t)=1 at time t+1
P({e1}t+1) = P({e2}t+1) = 1/2
P({e1Ve2}t+2) = 2/3 and P({e3}t+2) = 1/3
Now, P({e2}t+2|{e1}t+1) = 0, that is, probability
of selecting e2 given that e1 had been selected is 0.
Similarly P({e1}t+2) = 2/3. Why? because at the t+2
we do not have e2 anymore, so the selection has to be
done between e1 and e3. In order to keep the overall
probabilities equal at the end of the sequence, we have
to weight P({ei}t+k) appropriately, otherwise they will
not be equal (as you point out). What we are essentailly
doing is weighting the existing ei and new coming ej
to keep the probabilities at any point in time the same
for all. However, if we do that then at P({en}t+end) = 1/n
and that of the other one, say ei, P({ei}t+end) = n-1/n.
so say n = 30. then P(en) = 1/30 at time = end and
P(ei at time=end) = 29/30!
(one can see the above reasoning and probabilities by
drawing a tree diagram:
e1e2
e1(e3) e2(e3)
e1(e4) e1(e4) e3(e4) e2(e4) e2(e4) e3(e4)
: : :
At any given level the leaves are all outcomes and
number of ei are same. The (ei) simly means next character if ei-1
is not the terminal)

IMO, we can treat sampling at each time as independent
events, yes?

- tariq February 13, 2006 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

This is the same as the hiring problem. It can be proved that the expected number of times the character is replaced is log(n). BTW, Gayle's soln is very clear and correct

- Nat February 13, 2006 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Great analogy!

- tb February 09, 2008 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

kikongo is correct

- sp March 20, 2006 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Why not generate a random number for each incoming charachter and then see if it's even or odd. If it's even then we swap otherwise we don't.

- logan July 15, 2007 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Another explanation.

Suppose there are N elements. Then we want to choose any with a probability of 1/N. We propose a method of getting there inductively.

To do so, we record the number of elements we have seen (inclusive of the current sample) and the random selection.

After you sample the first element, you have probability of 1/1 of each, so trivially, you meet the condition.

By the induction hypothesis, at time k, all elements have probability of 1/k. Then at sample k+1, retain the new value with probability 1/(k+1). The current value will be retained with probability k/(k+1), which means that all of the previous values will have probability of 1/k * k/(k+1) = 1/(k+1), fulfilling the induction hypothesis.

- Jack November 14, 2007 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

I don't believe that you are able to store how many elements you've seen....

- jblow December 03, 2007 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

For those of you worried about whether or not you can store the counter, check the range of characters you're expecting. If it's just 'A' through 'Z', that's 26 characters so you only need 4 bits to represent them Just subtract 'A' from each incoming character. If our buffer is an 8 bit char, that gives us an extra 4 bits to use as a counter.

Of course that makes our counters max value 31, so if the stream has 32 characters, we'll reliably pick the very last one. Would there be any huge problem with just keeping the counter at 31 once it gets there?

But the 1/n probability wouldn't be even though, either way. As the counter grows, it gets less and less likely that the current number will be picked, so we end up favoring the earlier numbers, right?

- vt_mruhlin December 08, 2007 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

i'm yet to get the hang of it. can someone explain the solution involving probability?
Gayle? thank you.

- donkeyMan August 13, 2009 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

The web has a lot of great sites. Search for Reservoir Sampling.

- LOLer August 13, 2009 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

The question says when we have finished reading the stream we want to return a character with equal probability. We know when we have finished reading the stream the length. So when we say equal probability we mean each of the characters in the pool should have equal probability to be selected i.e if we have n characters in the pool our probability should be 1/n.
We can generate any random number between 1 to n and thus return that particular number.

- Free Bird March 14, 2012 | Flag Reply

CareerCup

Amazon Interview Question for Software Engineer / Developers

Books

Videos

Resume Review

Mock Interviews