Microsoft Interview Question

Country: United States

Comment hidden because of low score. Click to expand.
of 2 vote

The question starts with a probability topic and restriction on space both point to a possible solution using Bloom Filter.

Now Bloom Filter is a probabilistic data structure that uses a relatively small fixed-size array to record hashes for the elements that it has seen before. It may give you a false positive with some probability P but never a false negative. (So it may say that the element exists in a set when the element may not really exist)

Usually, a bloom filter has an "add" and "contains" operation which both are very fast (time complexity O(1)). But there is an implementation/extension of the bloom filter called, the Counting Bloom filter with a "remove" operation at the cost of incurring an additional space overhead for counting.

I would assume, the first part of the question where smaller numbers have more probability of appearing first in the stream to suggest that once you start getting larger numbers, you can remove hashes generated using smaller numbers to further reduce the false positive scenario.

- Saurabh March 18, 2021 | Flag Reply
Comment hidden because of low score. Click to expand.
of 0 vote

Bloom filter will work only for the false case, not the true case.

One other way to optimize for space could be to keep two levels of hashsets. One regular one, one for contiguous ranges. If in your regular hashset you find that you have hashed numbers in, say, a contiguous block of 10 (like 0-9 inclusive), you can remove them from regular hashshet and then add 1 to your range hashset. Now, as you are checking a new number n, check to see if n/10 exists in the range hashset. If not, check in the regular hashshet.

This can incredibly reduce your space usage.

One more thing, this seems more like an open ended question to see how many questions you can ask about details and come up with different solutions for it rather than one with one correct answer.

- SK April 04, 2021 | Flag Reply
Comment hidden because of low score. Click to expand.
of 0 vote

The fact that smaller numbers are more likely to appear before large numbers is a hint that the data will be clustered in continues ranges.

Here is a Python code that would keep track of seen numbers by storing contiguous ranges in a binary tree:

class MyRange:
    def __init__(self, l: int, r: int) -> None:
        self.l = l
        self.r = r

class IntegerSetNode:
    def __init__(self, i: int):
        self.range = MyRange(i, i+1)
        self.left = None
        self.right = None

class IntegerSet:
    def __init__(self):
        self.root = None

    def seen(self, i: int) -> bool:
        if self.root is None:
            self.root = IntegerSetNode(i)
            return False
            node = self.root
            while True:
                if node.range.l > i:
                    if node.range.l == i + 1:
                        node.range.l = i
                        return False
                    elif node.left is None:
                        node.left = IntegerSetNode(i)
                        return False
                        node = node.left
                elif node.range.r <= i:
                    if node.range.r == i:
                        node.range.r = i + 1
                        return False
                    elif node.right is None:
                        node.right = IntegerSetNode(i)
                        return False
                        node = node.right
                    return True

def test():
    in_set = IntegerSet()

    for i in range(10):
        if in_set.seen(i):
            print(f"should not have seen {i}")

    for i in range(10):
        if not in_set.seen(i):
            print(f"should have have seen {i}")

# Press the green button in the gutter to run the script.
if __name__ == '__main__':

- Vassili Gorshkov April 09, 2021 | Flag Reply
Comment hidden because of low score. Click to expand.
of 0 vote

A question for the interviewer would be - are we allowed to use a data store. In that case, you can have a sorted linked list to store the largest 50 or so integers and store the rest in the data store. Searching in a constant size list should take constant time - of course the constant itself is proportional to the size of the list. The moment you encounter a larger number, you extend the list and then pop out the smallest number from the tail and put it in a data store. If you occasionally encounter a number smaller than the smallest in the list, just search in data store.

- yossee May 31, 2022 | Flag Reply

Add a Comment

Writing Code? Surround your code with {{{ and }}} to preserve whitespace.


is a comprehensive book on getting a job at a top tech company, while focuses on dev interviews and does this for PMs.

Learn More


CareerCup's interview videos give you a real-life look at technical interviews. In these unscripted videos, watch how other candidates handle tough questions and how the interviewer thinks about their performance.

Learn More

Resume Review

Most engineers make critical mistakes on their resumes -- we can fix your resume with our custom resume review service. And, we use fellow engineers as our resume reviewers, so you can be sure that we "get" what you're saying.

Learn More

Mock Interviews

Our Mock Interviews will be conducted "in character" just like a real interview, and can focus on whatever topics you want. All our interviewers have worked for Microsoft, Google or Amazon, you know you'll get a true-to-life experience.

Learn More