Given a stream of integers whe

Microsoft Interview Question

5

of 5 votes

5
Answers
Given a stream of integers where probability of a smaller number being first in stream is more than probability of larger number in stream.

Implement a method that returns true if you have already seen that element in stream otherwise return false.

eg. stream could look like 1,2,3,5,6,11,7,3......
more examples - 3,4,1,2,5,9,22,12

---
I explained using a hashset but that solution was not viable because of the space requirement.
I don't think I handled this question well. So would love people's opinion on it.
- James March 17, 2021 in United States | Report Duplicate | Flag | PURGE
Microsoft Algorithm

Email me when people comment.

An error occurred in subscribing you.

Country: United States

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 3 vote

The question starts with a probability topic and restriction on space both point to a possible solution using Bloom Filter.

Now Bloom Filter is a probabilistic data structure that uses a relatively small fixed-size array to record hashes for the elements that it has seen before. It may give you a false positive with some probability P but never a false negative. (So it may say that the element exists in a set when the element may not really exist)

Usually, a bloom filter has an "add" and "contains" operation which both are very fast (time complexity O(1)). But there is an implementation/extension of the bloom filter called, the Counting Bloom filter with a "remove" operation at the cost of incurring an additional space overhead for counting.

I would assume, the first part of the question where smaller numbers have more probability of appearing first in the stream to suggest that once you start getting larger numbers, you can remove hashes generated using smaller numbers to further reduce the false positive scenario.

- Saurabh March 18, 2021 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Bloom filter will work only for the false case, not the true case.

One other way to optimize for space could be to keep two levels of hashsets. One regular one, one for contiguous ranges. If in your regular hashset you find that you have hashed numbers in, say, a contiguous block of 10 (like 0-9 inclusive), you can remove them from regular hashshet and then add 1 to your range hashset. Now, as you are checking a new number n, check to see if n/10 exists in the range hashset. If not, check in the regular hashshet.

This can incredibly reduce your space usage.

One more thing, this seems more like an open ended question to see how many questions you can ask about details and come up with different solutions for it rather than one with one correct answer.

- SK April 04, 2021 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

The fact that smaller numbers are more likely to appear before large numbers is a hint that the data will be clustered in continues ranges.

Here is a Python code that would keep track of seen numbers by storing contiguous ranges in a binary tree:

class MyRange:
    def __init__(self, l: int, r: int) -> None:
        self.l = l
        self.r = r


class IntegerSetNode:
    def __init__(self, i: int):
        self.range = MyRange(i, i+1)
        self.left = None
        self.right = None


class IntegerSet:
    def __init__(self):
        self.root = None

    def seen(self, i: int) -> bool:
        if self.root is None:
            self.root = IntegerSetNode(i)
            return False
        else:
            node = self.root
            while True:
                if node.range.l > i:
                    if node.range.l == i + 1:
                        node.range.l = i
                        return False
                    elif node.left is None:
                        node.left = IntegerSetNode(i)
                        return False
                    else:
                        node = node.left
                elif node.range.r <= i:
                    if node.range.r == i:
                        node.range.r = i + 1
                        return False
                    elif node.right is None:
                        node.right = IntegerSetNode(i)
                        return False
                    else:
                        node = node.right
                else:
                    return True


def test():
    in_set = IntegerSet()

    for i in range(10):
        if in_set.seen(i):
            print(f"should not have seen {i}")

    for i in range(10):
        if not in_set.seen(i):
            print(f"should have have seen {i}")


# Press the green button in the gutter to run the script.
if __name__ == '__main__':
    test()

- Vassili Gorshkov April 09, 2021 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

A question for the interviewer would be - are we allowed to use a data store. In that case, you can have a sorted linked list to store the largest 50 or so integers and store the rest in the data store. Searching in a constant size list should take constant time - of course the constant itself is proportional to the size of the list. The moment you encounter a larger number, you extend the list and then pop out the smallest number from the tail and put it in a data store. If you occasionally encounter a number smaller than the smallest in the list, just search in data store.

- yossee May 31, 2022 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

#include <iostream>
#include <unordered_set>
#include <vector>

class StreamChecker {
public:
    StreamChecker() { }

    bool hasSeen(int number) {
        // Check if the number is in the set of seen numbers
        if (seen.find(number) != seen.end()) {
            return true; // Number has already been seen
        } else {
            seen.insert(number); // Add the number to the set of seen numbers
            return false; // This is the first time seeing the number
        }
    }

private:
    std::unordered_set<int> seen; // Set to store seen numbers
};

int main() {
    StreamChecker checker;
    std::vector<int> stream = {1, 2, 3, 5, 6, 11, 7, 3, 4, 1, 2, 5, 9, 22, 12};
    
    for (int num : stream) {
        if (checker.hasSeen(num)) {
            std::cout << num << " has been seen before." << std::endl;
        } else {
            std::cout << num << " is seen for the first time." << std::endl;
        }
    }
    
    return 0;
}

- igvedmak April 06, 2024 | Flag Reply

CareerCup

Microsoft Interview Question

Books

Videos

Resume Review

Mock Interviews