Microsoft Interview Question for Senior Software Development Engineers


Country: United States
Interview Type: Phone Interview




Comment hidden because of low score. Click to expand.
3
of 3 vote

This problem is not about hash. Why? If you use hash you can't guarantee "no false positive". It means if you say "YES" it should be 100% "YES". If you store words with losing information you can't say "yes, i'm sure".

PLEASE stop use hash EVERYWHERE.

So, we need such data structure which say "Yes, I'm sure" and "Probably no".

1st approach

Use Trie (aka digital tree/radix tree/etc). Try to append as much words as you can. Trie is good choice because of good time complexity for search and better then plane storing memory efficiency if words intersecting each other.

2nd approach

Use archiving algorithms without losing of data. Try to append as much as you can.

Pros: you will be close to best memory efficiency.
Cons: veeeeery sloooow.

Both approaches can be merged depending on nature of data.

- Sergey October 19, 2016 | Flag Reply
Comment hidden because of low score. Click to expand.
1
of 1 vote

Very imprecise formulation. Given the problem is :
1. False positives are not allowed : --> Do not say found, when it is not actually there.
2. False negatives are allowed --> Can say not found, when it is there.

These two axioms trivialises the problem. Just store the portion of the list as set what is possible in the memory.
Say "yes" when it exists in the set, otherwise say "No".

Was it a trick question?

- NoOne October 19, 2016 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

Compute hashes for all words in the list and store in memory all values not in the computed set. On lookup, compute hash for the input and return "found" if it's not among stored values.

- kanukadze October 18, 2016 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 2 vote

how much memory do we have? how many words?
this quesion wants optimal memory usage, since we accept false negatives, we should minimize the probability of false negatives by maximizing a hash value and picking a "good" hash function. Hashtables are not optimal for such a problem due to the overhead not required in this problem (e.g. for a given hash-value we just need a bit, conflicts don't matter, no bucket lists or open addressing required, filling all possible values is just a huge overhead - not in terms of big-O but practically). Use a bit-array and index it with a hash value. Scale the size of the array to the memory available to minimize probability of false negatives.

Preparation step:
1) create an array of max size n with int's (n ints with b bits)
2) fill the table with 0xffffffff (depending on int size)
3) for each word compute a hash: 0<= hash < n*b and unmaks the bit hash%b of array[hash/b]

Query:
compute hash for word
check if bit hash%b of array[has/b] is set. If set word is for sure not in there
If not set word is probably in there (or an other word colliding with the same hash-value is in there...)

- Chris October 18, 2016 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

this is what I came up with, but seems it wasn't good enough:

class WordList
  {
    const int mask = (1<<6) - 1;

    ulong[] ba;

    public WordList(string[] l) {
      ba = new ulong[1<<21];
      foreach(string w in l) {
        int h = Hash(w);
        ba[h>>6] |= 1ul<<(h & mask); 
      }
      for(int i = 0; i < ba.Length; ++i) ba[i] = ~ba[i];
    }

    public bool Contains(string w) {
      int h = Hash(w);
      return (ba[h>>6] & 1ul<<(h & mask)) == 0;
    }

    int Hash(string s) {...}
  }

- kanukadze October 18, 2016 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

@ kanukadze
it looks ok to me. I don't get your constants 1 << 21 (roughly 16 MB memory, can't you use more?) The hash function has no upper limit, so it creates you an int 2^32 or 2^64 (don't kow c# that well anymore), anyway best case is 32 - 6 bits: 2^26 is > 2^21 so you will sooner or later access elements that aren't there... a remainder % would help although it messes up the hash distribution a bit, better is a hash function that takes the range into account and perfectly distributes in this range...
what are actually this ";6" ";21" close to the shift operators?
the idea was certainly right, I find it elegant to invert the array and work with OR before inverting the whole thing. What interview was it? was it just one or several?

- Chris October 18, 2016 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

@ChrisZ
thanks for walking me through this.
you're right 1<<21 does look strange. there wasn't any explicit cap on memory except fitting into a single box, so prob. just miscalculated. "<<;6" must be a website's formatting issue, no ';' there.
# of words was in billions, and there was a hint of using a hash function with lots of collisions which I interpreted as a way to limit the range => bitarray size.
it was of the two technical questions on a phone screen for senior dev with MS big data team

- kanukadze October 18, 2016 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

@ChrisZ
The problem definition is very blurry from my perspective.

If it's allowed to tune hash function until you're are sure your function has no collisions on this set. Well... It might work.

But what if you have only one iteration of reading?

I agree that trie and arch algs is not good solution. But we have strong constraint "no false positive", so we have to save some of the words without loosing of information.

At least I can create an example of words, which have a lot of hash collisions for popular hashing algorithms. I mean if nature of words is random you have low probability of collisions. But if you face artificially designed set of words it would be problem.

My conclusion for now: not enough information about this problem and about nature of words in particular. What probability of a mistake is acceptable.

PS by "merging approaches" I mean if alphabet is wide but words have a lot of same parts you can switch to another alphabet, like arch algs do. Then create trie over new alphabet.
But I never did this on practice. Such tricks usually used in math proofs.

PSS if you create a trie u not always need to use ptrs for all edges. You can make quasi-trie. And sub trie will be indexed inside little blocks. Very like to CPU functional blocks inside CPU.
I hope u get the idea, couse it's hard to describe with my english knowledge :)

- Sergey October 21, 2016 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

@Sergey
I am not sure I understood the quasi-trie-approach.
But you are right, if you can't inverse the problem statement to say "it's certainly not in there"
you may need to pack as many words into memory as possible. I researched a bit, Minimal Acyclic Finite State Automaton (DAWG) is a modification of the trie that could do it.
Anyway upvoted your post ;-)

- Chris October 21, 2016 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

@ChrisZ
If we have inverse problem: false positive ok, false negative disallowed. It would be good old Bloom filter problem. That's why I was so angry about hash solutions in this topic.

Of course keep in mind I might be wrong :) Good luck.

- Sergey October 21, 2016 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

Bloom Filters. :)

- Anonymous October 24, 2016 | Flag Reply


Add a Comment
Name:

Writing Code? Surround your code with {{{ and }}} to preserve whitespace.

Books

is a comprehensive book on getting a job at a top tech company, while focuses on dev interviews and does this for PMs.

Learn More

Videos

CareerCup's interview videos give you a real-life look at technical interviews. In these unscripted videos, watch how other candidates handle tough questions and how the interviewer thinks about their performance.

Learn More

Resume Review

Most engineers make critical mistakes on their resumes -- we can fix your resume with our custom resume review service. And, we use fellow engineers as our resume reviewers, so you can be sure that we "get" what you're saying.

Learn More

Mock Interviews

Our Mock Interviews will be conducted "in character" just like a real interview, and can focus on whatever topics you want. All our interviewers have worked for Microsoft, Google or Amazon, you know you'll get a true-to-life experience.

Learn More