Facebook Interview Question
Software Engineer InternsCountry: United States
how about the use of a bitset?
#include <iostream>
#include <bitset>
#include <list>
using namespace std;
int is_duplicate(bitset<32000000> &bs, int number) {
if(bs.test(number)) {
return true;
}
else {
bs.set(number);
}
return false;
}
int main() {
// your code goes here
bitset<32000000> bs;
list<int> l = {1, 2, 4, 1, 12};
for(auto it = l.begin(); it != l.end(); it++) {
cout << is_duplicate(bs, *it) << ' ';
}
return 0;
}
If there are no duplicates, then we can create array of buckets. The array size is 1M, and each element is a 12-bit integer count. Then scan the numbers, for each number N, increase the count in the (N>>12)'th bucket. At the end if the count in the bucket is less than 2^12-1, then the missing number is in this bucket. The rest is easy - scan the numbers again, discard the numbers not in the bucket, for the number is this bucket, use bitmap to indicate if the number is visited.
We can split maximum integer into ranges and sum integers in every range, in order to avoid overflow we have to subtract from every number the range start value. At the end all buckets must contain the same value. If value in bucket is different that is our searched range. Then we have to add all possible numbers of that range to bucket again and if total amount matches total amount of other buckets that is our missing number. But it works only if all integers are unique.
If the constraints are that all the numbers are unique and because there is only 1 integer missing.
Step 1. Take an integer -> temp1=0.
Step 2. Iterate through the array and then keep xor-ing each item with temp1.
Step 3. Take another integer -> temp2=0;
Step 4. Iterate till 4 billion / limit and keep xor-ing each item with temp2.
Step 5 . Xor temp1 and temp2 to get the missing element.
If however, unique elements aren't guaranteed or there are more than 1 elements missing then, we can use a bit vector. In the absence of STLs ,we can use an array of integers as a bit stream and set the bit appropriately and scan through this array to find the missing elements.
Java Code based on above discussion
import java.util.Arrays;
import java.util.BitSet;
import java.util.List;
// Given a list of 4 billion integers,
// find an integer not in the list using 4MB of memory. (interview was in Java)
public class MissingIntFinder {
// 4 MB can accommodate -> 4*8*1024*1024 = 32*(2^20) = 2^25
// 32 bit integer can have possible -> 2 ^ 32 numbers
// so if we assume the Integer range is 32 bit and we'll do the scan in 2 stages
// (0 -> 16bit) -> (16 -> 32bit)
// Following is a result of 1 stage scan
public static void main(String[] args) {
List<Integer> data = Arrays.asList(0,1,5,3,4,5,12,2,16,21);
BitSet bitSet = new BitSet(2^16);
for (Integer val : data) {
bitSet.set(val);
}
// nextClearBit : Returns the index of the first bit that is set to false that occurs
// on or after the specified starting index.
int nextClearBit = bitSet.nextClearBit(0);
System.out.println(nextClearBit);
}
}
You have to clarify if the integer is 32 bit or 64 bit. For 64 bit case, it is actually easier to generate an integer randomly. Since you have 4 billion = 2^32 integer, if you generate a random 64 bit integer, the chance that it is unique is (2^64 - 2^32) / 2^64 = 1.0 (with good precision)
- Ehsan February 26, 2014So the question should be, having 4 billion 32 bit integers, how do you know if one of them is repeated?
This is what came to my mind.
Let's assume you have 4 MB = 2^20 integer words. Define an array of length 2^20 of integers. Each value (unsigned) can hold up to 2^32
Round A)
Now read the stream of numbers (4 billion):
For number X Let Y = X % 2^20. and array[Y] = array[Y] + 1
After reading all, go through the array and find a "Y"with array[Y] < 2^12
Round B)
We know something about the missing number. The 20 LSB are same as Y.
Clear memory except for "Y" and make another array of length 2^12.
Read all X again. For each X, if X%2^20 == Y, increase the memory location at array[Z].
Round C)
For through array Z and find the location where "array[Z] = 0". Report Z * 2^20 + Y as the missing number.