Deshaw Inc Interview Question
Software Engineer / DevelopersInstead of storing the counters for each pair, you can save precious memory by writing out the pair of complements to a file as you encounter it. Why keep a count for pairs?
If repetitions of pairs are to be avoided, we don't even need the occurrence counters, instead just use 2 bit to store whether u has occurred or not, and has been used in a pair, and output the pair if u occurs, and hasnt been used in a pair yet
The above solution works fine, when size of numbers is small and all numbers can be loaded into RAM in one go. The data structure will be in the RAM and it requires to hold all the numbers.
The term “millions of 30 bit binary numbers” gives a notion that all the numbers can’t be accommodated in RAM in one go and hence we need a better solution for this. B-tree data structures are used for secondary storage but not sure if they will be useful in this case.
Given that it is millions of data, let's assume that all data is in disk.
Sort the numbers in this fashion, (Assume memory available is M) -
1. Create M sorted sublists by bringing M chunk of data, sorting them in memory and writing them back to disk.
2. Merge sorted sublists by populating M memory buffers by the first element of each sublist in disk. Finding the smallest number of all and write it back to disk. Re-populate a memory buffer from its list if it get empty.
Once we have it all sorted, then the problem remains trivial.
1. Start a pointer from head and another from tail.
2. Repeatedly run the head pointer downwards and tail upwards looking for complements.
3. When head & tail meets, finish the program.
What is significance of 30 bits here ? Why not normally available e.g. 16 / 32 bit ? I am just trying to find out if it ( 30 bit ) causes any issue ...
Any comment on the complexity of the file ?
cant we use tries to store all different binary numbers. Tries will take less memory to store all numbers and meanwhile searching for a numbers complement can be done in o(30) constant time.
----------------------------Start of Pseudocode---------------------------------------------------
- kg January 15, 2007b=string of bits
u=integer
D=suitable data structure (hashtable will be a good idea)
where element=(key,value)
value=(occurance counter- OC of key, pairing counter PC of key)
for each number b in file
{
u=binary string to int(b)
if( compliment(u) exits in D AND OC(u) >0 )
{
OC(compliment(u))=OC(compliment(u))-1
PC(compliment(u))=PC(compliment(u))+1
}
else
{
if (u exists in D)
{
OC(u)=OC(u)+1
}
else
{
D.add(u,1,0) // 1=OC 0=PC
}
}
}
//Scan through the Data Structure D to generate pairing information
----------------------------End of Pseudocode---------------------------------------------------
Runtime = O(nk) where n=number of binary numbers, k=number of bits in each number.