Microsoft Interview Question
Software Engineer InternsCountry: United States
Interview Type: In-Person
IPv4 is X1.X2.X3.X4 where X can be from 0 to 255, that's mean it can take 2^8 values.
4*X - is 4*8 bit and is it 4 byte.
1. we can use 4 byte as key for one element. In this case we will never have collisions.
2. we can use less then 4 byte, then the best way to impliment hash function will be to ignore first numbers, and use last (X4, X3.X4 or X2.X3.X4 dependence of how many memory we want to use) as hashkey.
We use only end of id address, because in real life ip addresses usually in one area have same first 3 digits, more over in one (home/work) network they usually have same 3*3 digits.
So collisions will be more evenly distributed if we skip first part.
Trie is how routing tables are stored. Best part about trie is there is no wasted space. And at each iteration (byte) you only bring in corresponding subtree into memory
Really? Trie has no 'wasted space'? 255 branches and only 2 IPs stored..will it be fully utilized?
it depends how much collisions you want, just take it as 32 bit unsigned number, then for example if I want to contain it in 5 bits itself do H(IP)=IP%31, if I want 6 bit I might do H(IP)=IP%61 something like thatbut of course the more you decrease the bits the more will be collisions so depends upon your application how much it can bear and how much space it can allocate
If we have 32 bits of data, we could enumerate all possible values (treated as an unsigned int), but we'd use too much space if we allocated for all of them. We only need enough space to store ~10 million ints. Depending on how many collisions we want to see, we can chop off some of the bits.
2^23 = 8 million
2^24 = 16 million
We could choose to only use the lower 23 or 24 bits of the addresses. I wouldn't choose the higher 24 bits because the variability is less.
Sticking with the lower 24 bits is nicer because we chop a whole byte off our 32 bit addresses, but it may be a better idea for memory reasons to stick with the lower 23 bits instead.
Maybe question is about creating hash table? Not hash function? For this we can use Bloom filter, if some false positives are allowed.
I guess a matrix of the type hash[4][32] can be used.
char hash[4][32]; /* set all fields to 0 */
Each byte index is a row number and each byte value gives bit index into particular entry.
void bitset(int i, unsigned char x)
{
int j = x>>3; /* divide x by 8 */
int bitpos = x%8 /* bit position */
x[i][j] |= (0x1 << bitpos);
}
void hash_insert(uint32_t n)
{
char *ip = n; /* n is numeric 32 bit IP address */
for(i = 0; i < 4 ; i++)
bitset(i, ip[i])
}
To delete is symmetric
void bitreset(int i, unsigned char x)
{
int j = x>>3; /* divide x by 8 */
int bitpos = x%8 /* bit position */
x[i][j] &= ~(0x1 << bitpos);
}
void hash_delete(uint32_t n)
{
char *ip = n; /* n is numeric 32 bit IP address */
for(i = 0; i < 4 ; i++)
bitunset(i, ip[i])
}
To search check if bit position is set or not
void isbitset(int i, unsigned char x)
{
int j = x>>3; /* divide x by 8 */
int bitpos = x%8 /* bit position */
return ( x[i][j] & (0x1 << bitpos) );
}
boolean hash_search(uint32_t n)
{
boolean found = true;
char *ip = n; /* n is numeric 32 bit IP address */
for(i = 0; i < 4 ; i++)
if( isbitset(i, ip[i]) == 1 )
continue;
else
{
found = false;
break;
}
return found;
}
All operations take O(1) and space is also O(1).
or we can use multilevel hashtable
- shsf October 31, 2013