You have 10 million IP address

Microsoft Interview Question for Software Engineer Interns

2

of 2 votes

20
Answers
You have 10 million IP addresses. (IPv4 4 byte addresses). Create a hash function for these IP addresses.

Hint: Using the IP's themselves as a key is a bad idea because there will be a lot of wasted space.
- Aasen October 30, 2013 in United States | Report Duplicate | Flag | PURGE
Microsoft Software Engineer Intern

Email me when people comment.

An error occurred in subscribing you.

Country: United States
Interview Type: In-Person

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 2 vote

or we can use multilevel hashtable

- shsf October 31, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 3 vote

@ivermette - a 6 character string would take 6 bytes while a simple 32 bit int address takes only 4 bytes if used as key.

- ab7 October 30, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 1 vote

IPv4 is X1.X2.X3.X4 where X can be from 0 to 255, that's mean it can take 2^8 values.
4*X - is 4*8 bit and is it 4 byte.

1. we can use 4 byte as key for one element. In this case we will never have collisions.
2. we can use less then 4 byte, then the best way to impliment hash function will be to ignore first numbers, and use last (X4, X3.X4 or X2.X3.X4 dependence of how many memory we want to use) as hashkey.
We use only end of id address, because in real life ip addresses usually in one area have same first 3 digits, more over in one (home/work) network they usually have same 3*3 digits.
So collisions will be more evenly distributed if we skip first part.

- Darida October 30, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 1 vote

Use trie... root with 255 branches and and then next level 255 branches.....

- shsf October 31, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

-1

of 1 vote

- anon October 31, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Trie is how routing tables are stored. Best part about trie is there is no wasted space. And at each iteration (byte) you only bring in corresponding subtree into memory

- IntwPrep November 23, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Really? Trie has no 'wasted space'? 255 branches and only 2 IPs stored..will it be fully utilized?

- confused_banda December 18, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 1 vote

then there will be no 255 branches, or if you insist on having some room for future IPs, I guess all other data structures have the same problem, right?

- kk February 28, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 2 vote

Treat it as a 32 bit unsigned integer, and use some famous hash (google one).
Or design an unsigned integer hash as per your algorithms text.

- Urik's twin Cookie Monster (dead now) October 30, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Probably naive, but I might convert the IP to a network address (long int) and convert that to base 36. That gets it down to a 6-character unique string.

- jvermette October 30, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

it depends how much collisions you want, just take it as 32 bit unsigned number, then for example if I want to contain it in 5 bits itself do H(IP)=IP%31, if I want 6 bit I might do H(IP)=IP%61 something like thatbut of course the more you decrease the bits the more will be collisions so depends upon your application how much it can bear and how much space it can allocate

- anonymous October 30, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

If we have 32 bits of data, we could enumerate all possible values (treated as an unsigned int), but we'd use too much space if we allocated for all of them. We only need enough space to store ~10 million ints. Depending on how many collisions we want to see, we can chop off some of the bits.
2^23 = 8 million
2^24 = 16 million

We could choose to only use the lower 23 or 24 bits of the addresses. I wouldn't choose the higher 24 bits because the variability is less.

Sticking with the lower 24 bits is nicer because we chop a whole byte off our 32 bit addresses, but it may be a better idea for memory reasons to stick with the lower 23 bits instead.

- Andy G October 30, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

-1

of 1 vote

@Andy, you are better off % down to table size.
What if you have IP addresses from around the world (e.g. Amazon web stats for a day) ? What if you are top level router/gateway/switch located on the island of Atlantis?

- Urik's twin Cookie Monster (dead now) October 30, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

Maybe question is about creating hash table? Not hash function? For this we can use Bloom filter, if some false positives are allowed.

- Viacheslav October 30, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

And what does it mean "use Bloom filter". How will you use it without hash function? Can you provide good hash function(s) for Bloom filter?

- den October 30, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

Can we use some thing like network mask 255.255.255.0 or 255.255.0.0 as a hash function......it really depends whether you need 255 buckets or 255 * 255 buckets

- Sai October 31, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

I think 10 million is 2^23<=10^7 <=2^24
hence with 23 bits we will have arround 1 0r 2 collosions and we will save a lots of space as compared to 24 bit has...i.e using last 3 actates//...

- Neta_Ji November 16, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Largest IPv4 address can be 254.254.254.254, find out its 32 bit representation.
Chose nearest prime number and hash function will be (IPv4 in 32 bit representation)%prime.

- confused_banda December 18, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

I guess a matrix of the type hash[4][32] can be used.

char hash[4][32]; /* set all fields to 0 */

Each byte index is a row number and each byte value gives bit index into particular entry.

void bitset(int i, unsigned char x)
{
	int j = x>>3; /* divide x by 8 */
	int bitpos = x%8 /* bit position */
	
	x[i][j] |= (0x1 << bitpos);
}

void hash_insert(uint32_t n)
{
	char *ip = n; /* n is numeric 32 bit IP address */
	for(i = 0; i < 4 ; i++)
		bitset(i, ip[i])
}

To delete is symmetric

void bitreset(int i, unsigned char x)
{
	int j = x>>3; /* divide x by 8 */
	int bitpos = x%8 /* bit position */
	
	x[i][j] &= ~(0x1 << bitpos);
}

void hash_delete(uint32_t n)
{
	char *ip = n; /* n is numeric 32 bit IP address */
	for(i = 0; i < 4 ; i++)
		bitunset(i, ip[i])
}

To search check if bit position is set or not

void isbitset(int i, unsigned char x)
{
	int j = x>>3; /* divide x by 8 */
	int bitpos = x%8 /* bit position */
	
	return ( x[i][j] & (0x1 << bitpos) );
}

boolean hash_search(uint32_t n)
{
	boolean found = true;
	char *ip = n; /* n is numeric 32 bit IP address */
	for(i = 0; i < 4 ; i++)
		if( isbitset(i, ip[i]) == 1 )
			continue;
		else
		{
			found = false;
			break;
		}
	return found;
}

CareerCup

Microsoft Interview Question for Software Engineer Interns

Books

Videos

Resume Review

Mock Interviews