Yahoo Interview Question
Software Engineer / DevelopersUsername has a length of 5 and is made up of A-Z,0-9, and a space.
So we have:
" " //space value 0
0-9:
0 //1
1 //2
2 //3
.
.
9 //10
A-Z:
A //11
B //12
.
.
Z //36
So we have 37 symbols to choose from in our username. Lowercase letters were not mentioned, so I'm assuming they are not allowed. Since order in which the symbols are arranged is important, and each symbol can be used numerous times, I believe to guarentee we have no collisions, our hashtable will need to be at least the size of the permutation of 37 choose 5 with replacement (37^5). For each space, we have a choice of 37 different symbols, so there are 37*37*37*37*37 different usernames we must expect.
We need to be able to recognize different usernames, even if they contain the same symbols (ie: " I AM" and "Am I "). So I'm going to assign a weight to each character position. The weight is going to allow us to recognize different orders of symbols, and also allow us to directly map each value to a unique position. Each position will be multiplied by size of the list raised to some power. So the first position will be multiplied by 37^0, which will allow us to fill in the first 37 smallest numbers. (" Z" = 37). The next position will be multiplied by 37^1, which will allow us to fill up the next 37 positions. Our whole username will hash something like this:
ZZZZZ ( Z = 36)
(36*37^4) + (36*37^3) + (36*37^2) + (36*37^1) + (36*37^0)
which hashes to 69343956 which is the last index in our array of size 37^5.
This is also used in the String class in Java.
The hash code for a String object is computed as
s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
using int arithmetic, where s[i] is the ith character of the string, n is the length of the string, and ^ indicates exponentiation. (The hash value of the empty string is zero.)
Little Modification
All U explained is fine but i think it should have been mentioned that we have base system of 37 & instead of assigning values to 0..9...A...Z to 1 to 36 we can use their ASCII Values
as u have said memory efficient solution so overall is fine
here is my effort so our hash function looks like as A Said
int hash(String key)
{ int h = 0;
for (int i = 0; i < key.length; i++)
{
h = 37*h + s.charAt(i); }
if(h>INT_MAX)
return h-INT_MAX // or h % table_size
else if(h<INT_MIN)
return h+INT_MAX;
return h; //else return h
}
Correct me if i said wrong ??
I think if we consider only A-Z there is a good solution. Each of the character does not need more that 6 bit. Even, the name contain a-z, it is also possible to represent them with 6 bits. That requires a simple trick. The key will be 32 bit long. Here it is:
unsigned int gKey( char name[5])
{
unsigned int key = 0;
for (int i = 0; i<5; i++)
{
key+=name[i]-32;
key=(key<<6) ^ (key>>(32-6));
}
return key;
}
Algorithm:
1. If you rearrange the ascii values into a sequence, you can form a number system with (26+10+1) symbols starting at 0.
2. Hence, the sum of the username characters will be between 0 and 5*(max symbol value).
3. Then do sum%(max symbol value+1).
Note: you can calculate the sequence max symbol value during preprocessing. To calculate the sum of the username characters, you will need to check if the character is a letter/digit/space. Afterward, do the appropriate subtraction based on ascii table.
This algorithm uses an integer iterator and int sum variable. Accessing all the characters is inevitable to identify the permutations.
@Jack your solutions are awesome.. you are like the small joke in between a serious boring class... good.. keep going.. :D
here is nice post on this question
- Jack December 12, 2014campuscoke.blogspot.in/2014/12/design-hash-table-to-store-phone-s-your.html