Google Interview Question
Software Engineer / DevelopersCountry: United States
Here is my idea.
First, I think we can use Huffman Encode to give each letter the specific representing bits. I think the avg bits needed is about 3. (because there are 26 letters, so you can use 5 bits to represent a letter in general). Suppose we need 4 bits to represent a letter after encoding. Then if we use 4 Bytes to represent each word. The max length with this method can hold the word which has no more 8 letters. I think that's a threshold, because the words longer than 8 letters are much less. So if the word is longer than 8 letters, we can map it with the first 8 letters(of course, need collision resolved but the length of collision nodes won't be very large).
The problem is the space needed is still large.
Waiting others good ideas.
Hash function for a string:
In Python:
def do_hash( my_str ):
# we want it to fit into 1 byte... (it would be better to use the first prime number that is less than 256)
return (hash(my_str, 256 ))
def hash( my_str, hash_size ):
hash_val = 0
alphabet_size = 26
for curr_char in my_str:
if curr_char >= 'A' and curr_char <= 'Z':
char_val = ord(curr_char) - ord('A'):
elif curr_char >= 'a' and curr_char <= 'z':
char_val = ord(curr_char) - ord('a')
else:
raise Exception("Not a string...")
hash_val += hash_val*26 + char_val
return hash_val % hash_size
How about bit-wise XOR all chars in a word? Very easy implementation but few flows:
-Even number of repeated letters will cancel itself though. "aka" will collide "k"
-As ascii values of letters are always between same interval mapping image of the hash will not use [0,255] range homogeneously.
- hin February 13, 2013