Microsoft Interview Question
SDE1sCountry: India
Interview Type: Phone Interview
Correct but the sentence is not very clear. I think what it means is:
1. Create a list of all the possible anagrams of a word
2. For each item in the list, check in the dictionary and see if it is a real word
I described this option but the interviewer was not that happy about it. But this is a good approach. I think he wanted me to pre process the dictionary may be. Not sure though.
Create a hashmap<String><List>
iterate over the words in the dictionary
Sort the words and store it in the hashmap as a key (if the key didnt exist)
add the word to the current hashmap.
Similarly, when you encounter the second word and sort it, check if there is already an entry for that sorted word in hasmap, if yes, then simply call hashmap.get(keyname), and add the word to the list that is returned by hashmap.
After this process, you can sort the word to be searched and get the list from the hashmap
the only way i can think of is:
1. compare length. not length match means no anagrams. this will reduce a lots of words.
2. Split the remaining words in group of characters. god-> g,o,d
3. Sort the letters. and compare hash.
I like this, but step #3 requires a sort of each word and then the hash? If we could get rid of that sort, we could save that nlogn time per word. What if we could do a hash that was insensitive to the order of the letters? Then we wouldn't need to sort them, saving that nlogn time. What if the sort were as silly as adding up the values of each letter. We could get false hits and then have to check each possible hit against other possible hits, maybe we would come out ahead here.
In any case, this is processing all the words in the dictionary, which would seem to be against the "not every words should be processed..." stipulation.
add a step 0 -
- Search only the words whose letter starts with a letters in given word.
Some modification that can be done after can be done after step 2 but i don't think these are elegant:
step 2.1 -store the original words(split-ed) in a array and in a set.
step 2.2 -add the letters in the new word in the set. If the length of the set changes its a miss. those words which pass the tests will require further processing
A note before step 3:-
I am not aware of any hash insensitive to the order. We might take sum of hash of each letters and compare the aggregate hash. a miss means no anagrams. i seriously doubt vice-versa is true.
Finally step3:
To find length of each word in dictionary, you have to access it, and thus is not a good solution.
Plus say the word is "Anagram"
Possible permutations of this word = 7!/3! = 7*6*5*4 = 840 only
While the dictionary will have a lot of words in comparison.
Therefore to decrease the number of comparisons, we should check for the permutations rather than the dictionary itself.
Assuming its large enough. Not sure what the interviewer was expecting though. I told him about length of words, Starting alphabet matching etc but he was not satisfied.
He stressed about the fact that not every word should be processed from the dictionary.
1. Generate permutations of the word.
2. Binary search through the "dictionary" (e.g. if given as a string array, or file directories - basically, avoid loading the entire dictionary into memory).
O(N! * log(P)), where N is the length of the word, and P is the number of words in the dictionary.
Dictionary Creation:
Data Structure: HashMap(AnagramHash key) contains HashMap(uniqueHash key)
hash1: anagramHash() - > Returns same hashcodes for all anagrams and not for non-anagram words
hash2: unique for each word
search:
Find hash 1 and hash 2 for any word and get the info
Time: O(1) i.e constant time
Anagram search:
find hash1 and return all objects in the inner hashmap
Time: O(1)
1) from the given word, separate all the alphabets and take only those sets from the dictionary starting with those alphabets
eg. given word "god", then tk the set of words starting with 'g','o' and 'd'
2) find the length off the given word and make a group of words with the same length.
3) that's it...
"interviewer stressed on not processing all the words in the dictionary."
That's the key point in this problem. I read algo for this problem long back somewhere, it was like this :
Map each alphabet with a prime number and product of prime values of letters in a word will act as hash. Now search in the given dictionary this hash value. If dictionary has been implemented as trie then search in the trie and break from a particular word if its prime product exceeds the corresponding value for the given word.
You need to take care of oveflow by carefully chosing prime values for letters, so higher frequency letters should be given lower prime value.
sort the word...find all anagrams of the word...
store the anagrams in lexicographically greater to smaller order!
find first word....in dict..
next word will b present from beginning till where first word was found...
next word will b present from beginning till where second word was found...
hence reducing search space on evry iteration!
Build a anagram HashMap of the dictionary itself. Key is the sorted string of characters of the words and value is a list of all the anagrams. Given a word, sort it and get the value of the sorted string from the HashMap. The list should have all the anagrams of the given words.
You only build the HashMap once. Sorting the input word has a complexity based on the sorting algorithm used (mostly O(nlogn)). Searching for list of anagrams is O(1) and looping through the list should also take O(m). Assuming the size of the word is small enough and the anagram list is mostly small, complexity should be O(1) once the HashMap of dictionary is created.
This seems the best :)
Pre process dictionary:
While inserting each word, sort the word & add it to trie according to this sorted word. & store the actual word.
When anagram for a word is to be find, just sort the word & lookup into trie. You will find list of words there.
Complexity will be O(nlogn) in worst case which will be of sorting word to be searched but as the words can't be too long, this should not be the issue.
Actually you need to find all permutations of a given word, which exists in given dictionary.
- googlybhai May 10, 2013