Flipkart Interview Question
Software Engineer / DevelopersSorting of word would itself take O(nlogn) where n is the length of the string.
Instead we can take 1st 26 prime number and allocate it to a-z. Now for every word we compute the product of the characters, this will be unique as we have chosen prime numbers.
Now for all the words Map it to HashMap<Integer,Word> if the key is there we have found one pair, else we insert the element into the map.
I think for anagrams best solution is to use Prime numbers assign A-2, B-3, C-5 etc., and put them in hash then CAB is 5*3*2 = 30, use this as the key and value can be list of Strings.
Do this for all words and finally iterate over the hash to print the anagrams.
This deals away the sorting burden.
The solution works. But the downside is that if we have very large sentence or text, the cost of multiplying i.e. tokenising the entire text would be very high.
For log words, i.e. of 10 characters, there is a potential chance of arithmetic overflow and a high cost of multiplication.
I'd suggest keep it simple by sum of ascii values and the no. of characters in a word. There would be collisions but would be very minimal.
tac would have the same weightage as tbb, act etc.
A trie would suffice. No hashtable required.
Staying with the assumption from coder, you can use a trie (prefix tree) instead of a hash. When determining if a word is an anagram, you would check to see if the word's reverse is in the trie. The advantages of using a trie:
1) No need for hashing algorithm
2) No collisions
3) Less storage (not entirely sure on this one)
My soln assumes that two words are anagrams if they are separated from each other by given delimiters like space etc and they contain same letters.
- coder December 05, 2009e.g. I saw a cat at the tac.
Here I can say that <cat, tac> is a anagram pair.
Algorithm:
Tokenize the words
As you are tokenizing, put the words in the hashtable as <word sorted by characters, actual word> e.g.if you find cat put <key, value> as <act, cat>
If you already find that word in hash table while putting in, you have found the anagram.
Time Complexity - O(len of str) assuming words are of constant size (lets say max size = 25).
Space complexity O(n) for the hash table.
Does anyone have better answer for this?