Progress Interview Question
Software Engineer / DevelopersCountry: India
Interview Type: Phone Interview
A hashmap such as
HashMap<String, LinkedList<String>> anagramHaspMap
such that the key is the string which is alphabet sorting of the input strings and the linkedlist is the list of the anagrams for the String.
For example for "eat" the data will be stored as:
"aet" : "eat" -> "tea" -> "ate" -> null
so for a given input word sort it in alphabetic order and then perform a hash look up.
Based on the observation that all words with the same anagrams shared the same histogram (counting of each individual letter), if the only query that we need is counting query (count the number of words in the same anagrams) then we should use hash map.
The mapping processing is following
word -> histogram -> count
Complexity
- word -> histogram : O(l) with l is the length of the words
- histogram -> count : O(1) as we can pre-compute the value of each entry (count)
The overall complexity is O(l), the solution require additional space O(m) with m is number of anagrams in the lists.
Now if we tight of space, then the best approach is to use a sorted list of words. We ordered all words based on its representative anagrams for example: 'abc', 'bac', 'cab' all be presented by 'abc'. We now can use a binary search to find the first element in the anagram (lower_bound() function in C++) and find the first element of the next anagram (upper_bound() function in C++) the number of words with same anagram is the difference of these two number.
The complexity of the approach is O(l) * O(log N) with N = 10000 and l is the length of the words however this approach require no additional space, the list of the same anagram is available immediately. Note that for the hashmap approach it takes O(N) to build this list.
If we have to count\return only number of anagrams for a string than why should we maintain words .. ?
We can use a HashMap, that will store key as a hash code (with hash collision for all anagrams - similar to my answer) and count as a value.
Each time a words occurs we will check in hashMap and just increase the count .. :)
Complexity will be O(1) to find all anagrams and space will be O(N), n = number of different anagram sets.
You have unique hash code because all anagrams share the same histogram. Using your hash code is one way to map between the histogram and the count but that is not the only hash function we can use. Basically any good hash function computed from the histogram would work. The reason that I don't think your hash function is a good choice due to the potential to overflow.
I will discuss to interviewer about what are the queries that application will support. if counting is the only query then we don't need to maintain the whole list.
import java.util.*;
public class Anagrams
{
private static Map<Character, Integer> charHashValuemap = new HashMap<Character, Integer>(26);
public static void main(String[] args)
{
populateCharHashValuemap();
generateAnagrams();
}
private static void generateAnagrams()
{
Map<Integer, Map<Integer, List<String>>> anagramMap = new HashMap<Integer, Map<Integer, List<String>>>();
Scanner sc = new Scanner(System.in);
System.out.println("Please enter the count of list:");
int count = sc.nextInt();
System.out.println("Please insert strings(press enter after each insert):");
int hashValue = 0;
for (int i = 0; i < count; i++)
{
String string = sc.next().toLowerCase();
hashValue = getStringHashValue(string);
if (anagramMap.containsKey(string.length()))
{
if (anagramMap.get(string.length()).containsKey(hashValue))
{
anagramMap.get(string.length()).get(hashValue).add(string);
}
else
{
List<String> innerList = new ArrayList<String>();
innerList.add(string);
anagramMap.get(string.length()).put(hashValue, innerList);
}
}
else
{
Map<Integer, List<String>> innerHashMap = new HashMap<Integer, List<String>>();
List<String> innerList = new ArrayList<String>();
innerList.add(string);
innerHashMap.put(hashValue, innerList);
anagramMap.put(string.length(), innerHashMap);
}
}
while (true)
{
System.out.println("Please insert string you want to check anagrams:");
String outputString = sc.next().toLowerCase();
int hashCode = getStringHashValue(outputString);
if (anagramMap.containsKey(outputString.length()))
{
if (anagramMap.get(outputString.length()).containsKey(hashCode))
{
for (String anagram : anagramMap.get(outputString.length()).get(hashCode))
{
System.out.println(anagram);
}
}
else
{
System.out.println("Anagrams doesnt exist for string :" + outputString);
}
}
else
{
System.out.println("Anagrams doesnt exist for string :" + outputString);
}
System.out.println("You want to continue(Y/N):");
if (sc.next().equalsIgnoreCase("N"))
{
break;
}
}
}
private static int getStringHashValue(String string)
{
char[] charArray = string.toCharArray();
int hashValue = 0;
for (char character : charArray)
{
hashValue += charHashValuemap.get(character);
}
return hashValue;
}
private static void populateCharHashValuemap()
{
charHashValuemap.put('a', 1);
charHashValuemap.put('b', 2);
charHashValuemap.put('c', 3);
charHashValuemap.put('d', 4);
charHashValuemap.put('e', 5);
charHashValuemap.put('f', 6);
charHashValuemap.put('g', 7);
charHashValuemap.put('h', 8);
charHashValuemap.put('i', 9);
charHashValuemap.put('j', 10);
charHashValuemap.put('k', 11);
charHashValuemap.put('l', 12);
charHashValuemap.put('m', 13);
charHashValuemap.put('n', 14);
charHashValuemap.put('o', 15);
charHashValuemap.put('p', 16);
charHashValuemap.put('q', 17);
charHashValuemap.put('r', 18);
charHashValuemap.put('s', 19);
charHashValuemap.put('t', 20);
charHashValuemap.put('u', 21);
charHashValuemap.put('v', 22);
charHashValuemap.put('w', 23);
charHashValuemap.put('x', 24);
charHashValuemap.put('y', 25);
charHashValuemap.put('z', 26);
}
}
Roughly (you can optimize, tweak this in obvious ways).
A string-key based hash table that "keeps" duplicates (doesn't overwrite).
One can think of how to design this, but since anagram equivalence classes are rarely large... who cares.
Simply hash these: < key=linearsort(string), data=string> into the hash table.
Then you are basically done.
How you design the details is boring....
- Ajeet October 17, 2013