Microsoft Interview Question
Software Engineer in TestsCountry: United States
Tries can be used to print all anagrams together.
a. Get all the anagrams related to a word under one leaf node.
b. Then again traverse the trie and print all anagrams together
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#define bool int
#define NO_OF_CHARS 26
// Structure to represent list node for indexes of words in
// the given sequence. The list nodes are used to connect
// anagrams at leaf nodes of Trie
struct IndexNode
{
int index;
struct IndexNode* next;
};
// Structure to represent a Trie Node
struct TrieNode
{
bool isEnd; // indicates end of word
struct TrieNode* child[NO_OF_CHARS]; // 26 slots each for 'a' to 'z'
struct IndexNode* head; // head of the index list
};
// A utility function to create a new Trie node
struct TrieNode* newTrieNode()
{
struct TrieNode* temp = new TrieNode;
temp->isEnd = 0;
temp->head = NULL;
for (int i = 0; i < NO_OF_CHARS; ++i)
temp->child[i] = NULL;
return temp;
}
//For qsort
int compare(const void* a, const void* b)
{ return *(char*)a - *(char*)b; }
/* A utility function to create a new linked list node */
struct IndexNode* newIndexNode(int index)
{
struct IndexNode* temp = new IndexNode;
temp->index = index;
temp->next = NULL;
return temp;
}
// A utility function to insert a word to Trie
void insert(struct TrieNode** root, char* word, int index)
{
// Base case
if (*root == NULL)
*root = newTrieNode();
if (*word != '\0')
insert( &( (*root)->child[tolower(*word) - 'a'] ), word+1, index );
else // If end of the word reached
{
// Insert index of this word to end of index linked list
if ((*root)->isEnd)
{
IndexNode* pCrawl = (*root)->head;
while( pCrawl->next )
pCrawl = pCrawl->next;
pCrawl->next = newIndexNode(index);
}
else // If Index list is empty
{
(*root)->isEnd = 1;
(*root)->head = newIndexNode(index);
}
}
}
// This function traverses the built trie. When a leaf node is reached,
// all words connected at that leaf node are anagrams. So it traverses
// the list at leaf node and uses stored index to print original words
void printAnagramsUtil(struct TrieNode* root, char *wordArr[])
{
if (root == NULL)
return;
// If a lead node is reached, print all anagrams using the indexes
// stored in index linked list
if (root->isEnd)
{
// traverse the list
IndexNode* pCrawl = root->head;
while (pCrawl != NULL)
{
printf( "%s \n", wordArr[ pCrawl->index ] );
pCrawl = pCrawl->next;
}
}
for (int i = 0; i < NO_OF_CHARS; ++i)
printAnagramsUtil(root->child[i], wordArr);
}
// The main function that prints all anagrams together. wordArr[] is input
// sequence of words.
void printAnagramsTogether(char* wordArr[], int size)
{
// Create an empty Trie
struct TrieNode* root = NULL;
// Iterate through all input words
for (int i = 0; i < size; ++i)
{
// Create a buffer for this word and copy the word to buffer
int len = strlen(wordArr[i]);
char *buffer = new char[len+1];
strcpy(buffer, wordArr[i]);
// Sort the buffer
qsort( (void*)buffer, strlen(buffer), sizeof(char), compare );
// Insert the sorted buffer and its original index to Trie
insert(&root, buffer, i);
}
// Traverse the built Trie and print all anagrms together
printAnagramsUtil(root, wordArr);
}
// Driver program to test above functions
int main()
{
char* wordArr[] = {"tar","rat","banana","atr"};
int size = sizeof(wordArr) / sizeof(wordArr[0]);
printAnagramsTogether(wordArr, size);
return 0;
}
Insert the sorted order of each word in the trie. Since all the anagrams will end at the same leaf node. We can start a linked list at the leaf nodes where each node represents the index of the original array of words. Finally, traverse the Trie. While traversing the Trie, traverse each linked list one line at a time.
Not sure if this is an efficient way to do it. Just used a HashMap. Please suggest if there are better ways.
import java.util.*;
public class GroupAnagrams {
public static void main(String[] args){
String[] arrwords = {"art","tar","top","pot","neat","rat","tape","nate","peat","random"};
String[] srtdarrwords = sortAlphabetsInEachWord(arrwords);
HashMap<String, String> wordmap = new HashMap<String,String>();
int i=0;
for(String s : srtdarrwords){
System.out.println(s);
if(wordmap.containsKey(s)){
wordmap.put(s,wordmap.get(s)+","+arrwords[i++]);
System.out.println("already contains, so adding");
System.out.println(wordmap.get(s));
}
else{
wordmap.put(s,arrwords[i++]);
}
}
for(String s : wordmap.keySet()){
System.out.println("["+wordmap.get(s)+"]");
}
}
private static String[] sortAlphabetsInEachWord(String[] inparr){
String[] retstr = new String[inparr.length];
int i=0;
for (String s : inparr){
retstr[i++] = s;
}
i=0;
for (String s : retstr){
char[] alphword = s.toCharArray();
Arrays.sort(alphword);
s = new String(alphword);
retstr[i++] = s;
}
return retstr;
}
}
This is optimal solution. You can ignore time complexity for sorting words which are bounded small and assume hashing function will gives in O(1).
The total time complexity will be O(n) for n words.
1. Create a node with two values for each word
Struct node {
char *word
char *signature;
}
where signature will be the sorted form of word, eg., tar,rat,atr (anagrams) will all have the same signature, "art"
2. Now sort the nodes based on their signature values, which will put all the anagram nodes together.
- keep a structure
- the struct contains -
the string - char *str;
the length - length of string - int length;
so, go thru the array one by one
- measure the length of the string
- based on the length start storing the string (node for every string) in the linked list such that the string with the lesser array length always comes as a first node in the list
How about taking the first one and get all the permutations of the string and store all into hashmaps? And then take the next set of elements and search into hashMap. If found then add into the list otherwise neglect.
C# style:
var groups = from word in words
group word by GetWordAnagramHash(word);
where GetWordAnagramHash is:
private static int GetWordAnagramHash(string word)
{
var key = 0;
foreach (var letter in word)
{
key += (int)letter;
}
return key;
}
it t'll work if the words are not too long as the key is Int32.
actually we need to add the length of the word to the hash:
var key = word.length;
foreach (var letter in word)
{
key += (int)letter;
}
return key;
public static string[,] SortAnagrams(string[] inputAnagrams)
{
Dictionary<string, List<string>> container = new Dictionary<string, List<string>>();
string signature = "";
foreach (string word in inputAnagrams)
{
signature = String.Concat((String.Copy(word).ToUpper()).OrderBy(c => c));
if (container.ContainsKey(signature)) { container[signature].Add(word); }
else { container.Add(signature, new List<string> { word }); }
}
string[,] anagramList = new string[,] { };
int ith = 0, jth = 0;
foreach (string word in container.Keys)
{
List<string> anagrams = (container[word]);
foreach (string anagram in anagrams)
{
anagramList[ith, jth] = anagram;
jth++;
}
ith++;
}
return anagramList;
}
}
static void AnagramGroups(string[] strs)
{
Hashtable h = new Hashtable();
foreach (string s in strs)
{
int sum=0;
for (int j=0;j<s.Length;j++)
{
sum+=s[j];
}
if (h.ContainsKey(sum))
h[sum] = h[sum].ToString()+","+s;
else
h.Add(sum,s);
}
Console.WriteLine("Anagram Groups:");
foreach(string str in h.Values)
Console.WriteLine(str);
}
Did it using hash table. Trick is to use custom comparer for the hash table for your own anagram comparison:
public class AnagramEqualityComparer<T> : IEqualityComparer<T[]>
{
public bool Equals(T[] x, T[] y)
{
// we shouldn't be concerned about equality of x and y lengthes, as that
// will be taken care of by GetHashCode() function
bool isEqual = false;
int i = 0;
Hashtable hashTable = new Hashtable();
for (i = 0; i < x.Length; i++)
{
if (hashTable.ContainsKey(x[i]) == true)
{
int value = (int)hashTable[x[i]];
value++;
hashTable[x[i]] = value;
}
else
{
hashTable.Add(x[i], 1);
}
}
for (i = 0; i < y.Length; i++)
{
if (hashTable.ContainsKey(y[i]) == true)
{
int value = (int)hashTable[y[i]];
value--;
if (value == 0)
{
hashTable.Remove(y[i]);
}
else
{
hashTable[y[i]] = value;
}
}
else
{
break;
}
}
if (i == y.Length)
{
isEqual = true;
}
return isEqual;
}
public int GetHashCode(T[] obj)
{
return obj.Length;
}
}
static List<string>[] PopulateAnagramGroups(string[] wordsList)
{
List<string>[] groups = null;
if (wordsList != null && wordsList.Length != 0)
{
Dictionary<char[], List<string>> hashTable = new Dictionary<char[],
List<string>>(new AnagramEqualityComparer<char>());
List<string> list = null;
foreach (string word in wordsList)
{
char[] array = word.ToCharArray();
if (hashTable.ContainsKey(array) == true)
{
list = hashTable[array];
list.Add(word);
hashTable[array] = list;
}
else
{
list = new List<string>();
list.Add(word);
hashTable.Add(array, list);
}
}
// now we have lists as values of hash table
groups = hashTable.Values.ToArray();
}
return groups;
}
I have created a Similar Code:
Please Suggest if it is suitable for Time and space Complexity:
import java.util.*;
public class FindAnagrams {
public static void main(String...args){
String[] words = {"cat", "dog", "tac", "god", "act","mary","Mary","army"};
Map<String,Integer> map = new HashMap<String,Integer>();
for(String val:words){
int hash = getHash(val);
map.put(val, hash);
}
Map<Integer,List<String>> maplist = new HashMap<Integer,List<String>>();
for(Map.Entry<String, Integer> me:map.entrySet()){
if(!maplist.containsKey(me.getValue())){
List<String> ls = new ArrayList<String>();
maplist.put(me.getValue(), ls);
ls.add(me.getKey());
}
else{
List<String> ls = maplist.get(me.getValue());
ls.add(me.getKey());
}
}
for(Map.Entry<Integer, List<String>> me: maplist.entrySet()){
System.out.println("Key:"+me.getKey()+" Value:"+me.getValue());
}
}
static int getHash(String val){
char[] c = val.toCharArray();
int hash = 0;
for(char ch:c){
String sc = String.valueOf(ch);
hash=hash+sc.hashCode();
}
return hash;
}
}
in ruby
def group_anagrams arr = []
sort_arr, sort_hash, new_sort_arr, newer_sort_arr = [], {}, [], []
arr = arr.each {|w| sort_arr << w.split('').sort.join}
0.upto(arr.size-1) do |i|
sort_hash.store(arr[i], sort_arr[i])
end
new_sort_arr << sort_hash.sort_by {|k, v| k.length}.to_h.keys[0..arr.size-1]
end
One solution to this question using Hash Table. consider each word, sort it and add as key to hash table if not present. The value for the key would be a list of all anagrams with the same key. I wanted to know about the time complexities, To sort the characters in an array, suppose O(n log n) To store in the hash table it would be O(n), a total of O(n*nlogn). Is there a better algorithm? with lesser time complexity?
what about making one linear pass to set up hashmaps of chars for each word that map char > number of occurrences in that word. Then compare all elements in each hashmap to every other hash map. Should be O(n + n * n) or just O(n^2) should be a bit faster than the O(n^2 logn) no?
Good one. The length of any word in English is bounded above by some constant number. So sorting a word need not be expressed with an asymptotic complexity in terms of n. But yes, if you have n words, your total sorting can become of complexity O(n). To store in a hashtable, you need O(n) extra space. However, inserts, retrievals from a hash table (with a good hash function is assumed to be) is of O(1) time complexity.
It is not clear as to how you arrived at O(n^2lgn) time complexity.
- Anonymous July 30, 2013