Given array of words, group th

Microsoft Interview Question for Software Engineer in Tests

0

of 0 votes

28
Answers
Given array of words, group the anagrams
IP:{tar,rat,banana,atr}
OP:{[tar,rat,atr],[banana]}
- abhinav27rulez July 29, 2013 in United States | Report Duplicate | Flag | PURGE
Microsoft Software Engineer in Test Algorithm

Email me when people comment.

An error occurred in subscribing you.

Country: United States

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 7 vote

map<string, vector<string> m;
for (int i=0; i < strs.size(); i++)
{
    string temp = strs[i];
    sort(temp.begin(), temp.end());
    m[temp].push_back(strs[i]);
}
int k = 0;
for (auto it = m.begin(); it != m.end(); it++)
{
       vector<string>& v = it->second;
       for (int i=0; i < v.size(); i++)
             strs[k++] = v[i]; 
}

- Anonymous July 30, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Nice and clean. +1.

- oOZz July 30, 2013 | Flag

Comment hidden because of low score. Click to expand.

-2

of 2 votes

output of this code will be {[banana],[tar,rat,atr]} not the desire one.
I mean sorted with key value.

- Anonymous August 25, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 2 vote

Tries can be used to print all anagrams together.
a. Get all the anagrams related to a word under one leaf node.
b. Then again traverse the trie and print all anagrams together

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#define bool int
#define NO_OF_CHARS 26

// Structure to represent list node for indexes of words in
// the given sequence. The list nodes are used to connect
// anagrams at leaf nodes of Trie
struct IndexNode
{
    int index;
    struct IndexNode* next;
};

// Structure to represent a Trie Node
struct TrieNode
{
    bool isEnd;  // indicates end of word
    struct TrieNode* child[NO_OF_CHARS]; // 26 slots each for 'a' to 'z'
    struct IndexNode* head; // head of the index list
};


// A utility function to create a new Trie node
struct TrieNode* newTrieNode()
{
    struct TrieNode* temp = new TrieNode;
    temp->isEnd = 0;
    temp->head = NULL;
    for (int i = 0; i < NO_OF_CHARS; ++i)
        temp->child[i] = NULL;
    return temp;
}

//For qsort
int compare(const void* a, const void* b)
{  return *(char*)a - *(char*)b; }

/* A utility function to create a new linked list node */
struct IndexNode* newIndexNode(int index)
{
    struct IndexNode* temp = new IndexNode;
    temp->index = index;
    temp->next = NULL;
    return temp;
}

// A utility function to insert a word to Trie
void insert(struct TrieNode** root, char* word, int index)
{
    // Base case
    if (*root == NULL)
        *root = newTrieNode();

    if (*word != '\0')
        insert( &( (*root)->child[tolower(*word) - 'a'] ), word+1, index );
    else  // If end of the word reached
    {
        // Insert index of this word to end of index linked list
        if ((*root)->isEnd)
        {
            IndexNode* pCrawl = (*root)->head;
            while( pCrawl->next )
                pCrawl = pCrawl->next;
            pCrawl->next = newIndexNode(index);
        }
        else  // If Index list is empty
        {
            (*root)->isEnd = 1;
            (*root)->head = newIndexNode(index);
        }
    }
}

// This function traverses the built trie. When a leaf node is reached,
// all words connected at that leaf node are anagrams. So it traverses
// the list at leaf node and uses stored index to print original words
void printAnagramsUtil(struct TrieNode* root, char *wordArr[])
{
    if (root == NULL)
        return;

    // If a lead node is reached, print all anagrams using the indexes
    // stored in index linked list
    if (root->isEnd)
    {
        // traverse the list
        IndexNode* pCrawl = root->head;
        while (pCrawl != NULL)
        {
            printf( "%s \n", wordArr[ pCrawl->index ] );
            pCrawl = pCrawl->next;
        }
    }

    for (int i = 0; i < NO_OF_CHARS; ++i)
        printAnagramsUtil(root->child[i], wordArr);
}

// The main function that prints all anagrams together. wordArr[] is input
// sequence of words.
void printAnagramsTogether(char* wordArr[], int size)
{
    // Create an empty Trie
    struct TrieNode* root = NULL;

    // Iterate through all input words
    for (int i = 0; i < size; ++i)
    {
        // Create a buffer for this word and copy the word to buffer
        int len = strlen(wordArr[i]);
        char *buffer = new char[len+1];
        strcpy(buffer, wordArr[i]);

        // Sort the buffer
        qsort( (void*)buffer, strlen(buffer), sizeof(char), compare );

        // Insert the sorted buffer and its original index to Trie
        insert(&root,  buffer, i);
    }

    // Traverse the built Trie and print all anagrms together
    printAnagramsUtil(root, wordArr);
}


// Driver program to test above functions
int main()
{
    char* wordArr[] = {"tar","rat","banana","atr"};
    int size = sizeof(wordArr) / sizeof(wordArr[0]);
    printAnagramsTogether(wordArr, size);
    return 0;
}

- vgeek July 30, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Here how you identified anagrams from given list ?

- pirate July 30, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Insert the sorted order of each word in the trie. Since all the anagrams will end at the same leaf node. We can start a linked list at the leaf nodes where each node represents the index of the original array of words. Finally, traverse the Trie. While traversing the Trie, traverse each linked list one line at a time.

- vgeek July 30, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

can you please mention what would the time complexity for this method be?

- abhinav27rulez July 30, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

It would be nlogn as the quick sort algorithm is used and then trie is being traversed which takes O(n). So the worst case time complexity is O(nlogn).

- vgeek July 30, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 1 vote

Not sure if this is an efficient way to do it. Just used a HashMap. Please suggest if there are better ways.

import java.util.*;

public class GroupAnagrams {
	public static void main(String[] args){
		String[] arrwords = {"art","tar","top","pot","neat","rat","tape","nate","peat","random"};
		String[] srtdarrwords = sortAlphabetsInEachWord(arrwords);
		HashMap<String, String> wordmap = new HashMap<String,String>();
		int i=0;
		for(String s : srtdarrwords){
			System.out.println(s);
			if(wordmap.containsKey(s)){
				wordmap.put(s,wordmap.get(s)+","+arrwords[i++]);
				System.out.println("already contains, so adding");
				System.out.println(wordmap.get(s));
			}
			else{
				wordmap.put(s,arrwords[i++]);
			}
		}
		for(String s : wordmap.keySet()){
			System.out.println("["+wordmap.get(s)+"]");
		}
		
	}
	private static String[] sortAlphabetsInEachWord(String[] inparr){
		String[] retstr = new String[inparr.length];
		int i=0;
		for (String s : inparr){
			retstr[i++] = s;
		}
		i=0;
		for (String s : retstr){
			char[] alphword = s.toCharArray();
			Arrays.sort(alphword);
			s = new String(alphword);
			retstr[i++] = s;
		}
		return retstr;
	}

}

- sk July 30, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

This is optimal solution. You can ignore time complexity for sorting words which are bounded small and assume hashing function will gives in O(1).

The total time complexity will be O(n) for n words.

- pirate July 30, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Is this the optimal way to do this? what is the space complexity for this algorithm.

- Anonymous August 21, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

1. Create a node with two values for each word

Struct node {
		char *word
		char *signature;
	   }

where signature will be the sorted form of word, eg., tar,rat,atr (anagrams) will all have the same signature, "art"
2. Now sort the nodes based on their signature values, which will put all the anagram nodes together.

- Codac July 30, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Time complexity for this solution will be O(nlogn), where n is the total number of words. For simplicity I have ignored the time complexity to sort each word.

- as09 August 23, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

- keep a structure
- the struct contains -
the string - char *str;
the length - length of string - int length;

so, go thru the array one by one
- measure the length of the string
- based on the length start storing the string (node for every string) in the linked list such that the string with the lesser array length always comes as a first node in the list

- s July 31, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Please check this and correct me if this affects the inefficiency.

- Anonymous July 31, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

How about taking the first one and get all the permutations of the string and store all into hashmaps? And then take the next set of elements and search into hashMap. If found then add into the list otherwise neglect.

- Alice7 August 03, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Sorry I didn't realize that we need to store the same anagrams into same array element.

- Alice7 August 03, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

C# style:

var groups = from word in words
                       group word by GetWordAnagramHash(word);

where GetWordAnagramHash is:

private static int GetWordAnagramHash(string word)
{
       var key = 0;
       foreach (var letter in word)
       {
                key += (int)letter;
       }
       return key;
}

it t'll work if the words are not too long as the key is Int32.

- andrey.ilnitsky August 08, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

actually we need to add the length of the word to the hash:

var key = word.length;
       foreach (var letter in word)
       {
                key += (int)letter;
       }
       return key;

- andrey.ilnitsky August 08, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

better way:

private static Tuple<int, int> GetWordAnagramHash(string word)
        {
            var hash = 0;
            foreach (var letter in word)
            {
                hash += (int)letter;
            }
            return new Tuple<int,int>(word.Length, hash);
        }

- andrey.ilnitsky August 08, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

public static string[,] SortAnagrams(string[] inputAnagrams)
        {
            Dictionary<string, List<string>> container = new Dictionary<string, List<string>>();

            string signature = "";
            foreach (string word in inputAnagrams) 
            {
                signature = String.Concat((String.Copy(word).ToUpper()).OrderBy(c => c));

                if (container.ContainsKey(signature)) { container[signature].Add(word); }
                else { container.Add(signature, new List<string> { word }); }     
            }

            string[,] anagramList = new string[,] { };
            int ith = 0, jth = 0;

            foreach (string word in container.Keys)
            {
                List<string> anagrams = (container[word]);
                foreach (string anagram in anagrams)
                {
                    anagramList[ith, jth] = anagram;
                    jth++;
                }
                ith++;
            }
            return anagramList;
        }

}

- Anonymous August 09, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

static void AnagramGroups(string[] strs)
        {
            Hashtable h = new Hashtable();
            foreach (string s in strs)
            {
                int sum=0;
                for (int j=0;j<s.Length;j++)
                {
                    sum+=s[j];
                }
                if (h.ContainsKey(sum))
                    h[sum] = h[sum].ToString()+","+s;
                else
                    h.Add(sum,s);
            }
            Console.WriteLine("Anagram Groups:");

            foreach(string str in h.Values)
                Console.WriteLine(str);
        }

- vamsi September 17, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Did it using hash table. Trick is to use custom comparer for the hash table for your own anagram comparison:

public class AnagramEqualityComparer<T> : IEqualityComparer<T[]>
        {
            public bool Equals(T[] x, T[] y)
            {
                // we shouldn't be concerned about equality of x and y lengthes, as that
                // will be taken care of by GetHashCode() function
                bool isEqual = false;
                int i = 0;

                Hashtable hashTable = new Hashtable();
                for (i = 0; i < x.Length; i++)
                {
                    if (hashTable.ContainsKey(x[i]) == true)
                    {
                        int value = (int)hashTable[x[i]];
                        value++;
                        hashTable[x[i]] = value;
                    }
                    else
                    {
                        hashTable.Add(x[i], 1);
                    }
                }

                for (i = 0; i < y.Length; i++)
                {
                    if (hashTable.ContainsKey(y[i]) == true)
                    {
                        int value = (int)hashTable[y[i]];
                        value--;
                        if (value == 0)
                        {
                            hashTable.Remove(y[i]);
                        }
                        else
                        {
                            hashTable[y[i]] = value;
                        }
                    }
                    else
                    {
                        break;
                    }
                }

                if (i == y.Length)
                {
                    isEqual = true;
                }

                return isEqual;
            }

            public int GetHashCode(T[] obj)
            {
                return obj.Length;
            }
        }

        static List<string>[] PopulateAnagramGroups(string[] wordsList)
        {
            List<string>[] groups = null;

            if (wordsList != null && wordsList.Length != 0)
            {
                Dictionary<char[], List<string>> hashTable = new Dictionary<char[],
                    List<string>>(new AnagramEqualityComparer<char>());

                List<string> list = null;
                foreach (string word in wordsList)
                {
                    char[] array = word.ToCharArray();
                    if (hashTable.ContainsKey(array) == true)
                    {
                        list = hashTable[array];
                        list.Add(word);
                        hashTable[array] = list;
                    }
                    else
                    {
                        list = new List<string>();
                        list.Add(word);
                        hashTable.Add(array, list);
                    }
                }

                // now we have lists as values of hash table
                groups = hashTable.Values.ToArray();
            }

            return groups;
        }

- AK November 05, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

I have created a Similar Code:
Please Suggest if it is suitable for Time and space Complexity:

import java.util.*;
public class FindAnagrams {
	
	public static void main(String...args){
	String[] words = {"cat", "dog", "tac", "god", "act","mary","Mary","army"};	
	Map<String,Integer> map = new HashMap<String,Integer>();
		for(String val:words){
			int hash = getHash(val);
			map.put(val, hash);
		}
		Map<Integer,List<String>> maplist = new HashMap<Integer,List<String>>();
		
		for(Map.Entry<String, Integer> me:map.entrySet()){
			if(!maplist.containsKey(me.getValue())){
				List<String> ls = new ArrayList<String>();
				maplist.put(me.getValue(), ls);
				ls.add(me.getKey());				
			}
			else{
				List<String> ls = maplist.get(me.getValue());
				ls.add(me.getKey());
			}			
		}
		
		for(Map.Entry<Integer, List<String>> me: maplist.entrySet()){
			System.out.println("Key:"+me.getKey()+" Value:"+me.getValue());
		}
		
	}
	
	static int getHash(String val){
		char[] c = val.toCharArray();
		int hash = 0;
		for(char ch:c){
			String sc = String.valueOf(ch);
			hash=hash+sc.hashCode();
		}
		return hash;
	}	
}

- Pankaj Jaiswal June 26, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

check for input {"zaz", "ABCDK"} your code will break.

- vikas July 16, 2023 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

in ruby

def group_anagrams arr = []
  sort_arr, sort_hash, new_sort_arr, newer_sort_arr = [], {}, [], []
  arr = arr.each {|w| sort_arr << w.split('').sort.join}
  0.upto(arr.size-1) do |i|
    sort_hash.store(arr[i], sort_arr[i])
  end
  new_sort_arr << sort_hash.sort_by {|k, v| k.length}.to_h.keys[0..arr.size-1]
end

- VinceBarresi August 27, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

-1

of 3 vote

One solution to this question using Hash Table. consider each word, sort it and add as key to hash table if not present. The value for the key would be a list of all anagrams with the same key. I wanted to know about the time complexities, To sort the characters in an array, suppose O(n log n) To store in the hash table it would be O(n), a total of O(n*nlogn). Is there a better algorithm? with lesser time complexity?

- abhinav27rulez July 29, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

what about making one linear pass to set up hashmaps of chars for each word that map char > number of occurrences in that word. Then compare all elements in each hashmap to every other hash map. Should be O(n + n * n) or just O(n^2) should be a bit faster than the O(n^2 logn) no?

- z.d.donnell July 29, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 1 vote

Good one. The length of any word in English is bounded above by some constant number. So sorting a word need not be expressed with an asymptotic complexity in terms of n. But yes, if you have n words, your total sorting can become of complexity O(n). To store in a hashtable, you need O(n) extra space. However, inserts, retrievals from a hash table (with a good hash function is assumed to be) is of O(1) time complexity.

It is not clear as to how you arrived at O(n^2lgn) time complexity.

- Murali Mohan July 30, 2013 | Flag

CareerCup

Microsoft Interview Question for Software Engineer in Tests

Books

Videos

Resume Review

Mock Interviews