Given a dictionary of words, a

Google Interview Question for Software Engineer / Developers

-2

of 6 votes

33
Answers
Given a dictionary of words, and a set of characters, judge if all the characters can form the words from the dictionary, without any characters left.
For example, given the dictionary {hello, world, is, my, first, program},
if the characters set is "iiifrssst", you should return 'true' because you can form {is, is, first} from the set;
if the character set is "eiifrsst", you should return 'false' because you cannot use all the characters from the set.

P.S. there may be tens of thousands of words in the dictionary, and the chars set length could be up to hundreds, so I really need some efficient algorithm.
- DoZerg March 20, 2014 in United States | Report Duplicate | Flag | PURGE
Google Software Engineer / Developer Algorithm

Email me when people comment.

An error occurred in subscribing you.

Country: United States
Interview Type: Phone Interview

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 5 vote

this is np-hard since subset sum can be easily reduced to this problem.

- Anonymous March 21, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 3 vote

Express each word in a[26], with a[i] = number of (c - 'a') in that word.
The input string is also processed in the above way.

Now search:

#include <vector>                          
#include <string>                          
#include <boost/foreach.hpp>               
#include <stdio.h>                         
#include <iostream>                        

struct Dictionary
{                
  Dictionary(std::vector<std::string> strs)
  {                                        
    BOOST_FOREACH(std::string s, strs)     
    {                                      
      int* p = new int[26];                
      memset(p, 0, sizeof(int) * 26);      

      BOOST_FOREACH(char c, s)
      {                       
        p[c - 'a']++;         
      }                       

      words.push_back(p);
    }                    
  }                      

  int* getWord(int i)
  {                  
    return words[i]; 
  }                  

  int size()
  {         
    return words.size();
  }                     

  std::vector<int*> words;
};                        

int* subtract(int* str, int* word)
{                                 
  int* result = new int[26];      
  memcpy(result, str, 26 * sizeof(int));

  for (int i = 0; i < 26; ++i)
  {                           
    result[i] -= word[i];     
    if (result[i] < 0)        
      return NULL;            
  }                           

  return result;
}               

bool check_all_match(int* r)
{                           
  for (int i = 0; i < 26; ++i)
  {                           
    if (r[i] != 0)            
      return false;           
  }                           
  return true;                
}                             

bool search_words(int* str, Dictionary& dict)
{
  int s = dict.size();

  for (int i = 0; i < s; ++i)
  {
    int* word = dict.getWord(i);
    int* r = subtract(str, word);

    if (r != NULL)
    {
      if (check_all_match(r))
        return true;
      else
      {
        bool ret = search_words(r, dict);
        if (ret) return true;
      }
    }
  }

  return false;
}

bool can_extract(char* str, Dictionary& dict)
{
  int* pstr = new int[26];
  memset(pstr, 0, sizeof(int) * 26);

  int len = strlen(str);

  for (int i = 0; i < len; ++i)
  {
    pstr[str[i] - 'a']++;
  }

  return search_words(pstr, dict);
}

int main(int argc, char* argv[])
{
  std::vector<std::string> v;
  v.push_back("hello");
  v.push_back("world");
  v.push_back("is");
  v.push_back("my");
  v.push_back("first");
  v.push_back("program");

  Dictionary dict(v);

  if (can_extract(argv[1], dict))
    std::cout << "found.";
  else
    std::cout << "not found.";

  return 0;
}

- guest.guest March 21, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 2 votes

I can see your algorithm is based on recursion, and it's right for small test cases. But it consumes too much memory by allocating a lot of int[26], which will dramatically slow down the process. And there is no booking strategy during the process, so I don't think it's efficient enough to pass the interview.

- DoZerg March 21, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 4 vote

I think we can compare it with subset sum ,
in subset sum array and a key value is given,

now in this problem,
let us consider array as dictionary,and each word represents the count of each letter present in the word,

and the given set of characters will be the key we have to find,

we can do t through backtracking(similar to subset sum).

- Anonymous March 20, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

-1

of 1 vote

Very Good!

- Anonymous March 20, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

This is right, and in fact proves NP-Hardness of this problem. People who downvote are utterly clueless.

- Anonymous March 21, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

I agree with you about the NP-hardness, and I don't like people who down-votes something without thinking.
But as I know, subset sum problem is one dimensional, and this problem could be at least a 26-D subset sum problem. And I cannot see the similar way to solve it by comparing to the normal subset sum problem.

- DoZerg March 21, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Yes, if you had an algorithm for this problem, then you could solve subset sum, and that is why this is NP-Hard.

But given that it is NP-Hard, just brute force it? (Backtracking solutions etc mentioned earlier).

- Anonymous March 21, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 2 vote

- Anonymous March 20, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 4 vote

What is the question?
Someone explain it more clearly please?

- RedHat Hacker March 21, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Given a multiset of letters, and a set of words (each word being a multiset itself), can you split the given multiset into multisets, each of which is a word.

Got it?

- La MultiPass. March 21, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

First process the dictionary according to the length of the words and convert it into an array of list such that array[1] contains words of length 1 , array[2] contains words of length 2 and so on.Store the length of the longest word is a variable say max_length. Going by the example value of max_length will be 7. Now get the length of the character set. Let's say the character set is {iiifrssst} , it's length is 9. It's more than max_lenght so we go to array[7] , which is program , we start by first character which is 'p' which is not present in the character set so we discard it , next we go to array[6] , nothing is there so we go to array[5] , and start with 'hello' again 'h' is not present in character set so we discard it , next we go to 'world' , discard it , next we go to 'first' , this fits so remove the characters constituting 'first' from the given character set {iiifrssst} , now we are left with {iiss} whose lenght is 4. So we go to array[4] , nothing is there , we go to array[3] , nothing there , we go to array[2] ,and 'is' is a match. We remove the 'i' and 's' characters from character set which is now reduced to {is} . Again following the same procedure makes the character set empty. Thus we return true.

- praveenkcs28 March 21, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

I think you would miss some combinations without backtracing. Say the dictionary is {is, so, iso}, and given char set "isso", what would happen if you remove the 'iso' from the char set?

- DoZerg March 21, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

This is the semi-naive first solution that just sorts the chars set and each word before processing. This give a complexity of dic_size * set_size.

bool is_valid_dictionary( vector<string>& dictionary, string char_set )
{
	sort( char_set.begin( ), char_set.end( ) );
	vector<int> counters( char_ser.size( ), 0 );

	for ( string word: dictionary )
	{
		sort( word.begin( ), word.end( ) );
	
		int j = 0;
		for( char ch: word )
		{
			while( char_set[j] < ch)
			{
				++j;
				if ( j == char_set.size( ) ) return false;
			}
	
			if( ch < char_set[j] ) return false;

			counters[ j ]++;
		}		
	}

	for ( int counter: counters )
	{
		if ( counter == 0 )
		{
			return false;
		}
	}
	
	return true;
}

- Riccardo March 22, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

This is a better solution in case of huge dictionaries and character set.
Uses counters on arrays of MAX_CHAR size (256).

bool is_dictionary_valid( const vector<string>& dictionary, const string& charset )
{
const int NUM_CHARS = CHAR_MAX + 1;

int charset_counters[ NUM_CHARS ];
for ( int& val: charset_counters )
{
val = 0; // Resets all counters.
}
for ( char ch: charset )
{
charset_counters[ch]++;
}

bool charset_matches[ NUM_CHARS ];
for ( bool& val: charset_matches )
{
val = false; // Resets all match flags.
}

// For each word into the dictionary:
for ( const string& word : dictionary )
{
// Checks is the word is made only of available characters:
for ( char ch: word )
{
if ( charset_counters[ch] == 0 )
{
return false;
}

charset_matches[ch] = true;
charset_counters[ch]--;
}

for ( char ch: word )
{
charset_counters[ch]++; // Restores the counters.
}
}

// Tests if there was one character of the charset that have never been used inside
// the dictionary:
for ( int i = 0; i < NUM_CHARS; ++i )
{
if ( charset_counters[i] > 0 && !charset_matches[i] )
{
return false;
}
}

return true;
}

- Riccardo March 22, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

This is a better solution in case of huge dictionaries and character set.
Uses counters on arrays of MAX_CHAR size (256).

bool is_dictionary_valid( const vector<string>& dictionary, const string& charset ) 
{ 
	const int NUM_CHARS = CHAR_MAX + 1; 

	int charset_counters[ NUM_CHARS ]; 
	for ( int& val: charset_counters ) 
	{ 
		val = 0; // Resets all counters. 
	} 
	for ( char ch: charset ) 
	{ 
		charset_counters[ch]++; 
	} 

	bool charset_matches[ NUM_CHARS ]; 
	for ( bool& val: charset_matches ) 
	{ 
		val = false; // Resets all match flags. 
	} 

	// For each word into the dictionary: 
	for ( const string& word : dictionary ) 
	{ 
		// Checks if the word is made only of available characters: 
		for ( char ch: word ) 
		{ 
			if ( charset_counters[ch] == 0 ) 
			{ 
				return false; 
			} 

			charset_matches[ch] = true; 
			charset_counters[ch]--; 
		} 

		for ( char ch: word ) 
		{ 
			charset_counters[ch]++; // Restores the counters. 
		} 
	} 

	// Tests if there was one character of the charset that have never been used inside 
	// the dictionary: 
	for ( int i = 0; i < NUM_CHARS; ++i ) 
	{ 
		if ( charset_counters[i] > 0 && !charset_matches[i] ) 
		{ 
			return false; 
		} 
	} 

	return true; 
}

- Riccardo March 22, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Your greedy algorithm is not correct.
See this:
dictionary = {abcde, abc, cde};
charset = {abccde};

Your code will choose the word "abcde" and left one 'c' and return false.
But one possible solution is choosing "abc" and "cde" then return true.

- hzhua May 12, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

take all the permutations of the pattern,and do it similar to word break problem,for all the combination of the pattern

- Anonymous April 12, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 2 vote

Looks like some of the provided algo do not consider the case that the first matched word should be skipped

Dictionary: { "abc", "ab", cde" }
Character set: "abcde"

My try with simple recursion, reduce the Dict or CharSet at a time.

boolean matchDictionary (Dictionary dict, CharSet  charset)
{
   if(charset is empty) return true;
   if(dict is empty) return false;

   String firstWord = getFirstWord(dict);
   Dictionary reducedDict = dict.remove(firstWord);

  if (chars in charset can make firstWord)
     // if firstWord is a candidate, may either try to use it (reduce charset) or may try to skip
     // it (reduce dictionary)
      return matchDictionary(dict, charset.remove(firstWord.getChars()) or
                  matchDictionary(reducedDict, charset);

   // otherwise, just remove it from dict 
   return matchDictionary(reducedDict, charset);
}

- wallace May 07, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

public class TestDictionary {

    public static void main(String[] s) {
        String[] dictionary = {"first", "is", "program", "ftt", "if"};
        String seq = "iiifrssst";
        HashSet<Character> uniqSeq = new HashSet();
        char[] cseq = seq.toCharArray();
        for (int i = 0; i < cseq.length; i++) {
            uniqSeq.add(cseq[i]);
        }
        String sSeq = "";
        for (Iterator<Character> it = uniqSeq.iterator(); it.hasNext();) {
            sSeq += it.next().toString();
        }
        boolean noWords = true;
        for (String w : dictionary) {
            int cover = 0;
            char[] cW = w.toCharArray();
            for (int k = 0; k < cW.length; k++) {
                if (sSeq.contains("" + cW[k])) {
                    cover++;
                }
            }
            //cover all word
            if (cW.length == cover) {
                noWords = false;
                System.out.println(w);
            }
        }
        if (noWords) {
            System.out.println("false");
        } else {
            System.out.println("true");
        }

    }
}

- Sabino May 19, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

public class TestDictionary {

    public static void main(String[] s) {
        String[] dictionary = {"first", "is", "program", "ft", "if"};
        String seq = "iiifrssst";
        HashSet<Character> uniqSeq = new HashSet();
        char[] cseq = seq.toCharArray();
        for (int i = 0; i < cseq.length; i++) {
            uniqSeq.add(cseq[i]);
        }
        String sSeq = "";
        for (Iterator<Character> it = uniqSeq.iterator(); it.hasNext();) {
            sSeq += it.next().toString();
        }
        boolean noWords = true;
        for (String w : dictionary) {
            int cover = 0;
            char[] cW = w.toCharArray();
            for (int k = 0; k < cW.length; k++) {
                if (sSeq.contains("" + cW[k])) {
                    cover++;
                }
                else{
                    break;
                }
            }
            //cover all word
            if (cW.length == cover) {
                noWords = false;
                System.out.println(w);
            }
        }
        if (noWords) {
            System.out.println("false");
        } else {
            System.out.println("true");
        }

    }
//
}
//

- Sabino May 19, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

public class TestDictionary {

    public static void main(String[] s) {
        String[] dictionary = {"first", "is", "program", "ft", "if"};
        String seq = "iiifrssst";
        HashSet<Character> uniqSeq = new HashSet();
        char[] cseq = seq.toCharArray();
        for (int i = 0; i < cseq.length; i++) {
            uniqSeq.add(cseq[i]);
        }
        String sSeq = "";
        for (Iterator<Character> it = uniqSeq.iterator(); it.hasNext();) {
            sSeq += it.next().toString();
        }
        boolean noWords = true;
        for (String w : dictionary) {
            int cover = 0;
            char[] cW = w.toCharArray();
            for (int k = 0; k < cW.length; k++) {
                if (sSeq.contains("" + cW[k])) {
                    cover++;
                }
                else{
                    break;
                }
            }
            //cover all word
            if (cW.length == cover) {
                noWords = false;
                System.out.println(w);
            }
        }
        if (noWords) {
            System.out.println("false");
        } else {
            System.out.println("true");
        }

    }
}
//

- Sabino May 19, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Variation of the knapsack problem?

Convert the character string and all the words in the dictionary to int a[26] where a[i] = the number of times a certain character appears in the set/ dictionary words. The converted set is your 'knapsack';

Difference here is that you can choose each word multiple times. So the knapsack algorithm needs to be extended to use a 3d array instead of 2d one to cache the sub problem results. And at each step you try to choose the word 0 .. n times until it doesn't fit in the 'knapsack' anymore then recursively solve for the next word and the remainder of the set?

- Rincer July 11, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Variation of the knapsack problem?

Convert the character string and all the words in the dictionary to int a[26] where a[i] = the number of times a certain character appears in the set/ dictionary words. The converted set is your 'knapsack';

Base cases are if all elements of a[26] are 0, set is empty return true, otherwise if then word index is -1 return false (tried all the words and there are still letters left).

if not a base case then in a loop try to choose the current word 0..m times and recursively solve for the word current-1 with the remainder of the set/knapsack. If any of the solutions return true then return true. otherwise if you can't choose the current word anymore because the set doesnt have enough letters but is not empty return false.

- Rincer July 11, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Please see this link.

thenoisychannel.com/2011/08/08/retiring-a-great-interview-problem/

- aviundefined August 09, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Basically idea consist of two major things:
1 apply dynamic programming approach to recursively narrow down appropriate words in a set W that fulfill character set C by taking word w from W and iterating over C - w
2. obtain constant time check if word w could be constructed by characters from array C. For that we might take a look to an array of bloom filter alike structures that represent number of occurrences of a specific character as a bit array (since there up 26 possible characters in English alphabet then this array be represented as an int[], where length of this array should be agreed and assumed to, say, 5 - max number of occurrences of the same letter in a any word)

- Dmitry November 22, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

-1

of 1 vote

form trie out of words in the dictionary and do backtracking using characters in the given set

- Anonymous March 20, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

-1

of 1 vote

You might want to sort the individual words first, before inserting/looking up in trie.

- Le snob. March 20, 2014 | Flag

Comment hidden because of low score. Click to expand.

-1

of 1 vote

dynamic programming approach : transform all words in dictionary and chars in the set to dictionaries of the form dict<char : count> . at each stage, consider each word against char dict (subtract count of intersecting chars in the char dict); if char dict reduces to all zeroes, return true. return false when total count in chars dict is below count in all word dicts

- Anonymous March 20, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

-1

of 3 vote

Approach:
Count the number of each character in the character set and store in a char array. Look for the start of every dictionary word in char array. If not present ignore the dictionary word. If present traverse the through the char array and subtracting the count for every character in the dictionary word in the char array.

If end of dictionary word is reached then switch to the next dictionary word, if not then traverse the dictionary word from the last not-found character and increment the character in the char array for every character in the dictionary word.

public static void dictionary (String [] dictionary, char [] charset){
	int len=charset.length;
	int [] arr = new int [256];
	int count=len,j,k,rep=len;

for(int i=0;i<len;i++){
		arr[charset[i]]++;
}

do{
		rep=count;
for(int i=0;i<dictionary.length;i++){
		if (arr[dictionary[i].charAt(0)] != 0){
			for(j=0;j<dictionary[i].length();j++){
				if (arr[dictionary[i].charAt(j)] != 0){
					arr[dictionary[i].charAt(j)]--;
					count--;
			}else
break;	
}
if (j != dictionary[i].length()){
		for(k=j-1;k>=0;k--){
			arr[dictionary[i].charAt(k)]++;
		}
	count=len;
}
if (count==0)
		break;
}
}
}while(rep != count);

if (count==0)
	System.out.println(“true”);
else
	System.out.println(“false”);

}

- Bhaskar March 20, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Down voter: please explain the reason to down vote. If there was a test case that failed, please mention that.

- Bhaskar March 21, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Approach is good but complexity is going to depend upon dictionary size.

- Anonymous March 27, 2014 | Flag

Comment hidden because of low score. Click to expand.

CareerCup

Google Interview Question for Software Engineer / Developers

Books

Videos

Resume Review

Mock Interviews