Google Interview Question
Software Engineer / DevelopersCountry: United States
Interview Type: Phone Interview
Express each word in a[26], with a[i] = number of (c - 'a') in that word.
The input string is also processed in the above way.
Now search:
#include <vector>
#include <string>
#include <boost/foreach.hpp>
#include <stdio.h>
#include <iostream>
struct Dictionary
{
Dictionary(std::vector<std::string> strs)
{
BOOST_FOREACH(std::string s, strs)
{
int* p = new int[26];
memset(p, 0, sizeof(int) * 26);
BOOST_FOREACH(char c, s)
{
p[c - 'a']++;
}
words.push_back(p);
}
}
int* getWord(int i)
{
return words[i];
}
int size()
{
return words.size();
}
std::vector<int*> words;
};
int* subtract(int* str, int* word)
{
int* result = new int[26];
memcpy(result, str, 26 * sizeof(int));
for (int i = 0; i < 26; ++i)
{
result[i] -= word[i];
if (result[i] < 0)
return NULL;
}
return result;
}
bool check_all_match(int* r)
{
for (int i = 0; i < 26; ++i)
{
if (r[i] != 0)
return false;
}
return true;
}
bool search_words(int* str, Dictionary& dict)
{
int s = dict.size();
for (int i = 0; i < s; ++i)
{
int* word = dict.getWord(i);
int* r = subtract(str, word);
if (r != NULL)
{
if (check_all_match(r))
return true;
else
{
bool ret = search_words(r, dict);
if (ret) return true;
}
}
}
return false;
}
bool can_extract(char* str, Dictionary& dict)
{
int* pstr = new int[26];
memset(pstr, 0, sizeof(int) * 26);
int len = strlen(str);
for (int i = 0; i < len; ++i)
{
pstr[str[i] - 'a']++;
}
return search_words(pstr, dict);
}
int main(int argc, char* argv[])
{
std::vector<std::string> v;
v.push_back("hello");
v.push_back("world");
v.push_back("is");
v.push_back("my");
v.push_back("first");
v.push_back("program");
Dictionary dict(v);
if (can_extract(argv[1], dict))
std::cout << "found.";
else
std::cout << "not found.";
return 0;
}
I can see your algorithm is based on recursion, and it's right for small test cases. But it consumes too much memory by allocating a lot of int[26], which will dramatically slow down the process. And there is no booking strategy during the process, so I don't think it's efficient enough to pass the interview.
I think we can compare it with subset sum ,
in subset sum array and a key value is given,
now in this problem,
let us consider array as dictionary,and each word represents the count of each letter present in the word,
and the given set of characters will be the key we have to find,
we can do t through backtracking(similar to subset sum).
This is right, and in fact proves NP-Hardness of this problem. People who downvote are utterly clueless.
I agree with you about the NP-hardness, and I don't like people who down-votes something without thinking.
But as I know, subset sum problem is one dimensional, and this problem could be at least a 26-D subset sum problem. And I cannot see the similar way to solve it by comparing to the normal subset sum problem.
I think we can compare it with subset sum ,
in subset sum array and a key value is given,
now in this problem,
let us consider array as dictionary,and each word represents the count of each letter present in the word,
and the given set of characters will be the key we have to find,
we can do t through backtracking(similar to subset sum).
First process the dictionary according to the length of the words and convert it into an array of list such that array[1] contains words of length 1 , array[2] contains words of length 2 and so on.Store the length of the longest word is a variable say max_length. Going by the example value of max_length will be 7. Now get the length of the character set. Let's say the character set is {iiifrssst} , it's length is 9. It's more than max_lenght so we go to array[7] , which is program , we start by first character which is 'p' which is not present in the character set so we discard it , next we go to array[6] , nothing is there so we go to array[5] , and start with 'hello' again 'h' is not present in character set so we discard it , next we go to 'world' , discard it , next we go to 'first' , this fits so remove the characters constituting 'first' from the given character set {iiifrssst} , now we are left with {iiss} whose lenght is 4. So we go to array[4] , nothing is there , we go to array[3] , nothing there , we go to array[2] ,and 'is' is a match. We remove the 'i' and 's' characters from character set which is now reduced to {is} . Again following the same procedure makes the character set empty. Thus we return true.
This is the semi-naive first solution that just sorts the chars set and each word before processing. This give a complexity of dic_size * set_size.
bool is_valid_dictionary( vector<string>& dictionary, string char_set )
{
sort( char_set.begin( ), char_set.end( ) );
vector<int> counters( char_ser.size( ), 0 );
for ( string word: dictionary )
{
sort( word.begin( ), word.end( ) );
int j = 0;
for( char ch: word )
{
while( char_set[j] < ch)
{
++j;
if ( j == char_set.size( ) ) return false;
}
if( ch < char_set[j] ) return false;
counters[ j ]++;
}
}
for ( int counter: counters )
{
if ( counter == 0 )
{
return false;
}
}
return true;
}
This is a better solution in case of huge dictionaries and character set.
Uses counters on arrays of MAX_CHAR size (256).
bool is_dictionary_valid( const vector<string>& dictionary, const string& charset )
{
const int NUM_CHARS = CHAR_MAX + 1;
int charset_counters[ NUM_CHARS ];
for ( int& val: charset_counters )
{
val = 0; // Resets all counters.
}
for ( char ch: charset )
{
charset_counters[ch]++;
}
bool charset_matches[ NUM_CHARS ];
for ( bool& val: charset_matches )
{
val = false; // Resets all match flags.
}
// For each word into the dictionary:
for ( const string& word : dictionary )
{
// Checks is the word is made only of available characters:
for ( char ch: word )
{
if ( charset_counters[ch] == 0 )
{
return false;
}
charset_matches[ch] = true;
charset_counters[ch]--;
}
for ( char ch: word )
{
charset_counters[ch]++; // Restores the counters.
}
}
// Tests if there was one character of the charset that have never been used inside
// the dictionary:
for ( int i = 0; i < NUM_CHARS; ++i )
{
if ( charset_counters[i] > 0 && !charset_matches[i] )
{
return false;
}
}
return true;
}
This is a better solution in case of huge dictionaries and character set.
Uses counters on arrays of MAX_CHAR size (256).
bool is_dictionary_valid( const vector<string>& dictionary, const string& charset )
{
const int NUM_CHARS = CHAR_MAX + 1;
int charset_counters[ NUM_CHARS ];
for ( int& val: charset_counters )
{
val = 0; // Resets all counters.
}
for ( char ch: charset )
{
charset_counters[ch]++;
}
bool charset_matches[ NUM_CHARS ];
for ( bool& val: charset_matches )
{
val = false; // Resets all match flags.
}
// For each word into the dictionary:
for ( const string& word : dictionary )
{
// Checks if the word is made only of available characters:
for ( char ch: word )
{
if ( charset_counters[ch] == 0 )
{
return false;
}
charset_matches[ch] = true;
charset_counters[ch]--;
}
for ( char ch: word )
{
charset_counters[ch]++; // Restores the counters.
}
}
// Tests if there was one character of the charset that have never been used inside
// the dictionary:
for ( int i = 0; i < NUM_CHARS; ++i )
{
if ( charset_counters[i] > 0 && !charset_matches[i] )
{
return false;
}
}
return true;
}
Looks like some of the provided algo do not consider the case that the first matched word should be skipped
Dictionary: { "abc", "ab", cde" }
Character set: "abcde"
My try with simple recursion, reduce the Dict or CharSet at a time.
boolean matchDictionary (Dictionary dict, CharSet charset)
{
if(charset is empty) return true;
if(dict is empty) return false;
String firstWord = getFirstWord(dict);
Dictionary reducedDict = dict.remove(firstWord);
if (chars in charset can make firstWord)
// if firstWord is a candidate, may either try to use it (reduce charset) or may try to skip
// it (reduce dictionary)
return matchDictionary(dict, charset.remove(firstWord.getChars()) or
matchDictionary(reducedDict, charset);
// otherwise, just remove it from dict
return matchDictionary(reducedDict, charset);
}
public class TestDictionary {
public static void main(String[] s) {
String[] dictionary = {"first", "is", "program", "ftt", "if"};
String seq = "iiifrssst";
HashSet<Character> uniqSeq = new HashSet();
char[] cseq = seq.toCharArray();
for (int i = 0; i < cseq.length; i++) {
uniqSeq.add(cseq[i]);
}
String sSeq = "";
for (Iterator<Character> it = uniqSeq.iterator(); it.hasNext();) {
sSeq += it.next().toString();
}
boolean noWords = true;
for (String w : dictionary) {
int cover = 0;
char[] cW = w.toCharArray();
for (int k = 0; k < cW.length; k++) {
if (sSeq.contains("" + cW[k])) {
cover++;
}
}
//cover all word
if (cW.length == cover) {
noWords = false;
System.out.println(w);
}
}
if (noWords) {
System.out.println("false");
} else {
System.out.println("true");
}
}
}
public class TestDictionary {
public static void main(String[] s) {
String[] dictionary = {"first", "is", "program", "ft", "if"};
String seq = "iiifrssst";
HashSet<Character> uniqSeq = new HashSet();
char[] cseq = seq.toCharArray();
for (int i = 0; i < cseq.length; i++) {
uniqSeq.add(cseq[i]);
}
String sSeq = "";
for (Iterator<Character> it = uniqSeq.iterator(); it.hasNext();) {
sSeq += it.next().toString();
}
boolean noWords = true;
for (String w : dictionary) {
int cover = 0;
char[] cW = w.toCharArray();
for (int k = 0; k < cW.length; k++) {
if (sSeq.contains("" + cW[k])) {
cover++;
}
else{
break;
}
}
//cover all word
if (cW.length == cover) {
noWords = false;
System.out.println(w);
}
}
if (noWords) {
System.out.println("false");
} else {
System.out.println("true");
}
}
//
}
//
public class TestDictionary {
public static void main(String[] s) {
String[] dictionary = {"first", "is", "program", "ft", "if"};
String seq = "iiifrssst";
HashSet<Character> uniqSeq = new HashSet();
char[] cseq = seq.toCharArray();
for (int i = 0; i < cseq.length; i++) {
uniqSeq.add(cseq[i]);
}
String sSeq = "";
for (Iterator<Character> it = uniqSeq.iterator(); it.hasNext();) {
sSeq += it.next().toString();
}
boolean noWords = true;
for (String w : dictionary) {
int cover = 0;
char[] cW = w.toCharArray();
for (int k = 0; k < cW.length; k++) {
if (sSeq.contains("" + cW[k])) {
cover++;
}
else{
break;
}
}
//cover all word
if (cW.length == cover) {
noWords = false;
System.out.println(w);
}
}
if (noWords) {
System.out.println("false");
} else {
System.out.println("true");
}
}
}
//
Variation of the knapsack problem?
Convert the character string and all the words in the dictionary to int a[26] where a[i] = the number of times a certain character appears in the set/ dictionary words. The converted set is your 'knapsack';
Difference here is that you can choose each word multiple times. So the knapsack algorithm needs to be extended to use a 3d array instead of 2d one to cache the sub problem results. And at each step you try to choose the word 0 .. n times until it doesn't fit in the 'knapsack' anymore then recursively solve for the next word and the remainder of the set?
Variation of the knapsack problem?
Convert the character string and all the words in the dictionary to int a[26] where a[i] = the number of times a certain character appears in the set/ dictionary words. The converted set is your 'knapsack';
Base cases are if all elements of a[26] are 0, set is empty return true, otherwise if then word index is -1 return false (tried all the words and there are still letters left).
if not a base case then in a loop try to choose the current word 0..m times and recursively solve for the word current-1 with the remainder of the set/knapsack. If any of the solutions return true then return true. otherwise if you can't choose the current word anymore because the set doesnt have enough letters but is not empty return false.
Basically idea consist of two major things:
1 apply dynamic programming approach to recursively narrow down appropriate words in a set W that fulfill character set C by taking word w from W and iterating over C - w
2. obtain constant time check if word w could be constructed by characters from array C. For that we might take a look to an array of bloom filter alike structures that represent number of occurrences of a specific character as a bit array (since there up 26 possible characters in English alphabet then this array be represented as an int[], where length of this array should be agreed and assumed to, say, 5 - max number of occurrences of the same letter in a any word)
form trie out of words in the dictionary and do backtracking using characters in the given set
dynamic programming approach : transform all words in dictionary and chars in the set to dictionaries of the form dict<char : count> . at each stage, consider each word against char dict (subtract count of intersecting chars in the char dict); if char dict reduces to all zeroes, return true. return false when total count in chars dict is below count in all word dicts
Approach:
Count the number of each character in the character set and store in a char array. Look for the start of every dictionary word in char array. If not present ignore the dictionary word. If present traverse the through the char array and subtracting the count for every character in the dictionary word in the char array.
If end of dictionary word is reached then switch to the next dictionary word, if not then traverse the dictionary word from the last not-found character and increment the character in the char array for every character in the dictionary word.
public static void dictionary (String [] dictionary, char [] charset){
int len=charset.length;
int [] arr = new int [256];
int count=len,j,k,rep=len;
for(int i=0;i<len;i++){
arr[charset[i]]++;
}
do{
rep=count;
for(int i=0;i<dictionary.length;i++){
if (arr[dictionary[i].charAt(0)] != 0){
for(j=0;j<dictionary[i].length();j++){
if (arr[dictionary[i].charAt(j)] != 0){
arr[dictionary[i].charAt(j)]--;
count--;
}else
break;
}
if (j != dictionary[i].length()){
for(k=j-1;k>=0;k--){
arr[dictionary[i].charAt(k)]++;
}
count=len;
}
if (count==0)
break;
}
}
}while(rep != count);
if (count==0)
System.out.println(“true”);
else
System.out.println(“false”);
}
Down voter: please explain the reason to down vote. If there was a test case that failed, please mention that.
this is np-hard since subset sum can be easily reduced to this problem.
- Anonymous March 21, 2014