Google Interview Question
Country: United States
Interview Type: Phone Interview
O(M*E), where M is sum of the sizes of all words and E is the number of differents letters. In general, E is constant (for english E = 26), so the complexity is O(M).
I believe the complexity of your solution is O(M + k) (where k is the number of letters in the original letter list) because k is not constrained to 26 since it may contain unlimited duplicates. The +k is the cost of building V[] at the beginning.
@ thiago.xth1 I don't understand why Ws[e] contains the number of repetitions of letter? Can you please explain your logic by example?
For example 'tree' has 2 e's.
If the set has {a,b,c,d,e}, which means V[e] is 1 but tree has Ws[e] of 2.
The word 'tree' won't qualify.
Adding to your solution.
You might want to restrict this operation only for words in the dictionary that start with an alphabet within the collection of letters.
I think this time complexity is O(M * avg_len(word)), M this the num of words and space complexity maybe a constant value for O(26) or O(52).I also solve this problem by myself using hash map. My idea is that first compute the count of appearance of every alpha in the set by hash mapping, then keep a temp length value: temp_len and temp pointer to record the longest word and its pointer at present, then to every word do like this, and finally we will get the answer we want:
#include<iostream>
using namespace std;
char *get_longest_str(char *set, char *str[], int n) {
int set_len = strlen(set);
int hash[26];
int i, j;
int final_len = 0;
int temp_len = 0;
char *ret = NULL;
for(i = 0; i < n; i++) {
memset(hash, 0, sizeof(int) * 26);
for(j = 0; j < set_len; j++)
hash[set[j] - 'a']++;
temp_len = strlen(str[i]);
for(j = 0; j < temp_len; j++)
hash[str[i][j] - 'a']--;
for(j = 0; j < 26; j++) {
if(hash[j] < 0)
break;
}
if(j >= 26) {
if(temp_len > final_len) {
final_len = temp_len;
ret = str[i];
}
}
}
return ret;
}
int main(int argc, char *argv[]) {
char *str[] = {"abacus", "deltoid", "gaff", "giraffe", "microphone", "reef", "qar"};
char set[] = {'a', 'e', 'f', 'f', 'g', 'i', 'r', 'q'};
int n = sizeof(str) / sizeof(char*);
char *ret = get_longest_str(set, str, n);
cout << ret << endl;
cin.get();
return 0;
}
the words in the sample dictionary seems ordered... it's not specified explicitly but worth clarifying. if they are ordered indeed, then it can be used for skipping ranges of words that has some non-matching prefix. for example, if the 3rd letter in the last checked word is not allowed, then this entire 3-letters-prefix can be skipped.
we take an array of size=26 and initialize it by first 26 prime no.
eg: 2 , 3, 5, 7,........
now these prime nos. will represent alphabet
i.e a=2,b=3,c=5......
pro=a* e* f* f* g* r* q=2*11*13*13.....
now we can check with every dictionary word
1) p= product of every letter of a dictionary word
eg: reef=61 * 11 * 11 * 13
2) if pro % p == 0
store l = length(word); and index // if l is greater than previous value
3)finally index will give the result of location of word
I liked your idea, but can happen overflow of integer easily.
For example: the number for [a]^70 (letter 'a' concatenated 70 times) is 2^70, which is greater than a 64-bit integer can store (2^64).
good idea,. Just one thing to consider, if the string is too long,in the worst case scnerio it contains 26 letters, are we going to suffer from integer overflow problem?
So time complexity will be: O(n)
where n is the number of words in the dictionary.
Is that correct?
This is not really better than the vector based solution.
Computing the product itself will be worse than the complete solution of computing the vector _and_ check the count inequalities (owing to the multiplication of large integers).
In effect, the vector solution is implicitly storing [a1,a2,...ak] where the product you are storing is (p1)^(a1) * (p2)^(a2) * ... * (pk)^(ak).
Checking for divisibility is not O(1) because of overflow issues. Trying to avoid that will lead you to do something like the vector based solution, as a further optimization!
Of course for small words this is a reasonable solution, but perhaps still not better than the vector (mutiplication vs addition etc).
I do not think this solution worked. because the count of each letter in the collection matters.
for example, string = ee, collection = {e}, e*e%e = 0, but ee is not a candidate for the question
This solution is appealing mathematically, but I agree that it is no faster than the first solution posted, and probably slower. In this approach you must process every letter of every word in the dictionary in order to compute all of the products. On the other hand, in the first solution you can stop processing a dictionary word as soon as you detect that it has letters not covered by given collection of letters. For example, as soon as we see the 'm' in 'microphone' we are done with that word because 'm' is not in the given set of letters.
If we can pre-process the data, I would like to store all the dictionary words in a TRIE. While searching for the longest possible word that can be form out of given characters, we will keep refining the options available and check all the options.
If we are not allowed to pre-preprocess the dictionary data, then we can maintain two arrays.
First array : Total number of a character in the given set of characters.
Then we will have to check for the presence of each dictionary word.
Second array : Initialized as a copy of first array for each new word. For each character in the word, its count in second array should be greater than zero. If yes, decrease the count by one and move to second character else move to next word.
Keep the record of longest word found.
Nice solution but re-initializing the second array after every word will result in complexity is O(N * k), where N is the length of the dictionary and k is the length of the letter list.
However if you reinitialize by working backward through the word you just examined and incrementing (essentially reversing the decrements you just did), the complexity is only O(N).
Preprocessing is a good idea, But like you sais " we will keep refining the options available and check all the options" is very expensive operations depending on the size of the character set available. say we have n* 26 characters, your option space turns out to be exponential and checking every option against a tree is the longest running function I could ever imagine.
May not be the best algorithm but something that I came up quick.
#!/usr/bin/python3
dictionary =['abacus','deltoid','gaff','giraffe','microphone','reef','qar']
letters = ['a','e','f','f','g','i','r','q']
def isContained(source_word,destination_word):
j=0
for i in range(len(source_word)):
while(j<len(destination_word)):
if source_word[i] == destination_word[j]:
break;
j=j+1
else:
return False
return True
if __name__=='__main__':
sorted_dictionary = {word:''.join(sorted(word)) for word in dictionary} #change to dict(), if you are using python2.
sorted_letters = ''.join(sorted(letters))
longest_word = None
for word,sorted_word in sorted_dictionary.items():
if isContained(sorted_word,sorted_letters) and (not longest_word or len(sorted_word) > len(longest_word)):
longest_word = word
print(longest_word)
Why do you created sorted_dictionary in advance instead of just sorting the word in your for loop or your isContained() function? Seems like a lot of wasted storage.
Nice use of while/else. If you want to get really idiomatic with Python, create a generator that yields the length of all valid words and consume that with the built in max() method. Here's a sketch:
def yo():
yield 1
yield 3
yield 2
print max(yo())
Simple solution :
Let A[1 to 26] is array where all entries are zero except the one given as input. and if some char repeat then the corresponding entry tell the number. for example here f's entry will be 2 but a's entry will be 1
Read words one by one
ans = 0;
for each word (w) :
i =0;flag = false;
while(i++ < length of word)
if A[w[i]])
A[W[i]]--;
else
flag = true;
break;
if flag
go on reading;
else
ans = max(ans,length of current word);
break;
this require O(n) internal memory operation + O(n/B) file I/O where B is the block size, where one block is read by memory at a time.
Now if we do preprocessing, we form trie with each node except leaf contain a single element, now here we need to do every possible searching, for example at root node we need to go to all possible nodes(at max 7 in above example), again at second level we need to choose at max 6 or 7(in f case) and so on....
so complexity here become O(2*7!),although this is a number but in actual computation it is not smaller. for this above example,
Most of the algorithms posted here are useless. Here's mine:
1) Sort the letters of every word in the dictionary
2) Build a trie out of them
3) Sort the letters in the query set
4) int canbeskipped=0;
5) Descend the trie and skip at most canbeskipped
6) If a word is found return it, otherwise decrement the int a jump to step 5
They might be, but the top voted vector one is reasonable. btw, your algorithm is not without problems. What would you do when canbeskipped is say 10?
Essentially, you will be enumerating all the subsets... which can get exponential.
The problem does not state that dictionary can be preprocessed (which makes sense, but does not state that).
Query time gets extremely low compared to the vector approach and you don't enumerate every subset. If the subset is long n, time is O(n^2) in the worst case. Bear in mind that any query time that includes the length of the dictionary is realistically unreasonable.
I wanna point out one more thing: by "skip at most" I mean that if the current letter in the set is not present in the current trie node we can skip it (keeping track of how many we skipped so far). If the current letter is present in the current node the algorithm MUST keep walking down the tree.
If all we are going to do is one query, then the vector approach is good (and it is O(n) for each word), as in your case we have to pay the cost of constructing the trie anyway.
If the dictionary is fixed, but we have multiple queries, then yes, avoiding reading the whole dictionary is good and some kind of preprocessing will do (and I am guessing will be the followup/clarifying statement from the interviewer).
Calling the vector approach "useless" is a bit too much.
Brute-force approach, C code:
#include <stdio.h>
#include <string.h>
const char* maxword(const char* words[], int len, const char* letters)
{
int maxsize = 0;
const char* maxw = NULL;
int i;
for ( i = 0; i < len; ++i )
{
const char* word = words[i];
const char* wordletter = word;
int cont = 1;
int localsize = 0;
while ( *wordletter++ && cont )
{
const char* l = letters;
while ( *l++ )
{
if ( *wordletter == *l )
++localsize;
}
}
if ( localsize > maxsize )
{
maxsize = localsize;
maxw = word;
}
}
return maxw;
}
int main()
{
const char* words[] = { "abacus",
"deltoid",
"gaff",
"giraffe",
"microphone",
"reef",
"qar" };
const char* word = maxword(words, 7, "aeffgirq");
printf("%s\n", word);
}
More straight C but with additional features
- ordering of input words to be able to skip words that are too long and end the search on the first hit
- use simple data structures and memory operations for caching
- thorough input validation
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
/* qsort comparison function */
int cmp_strlen_desc(const void *a, const void *b)
{
return strlen(*(char **)a) < strlen(*(char **)b);
}
/* solution function */
const char* maxword(const char* words[], int len, const char* letters)
{
const int cacheSize = 254; // assume full character range 1-255
int availableLettersLen;
int availableLettersCache[cacheSize];
int availableLetters[cacheSize];
char **sortedWords;
char *curChar;
int *curCount;
char *result;
// Input validation
if (words == NULL || len < 1 || letters == NULL) return NULL;
// Letters length
availableLettersLen = strlen(letters);
// More input validation
if (availableLettersLen == 0) return NULL;
// Sort words by length descending
sortedWords = (char **)malloc(len * sizeof(char *));
if (sortedWords == NULL) return NULL;
for (int i=0; i<len; i++) {
curChar = (char *)words[i];
// More input validation
if (curChar == NULL) {
free(sortedWords);
return NULL;
}
sortedWords[i] = curChar; // copy pointers
}
qsort(sortedWords, len, sizeof(char *), cmp_strlen_desc);
// Preprocess letters by counting them into a cache
memset((void *)availableLettersCache, 0, cacheSize * sizeof(int));
curChar = (char *)letters;
while (*curChar) {
availableLettersCache[*curChar - 1]++;
curChar++;
}
// Scan words
for (int i = 0; i < len; i++) {
curChar = (char *)sortedWords[i];
// Word too long, skip
if (strlen(curChar) > availableLettersLen) continue;
// Reset availability tracker from cache
memcpy(availableLetters, availableLettersCache, cacheSize * sizeof(int));
// Scan letters of word
while (*curChar) {
curCount = availableLetters + *curChar - 1;
if (!*curCount) break; // letter unavailable, end search
*curCount++; // letter available, decrease count
curChar++;
}
// If word was fully scanned, we found our answer
if (!*curChar) {
result = sortedWords[i];
free(sortedWords);
return result;
}
}
free(sortedWords);
return NULL;
}
/* test driver */
int main()
{
const char* words[] = { "abacus",
"deltoid",
"gaff",
"giraffe",
"microphone",
"reef",
"qar" };
const char* word = maxword(words, 7, "aeffgirq");
printf("%s\n", word);
}
A basic c++ implementation using containers
#include<iostream>
#include<stdlib.h>
#include<vector>
#include<fstream>
#include<map>
using namespace std;
// Use quick sort to sort the strings.
void quicksort(string& str, int low, int high) {
char pivot = str[(low+high)/2];
int i = low;
int j = high;
while (i <= j) {
while (str[j] > pivot)
j--;
while(str[i] < pivot)
i++;
if(i <= j) {
char temp;
temp = str[i];
str[i] = str[j];
str[j] = temp;
i++;
j--;
}
}
if (j > low)
quicksort(str, low, j);
if (i < high)
quicksort(str,i,high);
}
// Longest match algotithm.
void longestmatch(map<string,string>& mymap) {
string matcher("aeffgirq");
vector<string> final_list;
map<string,string>::iterator it;
for (it = mymap.begin(); it != mymap.end(); ++it) {
string str = it->second;
int i = 0;
int j = 0;
bool is_match = false;
while(true) {
if((i == matcher.length()) && (j < str.length()))
break;
if (matcher[i] == str[j]) {
i++;
j++;
} else {
i++;
}
if (j == str.length()) {
is_match = true;
break;
}
}
if (is_match) {
string first = it->first;
final_list.push_back(first);
}
}
// Iterator over the final list of all the words that
// match our criterion and then print the one with
// highest length
vector<string>::iterator it1;
int maxlength = 0;
string final_string;
for(it1 = final_list.begin();it1 != final_list.end();++it1) {
string tmp = *it1;
int length = tmp.length();
if (length > maxlength) {
final_string.replace(final_string.begin(), final_string.end(), tmp);
}
}
cout << "The longest string:" << final_string << '\n';
}
int main() {
vector<string> list;
ifstream in_stream;
string line;
in_stream.open("file.txt");
map<string,string> my_map;
// Read the words into a vector
while(in_stream.is_open())
{
while (in_stream.good()) {
getline(in_stream, line);
list.push_back(line);
}
in_stream.close();
}
// Sort the words in the vector and put the sorted
// words in a map with their keys as original words.
vector<string>::iterator it;
for (it=list.begin();it!=list.end(); ++it) {
string str = *it;
string sorted_str = *it;
int length = str.length();
quicksort(sorted_str,0,length-1);
my_map[str] = sorted_str;
}
// Clear this memory. No use now.
list.clear();
// Get the longest match.
longestmatch(my_map);
// clear the map
my_map.clear();
}
import java.util.List;
import java.util.Arrays;
import java.util.ArrayList;
public class LargestWord {
public static void main(String[] args) {
String maxStr = "";
int maxLength = Integer.MIN_VALUE;
List<String> words = Arrays.asList("abacus", "deltoid", "gaff", "giraffe", "microphone", "reef", "qar");
for (String w : words) {
List<Character> check = new ArrayList<Character>(Arrays.asList('a', 'e', 'f', 'f', 'g', 'i', 'r', 'q'));
boolean flag = true;
for (char c : w.toCharArray()) {
if (check.indexOf(c) != -1) {
check.remove(Character.valueOf(c));
}
else {
flag = false;
break;
}
}
if (flag && w.length() > maxLength) {
maxStr = w;
maxLength = w.length();
}
}
System.out.println(maxStr);
}
}
My solution in python, which is very close to thiago's vector answer:
def find_longest_word(words, letters):
# Maintain a global map of letter counts to check against
table = {}
for letter in letters:
table[letter] = table.get(letter, 0) + 1
longest = ''
for word in words:
# Construct a letter map for this word and short circuit if the word is not valid
wordtable = {}
valid = True
for letter in word:
wordtable[letter] = wordtable.get(letter, 0) + 1
if wordtable[letter] > table.get(letter, 0):
valid = False
break
if valid and len(word) > len(longest):
longest = word
return longest
Sweet Solution:
#include <iostream>
#include <string>
#include <fstream>
int main(){
char letters[] = {'a', 'e', 'f', 'f', 'g', 'i', 'r', 'q', '\0'};
char orig_arr[26][1] = {0};
char copy_arr[26][1] = {0};
int counter = 0;
char *ptr = letters;
while(*ptr++)
counter++;
for(int i = 0; i < counter; i++){
char t = letters[i] - 'a';
orig_arr[t][0]++;
copy_arr[t][0]++;
}
std::string longestword;
int maxlength = 0;
std::ifstream infile("temp.txt");
std::string word;
while(infile >> word){
const char *c_str = word.c_str();
bool flag = true;
int i = 0;
for(i = 0; i < word.length(); i++){
char t = c_str[i] - 'a';
if(copy_arr[t][0])
--copy_arr[t][0];
else{
flag = false;
break;
}
}
// reset the copy_array
for(int c = 0; c < i; c++){
char t = c_str[c] - 'a';
copy_arr[t][0] = orig_arr[t][0];
}
if(flag){
if(word.length() > maxlength){
longestword = word;
maxlength = word.length();
}
}
}
std::cout << "Longest word: " << longestword << std::endl;
}
-----------------------------------------------------------------------------------------------------------------------------
Following is programme snippet for the above problem.
First it read each line from the file and keep in array of dictionary in a sorted manner.
later it reads the characters need to be matched and stores in another array and matches with dictionary word. Here the sorting is done as per the numeric value of character.
This programme assumes dictinary and input is having only lowercase alphabets. To extend to all characters and mumerals 26 should be replaced with 256.
-----------------------------------------------------------------------------------------------------------------------------
#include<stdio.h>
#include<string.h>
#define DICT_SIZE 100
int main()
{
char dict[DICT_SIZE][27]= {0};
char input_string[32],search_string[32]={0},letter;
int i,j,count,size = 0,max_length = 0,dict_position=-1;
FILE *fp;
fp = fopen("dictionary.txt","r");
while( fscanf(fp,"%s",input_string) != EOF)
{
dict[size][26] = strlen(input_string);
for(i=0;i<dict[size][26];i++)
{
dict[size][input_string[i]-'a']++;
}
size++;
}
fclose(fp);
#if 0
do
{
scanf("%s",input_string);
if(!strcmp("end",input_string))
break;
dict[size][26] = strlen(input_string);
for(i=0;i<dict[size][26];i++)
{
dict[size][input_string[i]-'a']++;
}
size++;
//printf("\n%s\n",dict[size-1]);
}while(1);
#endif
printf("Enter the letters to be matched\n");
i=0;
do
{
scanf("%c",&letter);
if((int)(letter-'a') >= 26 || (int)(letter-'a') < 0)
break;
search_string[letter-'a']++;
i++;
}while(1);
search_string[26] = i;
for(i=0;i<size;i++)
{
if(dict[i][26] <= search_string[26] && dict[i][26] > max_length)
{
count = 0;
for(j=0;j<26;j++)
{
if(dict[i][j] == 0)
{
continue;
}
else if(dict[i][j] <= search_string[j])
{
count+=dict[i][j];
if(count == dict[i][26] )
{
max_length = count;
dict_position = i;
break;
}
}
else
{
break;
}
}
}
if(max_length == search_string[26])
{
break;
}
}
if(dict_position == -1)
{
printf("\nNo dictionary word\n");
}
else
{
printf("\nDictionary position =%d and match length = %d\n", dict_position,max_length);
}
return 0;
}
It is like to build a reverse index for each character. However for duplicated characters in a word, we have to do some extra work.
for example to index the fourth word
"giraffe"
, we have to store each character once, e.g.
"g": 4, "i": 4, "r": 4, "a" :4, "f": 4 , e": 4
, and the trick is an extra index
"ff": 4
.
When doing search, e.g "affr", we change it to
"a", "ff", "r"
, and then search the intersection of the indexes found for each key, and then find the maximum length one.
How about constructing a Trie of the dictionary and for every letter in the Set follow the Trie structure of the words. If you found in the trie path a letter that doesn't belong to the set stop and start over again with the next letter with the set. And so on.
This solution is order O(k^2) and its not a function of the size of the dictionary.
def longestword(src, lst):
maxlen = 0
for elem in lst:
word = list(elem)
word.sort()
if issublist(word, src):
if len(word) > maxlen:
maxlen = len(word)
maxword = elem
return maxword
def issublist(sub, lst):
if not sub:
return True
try:
idx = lst.index(sub[0])
return issublist(sub[1:], lst[idx + 1:])
except ValueError:
return False
src = ['a', 'e', 'f', 'f', 'g', 'i', 'r', 'q']
lst = ['abacus', 'deltoid', 'gaff', 'giraffe', 'microphone', 'reef', 'qar']
print longestword(src, lst)
// just check if
string ans("");
unordered_map<char, int> ma;
for (int i=0; i < input.size(); i++)
ma[input[i]]++;
for(int i=0; i < strs.size(); i++)
{
unordered_map<char, int> mb;
for (int j=0; j < strs[i].size(); j++)
mb[strs[i]]++;
bool found(true);
for (auto it = mb.begin(); it != mb.end(); it++)
if (it->second > ma[it->first]) { found = false; break ; }
if (found && strs[i].size() > ans.size()) { ans = strs[i]; }
}
return ans;
My solution in C#
char [] LongestInDictionary(List<char[]> entries, char [] stocks) {
if(entries == null || entries.Count == 0 ||
stocks == null || stocks.Length == 0)
throw new Exception("wrong input");
var dict = new Dictionary<char, int>();
foreach(var stock in stocks) {
if(!dict.TryAdd(stock, 1))
dict[stock]++;
}
char [] longest = null;
foreach(var entry in entries) {
var copy = dict.ToDictionary();//cloning
bool ok = true;
foreach(var chr in entry) {
if(!dict.ContainsKey(chr) || dict[chr] == 0) {
ok = false;
break;
}
dict[chr]--;
}
if(ok) {
if(longest == null || entry.Length > longest.Length)
longest = entry;
}
}
return longest;
}
Could use various optimizations:
- terminate outer loop once longest was found
- sort entries by length, and trim to list where entry[i].Length<=stocks.Length
- use Dictionary<string,byte> for minor memory optimizations (if english words are used, a byte per char should be sufficient)
- replace Dictionary with byte[256] array if entries are ASCII/bytes only
Perf optimized solution. Time is less than O(n) where n is the total number of characters in the dictionary. Before checking if a word can be spelled out by the given letters, we can bypass the expensive check if the length of this word is not long enough to make a update.
Implementation in Java
public static String findLongestWord(String[] words, char[] letters) {
String longestWord = null;
Map<Character, Integer> map = new HashMap<Character, Integer>();
for (char c:letters) {
if (map.containsKey(c)) map.put(c, map.get(c)+1);
else map.put(c, 1);
}
for (String word:words) {
if (longestWord==null && canSpell(word, new HashMap<Character, Integer>(map))) longestWord = word;
else if (longestWord!=null && word.length()>longestWord.length() && canSpell(word, new HashMap<Character, Integer>(map))) longestWord = word;
}
return longestWord;
}
public static boolean canSpell(String word, Map<Character, Integer> map) {
for (int i=0; i<word.length(); i++) {
char c = word.charAt(i);
if (!map.containsKey(c)||map.get(c)==0) return false;
map.put(c, map.get(c)-1);
}
return true;
}
1. Associate a key with each word of the dictionary. The key would be the alphabetically sorted word.. i.e. abacus will be aabcsu, microphone will be cehimnoopr and so on..
2. If inputs provided then sort the dictionary based on the new key.
3. Sort the letter array alphabetically.
4. Search this letter array in the dictionary.
- Preprocessing complexity is O(nlogn).
- Searching is done in O(logn).
- If you need to search only once, then dont sort the initial array and just search for a total complexity of O(n).
Step -1 - Get the sum of the ASCI values of the collection {a, e, f, f, g, i, r, q}
Step -2 - In dictionary get the sum of ASCII value of each chars in a word and compare it with the sum of collection.
int i = 0;
while(file.readline() != -1){
do{
if(sum of collection's ascii != sum of word ascii){
remove 1 char from collection -> get the sum of ascii and compare again.
}while(i < (collection.size/2))
}
}
So given a set { a, z } and word "by", will your algo think that there is a complete match? ('a'+'z' == 'b'+'y' == 219)
Create a vector V[], where V[e] contains the number of ocorrences letter e in the given set.
- thiago.xth1 March 14, 2013For each word s, compute a vector Ws[], where Ws[e] contains the number of repetitions of letter e in the word (compute size of the word too).
return the max size word s such that Ws[e] <= V[e], for all letters e.