Google Interview Question
Software Engineer / DevelopersVery clever solution, but unfortunately incorrect -- it only supports making a new word by adding a letter to the beginning or end, but the problem specifies that the letter can be inserted anywhere (eg 'cat' -> 'chat').
The high-level idea is to process the words in nondecreasing order of length, saving in a hash table only the words that can be made by iterated deletion. For each new word, we try all possible choices for the letter to delete and check the hash for each one.
No.. its nondecreasing order. At each step i, you will have all makeable words of length i-1 in the hash table.
This was asked in a phone interview, and I was asked to write codes on a google doc shared by the interviewer.
As the interviewer said at the end of the interview, he has a solution which is linear to the number of words in the dict, and linear to the length of longest word.
I came up a solution quickly but I know this is not the efficient one, so I asked him if I should find a better solution. The guy told me just to code it. I'm not comfortable with this kind of communication. I spent some time to code my solution, but before I can finish it he started to point out the errors in my codes. Damn, there's just 20 mins for the whole thing. I don't yet have time to verify my codes.
K: max length of words;
N: number of words;
w[0~(N-1)]: N words;
Step 1: use K linked lists to store N words, words with a length of k in the k-th linked list;
Step 2: check whether a word in linked list k is a part of a word in linked list (k+1). Using height to indicate length of the sequence: e.g. a->an->can, "can".height = "an".height +1 = "a".height+2 = 3;
...
Void PrintWord(w) {
int i=0, len. maxheight=0;
MyNode* mynode[K],p1,p2,maxword=null;
for(i=0;i<n;i++) {
len = strlen(w[i]);
insert(mynode[len-1], w[i]);
}
for(i=0;i<K-1;i++) {
p1 = mynode[i];
while(p1!=null) {
p2=mynode[i+1];
while(p2!=null) {
if(p2->height <= p1->height && PartOf(p1,p2)) {
p2->height = p1->height + 1;
p2->preword = p1;
if(p2->height > maxheight) {
maxheight = p2->height;
maxword = p2;
}
}
p2=p2->next;
}
p1=p1->next;
}
}
printf("%s\n",maxword);
}
...
Any chance that the answer to this one is to just use a trie ?
The solution using a trie is to simply list the valid leaf nodes at each level. At each level, check if the key represented by the leaf nodes corresponds to a valid word in the dictionary, if it does, then it's the longest word after adding that one letter. Keep doing this till the entire subtree is traversed.
* Put the words in a heap, ordered longest to shortest.
* Iterate until the heap is empty.
o Take the next word w from the heap, and perform search(w)
The procedure search(w) works as follows. It returns true if it finds a solution.
* If w is the empty string, then return true.
* For each letter l of w, do the following.
o Drop l from w, generating the candidate word v.
o If v is in the dictionary, then execute r=search(v) .
+ If r is true, then write out w and return true.
* The word w is a dead end; drop it from the dictionary and return false.
Assumption: isDict returns 1/0 depending on whether a word does/does not exist in the dictionary and is O(1).
remCharAtPos(str,i) removes character at position i in the string and returns a new string and is O(1)
Complexity = O((Len(word))!)
int GetWordHeirarchy(word)
{
if strlen(word)==1 && isDict(word) then print word; return 1;
for indx=1 to len(word)
{
new_word=remCharAtPos(word,i);
if isdict(new_word) then
if GetWordHeirarchy(new_word) then
{
print "->",new_word
return 1;
}
}
return 0;
}
The for loop can be made more efficient by ensuring that no consecutive same letters are removed.
An even more optimized solution combined with the one above is to pre-process the dictionary. For every word of size n, generate all possible words of size n-1 before hand. Retrieving these would take O(1) time using a hash. Now for each of these new words repeat the process. (Idea inspired by xiaogold's suggestion. In his case, the cost of searching the link list would be very high)
Time complexity would now be log(Dn)*log(Dn-1))*....log(D2)*1 where Dn is the number of words of length n-1 generated from a word of length n. In the worst case
Dn-1=log(Dn). This would be near constant time for any language.
why D_(n-1) = log(D_n)?
Shout it be D_n <= n since remove each letter from a word of length n resulting in a word of length (n-1)?
then D_(n-1) = n-1, how D_(n-1) = log(D_n).
If Searching a word in dictionary "isDict(word)" is O(1), and accessing a word in dictionary is also O(1), then in XiaoGold's solution, for each word in linklist(i+1), the word has a length of (i+1). Checking if each of all i words by removing one letter from word each time in the dictionary, and then accordingly modifing (accessing) the height of the word:: this step is O(K).
O(N*K) ::: Then the total complexity becomes length_(linklist(1))*K+length_(linklist(2))*K+...+length_(linklist(K))*K = O(N*K)
The question becomes how to design O(1) for Searching and Accessing (modify height value of each word)???
What about this?
1. Insert all the words in the dictionary in a trie, so the space will be the length of all the words in the dictionary. Now do a depth first search to find the longest word with every node having end of word character.
This will be order of O(N) solution, because you are only visiting each word only once.
import java.util.ArrayList;
public class ValidWord {
public static void main(String[] args) {
ValidWord validWord = new ValidWord();
String[] dictionary = { "a", "at", "ap", "cat", "chat", "chatt", "chate", "chatte" };
int MAX_LEN = 6;
ArrayList<String>[] lengthWord = new ArrayList[MAX_LEN];
for (int i = 0; i < dictionary.length; i++) {
int currLength = dictionary[i].length() - 1;
if (lengthWord[currLength] == null)
lengthWord[currLength] = new ArrayList<String>();
lengthWord[currLength].add(dictionary[i]);
}
ArrayList<String> possibleWord = new ArrayList<String>();
possibleWord = lengthWord[0];
for (int i = 0; i < lengthWord.length - 1; i++) {
if (lengthWord[i] == null || possibleWord == null || possibleWord.isEmpty())
break;
ArrayList<String> currWords = new ArrayList(); //words of length i+1
currWords.addAll(possibleWord);
// System.out.println("currWords : " + currWords);
ArrayList<String> nextWords =
lengthWord[i + 1]; //word of length i+2
//System.out.println("nextWords : " + nextWords);
possibleWord.clear();
for (int j = 0; j < currWords.size(); j++) {
System.out.println(currWords.get(j) + "|" + nextWords);
ArrayList<String> possible = Possible(currWords.get(j), nextWords);
if(possible != null || possible.isEmpty()){
possibleWord.addAll(possible);
}
}
}
System.out.println(possibleWord);
}
private static ArrayList Possible(String currString,
ArrayList<String> nextWords) {
if (nextWords == null || nextWords.isEmpty())
return null;
char[] alphabets = { 'a', 't', 'p', 'c', 'h', 'e' };
ArrayList<String> possibleWords = new ArrayList();
for (int i = 0; i < alphabets.length; i++) {
for (int j = 0; j <= currString.length(); j++) {
String nextStr = currString;
String constructedStr = "";
if (j == 0) {
constructedStr =
String.valueOf(alphabets[i]).concat(currString);
} else if (j == currString.length() + 1) {
constructedStr =
currString.concat(String.valueOf(alphabets[i]));
} else {
constructedStr =
currString.substring(0, j).concat(String.valueOf(alphabets[i])).concat(nextStr.substring(j));
}
if (nextWords.contains(constructedStr)){
possibleWords.add(constructedStr);
}
}
}
return possibleWords;
}
}
Bad code .. but will give idea atleast.
String getLongestWord(String input,int maxLength){
LinkedList<String> Q = new LinkedList<String>();
Q.add(input)
String maxWord = input;
while(!Q.isEmpty){
String u = Q.removeFirst();
if(u.length()>maxWord.length())
{
maxWord =u;
}
if(u.length()==maxLength)
break;
for(int i=0;i<u.length;i++)
{
for(int j=0;j<26;j++)
{
word1=addBefore('a'+j,i,u);
word2=addAfter('a'+j,i,u);
if(isDictionaryWord(word1))
Q.addLast(word1);
if(isDictionaryWord(word2))
Q.addLast(word2);
}
}
}
return maxWord;
}
Let
N be the total number of words
K be the size of the alphabet
Preprocessing:
Create a Hash table which has (Key,Value) pair, where Value is the actual word and Key is the letters in that word in sorted order.
I am assuming Get complexity to be O(1).
Now,
Let Q be a queue initially empty.
for i in Q{
P=sortLetters(i) // Get the Key for the hash map
for j in K{
Q.enqueue(GetWords(P+j)) // Get all words with a distance of single character from i
}
}
The last value of i will be the longest word.
Its a BFS question (unit changes in each step).
Create a 26 "26-ary" trees. Only difference among trees is the root nodes are different. Everything else is same.
Root = one alphabet (= a for fiest tree, = b for second, so on... Hence 26 trees)
This root as well as any other tree node has each of the 26 alphabets are children.
Make 26 BFS. At each step, hash with dictionary to ensure you have a meaningful word.
Damn, I think this is a blow up...
Second try: Recursion...
for all alphabets: char1
alphaVector.push_back(char1);
for all alphabets: char2
alphaVector.push_back(char2);
if ( dict_lookup(alphaVector) == true ) {
print_as_word(alphaVector);
} else {
// Drop char2 from vector...
alphaVector.resize(alphaVector.size()-1);
}
}
}
The best solution for this problem can be O(N*k*k + N*k). First sort the dictionary in O(N*k) and then starting from longest word to smallest, make a recursive call for that string each time removing one char from the string. When the string size becomes 1, check for maximum length. Each time a char is removed, check for existence of new string in a trie/Hash table.
In Python.
- Anonymous March 25, 2010