Google Interview Question
Software Engineer / DevelopersCountry: United States
Interview Type: Phone Interview
I think there can be words of size 1 to n. So big Oh of this is polynomial time.
Algorithm/sudo code,
for i in 1 to 10
{
//search all words of size i in the entire string
find_words(int size,String inputString)
}
Followup question could be how would you optimize it.
Ans : We can run find_words across n threads OR n map-reduce jobs. I am not saying map reduce just because its google, but this is perfect candidate for map reduce.
Let say A[k] is a boolean that tells us if string{0..k} can be broken into a set of valid words. Here A[k] would be true iff there is any j < k exists such that A[j] is true and string{j+1, k} is a valid word(validity can be check by keeping all the valid words in a hashtable).
A[n] will just give us if we can break the string into valid words but to know the words we can keep the starting point of the string along with the boolean in each A[k] and by back tracing we can find all the words. However this will give only one possible sequence of words, we can keep record of all the valid starting points of the string along with boolean in each A[k].
A[k] = OR { A[i] AND IsValid(String {i+1...k}) } for all i < k & i > 0
python code
def break_recursive(s):
for i in range(len(s)):
str1 = s[i:]
str2 = s[:i]
if str1 in words and str2 in words:
print str1, str2
bool is_word(char* s, int len)
{
//assume provided.
return true;
}
void split_str_to_words(char* s, int len)
{
int i;
if (s == NULL || len <= 0) {
return;
}
for(i = 1; i <= len; i++) {
if (is_word(s, i)) {
//printstring(s, i);
}
if (i < len && is_word(s + i, len - i)) {
//printstring(s + i, len - i);
}
}
split_str_to_words(s + 1, len - 2);
}
solution using dynamic programming
boolean wordBreak(String s) //check if word can be broken into words
{
int size = s.length();
if (size==0) return true;
boolean []dp = new boolean[size+1];
for (int i=1; i<=size; i++)
{
for (int j=0; j<i; j++)
{
if (dp[j]==true && dictionary.contains(s.substring(j,i)))
{
dp[i]=true;
break;
}
}
}
return dp[size];
}
this can be done in n^2 time...
ex: googlesearchengine
1. for i=0 to n, look for all words till i to n i.e..
start, end indexes
g 0, 0
go 0, 1
goo 0, 2
goog 0, 3
googl 0, 4
google 0, 5
googles 0, 6 ... do this till end of sentence
.
.
o 1, 1
oo 1, 2
oog 1, 3
oogle 1, 4
oogle 1, 5 ... do this till end of sentence
.
.
s 6, 6
se 6, 7
sea 6, 8
sear 6, 9
searc 6, 10
search 6, 11
searche 6, 12 ... do this till end of sentence
.
.
.
e 12, 12
en 12, 13
eng 12, 14
engi 12, 15
engin 12, 16
engine 12, 17
---------------
here existing words in dictionary are
go 0, 1
google 0, 5
sea 6, 8
search 6, 11
ear 7, 9
a 8, 8
arc 8, 10
hen 11, 13
engine 12, 17
here for existing words in dictionary, hash all the starting indexes as keys and ending indexes as data in hash table.
1. now start from 0 key in hash table
hash key hash data
0 1
so there exists word from 0-1 index, if this is valid word then next word should start from index 2 that is previous hash data+1
now look for key 2 in hash table, this is not exist so previous word is not valid
2. now try for hash key 0 and hash data 5, there exists word (0-5),
now look for 6 key in hash, this exists and data is 11, there exists word (6-11)
now look for 12 key in hash, this exists and data is 17, there exists word (12-17) and end of sentence
-----------------------
this also can be done with 2D array of nxn... n is sentence length
Thanks very much for your post! However, I am still lost. for example, why start from index 2 if this is valid word in 0-1 index.
Could you post the code?
index 0-1 is valid word then next valid word should start from index 2.
ex..
higoogle
here first valid word is "hi" index from 0-1
next valid word should start from index 2 and end at index 3(go) / 7(google)
question is to break the sentence into meaningful words, means to insert spaces in sentence that should form words exists in dictionary
So how about "hinttt" which hi can be a valid word and in can also be valid word. Hi is 0-1and in is 1-2 which we cannot start from index 2.
can it be done with DP? maybe a[i][j] keeps all lists of possible breaking points?
- adam2008 February 22, 2013