Google Interview Question
Software Engineer / DevelopersCountry: United States
Python Lazy Dictionary Approach.
We lazily build a dictionary of words from s1 as we try to find matches in s2.
# Find the longest substring present in two different strings.
# This is kind of an overkill approach.
def test():
s1 = "cbalaskdfjthe_answerlkounot_quitevo_lajslkj"
s2 = "alnot_quitesoiuthe_answerjoiunoiuhkbiwi"
assert "the_answer" == match(s1, s2)
def match(s1, s2):
# This helper function takes a list of indexes
# in a string and creates a dictionary whose
# keys are letters and whose values are the
# list of indexes immediately to the right
# of occurrences of the letter in s1.
def build_dict(subt):
dct = {}
for i in subt:
if i < len(s1):
c = s1[i]
if c not in dct:
dct[c] = []
dct[c].append(i+1)
return dct
# We lazily build a tree of all words in s1.
# We only go one letter deep until we encounter
# the letter in s2
tree = build_dict(range(len(s1)))
max_word = ''
for i in range(len(s2)):
word = ''
t = tree
for j in range(i, len(s2)):
c = s2[j]
if c not in t:
break
word += c
subt = t[c]
if type(subt) == list:
dct = build_dict(subt)
t[c] = dct
t = t[c]
if len(word) > len(max_word):
max_word = word
return max_word
test()
string longestCommonSubsequence(string s1, string s2) {
multimap<char, string> suffixTree;
int longest = 0;
string lstr = “”:
for (int i=0; i<s1.length(); i++) {
suffixTree.insert(pair<char, string>(s1[i], s1.substr(i)));
}
multimap<char, string>::iterator it;
for (int i=0; i<s2.length(); i++) {
it = suffixTree.find(s2[i]);
if (it == suffixTree.end()) continue;
pair<multimap<char, string>::iterator, multimap<char, string>::iterator> range = suffixTree.equal_range(s2[i]));
for (it = range.first; it != range.second; it++) {
int p=0, q=i;
while(p<it->second.length() && q<s2.length()) {
if (it->second[p] != s2[q]) break;
p++; q++;
}
if (longest < p) { longest = p; lstr = it->second.substr(0, p);}
}
}
return lstr;
}
Once I got this done, I found that it was possible to get using dynamic programming with the following formula.
M[i, j] = max(M[i-1, j-1], M[i+1, j], M[i, j+1]) if (S1[i] == S2[j])
= max(M[i+1, j], M[i, j+1]) if (S1[i] != S2[j])
M[i, j] refers to the ith position of S1 and jth position of S2
#include <iostream>
#include <string>
#include <vector>
using namespace std;
int main() {
string str1("abcdefg"), str2("defghig");
vector<vector<int> > arr;
int maxLen = 0;
string comStr;
for (int i = 0; i < str1.size(); ++i) {
vector<int> row(str2.size(), 0);
arr.push_back(row);
for (int j = 0; j < str2.size(); ++j) {
if (str1[i] == str2[j]) {
if (i == 0 || j == 0) {
arr[i][j] = 1;
}
else {
arr[i][j] = arr[i - 1][j - 1] + 1;
}
if (arr[i][j] > maxLen) {
maxLen = arr[i][j];
comStr = str1.substr(i - maxLen + 1, maxLen);
}
}
else {
arr[i][j] = 0;
}
}
}
cout << "common string: " << comStr << ", with max lengh of " << maxLen << endl;
cin.get();
return 0;
}
Longest common subsequence:
S(i,j) = longest common subsequence of strings 0..i, 0..j
S(0,j) =0, S(i,0) =0
S(i,j) : if(i==j) S(i-1,j-1)
else max(S(i-1,j),S(i,j-1))
Longest common consecutive subsequence:
STRING: It is best done by creating common suffix tree for both Strings and then finding the deepest node in the tree that has a path to both ends.
INT: I think you can do the same for Strings. Otherwise, the complexity will be O(n^2) using dynamic programming by counting every possible pair (I am not aware of any other algorithm)
This is a sub-sequence problem, instead of sub-string, I think the right algorithm to use should be "building a prefix table" using dynamic programming.
- Joanna8848 February 28, 2013