## Google Interview Question Software Engineer / Developers

- -4of 6 votes
Given two aligned sequences `a` and `b`. Write a function "findCommon", that finds the longest substring of the longer sequence that align to the smaller sequence in such a way that the alignment length (matching length) can be maximized. Sequences initially were of different lengths but the smaller one is padded with hyphen ('-') after alignment to make it equal to the longer sequence. The length of longer sequence is known in advance (m, which is same for the smaller padded sequence). The output is always the subsequence of the longer string.

The total number of such operations to be done is in billions.

findCommon(a, b, m)

Example 1:

a = ------bixg--

b = xxxxxxbi-gzz

m = 12

output: big

Example 2:

a = xxxxxxbxigxx

b = ------b-ig--

m = 12

output = bxig

Example 3:

a = bigxxxxxxxxx

b = bi-x--------

m = 12

output = bigx

**Country:**United States

**Interview Type:**In-Person

public void findCommon(char[] a, char[] b, int m) {

int startCount = -1;

int endCount =-1;

char[] c = null;

for (int i = 0; i < m; i++) {

if (a[i] == b[i]) {

if (startCount < 0)

startCount = i;

} else if (startCount >= 0) {

if ((a[i] == '-') && ((int)(b[i]) >=97 && (int)(b[i])<=122) ) {

if(!(i+1 < m && a[i+1] == b[i+1]))

{

endCount= i;

break;

}

} else if ((b[i] == '-') && ((int)(a[i]) >=97 && (int)(a[i])<=122) ) {

if(!(i+1 < m && a[i+1] == b[i+1]))

{

endCount= i;

break;

}

} else

startCount = -1;

}

}

System.out.println("start count = "+startCount +" end count = "+endCount);

if(startCount != -1)

{

if(startCount != 0)

c=a[0]== '-'?b:a;

else

c=a[endCount+1]== '-'?b:a;

if(endCount ==-1)

endCount=m-1;

int i= startCount;

while(i <=endCount-1 )

{

if(c[i] != '-')

System.out.print(" "+c[i]);

i++;

}

}

}

Find starting and end index of Longest common subsequence of lcs(a,b) in a.

result=a.substring(start,end);

yes

```
string longest_common_subsequence(string a, string b){ // string a is longer than string b
int m = a.length();
int n = b.length();
int** mat = new int*[m];
for(int i = 0; i < m; i++){
mat[i] = new int[n];
for(int j = 0; j < n; j++) mat[i][j] = 0;
}
int** start_mat = new int*[m];
for(int i = 0; i < m; i++){
start_mat[i] = new int[n];
for(int j = 0; j < n; j++) start_mat[i][j] = 0;
}
for(int i = 0; i < m; i++){
if(a[i] == b[0]){
mat[i][0] = 1;
start_mat[i][0] = i;
}
}
for(int j = 0; j < n; j++){
if(a[0] == b[j]){
mat[0][j] = 1;
start_mat[0][j] = 0;
}
}
for(int i = 1; i < m; i++){
for(int j = 1; j < n; j++){
if(a[i] == b[j]){
mat[i][j] = mat[i-1][j-1] + 1;
start_mat[i][j] = start_mat[i-1][j-1];
}
else{
if(mat[i-1][j] > mat[i][j-1]){
mat[i][j] = mat[i-1][j];
start_mat[i][j] = start_mat[i-1][j];
}else{
mat[i][j] = mat[i][j-1];
start_mat[i][j] = start_mat[i][j-1];
}
}
}
}
int longest = mat[m-1][n-1];
int start = -1, end = -1;
int maxv = -1;
for(int i = m-1; i >= 0; i--){
for(int j = n-1; j >= 0; j--){
if(mat[i][j] == longest && a[i] == b[j]){
if(i - start_mat[i][j] > maxv){
maxv = i-start_mat[i][j];
start = start_mat[i][j]; end = i;
}
}
}
}
int i = 0;
for(i = 0; i < start; i++){
if(a[i] == a[start]) break;
}
start = i;
string res = a.substr(start, end-start+1);
for(int i = 0; i < m; i++) delete[] mat[i];
delete[] mat;
for(int i = 0; i < m; i++) delete[] start_mat[i];
delete[] start_mat;
return res;
}
```

To cope with the 'billions' part you could make a Levenshtein Automaton of the pattern string, with as wide of edit distance as you're willing, convert that to an NFA, run the possible matches in parallel (not OS parallel), add logic (somehow) to keep count of the longest. It's possible that the branches in the NFA would make it exponential, but I'm sure that with good coding you could make it max out at something less (there should be only so many states). On its best day, this would be O(m = length of input string). Given that we're starting with O(n x m) we've got room to play with. The NFA 'threads' would keep track of their history - the longest you'd replay, with the right characters, '-' and so on.

i am not clear about the output,

ok so b has 'b' 'i' and 'g' characters common in order with a.

- Anonymous on November 19, 2013 Edit | Flag ReplyBut why examples 2 and 3 have output bxig (x doesn't appear in b) and bigx (g doesnt appear in b)