Google Interview Question Software Engineer / Developers

  • google-interview-questions
    -4
    of 6 votes
    10
    Answers

    Given two aligned sequences `a` and `b`. Write a function "findCommon", that finds the longest substring of the longer sequence that align to the smaller sequence in such a way that the alignment length (matching length) can be maximized. Sequences initially were of different lengths but the smaller one is padded with hyphen ('-') after alignment to make it equal to the longer sequence. The length of longer sequence is known in advance (m, which is same for the smaller padded sequence). The output is always the subsequence of the longer string.

    The total number of such operations to be done is in billions.



    findCommon(a, b, m)


    Example 1:

    a = ------bixg--
    b = xxxxxxbi-gzz
    m = 12

    output: big


    Example 2:
    a = xxxxxxbxigxx
    b = ------b-ig--
    m = 12

    output = bxig


    Example 3:
    a = bigxxxxxxxxx
    b = bi-x--------
    m = 12

    output = bigx

    - mary.kindall on November 19, 2013 in United States Report Duplicate | Flag
    Google Software Engineer / Developer Algorithm

Country: United States
Interview Type: In-Person


Comment hidden because of low score. Click to expand.
0
of 0 vote

i am not clear about the output,

a = ------bixg--
b = xxxxxxbi-gzz
m = 12

ok so b has 'b' 'i' and 'g' characters common in order with a.

But why examples 2 and 3 have output bxig (x doesn't appear in b) and bigx (g doesnt appear in b)

- Anonymous on November 19, 2013 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 votes

alright the output is a sequence from longer string that match with subsequence in shorter string

- Anonymous on November 19, 2013 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 vote

public void findCommon(char[] a, char[] b, int m) {
int startCount = -1;
int endCount =-1;
char[] c = null;
for (int i = 0; i < m; i++) {

if (a[i] == b[i]) {
if (startCount < 0)
startCount = i;

} else if (startCount >= 0) {

if ((a[i] == '-') && ((int)(b[i]) >=97 && (int)(b[i])<=122) ) {

if(!(i+1 < m && a[i+1] == b[i+1]))
{
endCount= i;
break;
}
} else if ((b[i] == '-') && ((int)(a[i]) >=97 && (int)(a[i])<=122) ) {

if(!(i+1 < m && a[i+1] == b[i+1]))
{
endCount= i;
break;
}
} else
startCount = -1;
}
}
System.out.println("start count = "+startCount +" end count = "+endCount);
if(startCount != -1)
{
if(startCount != 0)
c=a[0]== '-'?b:a;
else
c=a[endCount+1]== '-'?b:a;
if(endCount ==-1)
endCount=m-1;
int i= startCount;
while(i <=endCount-1 )
{
if(c[i] != '-')
System.out.print(" "+c[i]);
i++;
}
}
}

- hotcoder24 on November 19, 2013 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 votes

Whats the logic?

- confused_banda on November 19, 2013 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 vote

Find starting and end index of Longest common subsequence of lcs(a,b) in a.
result=a.substring(start,end);

- Raven on November 21, 2013 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 votes

yes

string longest_common_subsequence(string a, string b){ // string a is longer than string b
	int m = a.length();
	int n = b.length();
	int** mat = new int*[m];
	for(int i = 0; i < m; i++){
		mat[i] = new int[n];
		for(int j = 0; j < n; j++) mat[i][j] = 0;
	}
	int** start_mat = new int*[m];
	for(int i = 0; i < m; i++){
		start_mat[i] = new int[n];
		for(int j = 0; j < n; j++) start_mat[i][j] = 0;
	}

	for(int i = 0; i < m; i++){
		if(a[i] == b[0]){
			mat[i][0] = 1;
			start_mat[i][0] = i;
		}
	}
	for(int j = 0; j < n; j++){
		if(a[0] == b[j]){
			mat[0][j] = 1;
			start_mat[0][j] = 0;
		}
	}
	for(int i = 1; i < m; i++){
		for(int j = 1; j < n; j++){
			if(a[i] == b[j]){
				mat[i][j] = mat[i-1][j-1] + 1;
				start_mat[i][j] = start_mat[i-1][j-1];
			}
			else{
				if(mat[i-1][j] > mat[i][j-1]){
					mat[i][j] = mat[i-1][j];
					start_mat[i][j] = start_mat[i-1][j];
				}else{
					mat[i][j] = mat[i][j-1];
					start_mat[i][j] = start_mat[i][j-1];
				}
			}
		}
	}
	int longest = mat[m-1][n-1];
	int start = -1, end = -1;
	int maxv = -1;
	for(int i = m-1; i >= 0; i--){
		for(int j = n-1; j >= 0; j--){
			if(mat[i][j] == longest && a[i] == b[j]){
				if(i - start_mat[i][j] > maxv){
					maxv = i-start_mat[i][j];
					start = start_mat[i][j]; end = i;
				}
			}
		}
	}
	int i = 0;
	for(i = 0; i < start; i++){
		if(a[i] == a[start]) break;
	}
	start = i;
	string res = a.substr(start, end-start+1);
	for(int i = 0; i < m; i++) delete[] mat[i];
	delete[] mat;
	for(int i = 0; i < m; i++) delete[] start_mat[i];
	delete[] start_mat;
	return res;
}

- Anonymous on November 22, 2013 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

yeh, this would work. the last loop for finding "start" and "end" can be avoided by keeping a track in the main DP loop of the maximal substring seen so far (a complete alignment is detected when the smaller string is fully scanned, j == n).

- Anonymous on January 25, 2014 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 vote

not clear about the input:
1. shorter string can be padded at will or it has been padded fixedly in input?
2. why in example 1, longer string contains a "-" ? it's padded or a trival character?

- henryadam on November 23, 2013 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

m = 12 is the length of the two strings.

- Ravio on November 25, 2013 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

To cope with the 'billions' part you could make a Levenshtein Automaton of the pattern string, with as wide of edit distance as you're willing, convert that to an NFA, run the possible matches in parallel (not OS parallel), add logic (somehow) to keep count of the longest. It's possible that the branches in the NFA would make it exponential, but I'm sure that with good coding you could make it max out at something less (there should be only so many states). On its best day, this would be O(m = length of input string). Given that we're starting with O(n x m) we've got room to play with. The NFA 'threads' would keep track of their history - the longest you'd replay, with the right characters, '-' and so on.

- JeffD on March 03, 2014 | Flag Reply


Add a Comment
Name:

Writing Code? Surround your code with {{{ and }}} to preserve whitespace.

Books

is a comprehensive book walking you through getting a job at a top tech company, while focuses on dev interviews and does this for PMs.

Learn More

Videos

CareerCup's interview videos give you a real-life look at technical interviews. In these unscripted videos, watch how other candidates handle tough questions and how the interviewer thinks about their performance.

Learn More

Resume Review

Most engineers make critical mistakes on their resumes -- we can fix your resume with our custom resume review service. And, we use fellow engineers as our resume reviewers, so you can be sure that we "get" what you're saying.

Learn More

Mock Interviews

Our Mock Interviews will be conducted "in character" just like a real interview, and can focus on whatever topics you want. All our interviewers have worked for Microsoft, Google or Amazon, you know you'll get a true-to-life experience.

Learn More