## Interview Question for Software Engineer / Developers

Country: United States
Interview Type: Phone Interview

Comment hidden because of low score. Click to expand.
2
of 2 vote

Preprocess T so that we can answer queries of the following form:

- Given a character c and a position p, what is the smallest position j > p, such that c appears at position j in T (we allow p to be -1, assuming positions start at 0).

This can easily be done in O(n) preprocessing time (assuming 255 possible characters, linear time for each character) so that the queries can be answered in O(1) time.

Now for a given S we can use the above data structure (called DS below) as follows

``````p = -1
for char c in S:
j = DS.query(p,c)
return False
p = j
return True``````

Thus, for each string S, we only require O(|S|) time.

Comment hidden because of low score. Click to expand.
1
of 1 vote

Yeah!

Comment hidden because of low score. Click to expand.
0

DS.query has lower bound lglgT (like if DS is van emde boas at each character bucket) or lgT if you use balanced BST at each character bucket correct?

Comment hidden because of low score. Click to expand.
0

or balanced BST at each character bucket can be sorted array (which do binary search on) to improve cache hits

lasthope, this was your idea right?

but I like subbu do it without specifying the actual DS/algorithm exactly so it give clean look from above
nice one subbu

Comment hidden because of low score. Click to expand.
1
of 1 vote

It is O(1), by just using a table, say Lookup[p,c].

This table, of size O(|T|) can be filled in O(|T|) time.

Basically a matrix of 256*|T| integers.

For example if string is ab

Then the matrix will look like (hope formatting goes through)

``````a  | b  | c   | ....  | z
------------------------
-1|  0 | 1  | -1 | ....  | -1
0 | -1 | 1  | -1 | ....  | -1
1 | -1 | -1 | -1 | ..... | -1``````

You can start by filling the row corresponding to |T| and work right to left.

Comment hidden because of low score. Click to expand.
0

-=, yes, nice clean implementation. Although I'm not sure how table access for query can be O(1).

Comment hidden because of low score. Click to expand.
0

Yeah I think the successor query (which is "DS.query" by subbu) has lower bound lglgT. Not sure if we can beat van embde boas (which match the very lower bound).

But the way lasthope thinking about it (which in bigO is lgT) is better for cache hits because he does binary search over static array instead of pointer based van emde boas (lglgT).

But I like subbu presenting it is as abstract DS.query, but maybe we rename it as DS.successor :)

Subbu how to do actual successor query in O(1) on your table of mostly -1 (sparse) ?

Comment hidden because of low score. Click to expand.
0

@lasthope.

Constructing the table is O(|T|) time and query is O(1).

For example just for a single character, say 'h', you can fill its column in O(|T|) time by scanning through the string right to left.

Do this for each character (256) and you have the table.

Comment hidden because of low score. Click to expand.
0

@subbu, I understand that the table construction is O(T). But, if you want to find the lowest non -1 entry of 'h' (which is greater than p) in the table, you have a problem. You may have to pass through many values less than equal to p or simply -1 before you meet your criteria. Hope I'm making sense.

Comment hidden because of low score. Click to expand.
0

In the table above column 'c' has all -1. Mistake there.

@lasthope/ -=. I hope you got why it is O(1).

Comment hidden because of low score. Click to expand.
0

@lasthope.

Here is the code for a single character, which should explain what I am trying to say better.

``````int *GetColumn (char * T, char c) {
int L = strlen(T);
int *column = new column[L];
int position = -1;
for (j = L-1 ;  j>= 0; j--) {
column[j] = position;
if (T[j] == c) {
// found c, update the new next for the c that are to the left of current position.
position = j;
}
}
}``````

Comment hidden because of low score. Click to expand.
0

in GetRow, in the end there is a

``return column;``

which is missing.

Comment hidden because of low score. Click to expand.
0

Yeah it would seem lasthope's binary search is the divide and conquery improvement of the table idea.

We can discuss "per character buckets" (we put some DS at each character bucket - this part is same for all solutions we are discussing).

At a character bucket if we hang a sparse vector of length T with mostly -1, then worst case we might have to travel paths linear in T to find query. It seems O(T).

lasthope suggest converting the very same -1 sparse vector into an array of indices (sorted) to binary search on O(lgT).

or we can hang a van emde boas off each character bucket to do successor queries in O(lglgT).

Since Tmax is only 10M , lgTmax is 23, lglgTmax is 4

So the internal constants being smaller for binary search probably wins here.

I read somewhere the successor problem (for reasonable requirements for preprocessing times and space) has a lower bound O(lglgT) optimal query.

Comment hidden because of low score. Click to expand.
0

@subbu, You are absolutely right. A table constructed this way will immediately point to where the next position is. Perfect.

Comment hidden because of low score. Click to expand.
0

subbu , lasthope

i am still blocked , help!

subbu can you post all code again in a new one because it seem you got the optimal solution so we can upvote that one and understand better?

Comment hidden because of low score. Click to expand.
0

@-=. I believe you are thinking of the case where the n integers come from a universe of size N, with N >> n, with Theta(N) space usage being prohibitively expensive.

That is not the case here. N = |T| here.

btw, the Omega(log log N) bound was just a conjecture that has been disproved.

Comment hidden because of low score. Click to expand.
0

@-=, you have to understand how subbu is updating his table. table['c'][pos] is giving you the value

``min {i | i > p, T[i] = 'c' }``

, which is exactly what you need to query.

Comment hidden because of low score. Click to expand.
0

p = pos :(

Comment hidden because of low score. Click to expand.
0

Oh!
So extra preprocessing to O(max_char*T) to reduce to O(1) query.
Makes sense.

In Java this would be 64k*10M preprocessing worst case.

I understand all your amazing solutions. Unblocked thx lasthop subbu

Comment hidden because of low score. Click to expand.
1
of 1 vote

Here is some quick C++, fully working (but not production quality), commentless code. (For explanation, there are a bunch of comments preceding).

``````#include <string.h>
#include <iostream>
#include <vector>

using std::cout;

class Successor {
public:
int Query(char c, int position) {
return _table[c-1][position+1];
}

Successor(const char *T) {
for (int c = 1; c <= 256; c++) {
_table[c-1] = GetRow(T,c);
}
}

~Successor() {
for (int i = 0; i < 256; i++) {
delete [] _table[i];
}
}

private:
int *GetRow(const char *T, char c) {
int L = strlen(T);
int *column = new int[L+1];
int position = -1;
for (int j = L-1; j >= 0; j--) {
column[j+1] = position;
if (T[j] == c) {
position = j;
}
}
column[0] = position;
return column;
}
int *_table[256];
};

class SubSequenceChecker {
public:
bool Check(const char *S) {
int position = -1;
while (*S != '\0') {
position = _DS.Query(*S, position);
if (position == -1) {
return false;
}
S++;
}
return true;
}

SubSequenceChecker(const char *T) : _DS(T) {
}
private:
Successor _DS;
};

int main() {
SubSequenceChecker s("aabcddef");
cout << s.Check("a") << "\n";
cout << s.Check("abcf") << "\n";
cout << s.Check("abcdef") << "\n";
cout << s.Check("fed") << "\n";
cout << s.Check("aaa") << "\n";
cout << s.Check("aafd") << "\n";
}``````

Comment hidden because of low score. Click to expand.
1
of 1 vote

1. Create a sorted array of the positions where char X is present in T. Do it for all possible chars X (assuming 256).
2. For each char Y in S, do a binary search over the sorted array created for char Y in the first step. Note the position where it is found. Next time we look for Y again, we have look beyond this position.
3. If we do not find a viable position for Y in the sorted array, return false.

Runs in O( |S| log |T| ).

Comment hidden because of low score. Click to expand.
0

On a second thought, using two indexes, one pointing to the current char in S another one pointing to the last matched char plus one in T, it can be done in O( |S| + |T| ) time.

Comment hidden because of low score. Click to expand.
0

O(S + T) vs O(S logT)
usually S is tiny compared to T ~ 10M

Comment hidden because of low score. Click to expand.
0

in that case O( S log T ) seems to be the better option.

Comment hidden because of low score. Click to expand.
0

good sol.

Comment hidden because of low score. Click to expand.
0

So we can do this:
An array or hash of all characters where each bucket is a van Embde Boa tree (keys are the indices where the character for that bucket shows up).
Then as required we can do successor to find next index.

This should make it to O(S lglg T) which is nearly O(S) even with T as 10M

Comment hidden because of low score. Click to expand.
0

lasthope, preprocessing???

Comment hidden because of low score. Click to expand.
0

This won't work.

Comment hidden because of low score. Click to expand.
1
of 1 vote

Oh no, van emde boas is for making tree access cache oblivious. It is not required here.

In step 1 we "pre-process" T so that array[X] holds the positions (sorted if you go from left to right) where char X appears in T. O( T )

Then for each char S_i = 'd' in S (processing chars one by one from left to right), do a binary search in array['d']. Lets say we find it in location l of array['d']. O( log T )

The trick is to remember the location l and next time when we look for 'd' again we have to search array['d'][ l+1: END]. If found we update l. There should be an last found position similar to 'l' for each char processed from S so far.

If the process doesn't fail we return true at the end. O( S log T )

Comment hidden because of low score. Click to expand.
0

can Anonymous explain why lasthope sol not good?

Comment hidden because of low score. Click to expand.
0

The position that is the lower bound is common to all characters. So Step 2 in the description lasthope's algorithm can lead to wrong answers. No?

Comment hidden because of low score. Click to expand.
0

oops, sorry. there should be only one last matched position variable l. we have to search starting from l+1. please see the implementation by subhu below.

Comment hidden because of low score. Click to expand.
0

off by one error :) no worries

Comment hidden because of low score. Click to expand.
0

@-=: No, it is not off-by-one. Step 2 talks about maintaining positions for each character, which actually gives wrong answers.

Comment hidden because of low score. Click to expand.
0

There should have been a delete/edit option :-P. One last match position is all you need. btw, I think the O(1) lookup is very clever, subbu :)

Comment hidden because of low score. Click to expand.
0

c'mon subbu , lasthope idea is good for large max-char also (the correct idea was hidden with some bugs that's all)

Comment hidden because of low score. Click to expand.
0

@-=. I am pretty sure lasthope had the correct solution in mind and it does indeed work for larger character sets. In fact, this is a good discussion to have with the interviewer! Even for smaller character set, it would be a good discussion. Classic tradeoff between time and space (256*T space + O(|S|) runtime vs |T| space + O(|S|log|T|) rutime).

But as written it is wrong, and it could actually be a common mistake made my many of the interview candidates, don't you think? No harm in calling it out.

Comment hidden because of low score. Click to expand.
0

Yes yes.
I made mistake because I had similar solution (using vanEmde) so when I read lasthope's I just skimmed it assumed he was doing similar (successor at each bucket with lgT).

All these discussions are correct, I think can overwhelm the interviewer no matter how interviewer complains. that is key to beating the interviewer, don't let him/her complain anything (always come back and say .. ok in that case if you don't care about preprocessing we can... ok in that case you complain about that so do it this way...)

Good job subbu I really like your original code without any DS.query implementation that way we can drop what we need for that part based on how interviewer complain

Comment hidden because of low score. Click to expand.
0
of 0 vote

Ignore my other code this one is better.

``````HashMap<String, Boolean> memo = new HashMap<String, Boolean>();
Boolean result;
for(int i=i_min; i<=i_max; i++)  {
if( memo.containsKey( s[i] ) result=memo.get( s[i] );

if( T.subSequence(s[i]) ) System.out.println(s[i] + "is a subsequence");
}``````

Comment hidden because of low score. Click to expand.
0
of 0 vote

do following check for every Si....
input - T,
pattern - Si

``````public static boolean match(String input, String pattern){

char[] in = input.toCharArray();
char[] pat = pattern.toCharArray();

int i=0, j=0;
int count=0;
while(i<in.length && j<pat.length){
if(in[i]==pat[j]){
i++;j++;
count++;
if(count==pat.length)
return true;
}else{
i++;
}
}

return false;
}``````

Comment hidden because of low score. Click to expand.
0
of 0 vote

Sorry the above code with HashMap is incomplete. Here is full code
We can use memo dp to improve the algorithm.

``````// use memorization to improve the algorithm from brute force to dp
HashMap<String, Boolean> memo = new HashMap<String, Boolean>();
Boolean result;

for(int i=i_min; i<=i_max; i++)  {
if( memo.containsKey( s[i] ) {
result=memo.get( s[i] );
}
else {
result=T.subSequence(s[i]);
memo.put(s[i], result);
}
if( result) System.out.println(s[i] + "is a subsequence");
}``````

I will post visualization later

Comment hidden because of low score. Click to expand.
0
of 0 vote

``````def subsequence(T, S):
i = 0
j = 0

if len(S) > len(T):
return False

while(i< len(T) and j< len(S)):
if T[i] == S[j]:
print "Matched : {0}".format(T[i])
if j == len(S)-1:
return True
i += 1
j += 1
else:
print "Skip character".format(T[i])
i += 1

return False

subsequence('abbebcd','bbcd')``````

Comment hidden because of low score. Click to expand.
0
of 0 vote

Just use a BST to store T.
Then you can modify inorder search on the BST using each Si to check for subsequence matching.

Comment hidden because of low score. Click to expand.
0
of 0 vote

Great approach to have a character location map. Implemented the code with an IEqualityComparer parameter, just in case the interviewer wants to do this case insensitively.

``````using System;
using System.Collections;
using System.Collections.Generic;

namespace CareerCup
{
internal class Subsequence
{
private const int IndexNotFound = -1;

#region Public Methods

/// <summary>
/// Finds if the each of the <paramref name="patterns"/> string is a subsequence of the <paramref name="source"/> string.
/// </summary>
/// <param name="source">Source string.</param>
/// <param name="patterns">Pattern strings.</param>
/// <param name="comparer">Equality comparer.</param>
/// <remarks>
/// The <paramref name="source"/> string can be upto 10 million characters long.
/// </remarks>
internal static void IsSubsequence(string source, string[] patterns, IEqualityComparer<char> comparer = null)
{
// Let's pre-process the source string.
// Dictionary - This will store the count of each character in the source string.
//              This will help as the first line of defense against verifying if the pattern can exist in the source string.
//
// Dictionary - This will store the location of each character in the source string.
//              The concept is that if we will be able to do binary search to find the characters' existence and the order.

Dictionary<char, int> countMap = new Dictionary<char, int>(comparer);
Dictionary<char, List<int>> locationMap = new Dictionary<char, List<int>>(comparer);

for (int index = 0; index < source.Length; ++index)
{
char currentChar = source[index];

if (countMap.ContainsKey(currentChar))
{
countMap[currentChar] += 1;
}
else
{
countMap[currentChar] = 1;
List<int> locations = new List<int>();
locationMap[currentChar] = locations;
}
}

// Let's process all the patterns.
foreach (string pattern in patterns)
{
bool result = IsPatternASubsequence(pattern, source.Length, new Dictionary<char, int>(countMap, comparer), locationMap);
Console.WriteLine("Pattern: {0} IsSubsequence: {1}", pattern, result);
}
}

#endregion // Public Methods

#region Private Methods

/// <summary>
/// Checks if the <paramref name="pattern"/> string is a subsequence of the source string.
/// </summary>
/// <param name="pattern">Pattern string.</param>
/// <param name="sourceLength">Source string length.</param>
/// <param name="countMap">Character count map of the source string.</param>
/// <param name="locationMap">Character location map of the source string.</param>
/// <returns>Flag indicating whether the <paramref name="pattern"/> string is a subsequence of the source string.</returns>
private static bool IsPatternASubsequence(
string pattern,
int sourceLength,
Dictionary<char, int> countMap,
Dictionary<char, List<int>> locationMap)
{
if (string.IsNullOrWhiteSpace(pattern)
|| pattern.Length > sourceLength
|| !DoesSourceContainPattern(pattern, countMap))
{
return false;
}

// If we have reached here, then the pattern characters are present in the source string.
// Now, we need to make sure that all the characters are in the right order.

int indexInSource = IndexNotFound;
for (int index = 0; index < pattern.Length; ++index)
{
char currentChar = pattern[index];

// Check if the number of characters left in the pattern can be found in the source.
// Will this check improve on our efficiency?
if (!GetNextAvailableIndex(ref indexInSource, locationMap[currentChar])
|| ((sourceLength - indexInSource) < (pattern.Length - index - 1)))
{
return false;
}
}

return true;
}

/// <summary>
/// Gets the next available index of the desired character by looking up the character location.
/// The search is performed using binary search.
/// </summary>
/// <param name="currentIndexInSource">Current index in source string that holds the previous charcter.</param>
/// <param name="locationMaps">Character location map.</param>
/// <returns>Flag indicating whether the desired character exists in the source string after the current index.</returns>
private static bool GetNextAvailableIndex(
ref int currentIndexInSource,
List<int> locationMaps)
{
int start = 0;
int end = locationMaps.Count - 1;
int nextSoFar = IndexNotFound;

while (end >= start)
{
int middle = start + ((end - start) >> 1);

if (locationMaps[middle] == currentIndexInSource)
{
if (middle < locationMaps.Count)
{
nextSoFar = locationMaps[middle + 1];
}

break;
}
else if (locationMaps[middle] > currentIndexInSource)
{
nextSoFar = locationMaps[middle];
end = middle - 1;
}
else
{
start = middle + 1;
}
}

if (nextSoFar == IndexNotFound)
{
return false;
}

currentIndexInSource = nextSoFar;
return true;
}

/// <summary>
/// Verifies that all the characters in the <paramref name="pattern"/> exists in the source string.
/// </summary>
/// <param name="pattern">Pattern string.</param>
/// <param name="countMap">Character count map.</param>
/// <returns>Flag indicating whether the pattern characters exist in the source string.</returns>
private static bool DoesSourceContainPattern(string pattern, Dictionary<char, int> countMap)
{
for (int index = 0; index < pattern.Length; ++index)
{
char currentChar = pattern[index];
int count;
if (!countMap.TryGetValue(currentChar, out count)
|| count == 0)
{
return false;
}

countMap[currentChar] -= 1;
}

return true;
}

#endregion // Private Methods
}

public class CharComparer : IEqualityComparer<char>
{
public bool Equals(char x, char y)
{
return char.ToLowerInvariant(x).Equals(char.ToLowerInvariant(y));
}

public int GetHashCode(char obj)
{
return char.ToLowerInvariant(obj).GetHashCode();
}
}
}``````

Comment hidden because of low score. Click to expand.
0
of 0 vote

It is standard algorithm, it is called Aho-Corasick tree.

Comment hidden because of low score. Click to expand.
-1
of 1 vote

Represent each string as a vector with counts of characters and use that to determine "sub-sequencesness".

Eg: "abbc" will be the vector [1,2,1,0,...,0] with the first entry corresponding to number of a's, second to number of b's etc.

Given abc, which will be [1,1,1,0,...] you check if each corresponding entry of target is <= the entry at the source vector.

Comment hidden because of low score. Click to expand.
0

And vector can be represented as hashtable to reduce the number of entries to check.

Comment hidden because of low score. Click to expand.
0

this is doing a subset check

Comment hidden because of low score. Click to expand.
0

-= is right. Order matters. Good catch -=.

Comment hidden because of low score. Click to expand.
-1
of 1 vote

Here is my solution in Java

``````for(int i=i_min; i<=i_max; i++)  {
if( T.subSequence(s[i]) ) System.out.println(s[i] + "is a subsequence");
}``````

I will post the visualization for this in another reply.

Comment hidden because of low score. Click to expand.
-1
of 1 vote

i am assuming strings are of ASCII character set. since ASCII has 256 charecters, declare an integer array of with 256. count each character presence. do the same by decrementing charecter occurance. if any position got -1, then its not the subset of the first string.

Comment hidden because of low score. Click to expand.
0

subsequence still has order involved

you are doing "subset" checking
you can use this as a first level check though

Comment hidden because of low score. Click to expand.
0

yes it required to be in order. we can use this for the initial check, weather we go further checking for this or not. once its successful we should proceed for sequence check.

Comment hidden because of low score. Click to expand.
-1
of 1 vote

Build a suffix tree of T then do subsequence queries of each S_i

Comment hidden because of low score. Click to expand.
0

Suffix Tree cannot be used here

Comment hidden because of low score. Click to expand.
0

Suffix tree is useful for finding sub-strings not sub-sequences. Won't work here.

Comment hidden because of low score. Click to expand.
-1
of 1 vote

A greedy method should be OK, I think. The C++ code:

``````bool myCmp(string S, string T) {
int lenS = S.length();
int lenT = T.length();
for (int i = 0, j = 0; i < lenS, j < lenT; j++) {
if (S[i] == T[j]) i++;
}
return i == lenS ? true : false;
}``````

The space complexity is O(1); The time complexity is O(n), where n = length of T.
Please correct me, if it's wrong.

Comment hidden because of low score. Click to expand.
-1
of 1 vote

ignore my BST idea pls
subbu and lasthope solutions are best depending on different situations

rock on subbu!!!!!!!!1

Comment hidden because of low score. Click to expand.
-1
of 1 vote

Algorithms on Strings, Trees and Sequences by Gusfield has Anonymous/subbu/= solutions all variations.

all same people win the upvote

-Guest DS

Comment hidden because of low score. Click to expand.
0

Oh, fuck off. Troll.

Comment hidden because of low score. Click to expand.
-2
of 2 vote

{fake ajeet, fake Varun, Anonymous that failed with first solution using hashing, =, fake subbu } all same person 100%

{lasthope} different person

Comment hidden because of low score. Click to expand.
0

you are a fool
= is not part of any other set except {=}

Comment hidden because of low score. Click to expand.
0

newbie here, but wth is this???

Comment hidden because of low score. Click to expand.
0

@lasthope. You can safely ignore this/these idiot/idiots.

Comment hidden because of low score. Click to expand.
-2
of 2 vote

{fake ajeet, fake Varun, Anonymous that failed with first solution using hashing, =, fake subbu } all same person 100%

{lasthope} different person

Comment hidden because of low score. Click to expand.
-2
of 2 vote

{fake ajeet, fake Varun, Anonymous that failed with first solution using hashing, =, fake subbu } all same person 100%

{lasthope} different person who admit defeat in the end

Comment hidden because of low score. Click to expand.
0

Stop EXPOSING yourself algos.

Name:

Writing Code? Surround your code with {{{ and }}} to preserve whitespace.

### Books

is a comprehensive book on getting a job at a top tech company, while focuses on dev interviews and does this for PMs.

### Videos

CareerCup's interview videos give you a real-life look at technical interviews. In these unscripted videos, watch how other candidates handle tough questions and how the interviewer thinks about their performance.

### Resume Review

Most engineers make critical mistakes on their resumes -- we can fix your resume with our custom resume review service. And, we use fellow engineers as our resume reviewers, so you can be sure that we "get" what you're saying.

### Mock Interviews

Our Mock Interviews will be conducted "in character" just like a real interview, and can focus on whatever topics you want. All our interviewers have worked for Microsoft, Google or Amazon, you know you'll get a true-to-life experience.