Amazon Interview Question
Country: United States
Could u elaborate? I could only think of a O(n^2) solution with suffix tree. But this is easily achievable with DP in O(n^2).
Suffix tree is a good data structure for pattern matching. Look it up for thorough concept but in a nut shell, suffix tree stores all suffixes of a given string. for example : Algorithm has suffix : Algorithm, lgorithm, gorithm,...,hm,m. To build this tree takes O(n) time, n being length of string.
Now for checking if string(m) is substring of string(n), we do a string search on the suffix tree built earlier. this takes O(m) time. total O(n+m)
Suffix tree is an ok'ish (though more memory-consuming than necessary) algorithm in a general substring match, but a horrible algorithm for this problem since wildcards are allowed. How do you handle the wild cards in O(n) time?
Oh wait sorry I thought the problem statement said the string might contains wild cards in the sense that any character can replace it. In that case suffix tree is a good algorithm, though I'd prefer boyer-moore because it takes less space and, on average, is faster.
Couldn't you simply remove not alphabetic characters in a linear pass and then do a KMP substring search? O(N) time with O(P) extra space (Technically O(N+P)where N is the string to be searched and P is the substring with wild cards removed but since P must be strictly less than N it is O(N) overall
static boolean isSubString(String s1,String s2)
{
int start=0;
while(start<s1.length())
{
int cur=start;
int star=-1;
int mark=-1;
for(int i=0;i<s2.length();)
{
if(cur>=s1.length()) break;
if(s1.charAt(cur)==s2.charAt( i )||s2.charAt( i )=='.')
{
if(i==s2.length()-1)//last element
{
return true;
}
else
{
cur++;
i++;
}
}
else if(s2.charAt( i )!='*')
{
if(star!=-1)
{
i=start+1;
cur=mark+1;
mark++;
}
else break;
}
else//s2.charAt( i )=='*'
{
if(i==s2.length()-1)//last element
{
return true;
}
else//keep cur and advance i
{
star=i;
mark=cur;
i++;
}
}
}
start++;
}
return false;
}
Here's a generalized regex substring matcher. Supports things like a+ (1 or more 'a'), ?* and .* (zero or more wildcards), as well as .+ and ?+.
bool is_regex_substr(const char* s, const char* p)
{
if (0 == s || 0 == p) return false;
if (0 == *p) return true;
if (0 == *s && 0 != *p && '*' != *(p + 1)) return false;
if ('*' == *(p + 1))
{
if (is_regex_substr(s, p + 2)) return true;
while (*s != 0 && (*s == *p || *p == '.' || *p == '?'))
{
if (is_regex_substr(s + 1, p + 2)) return true;
++s;
}
return false;
}
if ('+' == *(p + 1))
{
while (*s != 0 && (*s == *p || *p == '.' || *p == '?'))
{
if (is_regex_substr(s + 1, p + 2)) return true;
++s;
}
return false;
}
if ('?' == *p || *s == *p) return is_regex_substr(s + 1, p + 1);
return false;
}
// usage
auto irs = is_regex_substr("careercup", "c?re+.*r+?+d*p");
Suffix tree
- AlgoAlgae December 02, 2014