Google Interview Question
SDE-2sCountry: United States
Can you write the code in 20 min interval?
It is easier to write Robin-Karp algo that has the same average complexity. But you should mention that it could be O(nm) in some bad cases.
Even better if you ask whether the characters of the text is unique. If they are you can easily do O(n) algo just traversing.
You need to use a Trie suffix tree
It can be done with O(n) when n is the length of the long string
Then we just check if the short string exists in the suffix tree O(m) where m is the length of the short string
C++ Rabin-Karp implementation. O(n+m) time.
#include <iostream>
#include <iterator>
static const int prime_mod = 2147483647;
static const int prime_factor = 257;
template <typename T>
long long rabinkarp_hash(long long value, T elem) {
return (value + elem) * prime_factor % prime_mod;
}
template <typename RandIt1, typename RandIt2>
RandIt2 rabinkarp_find(RandIt1 bneedle, RandIt1 eneedle,
RandIt2 bhaystack, RandIt2 ehaystack) {
if (bneedle >= eneedle || bhaystack >= ehaystack)
return ehaystack;
RandIt1 needle = bneedle;
typename std::iterator_traits<RandIt1>::difference_type needle_len =
std::distance(bneedle, eneedle);
// Calculate initial hash for needle, haystack and power
long long power = 1;
long long needle_hash = 0;
long long haystack_hash = 0;
while (bneedle < eneedle && bhaystack < ehaystack) {
power = rabinkarp_hash(power, 0);
needle_hash = rabinkarp_hash(needle_hash, *bneedle++);
haystack_hash = rabinkarp_hash(haystack_hash, *bhaystack++);
}
while (bhaystack < ehaystack) {
if (haystack_hash != needle_hash) {
// Remove first element from haystack hash
haystack_hash -= power * *(bhaystack - needle_len) % prime_mod;
haystack_hash = rabinkarp_hash(haystack_hash, *bhaystack++);
}
if (haystack_hash == needle_hash) {
// Hashes are equal, make sure the contents are equal
if (std::equal(needle, eneedle, bhaystack - needle_len)) {
bhaystack -= needle_len;
break;
}
}
}
return bhaystack;
}
size_t rabinkarp_find(const std::string& needle, const std::string& haystack) {
auto it = rabinkarp_find(needle.begin(), needle.end(), haystack.begin(), haystack.end());
if (it != haystack.end())
return std::distance(haystack.begin(), it);
return std::string::npos;
}
int main() {
size_t pos = rabinkarp_find("tl", "bottle");
std::cout << pos << std::endl;
return 0;
}
@one of candidate this is one of the easiest linear-time searching algorithms to write on a whiteboard in 20 minutes. It looks a bit complicated on the code above because I used C++ templates, but it can be much simpler without that. If you are asked to write a linear-time string searching algorithm in 20 minutes, I suggest you to go with Rabin Karp.
public class SubString {
/**
* @param args
*/
public static void main(String[] args) {
// TODO Auto-generated method stub
String complete="botytytle";
String check="tl";
int flag=0;int j=0;
for(int i=0;i<complete.length();i++)
{
if(complete.charAt(i)==check.charAt(j))
{
flag=1;
j++;
}
if(j==check.length())
{
System.out.println("YES");
System.exit(0);
}
else if(flag==0&&j!=0)
{
i--;
j=0;
}
flag=0;
}
}
}
What is the significance of below given else if block , please help to understand.
else if(flag==0&&j!=0)
{
i--;
j=0;
}
@sumanjoshi It is there to reset j (in the event that there is a detection of a letter but subsequent letters do not match). And i is reduced so that in the next iteration the first letter of the 'check' string can be compared with the letter with which the second letter of 'check' string was compared in the previous iteration.
Note: try dry-running to the part where 't' of 'check' string is getting compared to the t of 'complete' string.
This looks like a typical language detection problem for a Turing Machine.
Code:
public class SubStringFinder {
public static int GetSubStrLog(String main_str, String sub_str) {
int pos_main = 0;
while (pos_main < main_str.length()) {
int pos_sub = 0;
while(sub_str.charAt(pos_sub++) == main_str.charAt(pos_main++))
if (pos_sub == sub_str.length())
return pos_main - sub_str.length();
pos_main -= (pos_sub - 1);
}
return -1;
}
}
Sample:
public static void main(String[] args) {
// TODO code application logic here
String main_str = "This is thethethe maithe mai the main string.";
String sub_str = "the main";
int pos = SubStringFinder.GetSubStrLog(main_str, sub_str);
if (pos > -1)
System.out.println("Found the sub-string at position " + Integer.toString(pos) + ".");
else
System.out.println("Not found.");
}
Output:
I added a line to fix the error. It works now.
I had forgotten to reset main_pos to the begining of where we lock on a substring.
public void checkSubString(String full, String part){
int pos = 0;
for(int i=0,l=full.length(); i<l; ++i){
if(full.charAt(i) == part.charAt(pos)){
pos++;
if(pos == part.length()){
System.out.println("Found!");
System.exit(0);
}
}else if(pos > 0){
i -= pos;
pos = 0;
}
}
System.out.println("Not Found!");
}
This algorithm fails to find when the inputs are 'aaabaabaazb', 'aabaab'.
Very good attempt though.
public static void main(String[] args) {
String str="Writing Code? Surround your code with";
String s="de?";
boolean res=false;
char[] cStr=str.toCharArray();
int len=cStr.length;
int slen=s.length();
int h=slen-1;
int l=0;
int j=0;
String st="";
while(h<len){
if(l<=h){
st+=""+cStr[l];
l++;
}else{
l=++j;
h=l+slen;
st="";
}
if(st.equals(s)){
res=true;
break;
}
}
System.out.println("Is in String : "+res);
}
Sorry About that..
public static void main(String[] args) {
String str="aaabaabaazb";
String s="aabaab";
boolean res=false;
int len=str.length();
int slen=s.length();
int h=slen-1;
int l=0;
int j=0;
int ctr=0;
String st="";
while(h<len){
if(l<h){
if(str.charAt(l)==s.charAt(l-j)){
ctr++;
}else{
ctr=0;
}
l++;
}else{
l=++j;
h=l+slen;
st="";
ctr=0;
}
if(slen==ctr){
res=true;
break;
}
}
System.out.println("Is in String : "+res);
}
Here we build the hash with all the suffixes of the longer string. Only the keys are truncated to the size of smaller string. Time Complexity O(N) with a hash ofcourse. Written in Python.
def isSubString(mainString,subString):
subStrLen = len(subString)
#hash
suffixHash = {}
for i in range(0,len(mainString)):
key = mainString[i:i+subStrLen]
suffixHash[key] = True
print suffixHash.keys()
if subString in suffixHash:
print "True"
else:
print "False"
isSubString("hellothisiscool", "isi")
def FindString(String, subString):
pLen = len(subString)
pString = ''
found = False
if pLen > 0:
for i in range(len(String)):
pString = String[i:i+pLen]
if pString == subString:
return True
return found
def PrintMatch(String, subString):
if FindString(String, subString):
print "Found %s inside %s" %(subString, String)
else:
print "No match Found. "
PrintMatch('bottle', 'tl')
PrintMatch("dceababac", "abac")
Found tl inside bottle
Found abac inside dceababac
I have a similar implementation as yours in the same thread.(in python as well). The thing is you have a string "==" inside the loop. "==" is an operation that takes O(M), where m is the size of the smaller string. The program takes O(N*M). To avoid that "==" I had to use a dictionary.
Changed the algorithm to Python Dictionary as per your suggestion
def FindString(String, subString):
pLen = len(subString)
pString = ''
found = False
pHashTable={}
pHashTable[subString]=subString
if pLen > 0:
for i in range(len(String)):
pString = String[i:i+pLen]
if pHashTable.has_key(pString):
return True
return found
def PrintMatch(String, subString):
if FindString(String, subString):
print "Found %s inside %s" %(subString, String)
else:
print "No match Found. "
PrintMatch('bottle', 'tl')
PrintMatch("dceababac", "abac")
Found tl inside bottle
Found abac inside dceababac
Sorry guys, my previous code had a little bug in that ....
Here is the updated version ...
public static void checkSubString(String full, String part){
int pos = 0;
for(int i=0,l=full.length(); i<l; ++i){
if(full.charAt(i) == part.charAt(pos)){
pos++;
if(pos == part.length()){
System.out.println("Found!");
System.exit(0);
}
}else if(pos > 0){
i -= pos;
pos = 0;
}
}
System.out.println("Not Found!");
}
public static void main(String[] args) {
String str="aaabaabaazb";
String s="aabaab";
boolean res=false;
int len=str.length();
int slen=s.length();
int h=slen-1;
int l=0;
int j=0;
int ctr=0;
String st="";
while(h<len){
if(l<h){
if(str.charAt(l)==s.charAt(l-j)){
ctr++;
}else{
ctr=0;
}
l++;
}else{
l=++j;
h=l+slen;
st="";
ctr=0;
}
if(slen==ctr){
res=true;
break;
}
}
System.out.println("Is in String : "+res);
}
There is a different version of KMP that is a little space inefficient, but does the trick in O(n) time anyways. It involves creating a DFA which can be run on the input text in linear time to identify if the given pattern in a substring. I could code it in 5 min...so hopefully doing so in an interview in 20 min should not be a problem.
I have not checked the code very exhaustively, so please let me know if there are any bugs.
#include<iostream>
#include<vector>
using namespace std;
int main()
{
string text,pat;
cin>>text>>pat;
vector<vector<int> > dfa(pat.size(),vector<int>(1<<8 , 0));
dfa[0][pat[0]-0]=1;
int i=0,j=1;
while(j<pat.size()){
for(int q=0;q<dfa[j].size();++q){
if(pat[j]-0==q)dfa[j][q]=j+1;
else dfa[j][q]=dfa[i][q];
}
i=dfa[i][pat[j]-0];
++j;
}
for(i=0,j=0;i<text.size();++i){
j=dfa[j][text[i]-0];
if(j==pat.size()){
cout<<"found"<<endl;
break;
}
}
return 0;
}
input format :
[text]
[pattern]
eg:
bottle
tl
Ai Diegao!!!!
C++ Implementation of Knuth Morris Pratt algorithm might do the trick.
By not going back to the begining of the main sentence whenever we have a partial match that ends up becoming a failure.
This should be O(n+m) -> O(m) to pre-process(failureFunction) the pattern and O(n) to go through the sentence once.
#include <iostream>
#include <iterator>
#include <vector>
using namespace std;
void failureFunction(const string& pattern, vector<int>& vet)
{
size_t pos = 2;
size_t cnd = 0; // the zero-based index in the pattern of the next character of the current candidate substring
vet[0] = -1;
vet[1] = 0;
while (pos < pattern.length() )
{
if (pattern[pos - 1] == pattern[cnd])
{
cnd =cnd + 1;
vet[pos] = cnd;
pos = pos + 1;
}
else if (cnd > 0)
cnd = vet[cnd];
else
{
vet[pos] = 0;
pos = pos + 1;
}
}
}
int KnuthMorrisPratt(const string& text, const string& pattern)
{
int i = 0;
int j = 0;
if( pattern.length() > text.length() )
return -1;
vector<int> matchPattern;
matchPattern.resize(pattern.length());
// O(m) -> Lets pre-process the pattern and find out what are the failure points.
failureFunction(pattern, matchPattern);
for ( ; ; ) {
if (j == text.length()) break; // we reached the end of the text
if (text[j] == pattern[i]) //searching for a match...
{
j++;
i++;
if (i == pattern.length()) // match found
return j-pattern.length(); // This is the position we found the pattern.
}
else if(i > 0) // Match part of the pattern.
i = matchPattern[i]; // matchPattern[i] will be 0 in case there is no "submatch" in the match pattern array.
// Else, trying to scape the O(nxm), we simply go back to the last matching position from the pattern where
// there still might be a chance of matching
else
j++; // go to the next character from the text.
}
return -1;
}
int main() {
int pos = KnuthMorrisPratt("aaaabbbbaaabbbaabbababababc", "aabbab");
cout << pos << endl;
pos = KnuthMorrisPratt("aaaabbbbaaabbbabbbababababc", "abbbab");
cout << pos << endl;
pos = KnuthMorrisPratt("aaaabbbbaaabbbabbbababababc", "abab");
cout << pos << endl;
pos = KnuthMorrisPratt("aaaabbbbaaabbbabbbababababc", "abbbabbb");
cout << pos << endl;
pos = KnuthMorrisPratt("aaaabbbbaaabbbabbbababababc", "aaabbbbaaabbbabbbababababc");
cout << pos << endl;
pos = KnuthMorrisPratt("hellothisiscool", "isi");
cout << pos << endl;
return 0;
}
Simple Rabin-Karp implementation in Python with a (fairly naive) running hash function...
class RollingHash:
s = None
start = 0
end = 0
hash = 0
def __init__(self, s, length):
self.s = s
self.end = length - 1
for i in range(length):
self.hash += ord(s[i])
def update(self):
if self.s is None:
return
self.hash -= ord(self.s[self.start])
self.start += 1
self.end += 1
if self.end <= len(self.s) - 1:
self.hash += ord(self.s[self.end])
def search(s, ss):
if len(ss) > len(s):
return False
sshash = RollingHash(ss, len(ss))
shash = RollingHash(s, len(ss))
while shash.end <= len(s) - 1:
if sshash.hash == shash.hash:
if ss == s[shash.start:(shash.end + 1)]:
return True
shash.update()
return False
print search("Hello, world", "o, w")
The main improvements here would be a more sophisticated hash function which is less likely to have collisions, but I think this suffices to code it up in 10-20 minutes.
how about using recursion?
bool my_strstr(char * needle, char * heystack) {
if (!needle || !heystack) // error
return false;
if (!*needle) // reached end of needle, means found
return true;
if (!*heystack) // reached end of heystack, means not found
return false;
if (*needle == *heystack) // first chars match, so try rest of the chars
return my_strstr(needle+1, heystack+1);
// first chars don't match, so try in rest of heystack
return my_strstr(needle, heystack+1);
}
how about using recursion?
bool my_strstr(char * needle, char * heystack) {
if (!needle || !heystack) // error
return false;
if (!*needle) // reached end of needle, means found
return true;
if (!*heystack) // reached end of heystack, means not found
return false;
if (*needle == *heystack) // first chars match, so try rest of the chars
return my_strstr(needle+1, heystack+1);
// first chars don't match, so try in rest of heystack
return my_strstr(needle, heystack+1);
}
how about using recursion?
bool my_strstr(char * needle, char * heystack) {
if (!needle || !heystack) // error
return false;
if (!*needle) // reached end of needle, means found
return true;
if (!*heystack) // reached end of heystack, means not found
return false;
if (*needle == *heystack) // first chars match, so try rest of the chars
return my_strstr(needle+1, heystack+1);
// first chars don't match, so try in rest of heystack
return my_strstr(needle, heystack+1);
}
how about using recursion?
bool my_strstr(char * needle, char * heystack) {
if (!needle || !heystack) // error
return false;
if (!*needle) // reached end of needle, means found
return true;
if (!*heystack) // reached end of heystack, means not found
return false;
if (*needle == *heystack) // first chars match, so try rest of the chars
return my_strstr(needle+1, heystack+1);
// first chars don't match, so try in rest of heystack
return my_strstr(needle, heystack+1);
}
how about using recursion?
bool my_strstr(char * needle, char * heystack) {
if (!needle || !heystack) // error
return false;
if (!*needle) // reached end of needle, means found
return true;
if (!*heystack) // reached end of heystack, means not found
return false;
if (*needle == *heystack) // first chars match, so try rest of the chars
return my_strstr(needle+1, heystack+1);
// first chars don't match, so try in rest of heystack
return my_strstr(needle, heystack+1);
}
how about using recursion?
bool my_strstr(char * needle, char * heystack) {
if (!needle || !heystack) // error
return false;
if (!*needle) // reached end of needle, means found
return true;
if (!*heystack) // reached end of heystack, means not found
return false;
if (*needle == *heystack) // first chars match, so try rest of the chars
return my_strstr(needle+1, heystack+1);
// first chars don't match, so try in rest of heystack
return my_strstr(needle, heystack+1);
}
how about using recursion?
bool my_strstr(char * needle, char * heystack) {
if (!needle || !heystack) // error
return false;
if (!*needle) // reached end of needle, means found
return true;
if (!*heystack) // reached end of heystack, means not found
return false;
if (*needle == *heystack) // first chars match, so try rest of the chars
return my_strstr(needle+1, heystack+1);
// first chars don't match, so try in rest of heystack
return my_strstr(needle, heystack+1);
}
String pattern = "abac";
String text = "dceababac";
int patternLength = pattern.length();
for (int textIndex = 0; textIndex < text.length(); textIndex++) {
if (textIndex + patternLength <= text.length()) {
String subString = text.substring(textIndex, textIndex
+ patternLength);
if (pattern.equalsIgnoreCase(subString)) {
System.out.println(textIndex);
}
}
}
I think it ensures O(n) time complexity:
public static boolean isSubString(String str, String subStr){
int index=0;
int maxIndex = subStr.length();
for(int i=0;i<str.length();i++){
if(str.charAt(i)==subStr.charAt(index)){
index++;
}else{
if( index>0 && (str.charAt(i) == subStr.charAt(index-1)) )
continue;
else
index = 0;
}
if(index == maxIndex)
return true;
}
return false;
}
Correction:
if( index==1 && (str.charAt(i) == subStr.charAt(index-1)) )
Now it works for all cases.
your first code fails for str = "aaabbc" and subStr = "aabc"
your updated code fails most often
Let us assume that we want to know if the "ref" is substring of "andrreefjtjeeft".
If we use the method that I provided without the correction, the method returns "true". But that is not correct.
With the correction, it works fine. I ran a set of tests and I have not detected any problem.
Could you post the example that you use?
This will run in O(n) time complexity, please comment if something wrong.
private static boolean checkIfSub(String str, String sub)
{
if(str==null)
return false;
if(sub==null)
return true;
if(sub.trim().length()>str.trim().length())
return false;
int j=0;
boolean isSub = true;
for(int i=0;i<str.length() && j<sub.length();i++)
{
if(str.charAt(i) == sub.charAt(j))
{
isSub = true;
j++;
}
else if(j==sub.length()-1)
{
isSub = true;
break;
}
else
{
isSub = false;
j=0;
}
}
return isSub;
}
}
what is the need to do i--, as if character at i the pos and jth pos does not match we just need to move further with making j to 0.
DO KMP.
- Dalek November 26, 2013