Microsoft Interview Question
Software Engineer in TestsI am just curious why not to use TRIE .. just because example gives us two small strings.
Do we need to probe further to know the size of the text (hay) and search text (needle) to come up with a decision between TRIE and KMP/BM.
KMP/BM process the needle where as TRIE process the hay.. once intialized TRIE is depending upon size of the pattern rather than text..
How should we start tackling this kind of question ?
You must find all the occurrences of the pattern in the text.This means that you must create a trie of suffixes of the text to use it right.And the complexity in this case is O(N^2), with N the length of the text.KMP is much more simpler to implement and with a better complexity.
Thanks, so i believe that the given input/example will be the key. So we should always probe with interviewer how the function will be used, how much will be the data size etc etc.
Since in this example we are given sample strings so we can safely take KMP/BM.
Actually, a trie could work here if we put the pattern in it. But it isn't a true trie, it's more like a finite automata, to be useful.And this idea is very good for multiple patterns.For details, search for Aho-Corasick algorithm. For a single pattern, it's the same with KMP, but it's power comes with multiple patterns.So, with single patterns KMP it is the best, for multiple patterns, a trie constructed with Aho-Corasick algorithm will do the job.
public class FindSubstring {
String inputString;
String stringToMatch;
FindSubstring(String input, String toMatch) throws Exception{
if(input == null || toMatch==null)
throw new java.lang.Exception();
this.inputString = input;
this.stringToMatch = toMatch;
}
public int FindOccurances(){
return this.FindOccurances(this.inputString, this.stringToMatch);
}
public int FindOccurances(String input, String tomatch){
int occurances = 0;
char[] inputArray = input.toCharArray();
char[] subArray = tomatch.toCharArray();
for(int i=0 ; i< inputArray.length; i++){
int temp = i;
boolean match = true;
for(int j=0; j<subArray.length; j++){
if(temp< inputArray.length){
if( inputArray[temp] == subArray[j])
temp++;
else{
match = false;
break;
}
}
else
break;
}
if(match){
i = temp-1;
occurances++;
}
}
return occurances;
}
}
public class FindSubstring {
String inputString;
String stringToMatch;
FindSubstring(String input, String toMatch) throws Exception{
if(input == null || toMatch==null)
throw new java.lang.Exception();
this.inputString = input;
this.stringToMatch = toMatch;
}
public int FindOccurances(){
return this.FindOccurances(this.inputString, this.stringToMatch);
}
public int FindOccurances(String input, String tomatch){
int occurances = 0;
char[] inputArray = input.toCharArray();
char[] subArray = tomatch.toCharArray();
for(int i=0 ; i< inputArray.length; i++){
int temp = i;
boolean match = true;
for(int j=0; j<subArray.length; j++){
if(temp< inputArray.length){
if( inputArray[temp] == subArray[j])
temp++;
else{
match = false;
break;
}
}
else
break;
}
if(match){
i = temp-1;
occurances++;
}
}
return occurances;
}
}
Yeah, KMP is the best applicable algorithm here.
Also, If I were Jwalant, I would restrain myself from giving a O(mn) [n - length of string, m - length of pattern] algorithm when, clearly, KMP gives a O(n) Solution. I've been immediately asked in the past by interviewers to give a better algorithm [Knuth-Morris-Pratt or Robin-Karp].
I think doing it by creating a Suffix tree for the input text will be a better approach. The preprocessing time is O(n), but once we have the suffix tree, all the queries are linear in the size of Pattern.
The problem with KMP is that all queries take O(n + m) time. This can very inefficient.
I am wondering something..
KMP or Boyer Moore can be used to solve this problem efficiently.
However, can anybody in this forum write code ot implement these algorithms?
What if the interviewer asks to write some code for it?
They are complicated algos and I doubt whether it will be enough to say "We should use KMP".
Please respond
public static int StringOccurance(string strArr, string str)
{
int count = 0;
int len = strArr.Length - str.Length;
for (int i = 0; i <= len; i++)
{
if (strArr.Substring(i, str.Length).Equals(str))
count += 1;
}
return count;
}
Use KMP with a bit modification. "Have a look at the following code.
Please let me know the case in which it fails.
Thanks
#include<stdio.h>
#include<string.h>
int f[100];
void fail(char *pat)
{
int i, j, l;
l = strlen(pat);
for(j=1;j<l;j++)
{
if(pat[j] = pat[i+1] && i>=0)
i = f[i];
else if (pat[j] = pat[i+1])
f[j] = i+1;
else
f[j] = -1;
}
return ;
}
int pm(char *str, char *pat)
{
int i,j,l1,l2;
l1 = strlen(str);
l2= strlen(pat);
i = j = 0;
while(i<l1 && j < l2)
{
if(str[i] == pat[j] )
{i++;j++;}
else if (j == 0)i++;
else
{
j = f[j-1]+1;
i++;
}
if(j == l2)
{
printf("match at %d\n", i-l2);
j = 0;i = i-l2+1 ;
}
}
if(j == l2)
return i-l2;
else
return -1;
}
int main()
{
char *str="CCCC";
char *pat = "CC";
int index = -1;
index = pm(str, pat);
printf("%d", index);
return 0;
}
public class KMP {
public static void main(String[] args) {
String text = "CCCC";
String pattern = "CC";
int[] failureFunction = new int[pattern.length()];
preKMP(pattern.toCharArray(), failureFunction);
kmp(text.toCharArray(), pattern.toCharArray(), failureFunction);
}
/**
*
* @param t
* @param p
* @param f
*/
private static void kmp(char[] t, char[] p, int[] f) {
int i = 0;
int j = 0;
int count = 0;
while (i < t.length) {
if (t[i] == p[j]) {
i++;
j++;
} else if (j > 0) {
j = f[j - 1];
} else {
i = i + 1;
}
//here come the modification for original algo as we want to keep searching once
// we found the first match , so we shift the pattern again by using preKMP function
if (j >= p.length) {
count++;
j = f[j - 1];
}
}
if (count > 0) System.out.println(count);
}
static void preKMP(char[] pattern, int f[]) {
int i = 1;
int j = 0;
f[0] = 0;
while (i < pattern.length) {
if (pattern[i] == pattern[j]) {
j++;
f[i] = j;
i++;
} else if (j > 0) {
j = f[j - 1];
} else {
f[i] = 0;
i++;
}
}
}
}
Knuth-Morris-Pratt algorithm.
- bogdan.cebere October 24, 2010