Amazon Interview Question
Software Engineer / DevelopersCountry: United States
Interview Type: Phone Interview
changed a little bit to make it more concise:
public static boolean find(String s){
if(s.length() < 2) return false;
for(int i = 1; i <= s.length() / 2; i++){
int count = 0;
for(int ptr1 = 0, ptr2 = i; ptr2 < s.length(); ptr1++, ptr2++){
if(s.charAt(ptr1) == s.charAt(ptr2)) count++;
else count = 0;
if(count == i) return true;
}
}
return false;
}
Here aren't you trying to find all substrings starting from 0th character and then matching for consecutive substring.
But if you have say, abcbcde , you'll check for cases
1. a&b
2.ab&cb
3.abc&bcd
Not at all consider bc and bc.
So there should be one more loop to check for the substrings starting from other than 0th character. Also, in that case, you should also consider the case for the string abcbdeefef- where ef and ef appear in bottom half of string.
The way I see this is, it is a DP problem. But for each character you have to build a matrix, which checks for strings starting from it to it's adjacent ones. It is not a case where you need to fill in every entry of DPmatrix, however you need to do somewhat equivalent work. 0(n^3) in worst case.
public boolean isRepeatedSubstring(String s){
int i,j,k,m;
for(i=1;i<s.length();i++){
int count=0;
for(j=0,k=i;j<s.length()-i;j++,k++){
if(s.charAt(k)==s.charAt(j)){
count++;
}
else{
count=0;
}
if(count>=2){
//check if the a substring of length=count from k exist
if (k-j==count)
return true;
}
}
}
return false;
}
If trying to find consecutive duplicate string, it means there must be duplicate character. So iterate each character one by one till string matched.
if found any character matched then fetched the string from current character till matched character index. and fetch second string from matched character index till difference between current index and matched index.
If both string are equal, then return true.
public static boolean isConsecutiveString(String source) {
for(int index = 0 ; index < source.length() ; index++) {
char tempChar = source.charAt(index);
int indexOf = source.indexOf(tempChar, index+1);
if(indexOf != -1) {
int distance = indexOf - index;
if((indexOf+distance) > source.length())
continue;
String firstString = source.substring(index, indexOf);
String secondString = source.substring(indexOf, indexOf+distance);
if(firstString.equals(secondString))
return true;
}else {
continue;
}
}
return false;
}
Aren't you only searching for strings of a single character in length? The problem asks to locate consecutive substrings of any length.
Sure, I see that it searches the whole string for the *start* of a pattern, but that pattern apparently can only be one character *in length*. Have you run your function on the example above: "adabcabcd"?
Good Logic . Liked the bit about the starting character. What is the complexity ? n == stringlength ?
Let there be two suffixes A and B of the given string.
If ( length of longest common prefix of A and B == distance between starting point of A and B in the original string )
=> Consecutive substrings found
You can count matching characters in shifted strings.
public static boolean find2(String str)
{
byte minWordLen = 2;
for (byte j = 1; j < str.length() - minWordLen; ++j)
{
byte count = 0;
for (byte i = 0; i < str.length() - j; ++i)
{
boolean same = (str.charAt(i) ^ str.charAt(i + j)) == 0;
if (!same)
count = 0;
else
if (count++ == minWordLen)
return true;
}
}
return false;
}
char str[20]="abcababceccced";
int len = strlen(str);
int i, j, counter;
for(i = 1; i <= len / 2; ++i)
{
for(j = i, counter = 0; j < len; ++j)
{
if (str[j] == str[j - i])
counter++;
else
counter = 0;
if (counter == i)
{
return true;
}
}
}
The question is ambiguous.
Do you mean repeat substring? Like athethelete : the is repeated.
Or consecutive substring? like youabcde. abcde: characters are consecutive?
Or youbadec? badec: characters are still consecutive when sorted.
Explain.
Assumption: 'ab' is consecutive but not 'ba'
algo:
int previous_char = character[0];
for i=1 to n-1
{
int current = character[i];
if(current = previous_char + 1)
return true;
previous_char = current;
}
return false;
we can use a suffix array to find repeat pattern:
e.g. abcabc# (# only for end of string)
suffices will be
c#
bc#
abc#
cabc#
bcabc#
abcabc#
first we have c#, then we see how long are the common string between bc# and c# when both strings are aligned to their left sides; similarly, how long are the common string between abc# and any of bc# or c#; as long as we have found a common string longer than 1 char, then we consider the input string is a valid string.
In this case:
bcabc# will have a common string 'bc' with bc# (always align strings to their left sides when in comparison), so the input string is valid.
Finally, using tries will make comparison very fast, below the solution codes:
#include<iostream>
#include<vector>
#include<string.h>
using std::vector;
using std::cout;
struct suftree{
char key;
vector<struct suftree> child;
};
typedef struct suftree Stree;
typedef std::vector<Stree>::iterator iterator;
int findMem(Stree& root, char *s)
{
if (!s[0]) return 0;
int nmatch=0;
bool found=false;
for (iterator it=root.child.begin();
it!=root.child.end()&&!found; it++)
if ((*it).key==s[0])
{
nmatch=1+ findMem ((*it), s+1);
found=true;
}
if (!found)
{
Stree temp;
temp.key=s[0];
root.child.push_back(temp);
int n=root.child.size();
findMem(root.child[n-1], s+1);
}
return nmatch;
}
int main()
{
Stree root;
root.key=0;
char s[10000];
char *t_s=NULL;
std::cin>>s;
int slen=strlen(s);
for (t_s = s+ slen-1; t_s>=s; t_s--)
if (findMem(root, t_s)>=2)
{cout<<"The string has repeated continous substring.\n";
return 0;}
return 0;
}
BTW, we even impose some extra requirement like the substring are not shorter than M chars, here you only have to change 2 to M.
the question is looking for consecutive repeated substring, in your case, "bc" has repeated, but not consecutive
A slight modification of main() can also consider consecutive substring:
Once we found any substring with M chars, we just check if M chars immediately after the substring are the same to the substring. Here is the modified main()
int main()
{
Stree root;
root.key=0;
char s[10000];
char *t_s=NULL;
int i;
std::cin>>s;
int slen=strlen(s);
char *s_end=s+slen;
for (i=slen,t_s = s_end-1; t_s>=s; t_s--,i--)
{
int M=findMem(root, t_s);
if (M<2) continue;
bool valid=true;
for (int j=0;j<M;j++)
{
if (t_s+M+j>=s_end) {valid=false;break;}
if (*(t_s+j)!=*(t_s+M+j)) {valid=false;break;}
}
if (valid)
{
t_s[M]=0;
cout<<"The string has consecutive repeated substring: "<<t_s<<"\n";
return 0;}
}
return 0;
}
public static boolean consecutiveSubString(String input, String searchKey)
{
boolean res = false;
for( int i = 0, j = 0, count = 0; i < input.length(); i++, j = 0, count = 0 )
{
while( j < searchKey.length() && i < input.length() && input.charAt( i ) == searchKey.charAt( j ) )
{
j++;
i++;
if( j == searchKey.length() && count == 0 )
{
j = 0;
count++;
}
}
if( j == searchKey.length() )
{
res = true;
break;
}
}
return res;
}
If I interpreted the question right:
abab = true
abcab = false
aa = false
baba = false
aaabcabcaaa = true
etc
bool has_repeated_consecutive_substr(string str) {
int last_length = -1, curr_length = 0;
char curr_start_char, last_start_char;
for (int i = 1; i < str.length(); i++) {
if (str[i]-str[i-1] == 1) {
if (curr_length == 0) curr_start_char = str[i-1];
curr_length++;
}
else {
if (curr_length == last_length &&
curr_start_char == last_start_char &&
curr_length > 1) return true;
last_length = curr_length;
last_start_char = curr_start_char;
curr_length = 0;
}
}
return false;
}
public static String getLongestCRS(String test){
if(test==""){
return "";
}
if(test.length()==1){
return "";
}
String max="";
for(int i=0;i!=test.length();i++){
String currentMax="";
for(int j=test.length()-1;j!=i;j--){
if(test.charAt(i)==test.charAt(j)){
if(test.substring(j, test.length()).indexOf(test.substring(i,j))==0){
currentMax=test.substring(i, j);
break;
}
}
}
if(currentMax.length()>max.length()){
max=currentMax;
}
}
return max;
}
Hey everyone, I think this problem needs a dynamic programming approach.
This is what I coded up in Java,
public boolean isConsecutiveSubStringPresent(String inputStr,String checkStr) {
int checkStrLen = 0;
int inputStrLen = 0;
if (inputStr == null) {
System.out.println("Input String cannot be null, please supply a valid value");
return false;
}
// Checking if the checkStr is null, then we assign the first character
// in that String
if (checkStr == null) {
checkStr = inputStr.substring(0, 1);
inputStr = inputStr.substring(1, inputStr.length());
}
// At this point we know that both of the strings are not null
checkStrLen = checkStr.length();
inputStrLen = inputStr.length();
// If the length of the inputString is less than the length of
// checkString then we return false
if (inputStrLen < checkStrLen) {
return false;
}
// It will return 'true' only if the checkStr is equal to the same
// substring in the inputStr.
if (checkStr.equals(inputStr.substring(0, checkStrLen))) {
return true;
} else {
// We need to divide the inputString
return isConsecutiveSubStringPresent(inputStr.substring(1, inputStrLen),checkStr + inputStr.substring(0, 1))
|| isConsecutiveSubStringPresent(inputStr.substring(1, inputStrLen),inputStr.substring(0, 1))
|| isConsecutiveSubStringPresent(inputStr, null);
}
}
I hope this helps...
Jeez, the indentation looked weird I'll try again here.
public boolean isConsecutiveSubStringPresent(String inputStr,String checkStr) {
int checkStrLen = 0;
int inputStrLen = 0;
if (inputStr == null) {
System.out.println("Input String cannot be null, please supply a valid value");
return false;
}
if (checkStr == null) {
checkStr = inputStr.substring(0, 1);
inputStr = inputStr.substring(1, inputStr.length());
}
checkStrLen = checkStr.length();
inputStrLen = inputStr.length();
if (inputStrLen < checkStrLen) {
return false;
}
if (checkStr.equals(inputStr.substring(0, checkStrLen))) {
return true;
} else {
return isConsecutiveSubStringPresent(inputStr.substring(1, inputStrLen),checkStr + inputStr.substring(0, 1))
|| isConsecutiveSubStringPresent(inputStr.substring(1,inputStrLen),inputStr.substring(0, 1))
|| isConsecutiveSubStringPresent(inputStr, null);
}
}
A usual call to this method can be
isConsecutiveSubStringPresent("adabcabcd", null)
Of course we can have a wrapper around this method which takes in only the input string and we make a call to the recursive method with null.
#include <stdio.h>
#include <string.h>
int removeDuplicate(char *str, int len, int size)
{
char str1[len],str2[len];
int i=0,j=0,k=0;
for(i=0; i<len-size; i++){
strcpy(str1, str+i);
strcpy(str2, str+size+i);
str1[size]= str2[size] = '\0';
if(strcmp(str1, str2) == 0)
return 1;
str1[0] = str2[0] = '\0';
}
return 0;
}
int main()
{
char str[]="abcdabcd";
//char str[]="aabbabc";
int i;
int len = strlen(str)/2;
for(i=0; i<len; i++)
if(removeDuplicate(str, strlen(str)-1, i+1))
break;
if(i<len)
printf("found\n");
else
printf("not found");
return 0;
}
Maybe not the most efficient solution, but seems to work. I am counting single character strings, so "aa" for example or any repeated character would return True.
The idea is to walk through each character of the string. In the second for loop, divide every substring in half and see if the two halves are the same.
For example, for the string "cbaba", once we get to the last character 'a', the second for loop looks at substrings:
"cbaba"
"baba"
"aba"
"ba"
"b"
If you divide "baba" in half you get "ba" and "ba" which are the same and therefore the function returns true. It's a brute force n-squared algo, so not the best..
def main():
inputString = list(raw_input('Input string: '))
subString = checkRepeatedSubstring(inputString)
if subString != None:
print "Found substring: " + "". join(subString)
else:
print "No substring found"
def checkRepeatedSubstring(inputString):
for i in range(len(inputString)):
for j in range(i+1):
l = (i+1)-j # length of substring
m = j+l/2 # middle of substring offset
if inputString[j:m] == inputString[m:(i+1)]:
return inputString[j:(i+1)]
return None
Good job. A brute force algorithm that actually works is certainly better than the submissions above here that don't even work.
But, I think a brute force algorithm for this problem is actually O(N cubed) == O(N^3), including yours, since there are three nested loops iterating over the length of the input, in the worst-case: one loop for start position, one loop for end position, and one loop to check the two sub-strings for equality.
its simple..
#include<stdio.h>
#include<string.h>
#define MAX 100
int consecutive(char *);
void main()
{
char str[MAX];
int x;
scanf("%s", str);
x=consecutive(str);
if(x==1)
printf("\nconsecutive");
else
printf("Non-consecutive");
}
int consecutive(char *s)
{
char *p, *q;
int len=strlen(s);
for(int i=0; i<len-1; i++)
{
p=q=s; q++;
for(int j=1; j<len; j++)
{
if(*p!=*q)
q++;
if(*p==*q)
{
char *s=q; int i=0;
while(*(p+i)==*(s+i))
{
i++;
if((p+i)==q)
return 1;
}
q++;
}
} s++;
}
return 0;
}
// main function
bool mainfunc(char* s)
{
return f(s, 0);
}
// recursive function
bool f(char* s, int cur)
{
int target = (strlen(s)+cur)/2;
for(int i=cur+1; i<target; ++i)
{
if(match(s, cur, i))
return true;
}
if(cur<strlen(s)-1 && f(s, cur+1))
return true;
return false;
}
// helper function to match substring from cur to tar (exclusive)
// with that from tar to tar + (tar-cur)
bool match(char* s, int cur, int tar)
{
for(int i=0;i<tar-cur;++i)
{
if(s[cur+i]!=s[tar+i])
return false;
}
return true;
}
Why does the comment say that function f is recursive? It doesn't seem to be calling itself, either directly or indirectly.
And why does the loop inside function f only go half way through the input string?
It looks almost like the brute force approach, except that it has one and half nested loops instead of three. :-) So I guess that makes it half of a brute force approach.
// this is the C# version to solve the problem
public static bool isSubString(string input, string sub)
{
string firstSubstringChar = sub[0].ToString();
for (int i = 0; i < input.Length; i++)
{
if (firstSubstringChar == input[i].ToString())
{
if(sub == input.Substring(i, sub.Length))
return true;
}
}
return false;
}
if(sub == input.Substring(i, sub.Length)) will throw the error when sub.length gives index out of range for input string.
I am giving my version which will check the repetition of all the strings starting from length-1. If user wants to check for substrings of size 2 or greater then he/she need to change the initial value of int "j" in below code.
public static bool isRepeating(string input)
{
string substring;
for (int j = 1; j <= input.Length / 2; j++)
{
for (int i = 0; i < input.Length - j-1; i++)
{
substring = input.Substring(i, j);
if(input.IndexOf(substring) != input.LastIndexOf(substring))
{
return true;
}
}
}
return false;
}
// C#
// use hashtable to get the occurrences of chars.
// only search repeating strings using the chars having >1 occurrences.
bool FindRepeatingString(string s)
{
if(string.IsEmptyOrNull(s) || s.length < 2)
return false;
HashTable<char, List<int>> ocTable = new HashTable<char, List<int>>();
// construct occurrence hashtable.
for(int i=0;i<s.length;++i)
{
if(!ocTable.ContainKey(s[i]))
{
List<int> ocList = new List<int>();
ocList.Add(i);
ocTable.Add(s[i],ocList);
}
else
{
ocTable[s[i]].Add(i);
}
}
// for each char having >1 occurrences, check whether they are the starting points of repeating strings.
foreach(char c in ocTable.Keys)
{
if(ocTable[c].Count>1)
{
if(FindRecur(s, c, ocTable[c],0) == true)
return true;
}
}
return false;
}
// recursion function to check whether the occurrences in ocList point to the repeating strings.
bool FindRecur(string s, char c, List<int> ocList, int cur)
{
if(cur>ocList.Count-2)
return false;
for(int i=cur+1; i<ocList.Count; ++i)
{
bool match = true;
int st = ocList[cur];
int end = ocList[i];
int n = end-st;
for(int j=0; j<n; ++j)
{
if(s[st+j] != s[end+j])
{
match = false;
break;
}
}
if(match)
return true;
}
return FindRur(s, c, ocList, cur+1);
}
#include "stdafx.h"
void rep(char *str)
{
int i,j,k,count,len;
int match=0;
i=0;
len=strlen(str);
//printf("%d",len);
while(i<len)
{
count=0;
j=i+1;
while(str[j]!=str[i] && j<len)
{
count++;
j++;
}
count++;
while(count>0 && j<len)
{
k=i;
if(str[k++]==str[j++])
count--;
}
if(count==0)
{
match++;
i=j;
}
else
i++;
k=0;
}
printf("total matches = %d",match);
}
int main()
{
char str[]="adabcabc";
rep(str);
getchar();
}
static boolean consecutiveSubstring(String string) {
boolean result = false;
for (int i = 2; i <= string.length() / 2; i++) { // the range of
// substring's
// length is from 2
// to n/2;
for (int j = 0; j <= string.length() - i * 2; j++) {
if (string.substring(j, j + i).equals(
string.substring(j + i, j + i * 2))) {
/*There maybe more satisfied substrings, so we can store them in a list.*/
// subs.add(string.substring(j, j+i));
result = true;
}
}
}
return result;
}
This is a good problem for dynamic programming, which can reduce a brute-force algorithm of O(N^3) down one notch to an algorithm of O(N^2). Look up the Levenshtein algorithm as a related dynamic programming algorithm.
Explanation: We build an N+1 by N matrix and fill in the upper half in this way: the value of an element m[i][j] is the number of characters ending at position i that match the same number of characters ending at position j. In other words, m[i][j] = 1 + m[i-1][j-1] if the ith character matches the jth character, it is 0 otherwise. We only need to fill the upper half of the matrix because of symmetry: the rows and columns both iterate over the same input string. We make an extra row at the top to initialize with zeros which helps us as we start so we have a zero to add our first count onto when iterating over the top row of the matrix.
To know when a match is consecutive, we just compare the count at m[i][j] with it's position in the matrix. The count must be equal to the distance from the diagonal. For example, a single repeating character "a" in the string "baac" is one character long and one character from the diagonal. Hence the check for m[i][j] == j-i+1.
bool findConsecutiveReps(char[] s)
{
int[][] m = new int[s.length + 1][s.length];
// Initialize first row of table
// (This row does not correspond to a character in the string.
for (int j = 0; j < len; j++)
{
m[0][j] = 0;
}
// Row = candidate start position of first substring plus 1.
for (int i = 1; i <= s.length; i++)
{
// Col = candidate start position of second substring.
// Note that when i == j, j is actually the next character after i
// because of the extra row.
for (int j = i; j < s.length; j++)
{
if (s[i-1] == s[j])
{
m[i][j] = 1 + m[i-1][j-1];
if (m[i][j] == j - i + 1)
{
return true;
}
}
else
{
m[i][j] = 0;
}
}
}
return false;
}
main()
{
char s[20];
int i,j,k1,k,count=0,count1=0,c=1;
clrscr();
printf("\n\n\n");
printf("enter the string");
gets(s);
for(i=0;s[i];i++)
{
k=i;
for(j=i+1;s[j];j++)
{
k1=j;
if(s[i]!=s[k1])
{
count++;
}
else
{
count++;
count1=count;
k=i;
while(count1>0)
{
k++;
k1++;
if(s[k]==s[k1])
{
count1--;
if(count1==1 && c==1)
{
printf("true");
c=2;
break;
}
}
else
break;
}
}
}
count=0;
}
if(c==1)
printf("no string");
getch();
return 0;
}
main()
{
char s[20];
int i,j,k1,k,count=0,count1=0,c=1;
clrscr();
printf("\n\n\n");
printf("enter the string");
gets(s);
for(i=0;s[i];i++)
{
k=i;
for(j=i+1;s[j];j++)
{
k1=j;
if(s[i]!=s[k1])
{
count++;
}
else
{
count++;
count1=count;
k=i;
while(count1>0)
{
k++;
k1++;
if(s[k]==s[k1])
{
count1--;
if(count1==1 && c==1)
{
printf("true");
c=2;
break;
}
}
else
break;
}
}
}
count=0;
}
if(c==1)
printf("no string");
getch();
return 0;
}
This Java code works for me. Let me know if you spot an error or have a comment. The program spots any consecutive substrings. Starts with 0th character and checks all the consecutive substrings possible starting here. Then increment one character and repeat the process.
import java.util.Scanner;
public class StringCompare {
public static void main(String[] args){
Scanner myinput = new Scanner( System.in );
String inputstring;
System.out.print("Enter your string(no spaces and no capital letters): ");
inputstring = myinput.next();
for (int i=0;i<inputstring.length()-1;i++)
{
for(int s=1;s<=(inputstring.length()-i)/2;s++)
{
if(inputstring.substring(i,i+s).equals(inputstring.substring(i+s,i+s+s)))
{
System.out.println("Match Found: "+inputstring.substring(i,i+s)+" "+inputstring.substring(i+s,i+s+s));
}
}
}
System.out.print("Finished :-)");
}
}
This seems like brute force search. However, there are only two loops. It seems to me that complexity will be O(N^2), assuming we do constant O(1) work for each if statements. Can someone explain what I am doing wrong here?
I think the if statement is O(N) because in the worst case you have to iterate over N characters to compare each one.
My algorithm:
1) build a map of array indexes for each occurred character.
2) if there exists a array of indexes in map for current character, then check if any substring from indexes in array to current character is equal to substring of same length from current character.
3) if any found equal, return true.
4) return true if all the characters are exhausted.
Following is code for stated algorithm.
import java.util.ArrayList;
import java.util.Arrays;
import java.util.HashMap;
class Interview
{
private HashMap<Character, ArrayList<Integer>> hs;
public Interview()
{
this.hs = new HashMap<Character, ArrayList<Integer>>();
}
//required function
public boolean cons_substring(String s)
{
if(s == null || s.length() < 2)
{
return false;
}
Integer[] check;
char curr;
String temp;
for(int index=0; index < s.length();index++)
{
curr = s.charAt(index);
check = check_prev(curr, index);
for(int j= 0; j<check.length; j++) // for all previous indexes
{
int i = check[j];
if(index-i <= s.length()-index) // enough characters to left after current index
{
temp = s.substring(i, index);
if(temp.equals(s.substring(index, index+(index-i))))
{
return true;
}
}
}
}
return false;
}
/*
* return indexes of previous occuring character "curr"
* */
private Integer[] check_prev(char curr, int index) {
Object[] temp_arr;
Integer[] to_return = new Integer[]{};
ArrayList<Integer> temp;
if(hs.containsKey(curr))
{
temp = hs.get(curr);
temp_arr = temp.toArray();
to_return = Arrays.copyOf(temp_arr, temp_arr.length, Integer[].class);
temp.add(index);
hs.put(curr, temp);
}
else
{
temp = new ArrayList<Integer>();
temp.add(index);
hs.put(curr, temp);
}
return to_return;
}
}
See this Java Code
public class SubString{
public static void main(String args[]){
if(sub("abc","adabcabcd"))
System.out.println("True");
else
System.out.println("False");
}
public static boolean sub(String sub,String full){
int j=0;
boolean flag=false;
for(int i=0;i<full.length();){
flag=false;
for(j=0;j<sub.length();j++){
if(full.charAt(i)==sub.charAt(j)){
i++;
flag=true;
}
else{
if(!flag)
i++;
break;
}}
System.out.println(j);
if(j==sub.length())
return true;
}
return false;
}}
public class findConsSubString{
- coderFromGothamCity August 19, 2012public static boolean find(String s){
int ptr1;
int ptr2;
int i = 1;
int count;
if(s.length() < 2)
return false;
while(i <= s.length() / 2){
ptr1 = 0;
ptr2 = i;
count = 0;
while(ptr2 < s.length()){
if(s.charAt(ptr1) == s.charAt(ptr2))
count++;
else
count = 0;
if(count == i){
return true;
}
ptr1++;
ptr2++;
}
i++;
}
return false;
}
public static void main(String[] args){
boolean found = find(args[0]);
System.out.println("Found Cons SubStrings:" + found);
}
}