Amazon Interview Question
Software Engineer / DevelopersCountry: India
Interview Type: In-Person
fragment is not given. it can be any 3 consecutive words. which is there in the all files.
i guess we might need to use the longest common substring algorithm to extract the common fragments between multiple strings. doing it for multiple files might hurt the efficiency though because it has quadratic runtime..
couple of doubts though:
1. Will the fragments be of size 3 or 4 everytime? or can they be longer?
2. i dont think i understood this example 'a a a a a b c b c b c b c' very well. is this just one string in one file or does it mean that we have abt strlen("a a a a a b c b c b c b c") files each having 1 character in them? Kindly explain this input again.
ans to yr queries :
1: fragment is a 3 or more consecutive words. means it can be 4 5 6 7 .. any consecutive words which occurred in all files.
2. file name and number of file will be given at run time.
and each file have many word (lots of text written on them)...
3. In example : lets file number 5 has this input "a a a a a b c b c b c b c" here a, b, c are words (just wrote a b c for simplicity)
before coming to this file lets assume we already know that fragment "a b c" is available in all file.
and now we are trying to remove it from this file. so when we remove one fragment new fragment will be done by remaining word so we have to remove that too.
Add all the consequitive 3 word strings of the first line into a hash table. Calcluate the hash code for consequitive words of remaining lines for a match.
(hash function could be something like add the ascii chars and modulus on hash table size)
If u r reaching end of file, it means a match is found. In second pass remove the strings and look for a repeated match.
then four the last case which i wrote "a a a a a b c b c b c b c'
we have to read file four times. as four "a b c" will be removed .
i told this one but he was not expecting it.
Once you have determined the pattern to be eliminated in first pass, In the second pass, you will read the file, 1 line at a time into memory. You will recalc the hash codes for repeated patterns in this line and compare with hash code of target string and delete those patterns as well (like abc pattern in your case).
There are no multiple reads from the file, everything will happen in-memory.
Only the case where passes will increase is when you have multiple patterns matching all the lines.
Am I correct?
i guess u r right . but have to modify as fragment should be in same file.
steps should be .
1 . read first file and insert fragment in hashtable using some hashfunction
2 . now read all files at a time and maintain which fragment occured in all files.
3. now read all files incuding first one and remove the fragment occuring in all files.
while processing a line, after finding the pattern just go a step back and start from previour character.. that does it for one line in O(N)..
hey CSPrasad,
You said "all the consequitive 3 word strings", what about repeating fragments with 3+ words ?
So i think it should be "all the consequitive 3+ word strings of the first line into a hash table"
correct me if i misunderstood your solution :)
that's the crux of the problem. it say's that we have to consider 3 or more word . But as you can see if there is a 4 word fragment then it's should be a part of two 3 word fragment.
ex: if "a b c d " fragment is occurred in all the files then we will solve this as two fragment "a b c" and "b c d" .
@CSPrasad
Repeating the search from the previous character won't work.
What if there was something like
aababcc
after removing abc you are left with
aabc
and according to your scheme of starting with the previous character, we will start with b and never detect the remaining abc.
Starting with previous 2 characters might just work out.
I never said to start from the previous characters. As you never know how patterns are formed, You should start looking from the beginning of line once you delete a pattern,
When we match for fragments, Once we find a match for 3 consecutive words in any given line based on the hash value and comparison, We can proceed and try to match more words and keep track of the shortest fragment larger 3 word fragment that was identified earlier among the ones seen so far.
for instance if the lines are like:
1: a b c d e f g
2: a b c d k p f
3: a b c d e f l
In the above strings smallest fragment larger than 3 word fragment is a b c d.
Will this change support fragments of 3+ words ?
Can we create a suffix tree of all three strings into one to find out the common fragment.
Then we can remove that common fragment from all three strings ?
Will it work?
to remove the fragments we need to first find the fragment which could be done by first going through the array of string and then in the second pass remove that fragment from every string.
pseudo code for first pass using a trie T and a global string F, which stores the fragment(assuming only one in all string in input array of string A).
step-1 make trie T using the first string in the given array of strings A.
step-2 initialize Global string of common words F by finding the fragment in the second string s2 in the array of strings A.The fragments could be found out by lookup in the trie which would be faster.
F = find_fragment(T, s2)
The sample code be something as follows:
string
found_fragment(T, S)
{
int i, prevLen=0;
string prev=null, f;
for each word w in S
{
if(lookup(T, w)) //retruns true if found in Trie
{
if(prev=null)
start=i++;
}
else
{ //found a fragment
end=i++;
prev = null;
len = end -start;
if(len > 3) //chk that the fragement is a proper fragment
{
f = S.substring(start, len); //assuming only one fragment in a given string
break;
}
}
}
return f;
}
From the given ex it should assign F = "It is raining and I want to"
step-3 Now foreach string in A(from 3rd onwards) find fragments f' as in step 2
and check against the global string F so that F could be subset(in the sense that the smaller string would be in the larger string with order maintained) of f' or if f' is subset of F then do f' = F to maintain min len of fragment.
from the ex- f' = "I want to" but since len of f' is smaller than F so we assign
F=f' ie now F="I want to"
in this way at the end of step 3 we have the common string in F.
This method would have low memory footprint since we are creating a trie only using the first string and would be faster as we find the fragments in each string using a lookup in trie.
Use a Suffix Tree:
http: //en.wikipedia.org/wiki/Suffix_tree
There is a problem called "Longest common substring" which runs in O(|n| + |m|) (n = size of first string, m = size of 2nd). This is a special case of that problem = you have to specialize it so that length is at least 3 words and you have to run it for 3 strings
/******************************************************************************
Welcome to GDB Online.
GDB online is an online compiler and debugger tool for C, C++, Python, Java, PHP, Ruby, Perl,
C#, VB, Swift, Pascal, Fortran, Haskell, Objective-C, Assembly, HTML, CSS, JS, SQLite, Prolog.
Code, Compile, Run and Debug online from anywhere in world.
*******************************************************************************/
public class Main
{
public static String substring(String input , int start,int end)
{
StringBuilder sb = new StringBuilder();
for(int i=start;i<end;i++)
{
sb.append(input.charAt(i));
}
return sb.toString();
}
public static String [] split(String s)
{
int c=0;
for(int i=0;i<s.length();i++)
{
if(s.charAt(i)==' ')
c++;
}
// System.out.println("Count : "+c);
String result[] = new String[c+1];
int j=0;
for(int i=0;i<s.length();i++)
{
if(s.charAt(i)==' ')
{
result[j++] = substring(s,0,i);
s = substring(s,i+1,s.length());
// result[j++] = s.substring(0,i);
// s = s.substring(i+1,s.length());
i=0;
}
}
result[j++] = s;
// for(String w : result)
// {
// System.out.println(w);
// }
return result;
}
public static void main(String[] args) {
String s1 = "Every morning I want to do exercise regularly";
String s2 = "Every morning I want to do meditation without fail";
String s3 = "It is important that I want to be happy always";
String[] words1=split(s1);
String[] words2=split(s2);
String[] words3=split(s3);
StringBuilder sb = new StringBuilder();
for(int i=0;i<words1.length;i++){
for(int j=0;j<words2.length;j++){
for(int k=0;k<words3.length;k++){
if(words1[i].equals(words2[j]) && words2[j].equals(words3[k])
&&words3[k].equals(words1[i])){
//Concatenating the returned Strings
sb.append(words1[i]+" ");
}
}
}
}
System.out.println(sb.toString());
System.out.println(s1.replaceAll(sb.toString(), ""));
System.out.println(s2.replaceAll(sb.toString(), ""));
System.out.println(s3.replaceAll(sb.toString(), ""));
}
}
/******************************************************************************
Welcome to GDB Online.
GDB online is an online compiler and debugger tool for C, C++, Python, Java, PHP, Ruby, Perl,
C#, VB, Swift, Pascal, Fortran, Haskell, Objective-C, Assembly, HTML, CSS, JS, SQLite, Prolog.
Code, Compile, Run and Debug online from anywhere in world.
*******************************************************************************/
public class Main
{
public static String substring(String input , int start,int end)
{
StringBuilder sb = new StringBuilder();
for(int i=start;i<end;i++)
{
sb.append(input.charAt(i));
}
return sb.toString();
}
public static String [] split(String s)
{
int c=0;
for(int i=0;i<s.length();i++)
{
if(s.charAt(i)==' ')
c++;
}
// System.out.println("Count : "+c);
String result[] = new String[c+1];
int j=0;
for(int i=0;i<s.length();i++)
{
if(s.charAt(i)==' ')
{
result[j++] = substring(s,0,i);
s = substring(s,i+1,s.length());
// result[j++] = s.substring(0,i);
// s = s.substring(i+1,s.length());
i=0;
}
}
result[j++] = s;
// for(String w : result)
// {
// System.out.println(w);
// }
return result;
}
public static void main(String[] args) {
String s1 = "Every morning I want to do exercise regularly";
String s2 = "Every morning I want to do meditation without fail";
String s3 = "It is important that I want to be happy always";
String[] words1=split(s1);
String[] words2=split(s2);
String[] words3=split(s3);
StringBuilder sb = new StringBuilder();
for(int i=0;i<words1.length;i++){
for(int j=0;j<words2.length;j++){
for(int k=0;k<words3.length;k++){
if(words1[i].equals(words2[j]) && words2[j].equals(words3[k])
&&words3[k].equals(words1[i])){
//Concatenating the returned Strings
sb.append(words1[i]+" ");
}
}
}
}
System.out.println(sb.toString());
System.out.println(s1.replaceAll(sb.toString(), ""));
System.out.println(s2.replaceAll(sb.toString(), ""));
System.out.println(s3.replaceAll(sb.toString(), ""));
}
}
class Program
{
static void Main(string[] args)
{
Console.WriteLine(RemoveStringFragment("aaaaabcbcbcbc", "abc"));
Console.ReadLine();
}
public static string RemoveStringFragment(string actual, string fragment)
{
string temp = actual;
while(true){
temp=temp.Replace(fragment, "");
if (!temp.Contains(fragment))
return temp;
}
}
}
I think this is one function can be used..
# include <conio.h>
# include <stdio.h>
#include <string.h>
int main ()
{
char s1[]="sfufuckckanfuckjayfufuckck kumarffuckuck fuck", s2[]="fuck";
char *s3; int i,len;
s3=strstr(s1,s2);
while (1)
{
if(s3==NULL)
{
break;
}
strncpy(s3,s3+strlen(s2),strlen(s3+strlen(s2)));
len=strlen(s1)-strlen(s2);
for(i=strlen(s1);i>=len;i--)
s1[i]='\0';
s3=strstr(s1,s2);
}
printf("%s",s1);
getch();
return 0;
}
Fine this is a program which replace all the fragment from a string.
so this can be used as function also which accept the original string
and fragment string and return the string without the fragment .
Rest you can find if you execute the code.
Is the fragment to be removed given or is to be programmatically identified?
- topgun.in January 02, 2012Also, the question doesn't say anything about nested fragments because only one of it would be consecutive in the original string as per the description?