Write a program to remove frag

Amazon Interview Question for Software Engineer / Developers

0

of 0 votes

30
Answers
Write a program to remove fragment that occur in all strings,where a fragment
is 3 or more consecutive word.

example :

input::
s1 = "It is raining and I want to drive home.";
s2 = "It is raining and I want to go skiing.";
s3 = "It is hot and I want to go swimming.";

output::

s1 = "It is raining drive home.";
s2 = "It is raining go skiing.";
s3 = "It is hot go swimming.";

removed fragment = "and i want to"

i was told that this program will be tested again large files( in example above string are given as a sample)
and then we have to remove fragment from files.so efficiency will be taken into consideration.

Assumptions:Ignore capitalization ,punctuation. but preserve in output.

I told him that i will map the fragment from first string and then search the others, and we actually need to consider 3 length ones only as a 4 length one will implicitly contain two 3 word phrases and the phrases can overlap.

i was stuck when he gave me this kind of example
a a a a a b c b c b c b c

if we are removing fragment "a b c" ,then after removing middle one we got another "a b c" fragment and so on...
- Rahul January 02, 2012 in India | Report Duplicate | Flag | PURGE
Amazon Software Engineer / Developer Algorithm Application / UI Design

Email me when people comment.

An error occurred in subscribing you.

Country: India
Interview Type: In-Person

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 1 vote

Is the fragment to be removed given or is to be programmatically identified?

Also, the question doesn't say anything about nested fragments because only one of it would be consecutive in the original string as per the description?

- topgun.in January 02, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

fragment is not given. it can be any 3 consecutive words. which is there in the all files.

- Rahul January 02, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

i guess we might need to use the longest common substring algorithm to extract the common fragments between multiple strings. doing it for multiple files might hurt the efficiency though because it has quadratic runtime..

couple of doubts though:
1. Will the fragments be of size 3 or 4 everytime? or can they be longer?

2. i dont think i understood this example 'a a a a a b c b c b c b c' very well. is this just one string in one file or does it mean that we have abt strlen("a a a a a b c b c b c b c") files each having 1 character in them? Kindly explain this input again.

- Jester January 03, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

ans to yr queries :

1: fragment is a 3 or more consecutive words. means it can be 4 5 6 7 .. any consecutive words which occurred in all files.

2. file name and number of file will be given at run time.
and each file have many word (lots of text written on them)...

3. In example : lets file number 5 has this input "a a a a a b c b c b c b c" here a, b, c are words (just wrote a b c for simplicity)

before coming to this file lets assume we already know that fragment "a b c" is available in all file.

and now we are trying to remove it from this file. so when we remove one fragment new fragment will be done by remaining word so we have to remove that too.

- Rahul January 03, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 1 vote

"a a a a a b c b c b c b c" - this issue possible to solve by stack
1. put current char to the stack
2. if stack size > 2 and if [top - 2, top - 1, top] forms valid sequence, just remove them from the stack
3. repeate from 1

- V.Krestyannikov January 05, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Add all the consequitive 3 word strings of the first line into a hash table. Calcluate the hash code for consequitive words of remaining lines for a match.
(hash function could be something like add the ascii chars and modulus on hash table size)

If u r reaching end of file, it means a match is found. In second pass remove the strings and look for a repeated match.

- CSPrasad January 02, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

then four the last case which i wrote "a a a a a b c b c b c b c'

we have to read file four times. as four "a b c" will be removed .

i told this one but he was not expecting it.

- Rahul January 02, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Once you have determined the pattern to be eliminated in first pass, In the second pass, you will read the file, 1 line at a time into memory. You will recalc the hash codes for repeated patterns in this line and compare with hash code of target string and delete those patterns as well (like abc pattern in your case).

There are no multiple reads from the file, everything will happen in-memory.

Only the case where passes will increase is when you have multiple patterns matching all the lines.

Am I correct?

- CSPrasad January 02, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

i guess u r right . but have to modify as fragment should be in same file.

steps should be .
1 . read first file and insert fragment in hashtable using some hashfunction
2 . now read all files at a time and maintain which fragment occured in all files.
3. now read all files incuding first one and remove the fragment occuring in all files.

- Rahul January 02, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

while processing a line, after finding the pattern just go a step back and start from previour character.. that does it for one line in O(N)..

- raj January 02, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

hey CSPrasad,

You said "all the consequitive 3 word strings", what about repeating fragments with 3+ words ?

So i think it should be "all the consequitive 3+ word strings of the first line into a hash table"

correct me if i misunderstood your solution :)

- wakeup January 02, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

that's the crux of the problem. it say's that we have to consider 3 or more word . But as you can see if there is a 4 word fragment then it's should be a part of two 3 word fragment.

ex: if "a b c d " fragment is occurred in all the files then we will solve this as two fragment "a b c" and "b c d" .

- Dad January 02, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

@CSPrasad
Repeating the search from the previous character won't work.
What if there was something like

aababcc

after removing abc you are left with

aabc

and according to your scheme of starting with the previous character, we will start with b and never detect the remaining abc.

Starting with previous 2 characters might just work out.

- Sushant January 05, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

I never said to start from the previous characters. As you never know how patterns are formed, You should start looking from the beginning of line once you delete a pattern,

- CSPrasad January 05, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

When we match for fragments, Once we find a match for 3 consecutive words in any given line based on the hash value and comparison, We can proceed and try to match more words and keep track of the shortest fragment larger 3 word fragment that was identified earlier among the ones seen so far.
for instance if the lines are like:
1: a b c d e f g
2: a b c d k p f
3: a b c d e f l
In the above strings smallest fragment larger than 3 word fragment is a b c d.
Will this change support fragments of 3+ words ?

- Harish January 07, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

What is the expected output of your last sequence

- Anonymous January 02, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

answer would be 'a' as it will remove the other four "a b c"

- Rahul January 02, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

Can we create a suffix tree of all three strings into one to find out the common fragment.
Then we can remove that common fragment from all three strings ?

Will it work?

- Anonymous January 02, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

I think we have to build the n tries if n strings are given and each trie contains the words contain by the respective strings and take any one string and start searching in the other tries i think it will work but the spce complexity will be very high.,

- geeks January 05, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

to remove the fragments we need to first find the fragment which could be done by first going through the array of string and then in the second pass remove that fragment from every string.

pseudo code for first pass using a trie T and a global string F, which stores the fragment(assuming only one in all string in input array of string A).

step-1 make trie T using the first string in the given array of strings A.

step-2 initialize Global string of common words F by finding the fragment in the second string s2 in the array of strings A.The fragments could be found out by lookup in the trie which would be faster.
F = find_fragment(T, s2)
The sample code be something as follows:

string
found_fragment(T, S)
{
   int i, prevLen=0;
   string prev=null, f;
   
   for each word w in S
   {   
      if(lookup(T, w)) //retruns true if found in Trie
      {
         if(prev=null)
         start=i++;
      }
    
      else
      {  //found a fragment
         end=i++;
         prev = null;
         len = end -start;   
         if(len > 3) //chk that the fragement is a proper fragment
         {
            f = S.substring(start, len); //assuming only one fragment in a given string
            break;
         }
      }  
   }
   
   return f;
}

From the given ex it should assign F = "It is raining and I want to"

step-3 Now foreach string in A(from 3rd onwards) find fragments f' as in step 2
and check against the global string F so that F could be subset(in the sense that the smaller string would be in the larger string with order maintained) of f' or if f' is subset of F then do f' = F to maintain min len of fragment.
from the ex- f' = "I want to" but since len of f' is smaller than F so we assign
F=f' ie now F="I want to"

in this way at the end of step 3 we have the common string in F.
This method would have low memory footprint since we are creating a trie only using the first string and would be faster as we find the fragments in each string using a lookup in trie.

- raja roy January 09, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Use a Suffix Tree:

http: //en.wikipedia.org/wiki/Suffix_tree

There is a problem called "Longest common substring" which runs in O(|n| + |m|) (n = size of first string, m = size of 2nd). This is a special case of that problem = you have to specialize it so that length is at least 3 words and you have to run it for 3 strings

- PixelPerfect3 January 13, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Generalised suffix tree:

en.wikipedia.org/wiki/Generalised_suffix_tree

- PixelPerfect3 January 13, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

/******************************************************************************

Welcome to GDB Online.
GDB online is an online compiler and debugger tool for C, C++, Python, Java, PHP, Ruby, Perl,
C#, VB, Swift, Pascal, Fortran, Haskell, Objective-C, Assembly, HTML, CSS, JS, SQLite, Prolog.
Code, Compile, Run and Debug online from anywhere in world.

*******************************************************************************/

public class Main
{
public static String substring(String input , int start,int end)
{

StringBuilder sb = new StringBuilder();
for(int i=start;i<end;i++)
{
sb.append(input.charAt(i));

}

return sb.toString();

}

public static String [] split(String s)
{
int c=0;

for(int i=0;i<s.length();i++)
{
if(s.charAt(i)==' ')
c++;

}

// System.out.println("Count : "+c);

String result[] = new String[c+1];

int j=0;
for(int i=0;i<s.length();i++)
{
if(s.charAt(i)==' ')
{
result[j++] = substring(s,0,i);
s = substring(s,i+1,s.length());

// result[j++] = s.substring(0,i);
// s = s.substring(i+1,s.length());
i=0;
}

}
result[j++] = s;

// for(String w : result)
// {
// System.out.println(w);
// }

return result;
}
public static void main(String[] args) {
String s1 = "Every morning I want to do exercise regularly";
String s2 = "Every morning I want to do meditation without fail";
String s3 = "It is important that I want to be happy always";

String[] words1=split(s1);
String[] words2=split(s2);
String[] words3=split(s3);
StringBuilder sb = new StringBuilder();
for(int i=0;i<words1.length;i++){
for(int j=0;j<words2.length;j++){
for(int k=0;k<words3.length;k++){
if(words1[i].equals(words2[j]) && words2[j].equals(words3[k])
&&words3[k].equals(words1[i])){
//Concatenating the returned Strings
sb.append(words1[i]+" ");
}
}
}
}

System.out.println(sb.toString());

System.out.println(s1.replaceAll(sb.toString(), ""));
System.out.println(s2.replaceAll(sb.toString(), ""));
System.out.println(s3.replaceAll(sb.toString(), ""));

}
}

- Manojkumar K October 29, 2021 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

- Manojkumar K October 29, 2021 | Flag Reply

Comment hidden because of low score. Click to expand.

-1

of 1 vote

class Program
    {
        static void Main(string[] args)
        {
            Console.WriteLine(RemoveStringFragment("aaaaabcbcbcbc", "abc"));
            Console.ReadLine();
        }

        public static string RemoveStringFragment(string actual, string fragment)
        {
            string temp = actual;
            while(true){
                temp=temp.Replace(fragment, "");
                if (!temp.Contains(fragment))
                    return temp;
            }

        }
    }

- Another solution in C#. January 03, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

read the question carefully. fragment is not given it's any consecutive 3 or more words which is present in the all files

- Dad January 05, 2012 | Flag

Comment hidden because of low score. Click to expand.

-2

of 2 vote

I think this is one function can be used..

# include <conio.h>
# include <stdio.h>
#include <string.h>

int main ()
{
    char s1[]="sfufuckckanfuckjayfufuckck kumarffuckuck fuck", s2[]="fuck";
    char *s3;  int i,len;
    
    s3=strstr(s1,s2);
    while (1)
    {
    if(s3==NULL)
    {
                    break;
    }                
    strncpy(s3,s3+strlen(s2),strlen(s3+strlen(s2)));
    len=strlen(s1)-strlen(s2);
    for(i=strlen(s1);i>=len;i--)
          s1[i]='\0';
    s3=strstr(s1,s2);
    }
    printf("%s",s1);
    getch();
    return 0;
}

- Sanjay Kumar January 02, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Please don't mind the example

- Sanjay Kumar January 02, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

would u mind writing in English what u wrote above

- Dad January 02, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Fine this is a program which replace all the fragment from a string.
so this can be used as function also which accept the original string
and fragment string and return the string without the fragment .
Rest you can find if you execute the code.

- Sanjay Kumar January 02, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

pretty useless answer.

- Anonymous January 05, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

What you expect..? Spoon feeding ..
Apply your mind to use the concept.. I need only 5 min to change this code to desired program...

- Sanjay Kumar January 05, 2012 | Flag

CareerCup

Amazon Interview Question for Software Engineer / Developers

Books

Videos

Resume Review

Mock Interviews