Given a paragraph of text, wri

Amazon Interview Question for Software Engineer / Developers

0

of 0 votes

42
Answers
Given a paragraph of text, write a program to find the first shortest sub-segment that contains each of the given k words at least once. A segment is said to be shorter than other if it contains less number of words.

Ignore characters other than [a-z][A-Z] in the text. Comparison between the strings should be case-insensitive.

If no sub-segment is found then the program should output “NO SUBSEGMENT FOUND”.

Input format :

First line of the input contains the text.
Next line contains k , the number of words given to be searched.
Each of the next k lines contains a word.

Output format :

Print first shortest sub-segment that contains given k words , ignore special characters, numbers.If no sub-segment is found it should return “NO SUBSEGMENT FOUND”

Sample Input :

This is a test. This is a programming test. This is a programming test in any language.
4
this
a
test
programming

Sample Output :

a programming test This

Explanation :
In this test case segment "a programming test. This" contains given four words. You have to print without special characters, numbers so output is "a programming test This". Another segment "This is a programming test." also contains given four words but have more number of words.

Constraint :

Total number of character in a paragraph will not be more than 200,000.
0 < k <= no. of words in paragraph.
0 < Each word length < 15
- manishs747 June 24, 2012 in India | Report Duplicate | Flag | PURGE
Amazon Software Engineer / Developer Algorithm

Email me when people comment.

An error occurred in subscribing you.

Country: India

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 2 vote

A non DP solution.

Omit all the punctuation.

This is a test This is a programming test This is a programming test in any language

for(int i=0; i< TotalWordsInSentence; i++)

{

for(int j=i; j<TotalWordsInSentence; j++)

{

if(WordInSentence[j] is givenWord)

{

MarkFound(givenWOrd)

}

if(AllGivenWOrdFound())
{
Length = j - i;
start = i
end = j
}

if(Length < finalLen)
{
finalLen = Length
finalStart = start
finalEnd = end
}
}

}

print finalStart, finalEnd, finalLen

O(N^2) Complexity. Where N is Number of words in sentence.

- nerd June 24, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

This is the naive solution. Input sizes have been chosen to ensure that O(N^2) solutions time out. There's a linear time solution, as well as an O(NlogN) solution.

- eugene.yarovoi June 25, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Do we need to consider consecutive spaces in the result or what?
for the test case.
A B A A A B A
2
A
B
what is the answer?

- Anonymous June 26, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

1. Make a TRIE of all the words to be searched.
2. Now, find the first segment which contains all the words.Mark its length
3. Take two pointers, one pointing to the start of the above segment & other pointing to the end.
4. Increment the end pointer to point to the next word. Check whether both start & end pointer point to the same word. If yes, increment the start pointer to point to next word.[We will also perform the same while findin the first segment]
Now, remove all the extra words from the starting that are not in the TRIE.
5. Proceed above steps until EOF is not encountered, meanwhile also keep updating the minlength if length of the newly found segment is less.

If there are any issues, let me know.

- Aashish June 24, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Sure, there are issues. So in the end of step 4), you stop incrementing the start pointer once you hit something that's in the trie? The problem is that what if it's a searchword, but there are still occurrences of the searchword after the occurrence that you found? For example, consider x x a x a x b x b x c x a b x where x are not searchwords and a, b, and c are. You find a valid subarray at backpointer = 2, frontpointer = 10. Now you scan forward and you see a at index 12.

What would your algo do here? It would scan until it finds the b at index 6, right? But it should keep scanning until it finds the b at index 8. That's what I mean when I say there can still be occurrences of a searchword after the one you found.

- eugene.yarovoi June 25, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

I have updated my post. Please read it again.

- Aashish June 25, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

I don't understand what [We will also perform the same while findin the first segment] means.

Still seems prone to the same objection. BTW, in the example I gave, I meant c to be a searchword as well. Only x's are not searchwords.

I'm still seeing no sign of something I think is probably necessary to solve this problem - some sort of storage indicating where the most recent occurrence of a word was seen.

- eugene.yarovoi June 26, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

x x a x a x b x b x c x a b x
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
lets say we need to search for segment containing a,b,&c.
1.start pointer is at 0, last pointer is at 0.
2.start pointer is at 2, last pointer is at 2.
3.start pointer remains at 2, last pointer is at 4.
4.Both words[start & last pointer ] are same, update start pointer to 4[since word at index 3 has no purpose].
5.start pointer is at 4, last pointer is at 12.
6.Both words are same, update start pointer to 8, last pointer remains at 12.
7.start pointer is at 8, last pointer is at 13.
8. Found both words equal,update start pointer to 10,last pointer being at 13[this is the smallest segment.]
9. start pointer is at 10,last pointer is at 14.
10. The file is finished. So, the smallest segment is from 10 to 13.

I don't know what issues are you finding in this approach.
To me, its working perfectly.

- Aashish June 26, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Should be 6 instead of 8 in step 6 I think, but actually this would still work in that case. I still think it won't work on some cases, but for me to say what those cases are, I'd like to know whether you agree that in step 6 it should be 6 instead of 8. If not, how does the algo get 8?

- Anonymous June 28, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

i used the same code.
it's giving right answer for every test case i'm trying
but it fails 4 cases on the site.
Any ideas?

- nishant June 29, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

I dont care about Amazon
and I didn't cheat
and most of all "It's NONE OF YOUR BUSINESS"

- nishant June 29, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

Dude you are not supposed to paste questions, as contest is still going on.
Please remove this post asap

- rishikantku June 24, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

@rishikantku and eugene
Sorry if I am a noob but what/which contest are we talking about here?

- Omkar June 25, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Amazon India Programming contest on InterviewStreet[dot]com

- Nitin Gupta June 26, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

@manishs747
If this was an interview question then what kind of interview: phone or in-person? And was this question for fresh college grads?

- Omkar June 25, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

I solved it partially (6/10). and getting Segmentation fault (core dumped) in 1 case. and wrong answer in 3 case.

Can you suggest me want may test case I should consider?

- Niraj June 28, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

I think I can say this, since it's not specific to any problem: If you had a segfault, something's wrong and it's not just a wrong algorithm. Find what that something is. Whatever's wrong could explain the Wrong Answer as well.

- eugene.yarovoi June 28, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

hey can anyone tell algorithm ...
how it can be soved using dp.

- newhere June 28, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

here is my implementation in java

import java.util.*;
import java.io.*;
public class Solution
{
    public static void main(String[] args)
    {
        Map<String, Integer> needstofind = new TreeMap<String, Integer>();
        Map<String, Integer> hasfound = new TreeMap<String, Integer>();
        String final_result="";
        String result="";
        int begin=0;
        int end=0;
        String words[];
        String text_words[];
        String final_words[];
        String str_final;
        int begin_tmp=0;
        int end_tmp=0;
        int flag=0;
        int z=0;
        int len;
        int txt_len;
        int i;
        int count=0;
        int needstofind_size;
        int min=0;
        int final_length=0;
        Scanner in = new Scanner(System.in);
        String string = in.nextLine();
        string = string.replaceAll("[^a-zA-Z| |]", " ");
        string = string.replaceAll("  ", " ");
        str_final=string;
        string=string.toLowerCase();
        z = in.nextInt();
        words = new String[z+1];
        in.nextLine();
        for(i=0; i<z; i++)
        {
            words[i] = in.nextLine();
            needstofind.put(words[i],1);
        }
        text_words = string.split(" ");
        final_words=str_final.split(" ");
        len=hasfound.size();
        txt_len=text_words.length;
        i=0;
        needstofind_size=needstofind.size();
        begin=0;
        count=0;
        for(end=0; end<txt_len; end++)
        {
            if(needstofind.containsKey(text_words[end])==false)continue;
            hasfound.put(text_words[end],hasfound.get(text_words[end])==null?1:hasfound.get(text_words[end])+1);
            if(hasfound.get(text_words[end])<=needstofind.get(text_words[end]) )count++;
            if(count==needstofind_size)
            {
                flag++;
                while(needstofind.get(text_words[begin])==null || hasfound.get(text_words[begin])>1)
                {
                    if(needstofind.containsKey(text_words[begin]))
                    {
                        if(hasfound.get(text_words[begin])>1)
                            hasfound.put(text_words[begin],hasfound.get(text_words[begin])-1);
                    }
                    begin++;

                }
                if(flag>1 && end-begin+1>min)continue;
                result="";
                min=end-begin+1;
                for(int j=begin; j<=end; j++)
                    result=result+final_words[j]+" ";
                if(flag==1)
                {
                    final_result=result;
                    final_length=min;
                }
                else if(min<final_length && flag>1)
                {
                    final_result=result;
                    final_length=min;
                }
            }
        }
        if(flag>0)System.out.println(""+final_result);
        else System.out.println("NO SUBSEGMENT FOUND");
    }
}

- monu July 05, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Duhh.. too bad this does not even pass the sample testcases...

- Shark July 14, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

A solution using hashmap and building a dp tree
This is a test This is a programming test This is a programming test in any language.
Explanation with ex:
Note the occurrence index in hashmap

This - 0,4,9
a - 2,6,11
test - 3,8,13
programming - 7,12

Start with the word with the least instance and move towards one greater instance.
Move to the next word and associate a instance
i)within the range
ii) if none within range associate closest greater than and closest lesser than it
for 7 in 'programming' to 'test'
7 - 3,7 and 7,8
for 12 in 'programming' to 'test'
12- 8,12 and 12,13
now add 'a' instance
3,7 -> 3,6,7 [ 6 is within range ]
7,8 ->6,7,8 and 7,8,11
8,12 -> 8,11,12 [ 11 is within range]
12,13 ->11,12,13
now add 'this' instance
3,6,7 ->3,4,6,7
6,7,8 ->6,7,8,9 and 4,6,7,8
7,8,11 -> 7,8,9,11
8,11,12 ->8,9,11,12
11,12,13 ->9,11,12,13
Now the leaf nodes are all the smaller segments containing the given words , the smallest of them[ smallest range] is the answer
3,4,6,7 - range 4
6,7,8,9 - range 3
4,6,7,8 - range 4
7,8,9,11 - range 4
8,9,11,12 - range 4
9,11,12,13 - range 4

Ans is 6,7,8,9

- Anonymous August 07, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Use divider and conquer.
Collect all the words in the input string into a list inlist[left:right].
Start at the middle of the input list.

{{
subsegment(inlist[left:right], words)
from the mid move in steps of 1 to the left and right and check if the word is the search list. if it is there mark the positions
at the end of this step you should have mid segment which contains all the words.
If some words were not found return []
ls=subsegment(inlist[left:mid], words)
rs=subsegment(inlist[mid+1:right], words)
Now it is possible that all ls,ms,rs can contain the words or some of them or none of them.
return the subsegment with the min length
}}

I have written the code and it is working in Python. If you have any queries contact me

- Rajesh November 25, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

So I have some doubts about this solution: how exactly will we discover minimum subsegments that pass through the center? It seems we're assuming that any minimal subsegment that passes through the middle of the array will have an equal number of elements on both sides, and hence we will be able to discover it efficiently. There's no basis for such an assumption, if in fact you make it. I might just be misunderstanding you though.

- eugene.yarovoi December 30, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

can someone provide good test cases . I'm getting only 6 cleared but i need test cases

- siva February 15, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

#include<stdio.h>
#include<string.h>
#include<malloc.h>

bool isequals( char * a, char * b)
{
int sz = sizeof(a)/sizeof(char);
if ((sizeof(b)/sizeof(char)) != sz)
return false;

if (strcmp(tolower(a),tolower(b))!=0)
return false;
return true;
}
int main()
{
char str[]="This is a test. This is a programming test. This is a programming test in any language";

int test;
char **input;
scanf("%d",&test);
input=(char **)malloc(sizeof(char *)*test);

for(int i=0;i<test;i++)
{scanf("%s",input[i]);}
char * pch;
char **words;
words=(char **)malloc(sizeof(char *)*10000);
int i=0;

pch = strtok (str," ,.-");
while (pch != NULL)
{ words[i]=pch;
pch = strtok (NULL, " ,.-");
i++;
}

int count=0,k;
bool flag;

for( k=0;k<i;k++)
{

flag=false;
for(int j=0;j<test;j++)
{
if(isequals(input[j],words[k]))
{flag=true;}
}
if(flag)
{count++;}
else
{count=0;}
if(count==test)
{break;}

}

for(int j=test;j>0;j--)
{printf("%s ",words[j]);}

return 0;
}

- Aditya March 21, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

import java.util.HashMap;
import java.util.Scanner;

public class SubSegment {
	public static void main(String[] args) {
		Scanner s = new Scanner(System.in);
		int strt=0, end=0, x=0, y=0;
		int len = 100000;
		String text = s.nextLine();
		int count = s.nextInt();
		HashMap<String, Integer> h = new HashMap<String, Integer>();
		for(int i=0; i<count; i++){
			h.put(s.next().toLowerCase(), 0);
		}
		text = text.replaceAll("[^a-zA-Z ]", "");
		String t = text;
		text = text.toLowerCase();
		String[] token = text.split(" ");
		for(y=0; y<=token.length; y++){
			if(!h.containsValue(0)){
				y--;
			}
			else{
				if(y< token.length && h.containsKey(token[y])){
					h.put(token[y], h.get(token[y])+1);
				}
			}			
			if(!h.containsValue(0)){
				if(len > y-x){
					len = y-x;
					strt = x;
					end = y;
				}
				if(h.containsKey(token[x])){
					h.put(token[x], h.get(token[x])-1);
				}
				x++;
			}
		}
		
		String[] token2 = t.split(" ");
		
		if(end > strt){
			for(int i=strt; i<=end; i++){
				System.out.print(token2[i]+" ");
			}
		}
		else
			System.out.println("NO SUBSEGMENT FOUND");
	}

}

- CK October 10, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

import java.util.HashMap;
import java.util.Scanner;

public class SubSegment {

	public static void main(String[] args) {
		Scanner s = new Scanner(System.in);
		int strt=0, end=0, x=0, y=0;
		int len = 100000;
		String text = s.nextLine();
		int count = s.nextInt();
		HashMap<String, Integer> h = new HashMap<String, Integer>();
		for(int i=0; i<count; i++){
			h.put(s.next().toLowerCase(), 0);
		}
		text = text.replaceAll("[^a-zA-Z ]", "");
		String t = text;
		text = text.toLowerCase();
		String[] token = text.split(" ");
		for(y=0; y<=token.length; y++){
			if(!h.containsValue(0)){
				y--;
			}
			else{
				if(y< token.length && h.containsKey(token[y])){
					h.put(token[y], h.get(token[y])+1);
				}
			}			
			if(!h.containsValue(0)){
				if(len > y-x){
					len = y-x;
					strt = x;
					end = y;
				}
				if(h.containsKey(token[x])){
					h.put(token[x], h.get(token[x])-1);
				}
				x++;
			}
		}
		
		String[] token2 = t.split(" ");
		
		if(end > strt){
			for(int i=strt; i<=end; i++){
				System.out.print(token2[i]+" ");
			}
		}
		else
			System.out.println("NO SUBSEGMENT FOUND");
	}

}

- CK October 10, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Following code is passing 7 test cases but dont know what is wrong with other 3 test cases:

<?php
// read inputs
$line = trim(fgets(STDIN)); 						// reads paragraph
//$line = "This is a test. This is a programming test. a This is a  programming test in any language.";
//$line = file_get_contents('./input.txt', true);
if(strlen($line)> 200000) {
	die("Input paragraph is too long. It should be less than 200000");
}
$para = preg_split("/[\s]+/", $line);
foreach($para as $w) {
	if(strlen($w) >=15) {
		die("Word length should be in range 0 < word length < 15");
	}
}
$line1 = preg_replace("/[^a-zA-Z\s]/", "", $line);
$paragraph = preg_split("/[\s]+/", $line1);			// split paragraph in words

fscanf(STDIN, "%d\n", $number); 					// read number of words
//$number = 2;
if($number > count($paragraph)) {
	die("Number should be in range of 0 < Number <= " . count($paragraph));
}
for($i = 0; $i < $number; $i++) {
	$w = strtolower(trim(fgets(STDIN)));
	if(count($w) < 15) {
		$w = preg_replace("/[^a-zA-Z\s]/", "", $w);//strtolower(trim(fgets(STDIN)));
		$words[] = $w;
		$wordsAssoc[$w] = $w;
	} else {
		die("Word length should be in range 0 < word length < 15");
	}
}

// end reading inputs
$totalWords = count($words);
$totalPara = count($paragraph);
// create index map of required words
$alphaWords = array();
for($i = 0; $i < $totalPara; $i++) {
	$word = strtolower($paragraph[$i]);
	$alphaWords[] = $word;
}

$diff = array_diff($words, $alphaWords);

if(count($diff) > 0) {
	echo "NO SUBSEGMENT FOUND";
	exit;
}

$subStrLen = count($paragraph);
$min = 0;
$max = 0;
$found = array();
$result = array();
for($i = 0; $i < count($paragraph); $i++) {
	$word = strtolower($paragraph[$i]);
	if(isset($wordsAssoc[$word])) {
		if($totalWords == count($found)) {
			$found[$word] = $i;
			$values = array_values($found);
			$min = min($values);
			$max = max($values);
			//echo "$result[min] $result[max] $min $max \n";
			if($result['max'] - $result['min'] > $max - $min) {
				$result['min'] = $min;
				$result['max'] = $max;
			}
		} else {
			$found[$word] = $i;
			$values = array_values($found);
			$min = min($values);
			$max = max($values);
			$result['min'] = $min;
			$result['max'] = $max;
		}
	}
	if($totalWords == count($found) && $max - $min + 1 <= $totalWords) {
		// we have already found smallest substring
		break;
	}
}

for($i = $result['min']; $i <= $result['max']; $i++) {
	echo $paragraph[$i] . " ";
}

- warLog February 22, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Can you explain your logic and where this question was asked.

- agrawal.arpit35 August 28, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

-1

of 1 vote

I think we can have a HashTable where keys will be the 4 words and values will be by default false. Parse the text of string and check against the Hashtable to see if those values exists in the table. If yes, update value to True.... Now The words needs to be continuous. If not clear the HashTable and mark all False. Keep updating the table as per you parse the text,...and when you get all the values of the Table with TRUE at the same time words are continuous in the string, you get your answer right there !! Time complexity O(N), space will be O(1)..... give me ur thoughts on this... thanks !!

- vran.freelancer June 26, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Marking all on hash table will be O(k). O(n) if k == n (which would be the worst case) so it's still O(n^2)

- Herman June 26, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Yes, you are right, but you can base your algorithm on substring searching however. I solved on O(n) using something similar to string searching, thinking strings as characters and "string in set" as O(1) if set is HashSet.

- Herman June 28, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

@Herman: doesn't explain how you deal with the fact that the words can appear in any order, and that there can be non-words in between.

- eugene.yarovoi June 28, 2012 | Flag

Comment hidden because of low score. Click to expand.

-1

of 1 vote

can any one tell me what will be the output of this case?

This is a test. This is a programming test. T!his is a programming test in any language.
4
this
a
test
programming

can we ignore "!" in "T!his" for the match?

- Anonymous June 30, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

i have the same doubt... but if u see in every sentence.. test word ends with a period... but "test." is not considered as a word.. the period is skipped so... i guess T!his=This... all we can do is to skip all the special characters...

Also as far as in the k input words... i dont know whether we can consider T!his as This or T!his....

Amazon being unclear in a way wants to test how we consider no. of test cases and our patience.. thought it's basically a lotta waste of time for us...

- bhargava30 February 13, 2013 | Flag

Comment hidden because of low score. Click to expand.

-2

of 2 vote

after contest will over think lot of solution will be here...............till then wait.............!!!

- satya_patel June 27, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

@eugene : its two weeks long contest...see the link Amazon India Programmuing Contest
www[dot]interviewstreet[dot]com

- Nitin Gupta June 28, 2012 | Flag

Comment hidden because of low score. Click to expand.

-2

of 2 vote

guyz add extra test cases ... please !!!!!

- gowtham September 05, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

DP is too vague. Can you tell the algorithm?

- nerd June 24, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

You definitely can't. Rabin Karp is for finding substrings in strings. This is a different problem. We're being asked to find a "substring" that contains at least one of each of a set of strings, in any order. Completely different.

- eugene.yarovoi June 28, 2012 | Flag

CareerCup

Amazon Interview Question for Software Engineer / Developers

Books

Videos

Resume Review

Mock Interviews