Amazon Interview Question
Software Engineer / DevelopersCountry: India
This is the naive solution. Input sizes have been chosen to ensure that O(N^2) solutions time out. There's a linear time solution, as well as an O(NlogN) solution.
1. Make a TRIE of all the words to be searched.
2. Now, find the first segment which contains all the words.Mark its length
3. Take two pointers, one pointing to the start of the above segment & other pointing to the end.
4. Increment the end pointer to point to the next word. Check whether both start & end pointer point to the same word. If yes, increment the start pointer to point to next word.[We will also perform the same while findin the first segment]
Now, remove all the extra words from the starting that are not in the TRIE.
5. Proceed above steps until EOF is not encountered, meanwhile also keep updating the minlength if length of the newly found segment is less.
If there are any issues, let me know.
Sure, there are issues. So in the end of step 4), you stop incrementing the start pointer once you hit something that's in the trie? The problem is that what if it's a searchword, but there are still occurrences of the searchword after the occurrence that you found? For example, consider x x a x a x b x b x c x a b x where x are not searchwords and a, b, and c are. You find a valid subarray at backpointer = 2, frontpointer = 10. Now you scan forward and you see a at index 12.
What would your algo do here? It would scan until it finds the b at index 6, right? But it should keep scanning until it finds the b at index 8. That's what I mean when I say there can still be occurrences of a searchword after the one you found.
I don't understand what [We will also perform the same while findin the first segment] means.
Still seems prone to the same objection. BTW, in the example I gave, I meant c to be a searchword as well. Only x's are not searchwords.
I'm still seeing no sign of something I think is probably necessary to solve this problem - some sort of storage indicating where the most recent occurrence of a word was seen.
x x a x a x b x b x c x a b x
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
lets say we need to search for segment containing a,b,&c.
1.start pointer is at 0, last pointer is at 0.
2.start pointer is at 2, last pointer is at 2.
3.start pointer remains at 2, last pointer is at 4.
4.Both words[start & last pointer ] are same, update start pointer to 4[since word at index 3 has no purpose].
5.start pointer is at 4, last pointer is at 12.
6.Both words are same, update start pointer to 8, last pointer remains at 12.
7.start pointer is at 8, last pointer is at 13.
8. Found both words equal,update start pointer to 10,last pointer being at 13[this is the smallest segment.]
9. start pointer is at 10,last pointer is at 14.
10. The file is finished. So, the smallest segment is from 10 to 13.
I don't know what issues are you finding in this approach.
To me, its working perfectly.
Should be 6 instead of 8 in step 6 I think, but actually this would still work in that case. I still think it won't work on some cases, but for me to say what those cases are, I'd like to know whether you agree that in step 6 it should be 6 instead of 8. If not, how does the algo get 8?
i used the same code.
it's giving right answer for every test case i'm trying
but it fails 4 cases on the site.
Any ideas?
Dude you are not supposed to paste questions, as contest is still going on.
Please remove this post asap
@rishikantku and eugene
Sorry if I am a noob but what/which contest are we talking about here?
@manishs747
If this was an interview question then what kind of interview: phone or in-person? And was this question for fresh college grads?
I solved it partially (6/10). and getting Segmentation fault (core dumped) in 1 case. and wrong answer in 3 case.
Can you suggest me want may test case I should consider?
I think I can say this, since it's not specific to any problem: If you had a segfault, something's wrong and it's not just a wrong algorithm. Find what that something is. Whatever's wrong could explain the Wrong Answer as well.
here is my implementation in java
import java.util.*;
import java.io.*;
public class Solution
{
public static void main(String[] args)
{
Map<String, Integer> needstofind = new TreeMap<String, Integer>();
Map<String, Integer> hasfound = new TreeMap<String, Integer>();
String final_result="";
String result="";
int begin=0;
int end=0;
String words[];
String text_words[];
String final_words[];
String str_final;
int begin_tmp=0;
int end_tmp=0;
int flag=0;
int z=0;
int len;
int txt_len;
int i;
int count=0;
int needstofind_size;
int min=0;
int final_length=0;
Scanner in = new Scanner(System.in);
String string = in.nextLine();
string = string.replaceAll("[^a-zA-Z| |]", " ");
string = string.replaceAll(" ", " ");
str_final=string;
string=string.toLowerCase();
z = in.nextInt();
words = new String[z+1];
in.nextLine();
for(i=0; i<z; i++)
{
words[i] = in.nextLine();
needstofind.put(words[i],1);
}
text_words = string.split(" ");
final_words=str_final.split(" ");
len=hasfound.size();
txt_len=text_words.length;
i=0;
needstofind_size=needstofind.size();
begin=0;
count=0;
for(end=0; end<txt_len; end++)
{
if(needstofind.containsKey(text_words[end])==false)continue;
hasfound.put(text_words[end],hasfound.get(text_words[end])==null?1:hasfound.get(text_words[end])+1);
if(hasfound.get(text_words[end])<=needstofind.get(text_words[end]) )count++;
if(count==needstofind_size)
{
flag++;
while(needstofind.get(text_words[begin])==null || hasfound.get(text_words[begin])>1)
{
if(needstofind.containsKey(text_words[begin]))
{
if(hasfound.get(text_words[begin])>1)
hasfound.put(text_words[begin],hasfound.get(text_words[begin])-1);
}
begin++;
}
if(flag>1 && end-begin+1>min)continue;
result="";
min=end-begin+1;
for(int j=begin; j<=end; j++)
result=result+final_words[j]+" ";
if(flag==1)
{
final_result=result;
final_length=min;
}
else if(min<final_length && flag>1)
{
final_result=result;
final_length=min;
}
}
}
if(flag>0)System.out.println(""+final_result);
else System.out.println("NO SUBSEGMENT FOUND");
}
}
A solution using hashmap and building a dp tree
This is a test This is a programming test This is a programming test in any language.
Explanation with ex:
Note the occurrence index in hashmap
This - 0,4,9
a - 2,6,11
test - 3,8,13
programming - 7,12
Start with the word with the least instance and move towards one greater instance.
Move to the next word and associate a instance
i)within the range
ii) if none within range associate closest greater than and closest lesser than it
for 7 in 'programming' to 'test'
7 - 3,7 and 7,8
for 12 in 'programming' to 'test'
12- 8,12 and 12,13
now add 'a' instance
3,7 -> 3,6,7 [ 6 is within range ]
7,8 ->6,7,8 and 7,8,11
8,12 -> 8,11,12 [ 11 is within range]
12,13 ->11,12,13
now add 'this' instance
3,6,7 ->3,4,6,7
6,7,8 ->6,7,8,9 and 4,6,7,8
7,8,11 -> 7,8,9,11
8,11,12 ->8,9,11,12
11,12,13 ->9,11,12,13
Now the leaf nodes are all the smaller segments containing the given words , the smallest of them[ smallest range] is the answer
3,4,6,7 - range 4
6,7,8,9 - range 3
4,6,7,8 - range 4
7,8,9,11 - range 4
8,9,11,12 - range 4
9,11,12,13 - range 4
Ans is 6,7,8,9
Use divider and conquer.
Collect all the words in the input string into a list inlist[left:right].
Start at the middle of the input list.
{{
subsegment(inlist[left:right], words)
from the mid move in steps of 1 to the left and right and check if the word is the search list. if it is there mark the positions
at the end of this step you should have mid segment which contains all the words.
If some words were not found return []
ls=subsegment(inlist[left:mid], words)
rs=subsegment(inlist[mid+1:right], words)
Now it is possible that all ls,ms,rs can contain the words or some of them or none of them.
return the subsegment with the min length
}}
I have written the code and it is working in Python. If you have any queries contact me
So I have some doubts about this solution: how exactly will we discover minimum subsegments that pass through the center? It seems we're assuming that any minimal subsegment that passes through the middle of the array will have an equal number of elements on both sides, and hence we will be able to discover it efficiently. There's no basis for such an assumption, if in fact you make it. I might just be misunderstanding you though.
#include<stdio.h>
#include<string.h>
#include<malloc.h>
bool isequals( char * a, char * b)
{
int sz = sizeof(a)/sizeof(char);
if ((sizeof(b)/sizeof(char)) != sz)
return false;
if (strcmp(tolower(a),tolower(b))!=0)
return false;
return true;
}
int main()
{
char str[]="This is a test. This is a programming test. This is a programming test in any language";
int test;
char **input;
scanf("%d",&test);
input=(char **)malloc(sizeof(char *)*test);
for(int i=0;i<test;i++)
{scanf("%s",input[i]);}
char * pch;
char **words;
words=(char **)malloc(sizeof(char *)*10000);
int i=0;
pch = strtok (str," ,.-");
while (pch != NULL)
{ words[i]=pch;
pch = strtok (NULL, " ,.-");
i++;
}
int count=0,k;
bool flag;
for( k=0;k<i;k++)
{
flag=false;
for(int j=0;j<test;j++)
{
if(isequals(input[j],words[k]))
{flag=true;}
}
if(flag)
{count++;}
else
{count=0;}
if(count==test)
{break;}
}
for(int j=test;j>0;j--)
{printf("%s ",words[j]);}
return 0;
}
import java.util.HashMap;
import java.util.Scanner;
public class SubSegment {
public static void main(String[] args) {
Scanner s = new Scanner(System.in);
int strt=0, end=0, x=0, y=0;
int len = 100000;
String text = s.nextLine();
int count = s.nextInt();
HashMap<String, Integer> h = new HashMap<String, Integer>();
for(int i=0; i<count; i++){
h.put(s.next().toLowerCase(), 0);
}
text = text.replaceAll("[^a-zA-Z ]", "");
String t = text;
text = text.toLowerCase();
String[] token = text.split(" ");
for(y=0; y<=token.length; y++){
if(!h.containsValue(0)){
y--;
}
else{
if(y< token.length && h.containsKey(token[y])){
h.put(token[y], h.get(token[y])+1);
}
}
if(!h.containsValue(0)){
if(len > y-x){
len = y-x;
strt = x;
end = y;
}
if(h.containsKey(token[x])){
h.put(token[x], h.get(token[x])-1);
}
x++;
}
}
String[] token2 = t.split(" ");
if(end > strt){
for(int i=strt; i<=end; i++){
System.out.print(token2[i]+" ");
}
}
else
System.out.println("NO SUBSEGMENT FOUND");
}
}
import java.util.HashMap;
import java.util.Scanner;
public class SubSegment {
public static void main(String[] args) {
Scanner s = new Scanner(System.in);
int strt=0, end=0, x=0, y=0;
int len = 100000;
String text = s.nextLine();
int count = s.nextInt();
HashMap<String, Integer> h = new HashMap<String, Integer>();
for(int i=0; i<count; i++){
h.put(s.next().toLowerCase(), 0);
}
text = text.replaceAll("[^a-zA-Z ]", "");
String t = text;
text = text.toLowerCase();
String[] token = text.split(" ");
for(y=0; y<=token.length; y++){
if(!h.containsValue(0)){
y--;
}
else{
if(y< token.length && h.containsKey(token[y])){
h.put(token[y], h.get(token[y])+1);
}
}
if(!h.containsValue(0)){
if(len > y-x){
len = y-x;
strt = x;
end = y;
}
if(h.containsKey(token[x])){
h.put(token[x], h.get(token[x])-1);
}
x++;
}
}
String[] token2 = t.split(" ");
if(end > strt){
for(int i=strt; i<=end; i++){
System.out.print(token2[i]+" ");
}
}
else
System.out.println("NO SUBSEGMENT FOUND");
}
}
Following code is passing 7 test cases but dont know what is wrong with other 3 test cases:
<?php
// read inputs
$line = trim(fgets(STDIN)); // reads paragraph
//$line = "This is a test. This is a programming test. a This is a programming test in any language.";
//$line = file_get_contents('./input.txt', true);
if(strlen($line)> 200000) {
die("Input paragraph is too long. It should be less than 200000");
}
$para = preg_split("/[\s]+/", $line);
foreach($para as $w) {
if(strlen($w) >=15) {
die("Word length should be in range 0 < word length < 15");
}
}
$line1 = preg_replace("/[^a-zA-Z\s]/", "", $line);
$paragraph = preg_split("/[\s]+/", $line1); // split paragraph in words
fscanf(STDIN, "%d\n", $number); // read number of words
//$number = 2;
if($number > count($paragraph)) {
die("Number should be in range of 0 < Number <= " . count($paragraph));
}
for($i = 0; $i < $number; $i++) {
$w = strtolower(trim(fgets(STDIN)));
if(count($w) < 15) {
$w = preg_replace("/[^a-zA-Z\s]/", "", $w);//strtolower(trim(fgets(STDIN)));
$words[] = $w;
$wordsAssoc[$w] = $w;
} else {
die("Word length should be in range 0 < word length < 15");
}
}
// end reading inputs
$totalWords = count($words);
$totalPara = count($paragraph);
// create index map of required words
$alphaWords = array();
for($i = 0; $i < $totalPara; $i++) {
$word = strtolower($paragraph[$i]);
$alphaWords[] = $word;
}
$diff = array_diff($words, $alphaWords);
if(count($diff) > 0) {
echo "NO SUBSEGMENT FOUND";
exit;
}
$subStrLen = count($paragraph);
$min = 0;
$max = 0;
$found = array();
$result = array();
for($i = 0; $i < count($paragraph); $i++) {
$word = strtolower($paragraph[$i]);
if(isset($wordsAssoc[$word])) {
if($totalWords == count($found)) {
$found[$word] = $i;
$values = array_values($found);
$min = min($values);
$max = max($values);
//echo "$result[min] $result[max] $min $max \n";
if($result['max'] - $result['min'] > $max - $min) {
$result['min'] = $min;
$result['max'] = $max;
}
} else {
$found[$word] = $i;
$values = array_values($found);
$min = min($values);
$max = max($values);
$result['min'] = $min;
$result['max'] = $max;
}
}
if($totalWords == count($found) && $max - $min + 1 <= $totalWords) {
// we have already found smallest substring
break;
}
}
for($i = $result['min']; $i <= $result['max']; $i++) {
echo $paragraph[$i] . " ";
}
I think we can have a HashTable where keys will be the 4 words and values will be by default false. Parse the text of string and check against the Hashtable to see if those values exists in the table. If yes, update value to True.... Now The words needs to be continuous. If not clear the HashTable and mark all False. Keep updating the table as per you parse the text,...and when you get all the values of the Table with TRUE at the same time words are continuous in the string, you get your answer right there !! Time complexity O(N), space will be O(1)..... give me ur thoughts on this... thanks !!
Marking all on hash table will be O(k). O(n) if k == n (which would be the worst case) so it's still O(n^2)
Yes, you are right, but you can base your algorithm on substring searching however. I solved on O(n) using something similar to string searching, thinking strings as characters and "string in set" as O(1) if set is HashSet.
can any one tell me what will be the output of this case?
This is a test. This is a programming test. T!his is a programming test in any language.
4
this
a
test
programming
can we ignore "!" in "T!his" for the match?
i have the same doubt... but if u see in every sentence.. test word ends with a period... but "test." is not considered as a word.. the period is skipped so... i guess T!his=This... all we can do is to skip all the special characters...
Also as far as in the k input words... i dont know whether we can consider T!his as This or T!his....
Amazon being unclear in a way wants to test how we consider no. of test cases and our patience.. thought it's basically a lotta waste of time for us...
after contest will over think lot of solution will be here...............till then wait.............!!!
A non DP solution.
- nerd June 24, 2012Omit all the punctuation.
This is a test This is a programming test This is a programming test in any language
for(int i=0; i< TotalWordsInSentence; i++)
{
for(int j=i; j<TotalWordsInSentence; j++)
{
if(WordInSentence[j] is givenWord)
{
MarkFound(givenWOrd)
}
if(AllGivenWOrdFound())
{
Length = j - i;
start = i
end = j
}
if(Length < finalLen)
{
finalLen = Length
finalStart = start
finalEnd = end
}
}
}
print finalStart, finalEnd, finalLen
O(N^2) Complexity. Where N is Number of words in sentence.