Amazon Interview Question
SDE-2sCountry: India
Interview Type: In-Person
small modification is needed ..
if we can't able to find a word at some point of time, we have to discard the latest word which we committed ....and go with other alternatives ... recursion is the best way to do this...
And for searching whether the word is present with some letter or not, trie is the best suitable data structure(searching can be done in amortizedO(1))....
If someone gives best psuedo code .. that will be appreciated :)
Question is bit ambiguous coz we can have a questions like how many times we can reuse the given words? or we need to merge at-most 2 words to make the new word?
Good point. So if we have to use all the words - then result is either the longest word (if it includes all other words) or nothing. If we have to use just some words - it is still the longest word because it includes at least itself.
public String FindLongestCompositeString(java.io.File StringFile) throws FileNotFoundException, IOException {
java.io.BufferedReader fin = new java.io.BufferedReader(new java.io.FileReader(StringFile));
java.util.List<String> inputStrings = new java.util.ArrayList<String>();
while (fin.ready()) {
inputStrings.add(fin.readLine().trim());
}
return FindLongestCompositeString(inputStrings);
}
public String FindLongestCompositeString(java.util.List<String> inWords) {
java.util.List<String> tempWords = new java.util.ArrayList<String>(inWords);
java.util.Collections.sort(tempWords, new java.util.Comparator<String>() {
@Override
public int compare(String o1, String o2) {
if (o1.length() == o2.length()) {
return o1.compareTo(o2);
} else if (o1.length() > o2.length()) {
return -1;
} else {
return 1;
}
}
});
while (tempWords.size() > 1) {
String currentpossible = tempWords.get(0);
tempWords.remove(0);
if (findParts(currentpossible, tempWords)) {
return currentpossible;
}
}
return null;
}
public boolean findParts(String inPossible, java.util.List<String> inputStrings) {
if (inPossible.length() == 0) {
return true;
}
for (String curPart : inputStrings) {
if (inPossible.startsWith(curPart)) {
if (findParts(inPossible.substring(curPart.length(), inPossible.length()), inputStrings)) {
return true;
}
}
}
return false;
}
sort the strings based on length - O(nlogn) descending order
take first string, build Suffix tree and check remaining all remaining strings exists or not -O(n)
consider building suffix tree as constant
now leave first string and build suffix tree for second if all the strings are not exists in first string and repeat the same till the end of the all the strings -O(npow2)
char[] bigWordArr = bigWord.toCharArray();
booean[] visited = new boolean[bigWordArr.length];
for( int i =0; i <bigWordArr.size; i++){
boolean found = false; int temp;
for(int j =0; j <bigWordArr.size; j++){
found = lookUpInTries(bigWord.subString(i,j)); temp =j;
}
if(found) {
for(int j =i; j <=temp; j++){
if(visited[j]) continue;
else visited[j] = true;
}
}
}
for( int i =0; i <visited.size; i++){
if(!visited[i]){ syso("Not a real big word"); break;
}
// build a tries if not availble
1. Sort the strings in the file, based on their length.
2. Build a common suffix tree for all these words, in the above sorted order.
3. If a word can be inserted to the suffix tree without adding any direct child to the root node,
then it means that it is a combination of two words that have been already added to the suffix tree.
For eg: if we have inserted 'there' and 'after' to the suffix tree, then for adding 'thereafter' to the suffix tree we don't need to add a direct child to the root node of the suffix tree. So it is a combination of two other words.
Complexity : O(nlogn) considering that building suffix tree would take O(n) time.
#include<stdio.h>
#include<string.h>
#include<stdlib.h>
main()
{
char a[50],b[20],c[20];
int i,j=0,l=0;
printf("Enter a string:\n");
gets(a);
for(i=0;i<=strlen(a);i++)
{
if(a[i]!=32 && a[i]!='\0')
b[j++]=a[i];
else
{
b[j]='\0';
if(strlen(b)>l)
{
strcpy(c,b);
l=strlen(b);
}
j=0;
}
}
printf("The longest word is:");
puts(c);
}
can we create a hash table and put all the strings in that(calculate the index by summing up the chars of the strings). Start with the longest string (sorting can be done) or simply with the first string. suppose i have "there", i will first check whether t is present or not, then i will check whether "th" is present or not and so on. complexity will be O(n*m) n=no.of strings m=total no. of chars in a string
I think a sorting and trie should give the solution
1.Sort the dictionary in ascending order by string size
2.Starting from the smallest word, start inserting into the TRIE
3.If while inserting a word, we have not added any nodes and reach a dead end, take a pointer to the root, and as we insert the new nodes for the longer, move down the TRIE as long as we keep getting a match
4.When we insert the last node of the trie, we still have a matching character from the search, save this word and the length
5. do this till the end of the dictionary
6.Print the longest of the saved word
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.util.PriorityQueue;
import java.util.Scanner;
public class LongestCombinationString {
public static void main(String[] args) throws FileNotFoundException {
Scanner scanner = new Scanner(new FileInputStream("src/main/resources/words.txt"));
SuffixTree suffixTree = new SuffixTree();
PriorityQueue<String> queue = new PriorityQueue<>((s, t) -> t.length() - s.length());
while (scanner.hasNext()) {
String input = scanner.nextLine();
suffixTree.add(input);
queue.add(input);
}
while (!queue.isEmpty()) {
String word = queue.poll();
if(suffixTree.compositionSearch(word)) {
System.out.println(word.length());
break;
}
}
}
}
import java.util.HashMap;
import java.util.Map;
public class SuffixTree {
SuffixTreeNode root;
public SuffixTree() {
root = new SuffixTreeNode(true, null, false);
}
private boolean compositionSearch(final String reversedInput, int startIndex, boolean compositionSearch) {
SuffixTreeNode current = this.root;
while(startIndex < reversedInput.length()) {
Character ch = reversedInput.charAt(startIndex);
if (current.getChildren() == null) {
return false;
}
if (current.getChildren().containsKey(ch)) {
SuffixTreeNode treeNode = current.getChildren().get(ch);
if (treeNode.isLeaf()) {
if (startIndex == reversedInput.length() -1) {
return compositionSearch;
}
boolean compositionFound = compositionSearch(reversedInput, startIndex + 1, true);
if (compositionFound) {
return compositionFound;
}
}
startIndex = startIndex + 1;
current = current.getChildren().get(ch);
} else {
return false;
}
}
return false;
}
public boolean compositionSearch(String input) {
return compositionSearch(new StringBuilder(input).reverse().toString(), 0, false);
}
public boolean search(final String input) {
SuffixTreeNode current = this.root;
for(Character ch: new StringBuilder(input).reverse().toString().toCharArray()) {
if (current.getChildren() == null) {
return false;
}
if (current.getChildren().containsKey(ch)) {
current = current.getChildren().get(ch);
} else {
return false;
}
}
return current.isLeaf();
}
public void add(final String input) {
if (input.isEmpty()) {
this.root.setLeaf(true);
}
SuffixTreeNode current = this.root;
for(Character ch: new StringBuilder(input).reverse().toString().toCharArray()) {
if (current.getChildren() == null) {
current.setChildren(new HashMap<>());
}
if (current.getChildren().containsKey(ch)) {
current = current.getChildren().get(ch);
} else {
SuffixTreeNode temp = new SuffixTreeNode(false, null, false);
current.getChildren().put(ch, temp);
current = temp;
}
}
current.setLeaf(true);
}
}
class SuffixTreeNode {
private boolean isRoot;
private Map<Character, SuffixTreeNode> children;
private boolean isLeaf;
public void setRoot(boolean root) {
isRoot = root;
}
public Map<Character, SuffixTreeNode> getChildren() {
return children;
}
public void setLeaf(boolean leaf) {
isLeaf = leaf;
}
public boolean isRoot() {
return isRoot;
}
public void setChildren(Map<Character, SuffixTreeNode> children) {
this.children = children;
}
public boolean isLeaf() {
return isLeaf;
}
public SuffixTreeNode(boolean isRoot, Map<Character, SuffixTreeNode> children, boolean isLeaf) {
this.isRoot = isRoot;
this.children = children;
this.isLeaf = isLeaf;
}
}
I am thinking of a directed graph.
Given 2 strings, A and B, there exists an edge A -> B iff A \subset B. We do this for every pair of words in the file. Edge weights are always 1, and edges are directed.
This gives an adjacency matrix which is not symmetric.
Now,
Calculate sum of all columns of this matrix, and pick the word corresponding to the maximum sum. This is the word which has the most substrings from within the file.
Eg:
the -> there
the -> thereafter
there -> thereafter
after ->thereafter
thereafter
reaf ->thereafter
Now, if we form a 6x6 adjacency matrix corresponding to the above graph (directed), the sum of the column corresponding to thereafter will have the maximum value.
The columns need to store the length of the substring that actually form the string.
If we go with the notation that if the -> thereafter =1 else 0, we wont get the exact solution.
Although the string may contain maximum substring, but it may not be the longest word.
ex.
the
there
after
thereafter
reaf
a
b
c
abc
Your solution will give abc as the answer because it has maximum number of substring(a,b,c), but the correct answer is "thereafter" because it has the maximum length.
Well if I understand it correctly then I think it is asked to find the longest string "in the file" that could be made from the other strings in that same file.
- MM April 30, 2013Going with that interpretation, one way to solve this problem would be to first create a sorted (based on string lengths) array of the strings in the file. Then start with the largest string in that array. In this case 'Thereafter'. Get the first character in that string i.e. 'T' and see if there are any other strings that start with it. We find: 'the' and 'there'.
Now we need to evaluate these two separately and see if we can form the entire word. If we go with 'the' the next word we should search for should start with 'R'. And we find 'reaf'. Now 'the' and 'reaf' give us just part of 'thereafter'. But now there is no string that matches the remainder of 'thereafter' (i.e. 'ter'). So we discard 'The'.
Now if we go with 'there', the next string that we should search for should start with 'A'. And we find that we have 'after' that matches with the remainder of the 'thereafter' (i.e. 'after').
A better solution can be worked out for this but this is what came to my mind immediately after reading the problem.