Given a dictionary and a strin

Google Interview Question for Software Developers

0

of 0 votes

18
Answers
Given a dictionary and a string, find all the substrings that are valid words in dictionary.

I was thinking of a Trie solution but I'm not sure a Trie will work easily to match sub words that begin in the middle of the string.
- Pedro July 28, 2016 in United States | Report Duplicate | Flag | PURGE
Google Software Developer Algorithm

Email me when people comment.

An error occurred in subscribing you.

Country: United States
Interview Type: In-Person

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 0 vote

if the string is short enough, we could just iterate over all substrings (n^2 substrings) and check the membership in the dictionary.

if the string is too long, we should build the trie like you mentioned

- zd July 28, 2016 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

package com.db.code.example;

import java.util.ArrayList;
import java.util.HashSet;
import java.util.List;
import java.util.Set;

public class ValidWords {	
	public static void main(String []args){
		List<String> dictionary = new ArrayList<String>();
		dictionary.add("abc");
		dictionary.add("bc");
		dictionary.add("xc");
		dictionary.add("ac");
		dictionary.add("a");
		
		String str= "abcde";
		Set<String> strSet1= new HashSet<>();
		int length=str.length();
		String [] strArr = new String[length];
		
	    Set<String> strSet2= new HashSet<>();
	    long  startTime = System.currentTimeMillis();	
		for (int i = 0; i < str.length(); ++i) {
		      for (int j = 0; j < (str.length()-i); ++j) {
		    	  String subStr=str.substring(j, i + j + 1);
		    	  if(dictionary.contains(subStr)){
		    		  strSet2.add(subStr);
		    	  }
		      }
		    }	
		
		
	    for(String strValue: strSet2){
	    	System.out.println(strValue);
	    }
	}

}

- Ramkrishna July 28, 2016 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

You could use Ukkonen's Algorithm to build the suffix tree in linear time, but I would start with this O(n log n) approach (and maybe fail the interview, but what the heck...)

package main

import "fmt"

var (
	ctr  int
	dict map[string]bool = map[string]bool{
		"anti":                  true,
		"dis":                   true,
		"disestablish":          true,
		"establish":             true,
		"establishment":         true,
		"establishmentarian":    true,
		"establishmentarianism": true,
		"ment":                  true,
		"ism":                   true,
	}
)

func main() {
	m := substrings("antidisestablishmentarianism")
	for k, _ := range m {
		if dict[k] {
			fmt.Println(k)
		}
	}
	fmt.Printf("%d operations\n", ctr)
}

func substrings(s string) map[string]struct{} {
	m := map[string]struct{}{}
	substringsR(s, m)
	return m
}

func substringsR(s string, m map[string]struct{}) {
	if _, ok := m[s]; ok {
		return
	}
	ctr++
	m[s] = struct{}{}
	if len(s) > 1 {
		substringsR(s[1:], m)
		substringsR(s[0:len(s)-1], m)
	}
}

- agm July 28, 2016 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

The standard algo for this is aho corasick. However, I have a hard time believing this is an actual interview question since getting aho corasick right in an interview setting is one-in-a-thousand type of feat.

- Anonymous July 28, 2016 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Assuming that it is not a null string, and is single word not a sentence. we can convert it in the given string to character array and later have two for loops and pick up each letter and combine with other letters from character array ans check whether they exist in dictonary and if yes store them and later print the stored values.

- Sai July 28, 2016 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

void findsubString(int char[] arr, set Dictionary)
{
	int temp=0;
	int n=arr.length();
	for(int i=0;i<n-2;i++)
	{
		for(j=i+1;j<n-1;j++)
		{
			if(dictionary.contains(arr.substring(i,j)))   temp++;
		}
	}
	System.out.printLn(temp);
}

- Mk July 28, 2016 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

create a trie with the dictionary

for start in range(0, len(string)):
	for end in range(i+1, len(string)):
		yield tri.is_a_word(string[start:end]

- chromehearts July 29, 2016 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

do we just need to find all sub strings from string that are present in dictionary, if yes then it is really simple. Am i missing something here ?

public List<string> FindAllSubstrings(List<string> dict, string str)
        {
            List<string> output=new List<string>();

            foreach (var currStr in dict)
            {
                if (str.IndexOf(currStr) > -1)
                    output.Add(currStr);
            }

            return output;
        }

- iwbr August 16, 2016 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Brute force: represent the dictionary as a hash set. Search every substring (i,j). Time complexity O(n^2) where n is the string length.

You can't do any better than the brute force in worst case complexity, because each substring can potentially be in the dictionary, so you have to at least print every one of them and that's O(n^2).

However, representing the dictionary as a trie tree improves the run time much more in common situations. You search the trie starting at each position i in the string, and it lets you terminate early. Run time is O(nm) where m is the length of longest word in dictionary. This is better than brute force if m << n.

- Anonymous September 08, 2016 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Can't do better worst case than brute force O(n^2) because you need to potentially print every substring.

Using a trie tree will make the run time O(nm) where m is the length of the longest word in dictionary, and it's much better than brute force if m << n.

- rpzw September 08, 2016 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Given a string and a dictionary, returns all substrings of the string that exist in the dictionary. Time complexity is O(n^2)

public static List<String> getSubStringsFromDictionary(final String str, final List<String> dictionary) {
		List<String> result = new ArrayList<String>();
		for(int i=0; i<str.length(); i++) {
			String s = String.valueOf(str.charAt(i));
			// check for each characters, for example, a is substring of abc
			if(dictionary.contains(s)) {
				result.add(s);
			}
			// check for subsequent chars, for example, ab, abc, abcd etc.
			for(int j=i+1; j<str.length(); j++) {
				s = s + String.valueOf(str.charAt(j));
				if(dictionary.contains(s)) {
					result.add(s);
				}
			}
		}
		return result;
}

// test
public static void main(String[] args) {
    List<String> dictionary = Arrays.asList("abc", "kx",    
       "acd", "bc", "abj", "ab", "bcd");
  
System.out.println(getSubStringsFromDictionary("abcd", dictionary));
}
// output: [ab, abc, bc, bcd]

- code.runner September 20, 2016 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Build a suffix array on the dictionary entries.
Insert all the suffixes of the input string into the array and look at longest common prefixes they make with suffix array entries.
Complexity is linear in total length of dictionary entries.

- Dant3 October 29, 2016 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Build a suffix trie
DFS the suffix trie one character at a time and check of the path from root to current node (prefix of the current suffix) is present in the dictionary

- sanjogj43 November 17, 2016 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

C++ solution

This solution assumes substrings only consists of letters
next to each other. The strategy is to start with the full
string, and instead of a standard Trie search, when a string
is found, keep searching. Then keep chopping off the first
letter and restarting the search.

void getStrings(TrieNode* trie,
                const char* string, std::list<std::string>& strings)
{
    strings.clear();
    if (!trie)   return;
    if (!string) return;
    
    // Loop through the full string
    while (string[0])
    {
        // Get all sub-strings
        getAllStrings(trie, string, strings);

        // Chop off the left-most letter
        ++string;
    }
}

void getAllStrings(TrieNode* trie,
                   const char* word, std::list<std::string>& strings)
{
    if (!trie) return;
    if (!word) return;
    
    TrieNode* crawl = trie;
    
    // Search the entire word
    while (word[0])
    {
        int index = charToIndex(word[0]);
        
        // No more valid words possible
        if (!crawl->children[index]) return;
        
        crawl = crawl->children[index];
        
        // Add the sub-string if found
        if (crawl->leaf) strings.push_back(word);
        
        // Keep searching for more sub-strings
        ++word;
    }
}

Extra stuff needed for the solution above

#include <list>
#include <string>

const int NUM_LETTERS = (int)'z' - (int)'a' + 1;

struct TrieNode
{
    TrieNode* children[NUM_LETTERS];
    bool leaf;
    
    TrieNode() : leaf(false)
    {
      for (size_t i = 0; i < NUM_LETTERS; ++i)
        children[i] = NULL;
    }
};

int charToIndex(const char& ch)
{
    if ((int)ch >= (int)'a') return (int)ch - (int)'a';
    return (int)ch - (int)'A';
}

- Josh May 25, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Humm in wich way would a Trie would help to solve the problem?? As the writer mentions in the question what happens with the substrings starting at the middle of the string?? The only answer I can think of is to build a trie for each letter of the string but in that case it would be better to build all the substrings. Also what kind of information can you exploit using a prefix??. Using the string "cara" c and ca and cara might not be on the dictionary but car might be. You could try to do something smart like trying to get common prefixes on the words on the dictionary to try to remove unsuccessful tries as fast as possible but that doesn't change the complexity
Python code:

def find_all_valid_substrings(s, d):
	solutions = set()
	for x in xrange(len(s)):
		for y in xrange(x+1, len(s)):
			if s[x:y] in d:
				solutions.append(s[x:y])
    return solutions

- Fernando May 25, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

-1

of 1 vote

{package com.db.code.example;

import java.util.ArrayList;
import java.util.HashSet;
import java.util.List;
import java.util.Set;

public class ValidWords {
public static void main(String []args){
List<String> dictionary = new ArrayList<String>();
dictionary.add("abc");
dictionary.add("bc");
dictionary.add("xc");
dictionary.add("ac");
dictionary.add("a");

String str= "abcde";
Set<String> strSet1= new HashSet<>();
int length=str.length();
String [] strArr = new String[length];

Set<String> strSet2= new HashSet<>();
long startTime = System.currentTimeMillis();
for (int i = 0; i < str.length(); ++i) {
for (int j = 0; j < (str.length()-i); ++j) {
String subStr=str.substring(j, i + j + 1);
if(dictionary.contains(subStr)){
strSet2.add(subStr);
}
}
}

for(String strValue: strSet2){
System.out.println(strValue);
}
}

}
}

- Ramkrishna July 28, 2016 | Flag Reply

Comment hidden because of low score. Click to expand.

-1

of 1 vote

void findsubString(int char[] arr, set Dictionary)
{
int temp=0;
int n=arr.length();
for(int i=0;i<n-2;i++)
{
for(j=i+1;j<n-1;j++)
{
if(dictionary.contains(arr.substring(i,j))) temp++;
}
}
System.out.printLn(temp);
}

CareerCup

Google Interview Question for Software Developers

Books

Videos

Resume Review

Mock Interviews