Google Interview Question
Software DevelopersCountry: United States
Interview Type: In-Person
package com.db.code.example;
import java.util.ArrayList;
import java.util.HashSet;
import java.util.List;
import java.util.Set;
public class ValidWords {
public static void main(String []args){
List<String> dictionary = new ArrayList<String>();
dictionary.add("abc");
dictionary.add("bc");
dictionary.add("xc");
dictionary.add("ac");
dictionary.add("a");
String str= "abcde";
Set<String> strSet1= new HashSet<>();
int length=str.length();
String [] strArr = new String[length];
Set<String> strSet2= new HashSet<>();
long startTime = System.currentTimeMillis();
for (int i = 0; i < str.length(); ++i) {
for (int j = 0; j < (str.length()-i); ++j) {
String subStr=str.substring(j, i + j + 1);
if(dictionary.contains(subStr)){
strSet2.add(subStr);
}
}
}
for(String strValue: strSet2){
System.out.println(strValue);
}
}
}
You could use Ukkonen's Algorithm to build the suffix tree in linear time, but I would start with this O(n log n) approach (and maybe fail the interview, but what the heck...)
package main
import "fmt"
var (
ctr int
dict map[string]bool = map[string]bool{
"anti": true,
"dis": true,
"disestablish": true,
"establish": true,
"establishment": true,
"establishmentarian": true,
"establishmentarianism": true,
"ment": true,
"ism": true,
}
)
func main() {
m := substrings("antidisestablishmentarianism")
for k, _ := range m {
if dict[k] {
fmt.Println(k)
}
}
fmt.Printf("%d operations\n", ctr)
}
func substrings(s string) map[string]struct{} {
m := map[string]struct{}{}
substringsR(s, m)
return m
}
func substringsR(s string, m map[string]struct{}) {
if _, ok := m[s]; ok {
return
}
ctr++
m[s] = struct{}{}
if len(s) > 1 {
substringsR(s[1:], m)
substringsR(s[0:len(s)-1], m)
}
}
Assuming that it is not a null string, and is single word not a sentence. we can convert it in the given string to character array and later have two for loops and pick up each letter and combine with other letters from character array ans check whether they exist in dictonary and if yes store them and later print the stored values.
do we just need to find all sub strings from string that are present in dictionary, if yes then it is really simple. Am i missing something here ?
public List<string> FindAllSubstrings(List<string> dict, string str)
{
List<string> output=new List<string>();
foreach (var currStr in dict)
{
if (str.IndexOf(currStr) > -1)
output.Add(currStr);
}
return output;
}
Brute force: represent the dictionary as a hash set. Search every substring (i,j). Time complexity O(n^2) where n is the string length.
You can't do any better than the brute force in worst case complexity, because each substring can potentially be in the dictionary, so you have to at least print every one of them and that's O(n^2).
However, representing the dictionary as a trie tree improves the run time much more in common situations. You search the trie starting at each position i in the string, and it lets you terminate early. Run time is O(nm) where m is the length of longest word in dictionary. This is better than brute force if m << n.
Given a string and a dictionary, returns all substrings of the string that exist in the dictionary. Time complexity is O(n^2)
public static List<String> getSubStringsFromDictionary(final String str, final List<String> dictionary) {
List<String> result = new ArrayList<String>();
for(int i=0; i<str.length(); i++) {
String s = String.valueOf(str.charAt(i));
// check for each characters, for example, a is substring of abc
if(dictionary.contains(s)) {
result.add(s);
}
// check for subsequent chars, for example, ab, abc, abcd etc.
for(int j=i+1; j<str.length(); j++) {
s = s + String.valueOf(str.charAt(j));
if(dictionary.contains(s)) {
result.add(s);
}
}
}
return result;
}
// test
public static void main(String[] args) {
List<String> dictionary = Arrays.asList("abc", "kx",
"acd", "bc", "abj", "ab", "bcd");
System.out.println(getSubStringsFromDictionary("abcd", dictionary));
}
// output: [ab, abc, bc, bcd]
C++ solution
This solution assumes substrings only consists of letters
next to each other. The strategy is to start with the full
string, and instead of a standard Trie search, when a string
is found, keep searching. Then keep chopping off the first
letter and restarting the search.
void getStrings(TrieNode* trie,
const char* string, std::list<std::string>& strings)
{
strings.clear();
if (!trie) return;
if (!string) return;
// Loop through the full string
while (string[0])
{
// Get all sub-strings
getAllStrings(trie, string, strings);
// Chop off the left-most letter
++string;
}
}
void getAllStrings(TrieNode* trie,
const char* word, std::list<std::string>& strings)
{
if (!trie) return;
if (!word) return;
TrieNode* crawl = trie;
// Search the entire word
while (word[0])
{
int index = charToIndex(word[0]);
// No more valid words possible
if (!crawl->children[index]) return;
crawl = crawl->children[index];
// Add the sub-string if found
if (crawl->leaf) strings.push_back(word);
// Keep searching for more sub-strings
++word;
}
}
Extra stuff needed for the solution above
#include <list>
#include <string>
const int NUM_LETTERS = (int)'z' - (int)'a' + 1;
struct TrieNode
{
TrieNode* children[NUM_LETTERS];
bool leaf;
TrieNode() : leaf(false)
{
for (size_t i = 0; i < NUM_LETTERS; ++i)
children[i] = NULL;
}
};
int charToIndex(const char& ch)
{
if ((int)ch >= (int)'a') return (int)ch - (int)'a';
return (int)ch - (int)'A';
}
Humm in wich way would a Trie would help to solve the problem?? As the writer mentions in the question what happens with the substrings starting at the middle of the string?? The only answer I can think of is to build a trie for each letter of the string but in that case it would be better to build all the substrings. Also what kind of information can you exploit using a prefix??. Using the string "cara" c and ca and cara might not be on the dictionary but car might be. You could try to do something smart like trying to get common prefixes on the words on the dictionary to try to remove unsuccessful tries as fast as possible but that doesn't change the complexity
Python code:
def find_all_valid_substrings(s, d):
solutions = set()
for x in xrange(len(s)):
for y in xrange(x+1, len(s)):
if s[x:y] in d:
solutions.append(s[x:y])
return solutions
{package com.db.code.example;
import java.util.ArrayList;
import java.util.HashSet;
import java.util.List;
import java.util.Set;
public class ValidWords {
public static void main(String []args){
List<String> dictionary = new ArrayList<String>();
dictionary.add("abc");
dictionary.add("bc");
dictionary.add("xc");
dictionary.add("ac");
dictionary.add("a");
String str= "abcde";
Set<String> strSet1= new HashSet<>();
int length=str.length();
String [] strArr = new String[length];
Set<String> strSet2= new HashSet<>();
long startTime = System.currentTimeMillis();
for (int i = 0; i < str.length(); ++i) {
for (int j = 0; j < (str.length()-i); ++j) {
String subStr=str.substring(j, i + j + 1);
if(dictionary.contains(subStr)){
strSet2.add(subStr);
}
}
}
for(String strValue: strSet2){
System.out.println(strValue);
}
}
}
}
if the string is short enough, we could just iterate over all substrings (n^2 substrings) and check the membership in the dictionary.
- zd July 28, 2016if the string is too long, we should build the trie like you mentioned