Facebook Interview Question
Software EngineersCountry: United States
You mention that you're looking for all possible strings. Are you actually interested in a solution that recovers all possibilities or in a solution that isn't exhaustive but that covers many possibilities? My first thought is to use frequency analysis associated with a given language (such as english) to determine which characters are infrequently used and to use those to determine words in the dictionary that contain them. Once those words are located one can take note of all of the other characters in those words and check to see if those characters are present in the string input thereby eliminating many common characters without the need to check through all possibilities using brute-force. There are issues with this approach, though, because infrequently used characters can be located anywhere in a word (albeit there are patterns that can be used to reduce the word checking necessary). But this type of approach may result in leaving some sentence possibilities out. Coupled with frequency analysis on letters that start words, etc. One can narrow down the possibilities quite a bit. However, some word combinations may not surface. Also, is grammar a consideration here? Would you like the produced strings to be grammatically correct?
I noticed you mention the task is to return all possible strings in your post. However, do you actually mean that all possibilities should be printed? At first glance, what comes to mind when looking at this problem is frequency analysis. Keeping track of characters in the language and the frequency with which they occur. When one receives the scrambled string they could store all of the characters and their associated count in a dictionary. Using a list-like structure, they could then iterate from the least frequently used to the most frequently used characters in the language. If those characters are present in the string, the language dictionary can be referenced for words that start with or contain those infrequently used characters. Using this approach one can decrease the total number of words that need to be checked because those infrequently used characters will, obviously, be surrounded by additional characters that will be more frequent in the language eliminating the need to brute-force frequently used characters which would require analyzing huge numbers of words and combinations. To increase the probability of correctness, one could incorporate additional frequency tables such as one that indicates characters that start words, frequently used words (such as "the," "and,"), etc. Also, does grammatical correctness factor in here? If so, that can be further used to eliminate word combinations that don't make sense and, therefore, to further reduce the total complexity of the solution.
There are 2 problems here:
1) Given a string and dictionary of words, add spaces to create sentence with words.
2) Create all combinations of string and check if a sentence can be formed using dictionary as explained in #1.
#1 is straightforward can be implemented in O(N).
#2 is n^n, how to optimize it.
One solution is to create suffix tree and discard all strings which can not be formed sentence.
For ex:
if Dictionary has words "java", "code", "here"
All possibilities which do not start with "j", "c" or "h" can be discarded.
Same logic cabe recursively applied to discard further possibilities and pass string to #1 only which can be converted to valid sentence.
Hope writing code should not be difficult.
I still think that we have some string X and should find all words from dictionaries that formed the same X where they make sentence S and and S is transformed to X ( transformation means - remove whitespace and shuffle random letters in each word) . So if there are word's anagrams they also could form some new S.
For example - one case for X = "Ajavdoeceehr" is java, code , here is hava, code here are part of dictionary or could be some word dictionary for example Javdao ( shuffle to Ajavdo) and heeceer ( shuffle to eccehr)
pseudo code
{{
Hashtable<int, Vector<string>> hashTable;
//NOT base encoded vn * 2^32 + v(n-1) * 2^31 ..
int computeWeight(string word) {
int sum = 0;
for(char c:word) {
sum += (int) c;
}
return sum;
}
//prepare data set
for (string word in dictionary) {
int weight = computeWeight(word)
if (weight not in hashTable {
hashTable[weight] = new Vector<string>();
}
hashTable[weight].add(word);
}
string givenMessedString = "decovajapleexam"; //code java example
//pick individual letters, then combination of letters to make words
//d, e, c, .., de, co, dec, deco, decova
for (generatedstring from givenMessedString) {
int weight = computeWeight(generatedstring)
if (weight in hashTable) {
for (each word in hashTable[weight]) {
print "possible word is ", word
}
}
}
}}
My approach (brute force):
1. Find all possible string combinations [ie: get a List<List<string>>]
2. For each possible combination check whether every word in that combination is a valid dictionary word.
2. is trivial so I'll only talk about 1.
Let's have an example to motivate our work. Given an initial sentence of "a hi" this sentence is transformed to "aih". Since we have no idea where the word break(s) are the best we can do is generate all possible words.
aih
a ih
a hi
a i h
Note in writing this out we already can see a kind of recursive structure. Similar to generating all permutations, but enforcing an ordering upon permutations as we care about word breaks.
To better illuminate this take the (very!) related problem of trying to count the number of ways to count up to a number N. This problem is related because counting the number of ways to count up to a number N is an identical problem to finding the number of ways to slice a string of length N. To count up to 3 we do:
3
12
21
111
So roughly speaking we have:
count(3) = [[3], [2, count(1)], [1, count(2)]]
count(n) = [[n], [i, count(n-i)] for i in range(1,n)]
I say roughly because we see our count function is returning a list of lists and so the types don't match up in this recursion. [The correct solution is to iterate over the list of lists returned from count and append them separately]
Anyways I leave it as an exercise to the reader to do the actual implementation with the string form.
This can be solved with dynamic programming.
Given X of length 'K', A dictionary of 'N' words of average size 'M'.
Let Set S = {s1, s2, ... sn} be a set of all possible lengths seen in the dictionary. (of max size N, since there are N words).
Ways(i) = For every length 's' in S, for anagram of a dictionary word that X.substr(i-s, i) is , ways(i -s, i) + matching dict word.
Solve for i = 0 to i = K, with base cases being empty strings i < 0.
The steps to solve would be
1. Create a sorted dictionary map from SortedString -> List<Dictionary Word> O(N. M. LogM)
2. For i = 0 to K
a. Calculate and store ways[i] based on above formula.
b. This can be done in N. M LogM time. (For each length, max N, sort the substring and compare to dictionary)
c. Overall complexity of this loop would be K. N. M Log M
Space complexity = K.(No. of possible solutions). I'm not sure what an accurate on no. of possible solutions would be. A very conservative estimate is 2^K+1 but that doesn't take into account that most of those possible solutions would be invalidated by dict check.
public IEnumerable<string> RestoreText(string s, HashSet<string> words)
{
var dict = new Dictionary<string, string>();
int maxLength = 0;
var lengths = new HashSet<int>();
var sb = new StringBuilder();
foreach (var word in words)
{
var key = GetKey(word);
dict.Add(key, word);
maxLength = Math.Max(maxLength, word.Length);
if (!lengths.Contains(word.Length))
lengths.Add(word.Length);
}
var result = new List<string>();
RestoreText(s, dict, 0, maxLength, lengths, sb, result);
return result;
}
private void RestoreText(string s, Dictionary<string, string> dict, int start, int maxLength, HashSet<int> lengths, StringBuilder sb, List<string> result)
{
if (start == s.Length)
{
if (sb.Length > 0)
sb.Remove(sb.Length - 1, 1);
result.Add(sb.ToString());
sb.Append(' ');
return;
}
var list = new List<char>();
for (int i= 0; i < maxLength; i++)
{
if (start + i >= s.Length)
return;
list.Add(s[start + i]);
if (!lengths.Contains(i + 1))
continue;
var key = GetKey(list);
if (dict.ContainsKey(key))
{
var word = dict[key];
sb.Append(word);
sb.Append(' ');
RestoreText(s, dict, start + i + 1, maxLength, lengths, sb, result);
sb.Remove(sb.Length - word.Length - 1, word.Length);
sb.Remove(sb.Length - 1, 1);
}
}
}
private string GetKey(List<char> list)
{
list.Sort();
var sb = new StringBuilder();
foreach (var item in list)
sb.Append(item);
return sb.ToString();
}
private string GetKey(string s)
{
var arr = s.ToArray();
Array.Sort(arr);
return new string(arr);
}
Steps:
1. Iterate over the given String dictionary, sort each word, and add the sorted word as key and original word as List of values . Since sorted sequence of many words(car,arc) can be same , all those values are added to the same key(sorted string , acr) and values will be list of string(car,arc).
2. Iterate over the given string and keep building string of lengh 0,1 , 2.... for each such substring , sort it and check if it is in the map we constructed
3. If yes, the check for the left over String and result=no of elements(list size) in the values for the current substring as key *(multiply) no of elements(list size) in the values for the result obtained from left over String
4. If no, continue to build the string and repeat step 3
import java.util.*;
public class StringSegmentationShuffle {
static String[] dict = {"car","arc","hello","hi"};
public static int findPossibleCombinations2(String str, HashMap<String,List<String>> map){
if(str.length()==0){
return 1;
}
for(int i=1;i<=str.length();i++){
String subString=str.substring(0,i);
char[] subStringCharArray = subString.toCharArray();
Arrays.sort(subStringCharArray);
String subStringSorted = new String(subStringCharArray);
if(map.containsKey(subStringSorted)){
String leftOver = str.substring(i,str.length());
int result = findPossibleCombinations2(leftOver,map);
if(result != -1){
return map.get(subStringSorted).size() * result;
}
}
}
return -1;
}
public static void main(String[] args) {
String str ="ymcra"; // my car , my arc : each word is shuffled first individually ( my=>ym and car=> cta) and then added to string
HashMap<String,List<String>> map = new HashMap<String,List<String>>();
for(int i=0;i<str.length()-1;i++){
String s = dict[i];
char[] sCharArray = s.toCharArray();
Arrays.sort(sCharArray);
String sortedString = new String(sCharArray);
if(map.containsKey(sortedString)){
map.get(sortedString).add(s);
}else{
List<String> list = new ArrayList<String>();
list.add(s);
map.put(sortedString,list);
}
}
System.out.println(findPossibleCombinations2(str,map));
}
}
First, iterate over the given dictionary, sort the letters in each word, and add the sorted word and original word to a multi-map as key and value, respectively. This multi-map has an entry for each group of anagrams (with the sorted letters as the key for the group).
- gen-y-s February 21, 2016Then, iterate on the given string, and insert each letter into a sorted array of letters. After adding a letter into the sorted array, check if the multimap contains the a key equal to the current value of the sorted array. If so, then we have found matching word(s), so we call the function recursively on the rest of the given string, and add to the count of possible strings the returned value multiplied by the number of words matching the key. Finally, we continue the main loop iteration to add the next letter to the sorted array.
Example: "ymacr". We start with y, check multi-map, no match, then add m to sorted array, so we have "my" - this key exists in multimap with one word "my". We call recursively, with "acr" - add "a" no match, add "c" -> "ac" no match, add "r" -> "acr" this key exists in multimap with values "car", "arc" so we return 2. We mutiply the 1 for "my" with 2 returned in recursive call, so we add 2 to the count. Then we continue, by adding "a" -> "amy" no match, add "c" -> "acmy" no match, add "r" -> "acmry" no match. So final value of count is 2.