Google Interview Question
Software Engineer / DevelopersCountry: United States
Interview Type: Phone Interview
This is the first thing I thought off and I am pretty sure this is the simplest and the most efficient way.
Let's think about it's complexity:
Let length of input string = m
Then 1. is O(m) time and O(m) space
Let the number of words in the dictionary be = n
Let the length of the longest word in the dictionary be l
2. is O(n * l)
Total O(m + n*l) = O(n) Linear time
Let me know if you need more clarification or find any problems with this approach.
Initially we need to preprocess the dictionary. For every string in dictionary we need to sort it and store it as a key and value as original string in Hash Table.
Sort the given characters.
Let's assume that you have a function isWord(w), which checks if w is a word using a dictionary. Let's for simplicity also assume for now that you only want to know whether for some word w such a splitting is possible. This can be easily done with dynamic programming.
Let S[1..length(w)] be a table with Boolean entries. S[i] is true if the word w[1..i] can be split. Then set S[1] = isWord(w[1]) and for i=2 to length(w) calculate
S[i] = (isWord[w[1..i] or for any j in {2..i}: S[j-1] and isWord[j..i]).
Finally return longest string.
Time complexity nlogn
Correct me If I am wrong.
Find set of the input alphabet.
Find set of dictionary word.If both are same then check length of the word. Doesn't not solve the problem.
I assume that all characters from input must be in dictionary word for this to work.
If the dictionary is given as a set of Strings, then we can just iterate over the dictionary and check whether that particular word can be formed with the given letters. If it is longer than the currently longest such word, then we store the word. At the end we can then return the longest such word.
Runtime is O(n + m*l) where n is the length of the whole dictionary (as if it were one string), m is the number of words in the dictionary and l is the length of the letters string.
Space complexity is O(w) where w is the length of the longest word in the dictionary.
public String getLongest(Set<String> dict, String letters) {
String result = "";
for (String s : dict)
if (s.length() > result.length() && isWordOfLetters(s, letters))
result = s;
return result.length() == 0 ? null : result;
}
private boolean isWordOfLetters(String s, String letters) {
Map<Character, Integer> occ = new HashMap<Character, Integer>();
for (char c : s.toCharArray())
if (!occ.containsKey(c))
occ.put(c, 1);
else
occ.put(c, occ.get(c) + 1);
for (char c : letters.toCharArray())
if (occ.containsKey(c))
occ.put(c, occ.get(c) - 1);
for (int i : occ.values())
if (i > 0)
return false;
return true;
}
You sure your code works ?
You might want to loop over letters before looping over s.
Or better yet (optimization), you should do the loop over letters once in the outter function, and reuse it in the inner function:
public String getLongest(Set<String> dict, String letters) {
// get letter counts for the input/letters string
Map<Character, Integer> letocc = new HashMap<Character, Integer>();
for (char c : letters.toCharArray())
if (!letocc.containsKey(c))
letocc.put(c, 1);
else
letocc.put(c, letocc.get(c) + 1);
// check if each word is longer than previous longest valid word
String result = "";
for (String s : dict)
if (s.length() > result.length() && isWordOfLetters(s, letters, letocc))
result = s;
return result.length() == 0 ? null : result;
}
private boolean isWordOfLetters(String s, String letters, Map<Character, Integer> letocc) {
Map<Character, Integer> occ = new HashMap<Character, Integer>();
//count occurrences of letters in word 's' from dictionary
for (char c : s.toCharArray())
if (!occ.containsKey(c))
occ.put(c, 1);
else
occ.put(c, occ.get(c) + 1);
//if any letter in input string doesn't exist or exists in less number
//in the word s, then the word doesn't pass our test
for (char c : letters.toCharArray())
if (!occ.containsKey(c) || occ.get(c)<letocc.get(c))
return false;
return true; //word has all the characters of input string (letters)
}
S O U N D W A V E, thanks for the comment! Yes, I'm sure my code works, I've tried it with several inputs and it always returned the correct result.
You are right, the optimization is worth doing. Although the problem with your approach is that you actually lose all of the optimization you're doing. If you look at your code, you precompute the occurrences of the letters, yet you still iterate over all the letters in the isWordOfLetters function. If you want the optimization to actually take effect, you should only iterate over the EntrySet of letocc.
Also, I think your code would not work this way, because if there is a letter in letters, which does not appear in the given word, then you return false... In this case you shouldn't return false, because the word we are looking at does not necessarily contain all the letters of the given letters. In fact, you even return false if a letter occurs in the word but less times than in the given letters. This would also be a valid case, yet you return false.
The question asks: "find the longest word that only uses letters from the string", but it does not say that it has to use all letters.
@ Zoli,
I think u may make a mistake in your code. When you check the hashtable value:
for (int i : occ.values())
if (i > 0)
return false;
return true;
I think you should return false when the hashtable value is smaller than 0. That means the word in dictionary contain the char which is not in the string letters
Please let me know if this approach is possible:
Let's say the dictionary is a hashmap.
Permute all the combinations of words possible using the characters in the string.
Sort by length and start checking if the word exists in the dictionary.
Permutating through all combinations of letters of different length will be a very high number. I am sure there will be a better way to do it. Lets wait.
Assuming #letters in the given string is n
Approach 1:
1. Find and put all sustrings of string in array A. Run time : O(n^2)
2. Sort this array. Run time O(n^2 * log(n^2)
3. Starting from longest entry in this array, check if present in dictionary. A match will give the ans. Run time O(n^2) assuming lookup time is O(1)
so total run time = O(n^2 * log n^2)
Approach 2:
Assuming dictionary is implemented as trie.
For each char in the string S, walk its trie entry in dictionary, checking to see if the path we're walking in trie has characters present from the string.
E.g. Say input string = "abzd"
Dictionary looks something like this:
b
a b c ... z
so we know 'b' is present in string and in dictionary, we follow this entry
'ba' is present in dictionary, as seen from above trie, its also present in string and so on.
Run time: O(n^2)
@puneet - the letters in the substring can be in any order (since given is a list of letters, and we need to find words that uses only those letters irrespective of the order). So for you approach 1, finding all possible substrings would take more that O(n^2) in this case, I think.
we take 2-stage method:
1) break the input long string into substring by removing all letters not in the given letters. O(n) time complexity
2) now the task become: for each one of substrings, what's the longest valid word inside it.
this step should o(n^2) time complexity, any better way?
Actually, for a given string, and a dictionary, we could find all words inside the string very efficiently using Aho–Corasick string matching algorithm. It's O(n) algorithm with enhanced prefix tree, but, if I didn't know that in advance and can not figure it out my own during an interview, nobody will complain me, right:)
Initially we need to preprocess the dictionary. For every string in dictionary we need to sort it and store it as a key and value as original string in Hash Table.
Sort the given characters. And generate power set of given characters(We get 2 power n strings). Check each string in power set is available in hash table. if it available note it down string and it's length.
Finally return longest string.
Time complexity 2 power n
Its a good method. Only caveat I see is, with a large # of entries in dictionary, your hash table would take up a lot of space..
You should have asked for dictionary API.
1) If it is a HashMap<String, String>, then there is not much to do except for checking all pairs.
2) If it is a Binary Search Tree, and it supports ceil and floor, OR, it is a more sophisticated structure that can tell you given a string "abc" if there are any words having "abc" as prefix, then you could do a backtracking and search through the words which exactly contain your letters.
Rough idea:
static int max_length;
static String longest_word;
BacktrackingFunc(String prefix) {
if (!dictionary.containsPrefix(prefix))
return;
if (prefix.length() > max_length) {
longest_word = prefix;
max_length = prefix.length();
}
for (int i = 0; i < given_str.length(); i++) {
BackTrackingFunct(prefix + given_str.charAt(i));
}
}
By taking the second approach, you will at worst search for all the words in the dictionary. But that would be the case where the dictionary only contains words made of given letters.
To get an idea, consider the case where you have letters "a, b, c" and a very large dictionary.
Then, to find a word made of "a,b,c" and length "N" you will at worst call the dictionary 3^N times, but that is the case where all such combinations exists in the dictionary.
I have a linear algorithm.
Let's say we have a set of letter, S, and a list of dictionary, dict.
step
1 define a type freq, which is a array of 26 letters; counting the frequency of each letters in S and store them into S_freq; counting the frequency of each letter of each word in dict(i.e. store in dicct_Freq);
2 compare function: bool compare(freq a, freq b); return true if b is in a; this algorithm is constant with worse case 26 comparisons.
3 compare every freq in dict_freq with S_freq and stop when compare(S_freq, dict_freq[i]) returns true; then max = dict[i].size(); store the index in max_index;
4 go over the rest of list and compare when dict.size() > max; track the index of max;
5 return dict[max_index]
This can be done in O(nlogn) where n is the length of the search string ( or the characters to use)
The key is to store the dictionary that makes it amenable to do O(1) lookups. Dictionary should be stores as a HashMap where key is sorted unique set of characters and the value is a linked list of all words in the dictionary that uses all those characters. The linked list must always be in descending order of word lengths.
Lookup: Just sort input characters, lookup in the HashMap and return the 1st word in the linkedlist - O( time to sort chars )
Inserting a new word in the dictionary: Sort chars of the word, remove redundant chars, lookup the linked list in the HashMap and insert the new word at the right spot in the LinkedList (so that we preserve the descending world length criteria) - O (time to sort chars) + O(number of words in that bucket)
Code
public class SortedDictionary {
private final Map<String, List<String>> dictionary;
public SortedDictionary() {
dictionary = new HashMap<String, List<String>>();
}
public static SortedDictionary create(String[] words) {
SortedDictionary dictionary = new SortedDictionary();
for (String word : words) {
dictionary.add(word);
}
return dictionary;
}
public void add(String word) {
String sortedString = getSortedString(word);
List<String> words = dictionary.get(sortedString);
if (words == null) {
words = new LinkedList<String>();
words.add(word);
} else {
insertAtSortedLocation(word, words);
}
dictionary.put(sortedString, words);
}
private void insertAtSortedLocation(String word, List<String> words) {
int index = 0;
for (String str : words) {
if (str.length() <= word.length()) {
words.add(index, word);
return;
}
index++;
}
words.add(word);
}
private String getSortedString(String str) {
char[] sorted = str.toLowerCase().toCharArray();
Arrays.sort(sorted);
StringBuffer result = new StringBuffer();
boolean firstChar = true;
char lastChar = ' ';
for (char c : sorted) {
if (firstChar || c != lastChar) {
lastChar = c;
firstChar = false;
result.append(c);
}
}
return result.toString();
}
public String search(String letters) {
String sortedString = getSortedString(letters);
List<String> list = dictionary.get(sortedString);
if (list == null) return null;
return list.get(0);
}
}
Saving some space from a normal hash map implementation, but pretty much the same idea:
# assume some dictionary and string such as:
# dict = ["abbad", "dandelion", "z", "daabbadd", "qqqqqqqqqqqqq"]
# str = "dab"
longest = ""
key = longest_size = 0
str.split("").each {|s| key |= (1<<(s.ord - "a".ord))}
dict.each do |r|
v = size = 0
r.split("").each {|s| (v |= (1<<(s.ord - "a".ord))) && size+=1}
((v ^ key) == 0 && longest_size < size) ? ((longest = r) && (longest_size = size)) : next
end
puts longest
O(n) complexity without making a hash map. Exploits the fact that characters incrementally increase and the given characters in a string can therefore map to a 26 bit bit-mask.
Why is everybody talking about inserting things into hash tables? Maybe I don't understand the problem. Hash tables will give you a false negative when your target string has extra characters. Also, testing all of the permutations of the target string will be very expensive even at a cost of O(1) for testing them against the dictionary.
Maybe the key is that a dictionary usually doesn't have that many words in it compared to the number of permutations of a character set. 12! is about 479 million, but there probably aren't more than thousands of 12-letter words in the English language. Therefore, you probably want to test the dictionary against the target string, NOT permutations of the target string against the dictionary.
In fact, given today's L1/L2 cache architectures, I wouldn't be surprised if the brute force method is a very efficient way to do this - iterate through the dictionary, checking letters against the target string. On paper, sorting both dictionary and target strings will reduce comparison time from polynomial to linearithmic (where n is the size in characters of the dictionary). In reality, I'm not sure how much it helps with small strings.
To get to linear time, we could just construct a bit vector with a bit for each code point in the character set, set to true if the character exists in the target string. Things get even more interesting if you can assume a small alphabet - that bit vector fits in a register. Then a bitwise AND operation will tell you if the dictionary word can get its characters from the target string. This is still technically a linear solution, but I bet it's orders of magnitude faster. If you want to run comparisons against many target strings, you could preprocess the dictionary into a bit vector per word as well.
What about the following idea:
Use a HashMap that contains an Integer as a key and a Heap of Strings as the value. The key represents the used letters of the word, e.g.:
a = 1
b = 2
c = 4
...
abcaabcabc = 7
public class LongestWordOfString {
private static final Map<Character, Integer> _idMap = new HashMap<Character, Integer>();
private final Map<Integer, PriorityQueue<String>> _dict = new HashMap<Integer, PriorityQueue<String>>();
Comparator<String> stringLenComparator = new Comparator<String>() {
@Override
public int compare(final String o1, final String o2) {
if (o1 != null && o2 != null) {
// if the length is equal it doesn't matter which word will be returned
return o1.length() > o2.length() ? -1 : 1;
} else if (o1 != null) {
return -1;
} else if (o2 != null) {
return 1;
} else {
return 0;
}
}
};
static {
// support only letters a...z
final String tmp = "abcdefghijklmnopqrstuvwxyz";
for (int i = 0; i < tmp.length(); ++i){
_idMap.put(tmp.charAt(i), (int)Math.pow(2, i));
}
}
public LongestWordOfString(final String... dictInput) {
if (dictInput != null) {
for (int i = 0; i < dictInput.length; ++i) {
int id = determineWordId(dictInput[i]);
if (_dict.containsKey(id)) {
_dict.get(id).add(dictInput[i]);
} else {
final PriorityQueue<String> queue = new PriorityQueue<String>(10, stringLenComparator);
queue.add(dictInput[i]);
_dict.put(id, queue);
}
}
}
}
private final int determineWordId(String word) {
int retVal = -1;
if (word != null && word.length() > 0) {
word = word.toLowerCase(); // to support also upper letters for the determination of the ID
retVal = 0;
for (int i = 0; i < word.length(); ++i) {
// assuming only aA...zZ exists
retVal |= _idMap.get(word.charAt(i));
}
}
return retVal;
}
public String determineLongestWordOfStringLetters(final String inputString) {
String retVal = null;
int id = determineWordId(inputString);
if (_dict.containsKey(id)) {
retVal = _dict.get(id).peek();
}
return retVal;
}
}
...
@Test
public void test1() {
LongestWordOfString lwos = new LongestWordOfString("abcabc", "abc", "AAAAAAAAAAAAAAABC", "xzasdf");
System.out.println(lwos.determineLongestWordOfStringLetters("abc"));
// AAAAAAAAAAAAAAABC
}
The Heap is also removable if you only have to keep track of the longest String to a certain ID.
With this solution you can determine the word in constant time.
static List getLongest(String [] arr, String dictionary){
int [] dicCount = new int[256];
HashMap<Integer,List<String>> res = new HashMap<>();
int max = Integer.MIN_VALUE;
for(char c: dictionary.toCharArray()){
dicCount[c]++;
}
for(String s: arr){
if(s.length()<=dictionary.length()&&matches(dicCount,s)){
if(s.length()>max){
if(max!=Integer.MIN_VALUE) res.remove(max);
max=s.length();
res.put(s.length(),new ArrayList<String>());
res.get(max).add(s);
}
else if(s.length()==max){
res.get(max).add(s);
}
}
}
return res.get(max);
}
static boolean matches(int [] a,String target ){
int [] dicCount = Arrays.copyOfRange(a, 0, a.length);
for(int i =0 ;i<target.length(); i++){
--dicCount[target.charAt(i)];
if(dicCount[target.charAt(i)]<0) return false;
}
System.out.println(target);
return true;
}
static List getLongest(String [] arr, String dictionary){
int [] dicCount = new int[256];
HashMap<Integer,List<String>> res = new HashMap<>();
int max = Integer.MIN_VALUE;
for(char c: dictionary.toCharArray()){
dicCount[c]++;
}
for(String s: arr){
if(s.length()<=dictionary.length()&&matches(dicCount,s)){
if(s.length()>max){
if(max!=Integer.MIN_VALUE) res.remove(max);
max=s.length();
res.put(s.length(),new ArrayList<String>());
res.get(max).add(s);
}
else if(s.length()==max){
res.get(max).add(s);
}
}
}
return res.get(max);
}
static boolean matches(int [] a,String target ){
int [] dicCount = Arrays.copyOfRange(a, 0, a.length);
for(int i =0 ;i<target.length(); i++){
--dicCount[target.charAt(i)];
if(dicCount[target.charAt(i)]<0) return false;
}
System.out.println(target);
return true;
}
The issue with many of the explanations in this thread is that they assume you have access to the dictionary. For this question, the dictionary (list of strings) is wrapped in a class that contains only one public method, "Contains(string s)" to check if the string made up of some combination of letters exists in the dictionary. You can't preprocess the dictionary because you don't have access to it.
Algo:
- Amit. March 12, 20141. Put all chars of string in a hashmap
2. traverse word by word from dictionary and for each word check all letter presents in hastable.
3. keep track of longest string traversed.