Google Interview Question
Software Engineer / DevelopersCountry: china
Interview Type: In-Person
Some optimizations depending on how the dictionary is stored:
- Keep track of the max word length on a page of dictionary (internal node in a trie like structure)
- Bloom filter for the intersection of letters that occur on a page of dictionary (internal node in a trie). If the given word and bloom filter have commonality, then skip the subtree.
excellent way, per word we'll however need uint64_t which should be fine i guess but what about point 2 of the question?
For point 2, I was thinking of sorting the dictionary by length (n*n). Then traverse it in a nested loop (n*n) in decreasing order of length to find the max length pair...
the 'overlap' detection using bitmask is nice.
however, for finding max product length, it is incorrect to traverse the dictionary by decreasing order of length and return the first pair that has no overlap.
By starting out with the longest words, you are assuming that both the final 2 words have long length. But it could be the case that your 'found' pair has one long word and one short word. If that is the case, a pair of 2 average-length words can beat a pair of a long word and a short word. For example, 5 * 5 > 10 * 2
my attempt:
since a lot of English words have common characters between them (e.g., a majority of English words have letter 'e'), it is too expensive to loop through the entire dictionary every time you try to find a non-overlapping pair.
Rather, we could pre-compute the dictionary and divide words into 'buckets'. The English alphabet has 26 characters, so each word can belong to one or more buckets (eg: 'red' belongs to bucket R, bucket E and bucket D).
When searching for non-overlapping pair, the outer for() loop can be all words in the entire dictionary, so O(N).
For each word in the dictionary, we run a second inner for() loop to try to pair it with the longest non-overlapping word. For this inner for() loop, we only have to consider words that do not belong to any of the buckets of the first word. For example, if first word is 'red', we look at words that do not belong to buckets R, E and D. That will reduce significantly time spent in the inner for() loop from O(n) to O(k), where k is the number of words that have no overlap with first word.
Repeat for all words in dictionary. At all time, keep track of max multiple and update it every time we find a pair's multiple that beats current max.
Assuming you have an Dictionary technically a map
For 1st case :
a. Take out all the keys.
b. for each key get the value from map. Recursively iterate through the key and value to find the pairs with no common letter.
For 2nd Case:
a. Compute totalLength = key.length * value.length;
b. Maintain one more map with key as totalLength and value as List of keys.
c. at the end dump the values of largest key.
For the love of god guys, big O notation is often a funny topic. It's cool to talk about, but it sometimes means nothing. For example, if you have a container of size 100 and you need some answer with it that can be done in O(n), it is sometimes fast enough (or even FASTER at times) to do a O(n^2).
This is a dictionary... 200k^2 isn't that big. I coded up a disgusting piece of code. It found an answer in a real dictionary in 0.9 seconds. Here is the answer:
Dictionary size: 235882
Words: chromochalcography uniquisitiveness
Product: 306
The code (of course):
bool valid_words(string const &w1, string const &w2)
{
if(w1.find_first_of(w2) == string::npos)
return true;
return false;
}
pair<string, string> find_two_words(vector<string> dict)
{
// sort copy of dictionary in order of descending size (biggest first)
sort(begin(dict), end(dict), [](string const &w1, string const &w2)
{
return w1.size() > w2.size();
});
// Find an answer
pair<unsigned, unsigned> ans; // indices into dict
int maxProduct = 0;
for(int i = 0; i < dict.size(); ++i)
{
if(dict[i].size()*dict[0].size() <= maxProduct)
return make_pair(dict[ans.first], dict[ans.second]);
for(int j = 0; j < dict.size(); ++j)
{
unsigned product = dict[i].size()*dict[j].size();
if(product <= maxProduct)
j = dict.size();
else if(valid_words(dict[i], dict[j]))
{
ans = make_pair(i, j);
maxProduct = product;
}
}
}
return make_pair(dict[ans.first], dict[ans.second]);
}
int main()
{
ifstream fin("dict.txt");
string word;
vector<string> dict;
while(fin >> word)
dict.push_back(word);
cout << "Dictionary size: " << dict.size() << endl;
auto biggest = find_two_words(dict);
cout << "Words: " << biggest.first << " " << biggest.second << endl;
cout << "Product: " << biggest.first.size()*biggest.second.size() << endl;
}
This is the most optimal solution at O(n * k). The assumption I take into account is that only one word is the longest value without specific characters (If there are two words then technically more than a single pair can be returned)
private static String calculate(List<String> dictionary) {
//a little bit more efficient to make this ahead of time
Character[] alphabet = new Character[26];
for (int i = 0; i < 26; i++) {
alphabet[i] = (char)('a' + i);
}
String[] longestWord = new String[26];
//Build the longest word records
for (String word: dictionary) {
HashSet<Character> characters = new HashSet<Character>();
for (char c: word.toCharArray())
characters.add(Character.toLowerCase(c));
int length = word.length();
for (int i = 0; i < alphabet.length; i++) {
if (!characters.contains(alphabet[i])) {
int longest = longestWord[i].length();
//Assumption there is one 1 word that is the longest value with out that character
if (length > longest) {
longestWord[i] = word;
}
}
}
}
String pair = "";
int maxProduct = 0;
//Find the pair:
for (int i = 0; i < longestWord.length; i++) {
String word = longestWord[i];
int length = word.length();
HashSet<Character> characters = new HashSet<Character>();
for (char c: word.toCharArray())
characters.add(Character.toLowerCase(c));
for (int j = i + 1; j < alphabet.length; j++) {
if (!characters.contains(alphabet[i])) {
String wordTwo = longestWord[j];
int product = length * wordTwo.length();
if (product > maxProduct) {
maxProduct = product;
pair = word + ":" + wordTwo;
}
}
}
}
return pair;
}
I am sorry. I made a mistake in my original code. I guess I should've verified it before submitting it. The issue I did was I looped through the values I calculated as the longest and then took the product together.
THIS IS INCORRECT.
What I should've done was loop through the dictionary again and take the product from the letters not present and calculate this out.
This still results in O(n), but it is really O(2 * n * k) instead of O(n*k).
Sorry next time I'll think that through.
Also since I didn't test this (just throwing it together for people who want a coding example) I should point out the above method would throw a null pointer exception because I did not initialize the indexes for the longestWord.
1. Sort words
2. Build a boolean 2D table where row corresponds to a letter and column to a word in the dictrionary. Mark an element[c][w] true if the word contains the word w contain letter c.
3. Iterate over each row of the 2D table to contruct, for each letter, a set of words that do not contain the letter.
4. Iterate over the disctionary words in decreasing order. For each word, find the intersections of the sets computed in step 3 that correspond to the letters of this word. Then, multiply the length of the current word with the largest word in the intersection. Remember the result if is greater than previous maximum.
5. Repeat step 4 until the length of the current word is smaller than the square root of the previous maximum.
The overall complexity is roughly O(nlogn + m * S), where m is the overall number of letters in the dictionary and S is the average size of a set computed in step 3.
import java.util.Arrays;
import java.util.HashSet;
public class GetBiggestNonSharedWordMultiple {
private final static int NUM_LETTERS = 26;
private static int charToIndex(char ch){
return Character.toLowerCase(ch)-'a';
}
public static int getBiggestMultiple (String [] words){
int n = words.length;
Arrays.sort(words); // O(n log n) ascending
boolean [][] table = new boolean[NUM_LETTERS][];
for(int i = 0; i < NUM_LETTERS; i ++) table[i] = new boolean[n];
// O(m); m overall num of letters in input
for(int w = 0; w < n; w++){
String word = words[w];
for(int i = word.length()-1; i >=0 ; i--){
table[charToIndex(word.charAt(i))][w] = true;
}
}
// O(k*n)
HashSet<String> [] absenceSets = new HashSet[NUM_LETTERS];
for(int i = 0; i < NUM_LETTERS; i ++) {
HashSet<String> abSet = new HashSet<>();
for(int w = 0; w < n; w++) {
if(!table[i][w]) abSet.add(words[w]);
}
absenceSets[i] = abSet;
}
// O(m * AvgSetSize)
int max = -1;
for(int w = n-1; w >=0 ; w--){
String word = words[w];
int wLen = word.length();
if(wLen*wLen < max) break;
if(wLen == 0){
if(max < 0) max = 0;
continue;
}
HashSet<String> currentSet = new HashSet<>();
currentSet.addAll(absenceSets[charToIndex(word.charAt(0))]);
for(int i =1 ; i < wLen ; i++){
currentSet.retainAll(absenceSets[charToIndex(word.charAt(i)) ]);
}
int largestSize = -1;
for(String otherWord: currentSet){
if(otherWord.length() > largestSize) largestSize = otherWord.length();
}
int mult = wLen * largestSize;
if(mult > max) max = mult;
}
return max;
}
public static void main(String[] args) {
String [] arr = { "hello", "world" ,"superb", "my", "mercury" };
System.out.println(getBiggestMultiple(arr));
}
}
This is my O(n*k) where n is number of words and k is longest length possible in the dictionary.
1. Make an array of 26 strings for each character with name longestWord[ ] .
2. Iterate through the entire dictionary and check for each character in that word if the string corresponding to that character(stored in array) is actually null or length is smaller than the current string. If yes, then update it to the current String.
3. Keep a variable max and Iterate through the dictionary again:
3a. Add all the characters of the current word to a HashSet <Character> restricted. Also, keep a variable currentMax.
3b. Iterate from char c='a' to 'z' and check if it that c is not in restricted. If it is not then get the String from longestWord corresponding to that char.
3c. Check if it is the biggest non-overlapping then update our currentMax.
3d. Now, check if the max<currentMax*currentWord.length() and if it is update our max and save the pair of max product.
4. Return this pair.
This works because for current string all the non-overlapping strings would not have the same character as the current and the product for the current string would be maximized only when the other string length is maximized. O(n*K+n*26)
And by the way, that .9 seconds includes reading in the dictionary. I can only imagine that is the majority of the time used. My first return statement (as I predicted) executes rather quickly. The theory doesn't matter. Even if this were an exponential complexity, you pretty much know from domain knowledge that it won't do more than a few hundred thousand loops.
Assuming the word is A-Z/a-z only, use a bitmap to set which letters it contains.
- Anonymous November 19, 2013e.g. ca => 000....101
bb => 000...010
Iterate over the words in decreasing order of length
for each pair of words, AND the bitmaps. Return the first pair that gives a 0 result.
This should be n*k + n*n