Yahoo Interview Question
Software Engineer / Developersi will start from finding the longest word in this dictionary. if it is n, then i start:
i=n;
j=n;
loop till found {
loop till found
try to find rectangle (i,j);
if cannot find then j--;
i--;
j=i;
}
how to find the rectangle size (i,j) is:
find all words with length i
find all words with length j
try to do some depth first search on this
I created a suffix tree ST for all words, then built a graph TG of all allowable transitions between 2 words.
[ wx] - word 1
[ wy] - word 2
would have an edge in TG for 1->2 if having word 1 and word 2 set out horizontally would have all legal suffixes for all vertical columns according to our ST.
Now we're done with preprocessing.
Maintain a queue of word candidate lists, seeded with all words
process queue element e = [wx, wy, ...] till done
if all cols of e are terminating suffixes according to ST, we have a candidate solution
now find all nodes n that could legally prepend our element (this is easy with our TG)
and queue them [n1 wx wy wz], [n5 wx wy wz]
worst case this would be very bad, in practice our constraints should prune solution space rather quickly for just the cost of our ST & TG.
This could be run of a machine with a good bit of ram.
1. Select a word from dictionary Say W[0...m]
2. Check if all W[0]...W[m] are valid prefixes in dictionary
3. Select another word from dictionary Say W2[0....m],
4. Check if all 2 letter prefixes are present in dictionary , if not, go back to step 3, if exhausted, go back to step 1
5. Repeat 3 and 4 with incremental prefix checks
6. output largest rectangle formed
It's late and I can't think all that clearly, but I think I'd tackle it as a graph problem. Set up the graph with a set of nodes, where each letter is a node. Then model the words as paths through the letters/nodes. This allows you to model all the words at once: starting with any given node, all paths that visit the node are crossings, where two words overlap and the. So each a potential for those words to intersect in the solution rectangle. The answers are all in this graph, you just have to get rid of the wrong ones. ;)
- wazzucougarneedsajob April 03, 2006From there you need to see which permutations of paths produce word-rectangles. I'm too tired to actually work it out any further, but I do believe this is the solution. It's a little like De Novo protein sequencing except that instead of looking for the "best superstring" of the given input strings, the solution is the "largest rectangle." You can read more on De Novo in the textbook for my bioinformatics class, "An Introduction to Bioinformatics Algorithms" by Jones and Pevzner. It's pretty good, but it's also good to have a teacher.