Amazon Interview Question
Software Engineer / DevelopersStore all the words in a trie or hashtable so that you can easily find out whether a word is valid.
Then it's just a shortest path problem on an unweighted graph. BFS will do.
If word a can be transformed to word b with only one operation, then there's an edge between a and b.
It's not necessary to test every pair of words.
There're only O(3*len) words which can be formed by changing a word with length len.
can u pls give some code for making this graph......
i always end up in inefficient codes in such type of graphs.
If u can give some code..then it would be very helpful for me for other questions also......
The Algorithm
Steps
Step Description
1 Set n to be the length of s.
Set m to be the length of t.
If n = 0, return m and exit.
If m = 0, return n and exit.
Construct a matrix containing 0..m rows and 0..n columns.
2 Initialize the first row to 0..n.
Initialize the first column to 0..m.
3 Examine each character of s (i from 1 to n).
4 Examine each character of t (j from 1 to m).
5 If s[i] equals t[j], the cost is 0.
If s[i] doesn't equal t[j], the cost is 1.
6 Set cell d[i,j] of the matrix equal to the minimum of:
a. The cell immediately above plus 1: d[i-1,j] + 1.
b. The cell immediately to the left plus 1: d[i,j-1] + 1.
c. The cell diagonally above and to the left plus the cost: d[i-1,j-1] + cost.
7 After the iteration steps (3, 4, 5, 6) are complete, the distance is found in cell d[n,m].
Example
This section shows how the Levenshtein distance is computed when the source string is "GUMBO" and the target string is "GAMBOL".
Steps 1 and 2
G U M B O
0 1 2 3 4 5
G 1
A 2
M 3
B 4
O 5
L 6
Steps 3 to 6 When i = 1
G U M B O
0 1 2 3 4 5
G 1 0
A 2 1
M 3 2
B 4 3
O 5 4
L 6 5
Steps 3 to 6 When i = 2
G U M B O
0 1 2 3 4 5
G 1 0 1
A 2 1 1
M 3 2 2
B 4 3 3
O 5 4 4
L 6 5 5
Steps 3 to 6 When i = 3
G U M B O
0 1 2 3 4 5
G 1 0 1 2
A 2 1 1 2
M 3 2 2 1
B 4 3 3 2
O 5 4 4 3
L 6 5 5 4
Steps 3 to 6 When i = 4
G U M B O
0 1 2 3 4 5
G 1 0 1 2 3
A 2 1 1 2 3
M 3 2 2 1 2
B 4 3 3 2 1
O 5 4 4 3 2
L 6 5 5 4 3
Steps 3 to 6 When i = 5
G U M B O
0 1 2 3 4 5
G 1 0 1 2 3 4
A 2 1 1 2 3 4
M 3 2 2 1 2 3
B 4 3 3 2 1 2
O 5 4 4 3 2 1
L 6 5 5 4 3 2
Step 7
The distance is in the lower right hand corner of the matrix, i.e. 2. This corresponds to our intuitive realization that "GUMBO" can be transformed into "GAMBOL" by substituting "A" for "U" and adding "L" (one substitution and 1 insertion = 2 changes).
int LevenshteinDistance(char s[1..m], char t[1..n])
{
// d is a table with m+1 rows and n+1 columns
declare int d[0..m, 0..n]
for i from 0 to m
d[i, 0] := i // deletion
for j from 0 to n
d[0, j] := j // insertion
for j from 1 to n
{
for i from 1 to m
{
if s[i] = t[j] then
d[i, j] := d[i-1, j-1]
else
d[i, j] := minimum
(
d[i-1, j] + 1, // deletion
d[i, j-1] + 1, // insertion
d[i-1, j-1] + 1 // substitution
)
}
}
return d[m,n]
}
if "valid words" restriction is not allowed then its simply a DP problem.
Otherwise, we need to keep track of the intermediate words and check them for validity. I think we can include this check in the dp itself...if the last word was a dictionary word then in all three operations, we just need to check if adding the next letter would make it valid or invalid.
So keep the dictionary in a trie...keep traversing the dictionary as u keep transforming the original string into the new string...at each point of adding a new letter, check if the next child in the trie is a valid word or not....
running time...same as edit distance
space complexity...trie+2D matrix
Levenshtein distance.
- S July 17, 2010