Capgemini Interview Question
Senior Software Development EngineersTeam: Chetu
Country: United States
Interview Type: Phone Interview
Edit distance algorithm is the perfect solution for this ....
distance(a, b){
//Create an empty matrix to hold distance.
m= size of a and
n = size of b
int[][] dist = new int[m+1][n+1];
//Fill the matrix with base values
for(i = 0 to n){
dist[i][0] = i;
}
for(i = 0 to m){
dist[0][j] = j;
}
int cost = 0;
for(i from 0 to n){
for(j from o to m){
if(a[i] == b[j]){
// Both have same character so no need of any modification
cost = 0;
} else {
//Both strings have different characters so it required modification - deletion, insertion or substitution.
//Each edit operation has cost 1, because in each operation we are changing only one character.
cost = 1;
}
//Update dist matrix with appropriate modification
dist[i][j] = minimun of ( cost+dist[i-1][j-1], //Substitution
cost+dist[i-1][j], //Deletion
cost+dist[i][j-1]); //Insertion
}
}
return dist[m][n];
}
Both substrings always need to have the same length so it's the more simple Hamming distance -- substitution only, no insertion or deletion. "Dissimilarity for each set S is measured by the number of index positions where characters of both strings do not match"
Also note that for any substring of one which is shorter in length than D, it's a valid set with every other same-length substring of the other. So the larger D is compared to the string lengths, the simpler the problem becomes.
Edit distance is a good solution. However, it needs both strings to be available. This problem receives a single string along with a number (measure of permissible dissimilarities). In case n=1, then we can expect maximum of str.length() sets, assuming every new string generated through replacing a character in str is a valid word. Hence, we need a structure to check for the validity of the newly generated word as well. Trie is good choice to build the dictionary as checking the validity of the string is in order of its length. In general, number of trials is in order of nCk where n is the length of the string and k is measure of dissimilarities. For instance, if the string length is 4 and measure of dissimilarity is 2 then 4!/((2! * (4-2)!))=6 possibilities that are required to be checked against the dictionary to examine their validity.
edit distance algorithm
- Prakash October 29, 2013en.wikipedia.org/wiki/Edit_distance