You have a list of 1 million d

Amazon Interview Question for Software Engineer / Developers

0

of 0 votes

24
Answers
You have a list of 1 million distinct English words. Each word is between 1 to 40 letters long and contains only alphabets, no space or special characters. The list is already sorted alphabetically.

Given a word shorter than 40 letters, find all words in this list that are only 1 letter different from this word, spelling order is not important.

There are 3 types of matches: 1. Swap one letter with another and you have an exact match 2. Remove one letter and you have an exact match and 3 Add one letter and you'll have an exact match. For example: Given the word "coverage", these are valid matches:

1. "converge". Swap 'n' with 'a'
2. "coverages". Remove 's'.
3. "overage". Add a 'c'.

What's the time complexity of your algorithm?

Can your algorithm handle the request to find words that are 2 letters different from the given word?
- chriscareercup November 12, 2012 in United States | Report Duplicate | Flag | PURGE
Amazon Software Engineer / Developer Algorithm

Email me when people comment.

An error occurred in subscribing you.

Country: United States

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 11 vote

here's a O(n*m) (n=no of words in the dictionary, m = max length of the word)complex solution with space O(52). Perform the following steps:
1. Take all the words from the lis with length same as the given word or +/- 1
2. Create a letter count array/map of the given word (for eg if the word is "coverage" then a=1, c=1, e=2, g=1, r=1, v=1) (say a[26] contains the letter count from a to z)
3. take each word from step 1. create its letter count array (say b[26]
4 for i = 1 to 26 sum += abs(a[i]-b[i]). each word with sum value = 1 or 2 will be the required word.

- The Artist November 12, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

This looks like a cool solution. I was trying to work something out using 3 regular expressions, but I think your method is the way to go.

However, it will fail for terms which are anagrams. As an example, if we search for "Dictionary", the word 'Indicators' will be counted as a word which needs only 1 letter changed to get an exact match.

How can we change your technique to account for this?

- Michael Howard November 12, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

@Michael - it is supposed to find "indicators" if input is "dictionary"

In the question, it is clearly mentioned that "spelling order is not important."

see my C# implementation below

- siva November 12, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

I see now. In that case, the method which you and The Artist have posted is definitely the correct solution.

- Michael Howard November 12, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

There is a small issue in the solution. If you are accepting words with sum = 2, you might end up taking words which are different by 2 letters also.

- Anjana November 13, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

I think your algorithm does not consider "The list is already sorted alphabetically."
Sorted list must benefit your algorithm.

- Eric November 13, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

@Anjana: sum=2 should be considered when the size of the dictionary word is same as the given string

@Eric: that could be a decoy. I did not find much help from the sorted list.

- The Artist November 14, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Couldn't get the part where sum=2...i mean if we find exact match, sum=0, differ by 1 then sum=1. How does sum=2 fit in. Please explain.

This might be a little dumb question but nonetheless, would be great if you would explain.

- Jas November 14, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

@Jas: consider the example given in the question where you convert 'coverage' to 'converge'. In that case the sum will be two. Basically here you are taking one character of the given string and replacing it with any other alphabet.

- The Artist November 15, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Step 4 is wrong.
Suppose we have words "a" and "b".
The arrays would be a[][]=[1][0] and b[][]=[0][1].
The result would be sum=0 which means that "a" and "b" are completely equals.

- sm.pavlenko January 29, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

I am not sure why this question would be asked(as you have to traverse the whole dictionary of million words). The better question to ask would be how would you design the dictionary so that you can print the words efficiently.
The answer to that would be :
Have your dictionary stored in key , value pair : key : count[26]array, value list of all strings that match (basically all anagrams will be in this string)
Now given word with count array G[]
There are the following three cases :
1) anagram of Given(G) + An extra alphabet
2) anagram - 1 alpha
3) change one alphabet in G
Now solution for above 3 cases :

1) for i = 0 to 25 : {G[i]++ ;  printlist(hash(G)); G[i]--; }
2) Take old G, not from 1) . for i = 0 to 25 : {if(G[i]) {G[i]-- ;  printlist(hash(G)); G[i]++; }
3) for i=0 to 25
{
	G[]= given;
	if(G[i])
	{
	G[i]--;
	for j=0 to 25 -> skip j=i;
	{
		G[j]++;
		printlist(hash(G))
		G[j]--;
	}
	G[i]++;
}

- Hill Billy March 09, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

and i think my above solution will work for any difference just the sum value need to be modified

- The Artist November 12, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

public List<string> FindWord(List<string> dictionary, string word)
{
List<string> output = new List<string>();

int[] charCount = new int[26];
char[] input = word.ToCharArray();

for (int i = 0; i < input.Length; i++)
{
charCount[input[i] - 'a']++;
}

int[] temp = new int[26];

foreach (string _word in dictionary)
{
if (Math.Abs(_word.Length - input.Length) == 1 || Math.Abs(_word.Length - input.Length) == 0)
{
charCount.CopyTo(temp, 0);

char[] _input = _word.ToCharArray();

for (int i = 0; i < _input.Length; i++)
{
temp[_input[i] - 'a']--;
}

int sum = 0;

for (int i = 0; i < temp.Length; i++)
{
sum += Math.Abs(temp[i]);
}

if (sum <= 2)
{
output.Add(_word);
}
}
}

return output;
}

- siva November 12, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

How about a backtracking solution? Lets say we build a trie of the one million words. lets say input word is of length n. We check all the branches and keep track of number of mismatches and if mismatch count is less than permitted we allow traversing the trie else we exit the branch and check another branch using backtracking.

This approach however checks for words with only same length. I am checking how this can be modified to words with different length.

backtracking(root, "coverage", 1);

backtracking(trie node,string input_word, int mismatch_allowed)
{
    if (mismatch_allowed < 0)
        return;
    if (input_word.length == 0)
    {
        print current node details (i.e. complete word till this node);
        return;
    }

    for character_branch in (all_branches_at_root)
    {
        if (character_branch == input_word[0])
        {
             // first char matches, check next chars
             backtracking(character_branch_node, "overage", mismatch_allowed);
         }
         else {
             // character mismatch reduce the count
             backtracking(character_branch_node, "overage", --mismatch_allowed);
         }
    }
}

- Anonymous November 12, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

I think the key to solving this problem is to use the information that the list of words is sorted (i.e.: you already have the word list preprocessed to help you with the query).

Consider (for complexity computation):
N - number of words in the word list (max. 1 million)
L - size of a word (max. 40)
A - size of the alphabet (= 26)

For 1 letter distance you can use the following algorithm:
1. Generate all the possible words that are 1 letter distance away from the query word
- this has the complexity O(L*A)
2. Look up each of the generated words in the word list using a binary search:
- the look up of an individual word is O(L*logN)
- we have O(L*A) lookups => final complexity is O(L^2*A*logN) which is roughly (not considering the hidden constant) about 832.000 operations which is better than O(L*N) which is roughly 40 million operations.

For distance = 2, this algorithm performs worse than the O(L*N) version.

- cristian.botau November 13, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

i think you missed the point in the question where it says that the order does not match which can result upto a max of (40!)/(2^14) possible combinations which is humungous.

- The Artist November 14, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Yeah, my bad :)

Although, if the order doesn't matter, I don't see how the fact that the list is sorted may be helpful.

- cristian.botau November 14, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

The question is more interesting and of practical use if the ordering of letters is important.

- Vikas November 19, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

@anjana

In the algo presented by The Artist

a minor modification is needed such that it doesnt take into account of the words such as (sumit with sumitas)

what we can do is make the changes in that space i.e O(26)
for example the word sumit has to be searched
so s-1,u-1,m-1,i-1,t-1

when the word sumita is encountered all the fields in O(26) is zero except a which is -1,
so this is our word.

now if we take sumi , then only value corresponding to field t is -1.

but if we take a word such as samit then field corresponding to u would be 1 and a would be -1,

in a iteration corresponding to the word length there can be only a pair of 1 & -1;
we wont consider if there is 1&1 or -1 & -1 which might arise in the example of (umi or sumitaf) which would be the case for difference in length corresponding to 2.

- kumsaha November 21, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

-1

of 1 vote

We can solve the problem by the approach artist suggested. But we can reduce the time complexity from O(mn) to O((number of words with size+number of words with size-1+number of words with+1))m.

Simply apply the binary search on the list given to find the first occurance of the word with size-1, It would take O(logn)(i.e. log one million= approximately 20). From that index onwards you can traverse the list to until you reach a word with size+2. Since the list is already sorted, We don't need to traverse the entire list.

- Eswar November 13, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Sorry for my ignorance. It doesn't work. I just confused of alphabetical sorting

CareerCup

Amazon Interview Question for Software Engineer / Developers

Books

Videos

Resume Review

Mock Interviews