Hi All, I attended MS intervi

Microsoft Interview Question for Software Engineer in Tests

0

of 0 votes

22
Answers
Hi All,
I attended MS interview loop this week, I was unable to get through, pissed off after seeing the beautifully madeup regret mail from the staffing consultant, I wanted to share a question and my answer...Beware of the answer as I am not selected:)

here it goes..

Given a text document as input with a set of strings, assume '\n' is the delimiter, print to the console strings grouped by anagrams.

vinay
naviy
inavy
tes
set
...

My answer:
1. sort all the strings and compare...interviewer was not happy with the time complexity of sorting all strings.
2. create a hashtable where the hashcode is generated using the characters in the string..I got the hash of each string by doing an Exclusive OR of each charcater in the string although i had the dbt if the Ex-OR of non anagrams can be same, this is the only other solution that came to my mind.

Any efficient ways of solving this??
- c# September 04, 2010 | Report Duplicate | Flag | PURGE
Microsoft Software Engineer in Test

Email me when people comment.

An error occurred in subscribing you.

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 0 vote

This question reduce to check for 2 strings whether they are anagrams of not

For Anagrams we can count the occurrence of each letter in the first string and check this with the other

- DashDash September 04, 2010 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

". create a hashtable where the hashcode is generated using the characters in the string..I got the hash of each string by doing an Exclusive OR of each charcater in the string although i had the dbt if the Ex-OR of non anagrams can be same"

Imagine that the alphabet only has two letters: 1 and 0,
Char representation: a bit for each character
Two words given:
10
100

Hash value:
1 XOR 0 = 1
1 XOR 0 XOR 0 = 1

The two words has the same hash value, but are not anagrams.

- lupuslupus September 04, 2010 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

in that case, just incorporate the lenght of the word into you hash code to avoid collision.

- Anonymous September 26, 2010 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

Num. input words: N
Maximum word length: M

OP's soultion:
1. Sort each word: O(M*logM*N).
2. Compare with each anagram group: O(M*N^2)
Total complexity, assuming that logM < N: O(M*N^2)

ibnipun10's solution:
1. Count occurance of each character: O(M*N)
2. Compare with each anagram group: O(N^2)
Total complexity, assuming that M < N: O(N^2)

However, if we assume that word lengths will be "reasonable" (say, always less than C characters)? If so, the complexity of the first solution becomes O(N^2). So I don't think it is clear which solution is the better. Thoughts, anyone?

- lupuslupus September 04, 2010 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

here u can use counting sort which run in O(n + k) where k = number of key which are 256 i.e. no. of ASCII values

string CountingSort(string str){
	int a[256];
	int size = 255;
	for(int i = 0; i < 256; i++){
			  a[i] = 0;
	}
	//count number of characters in string
	for(string::iterator it = str.begin() ; it != str.end(); it++)
		a[*it] += 1;
	
	int len = str.length() - 1;
	
	//rebuild string from back to keep sort stable 
	while(size--){
	
		if(a[size]){
			while(a[size]--){
			
				str[len--] = char(size);
			}
		}
	}
	 return str;
}

- sachin323 October 10, 2010 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

Problem apears in the book Programming Pearls. The solution there is pretty straight-forward:

1. "sign" the words, i.e. for each word the signature is itself sorted
2. order by signature
3. "squash" the structure to produce the anagrams.

The author suggests a pipeline like solution, so you could potentially paralelize this.

- b September 04, 2010 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

it sound good. we can maintain a hashtable, with the key as the word "sign" and value as the the anagrams.

Then for each string S
Find the sign of this string S_Sign
Insert it into the hashtable with (S_Sign, S)

Then we only need to iterator each hashtable pair
For each key
Print all the entries with same key

- Anonymous September 08, 2010 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

How is this solution different from the solution in the question, 'sign' being XOR of the characters of the string?

- Mahesh September 09, 2010 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

check my solution:
give to each letter in alphabet a unique prime number('a'->2,'b'->3,'c'->5,...)
then give to each word from input its code=product of prime numbers of this word's letters. then if 2 words are anagrams, their codes are equal, if not anagrams, the codes must be different too.
then sort by codes and output. O(N*logN).
am i right?

- maksdamir September 04, 2010 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

In theory it looks right, but will the anagram code fit into a long int?

- lupuslupus September 04, 2010 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

i was wrong. if we use english alphabet and have not case sensetive task, we need max prime number=101, it means 9 letter-words in worst case. not good, but we can use mathematics using string. i think it shouldn't increase the time-complexity

- maksdamir September 04, 2010 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

if we allow to use hash, we could reach time complexity of O(NMlgM), where N is the number of words and M is the number of characters in the longest word. Following is the abstract step.
First, define the signature of a word to be the word with it's characters being sorted, e.g. the signature of Microsoft is cfiMorst. Anagrams should have the same signature

1. Create a hash which the key is the "signature" of a word, the value is a list of words having the same signature. (We have to becareful of the case here, we can convert the word to lower/upper cases before generating the hash key)
2. Go through each word in the file, generate the signature and add to the hash table, if the key already exist, add to the list
3. Go through the hash table, print the words from the list of each hash key ( which should be grouped by anagrams)

The time complexity is bounded by O(NMlgM), memory complexity would be O(N)

- Natural Lam September 06, 2010 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Rather than sorting each string , we can compute the hashcode as , each character in string occupies 8 bits (or in some case 16 bits) , so we consider the string as a number in base 2^8 , and compute the value of string , say for e.g. for vinay ,the value is calculated as :- ('v'-'a')*2^(8*4) + ('i'-'a')*2^(8*4) + ('n'-'a')*2^(8*4) + ('a'-'a')*2^(8*1) + ('y'-'a')*2^(8*0) ;
and let this sum is represented by k
then k mod (2^8 - 1) gives unique code for all anagrams

- M.K. Aasawat September 25, 2010 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Rather than sorting each string to calculate its signature , we can compute the hashcode as , each character in string occupies 8 bits (or in some case 16 bits) , so we consider the string as a number in base 2^8 , and compute the value of string , say for e.g. for vinay ,the value is calculated as :- ('v'-'a')*2^(8*4) + ('i'-'a')*2^(8*4) + ('n'-'a')*2^(8*4) + ('a'-'a')*2^(8*1) + ('y'-'a')*2^(8*0) ;
and let this sum is represented by k
then k mod (2^8 - 1) gives unique code for all anagrams

- M.K. Aasawat September 25, 2010 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

- M.K. Aasawat September 25, 2010 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

read more about using trees to capture strings..this is a variety of the suffix tree..

you put each string in a tree, but before hand we'll sort each string, meaning, if you wanna insert bda, what you'll insert will be abd, o/c we'll cramp everything in.
this way, when you go down the tree, if you find an anagram it'll mean you're down the tree and found a \n or a null of some sort.
should be O(n*m) as each step means you have to reorder the current part of the string and go down the tree, thing is going down the tree..if we use count-sort then we don't even have to re-sort each time, just keep an array of size 26, each time incrementing the right cell.
going down the tree again and again makes it O(n*m), n being length of file, m being the max length of the largest anagram

- dark September 06, 2010 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

I'm loving how much bullshit your reply contains.

- Mahesh September 09, 2010 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

Anagram checking can be done in O(n). From the question I guess the anagrams in the same input file are supposed to be printed.

This be done by bucketing all strings per their length. i.e. for length 5, the linked list could contain vinay, naviy, inavy, there, using , ...

We'll compute the hash of each strings such that the hash function will be of O(n) where n is the length of the string. The has function can be as simple as sorting the characters. Yes we can sort the strings in O(n) by using a bitmap counter.

For ascii characterset we can take an array of 128 int and initialize to zero and increment the array[i]++ where i is the character encountered. once the pass is done, we can reconstruct the sorted string by passing the array again "there" becomes "eehrt". this can be compared to the other string's sorted values.

In Worst case however if the text file contains all words of the same length, then we'd need an extra space again of the same size of the file to store the sorted strings and then do a string compare using KMP algo O(n) again.

- Neo September 07, 2010 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

nlogn for smaller n's can still be faster than o(n) where n = 255

- J September 10, 2010 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

My bit: position insensitive on the fly hash code computation. This can be done by associating first 26 prime numbers(>2) to each char. Keep on multiplying the associated prime numbers from start position till delimitor. Using hashmap lookup for this hashcode value. If null is returned create a list(each element holding start and end index of same hashcode) and add the element(start position and end position) corresponding to current hashcode. If not null return value add the element to the returned list.
This will take time-o(n) and space-0(k)
Next step, iterate over the hashmap and print the elements of the list(value of key,value pair) together.
This will take time-o(k)
Overall: time-o(n) and space o(k).

Please correct me if i am wrong.

- Hinax September 17, 2010 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

My 2 cents

- Divide the problem into sub problems by using multiple hashing
a) By Length
b) By XOR or any hash function which is irrespective of the order of it.

- From each of minimal set, Prepare a list to display strings group by anagrams
For instance , S1, S2, S3, S4 ,S5 , S6 is a linear list containing three strings which would be appeared as
S1->S2->S6
|
S3
|
S4 ->S5
If S7 get hashed into this list, Compare the string and add it to the above list accordingly.

- Checking whether two words are anagrams or not.
a) Prepare a hash table of 26 character
b) Fill the hash table for both string with one +1 and other as -1
c) Sum of hash table as non-zero implies - Not an anagram else anagram

- Ankush Bindlish September 30, 2010 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

I think maybe part of the point is to come up with a good hashing scheme. If 256 characters or more are being used, then the hashing scheme should allow for "sparse" computation so there isn't an automatic constant like 256. Assigning a large (prime?) weight to each character and then letting a word's hash key be the sparse linear combination of characters seems reasonable.

- Peter May 20, 2011 | Flag Reply

CareerCup

Microsoft Interview Question for Software Engineer in Tests

Books

Videos

Resume Review

Mock Interviews