there are the k nos of set of

Google Interview Question for Software Engineer in Tests

0

of 0 votes

16
Answers
there are the k nos of set of words of size n each. write an algo to find the no. of set of anagrams in the given words?
- blaze September 06, 2010 | Report Duplicate | Flag | PURGE
Google Software Engineer in Test Algorithm

Email me when people comment.

An error occurred in subscribing you.

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 1 vote

There are two solutions that i can think of..

i) O(nlogn) - Sort each word alphabetically and store them in a hash table with key being the sorted word and value being the frequency.

ii) O(n) - Tag each alphabets with the first 26 primes. For each letter in each word take the product of its corresponding prime. Store the final product along with the frequency as the key and value respectively in a hash table.

- Anon September 07, 2010 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

The second solution is good.

- DashDash September 10, 2010 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

The second approach sounds good. But can't we go for adding the unique prime number generated for the each alphabet rather than multiplying. Addition will never produce same key value for different words as we are given the size of each word is 'n'. Also it can handle strings of much larger length than in case of multiplication. So the complexity will be O(n) since K is some constant and not a function of 'n'. Correct me if I am wrong guys.

- Anoop September 10, 2010 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Nope, addition of primes won't work. Try a simple example:

a -> 2
b -> 3
c -> 5
d -> 7
e -> 11
f -> 13

What do "bf" and "ce" compute to under addition?

Multiplication works because of Unique Factorization. But I don't like it because it does more work than needed, and for long enough strings you might require big number operations. If you think about it, you are really just using primes as placeholders; the exponents of those primes actually give you the count for each letter encountered. So why not simplify things and just keep an array of letter counts? Then you can push each array into a set and count the set at the end. Or if you are more comfortable dealing with big integers just think of the array as a big integer with each count being multiplied by a power of 2 depending on the bit-width of a count variable. Heck, you can even do the original prime thing (but in a faster way) by raising each prime by the associated count and multiplying together. That way, you'll be doing mostly additions.

- Bullocks September 11, 2010 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Yaa.. I get you. Considering the prime number multiplication approach into account, in order to avoid a higher product value which can't be handled, probably we can do a preprocessing step were the prime number associated with each letter can be based on the frequency of occurrence of that alphabet in English language rather than assigning them from a-z sequentially. So most frequently occurring letter will get minimum prime number "2" and so forth. In this way we can handle the product value explosion to a certain level and also make sure to handle longer strings than in previous case. So the complexity will be maintained as O(n).

- Anoop September 11, 2010 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

it looks like map-reduce code
for each set
generate
<anagram, {setofwords}> using map
end

//Reduce
Take global map;
for each <anagram, {setofword}>
if(exists GMap<anagram>)
merge GMap<anagram>, {setofwords>
else
insert GMap<anagram, {setofwords}>
end

GMap will hold answer.

- Anonymous September 06, 2010 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Just because this is a Google interview question doesn't mean you have to pander to the interviewer with a meaningless map-reduce blurb. In fact, the above code will probably get you nowhere, and the interviewer will make you write the proper low-level code, meaning that you will have wasted precious time from your 45 minutes. When you get into Google, then you can talk all you want about map-reduce. But in the meantime, stick to the basics and answer to the intent of the question, which is essentially: what signature to use for identifying anagrams.

- Bullocks September 11, 2010 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

What is the best algo to write a Anagram method?

- Anonymous September 06, 2010 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

If there is set of words like and, nda, dna. there in final set of anagram will be look like

and--nda,dna
nda--and,dna
dna--and, nda

Is this what expected in this question??

- O(?) September 06, 2010 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

But sorting k word means O(k*nlogn) then matching each two would be O(k2* n).
I don't think google is expecting that much high complexity soln.
But the 26 prime solution is good.But what about the product generated. for 10 letter word it can go up to 21 digit no.

- Anonymous September 07, 2010 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

I think multiplication of prime number is just myth, as one can't take multiplication as O(1) op. O(m^1.8) where m = log(n), n is largest multiplication needed to be done.

To use prime numbers, upper bound on number will be 26*log26~ 32(5bits best case) (26 primes, need to go upto 26*log26).

- Anonymous September 08, 2010 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

You can take any operation you want as the basic unit for evaluating complexity, as long as it makes sense for the situation you are trying to shed light on. These days people don't usually distinguish mult-op from other arithmetic operations performed by the CPU, there just isn't a whole lot of difference. In the past it made sense to do so (e.g. on Intel 286s) when division and multiplication took up many more cycles. But what matters still is distinguishing between big number multiplication and big number addition. The former is way more expensive, so you'd want to count those two separately.

And just a reminder that 64-bit CPUs are common today, so your 32-bit estimate is a bit conservative.

- Bullocks September 11, 2010 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

Generation of key:
Given size of alphabet is 26.
Take a "int" and and use only 26 bits
Ex. If lets say word is "aba"
sets bit for
'a' -> set first bit in key,
'b' -> set second bit in key
..

It would gernate a unique key, and in case of anagrams keys would collide

- nkumar September 15, 2010 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

"aaaaa" and "aa" ?

- kilotaras September 24, 2010 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

How about sorting it twice ....

Initially sort all the words .. Hence in k*n lg n time you have sorted everything .. and then after that sort these lists again ... Hence the words which are anagrams will automatically come together ...

For the second part .. sorting will just require k pointers which can be moved forward ...

For each of the k pointers which match we then have to recursively minimize the set... and keep on partitioning ...

Any comments, flaws?

- Anagram September 25, 2010 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

First of all, the question is still not clear to me, I am guessing that it's this:

Given a set of k words, each of size n, find sets of anagrams in that set.

The algorithm for this seems straightforward:
(1) Rearrange the characters in each word alphabetically; e.g. "never" --> "eenrv". Running time O(k * n * lg n) which for small n is really just O(k). You should keep the original words so that later you can reproduce them.
(2) Sort the rearranged words -- using radix sort, this can be done in O(n*k) = O(k) for small n.
(3) Find sets of identical (rearranged) words. O(k * n) = O(k).
(4) Output the original words for each identical set O(k * n) = O(k).
Total running time is O(k) (assuming that n <= c for some small constant c).

- czpete September 27, 2010 | Flag Reply

CareerCup

Google Interview Question for Software Engineer in Tests

Books

Videos

Resume Review

Mock Interviews