You are given a bunch of email

Bazaarvoice Interview Question for Software Engineer / Developers

0

of 0 votes

15
Answers
You are given a bunch of emails in an inbox. You want to send all the sender addresses to some server. You can send them in batches (each batch containing a bunch of sender email addresses). The restriction is that no batch can contain duplicate email address. How would you write a program to send all the email addresses in batches such that it takes the minimum number of batches.

Analyze the complexity
- bv September 30, 2009 | Report Duplicate | Flag | PURGE
Bazaarvoice Software Engineer / Developer Algorithm

Email me when people comment.

An error occurred in subscribing you.

More Questions from This Interview

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 0 vote

If the list of email addresses is not sorted, then Put the e-mail addresses into a Balanced Binary search tree( this will remove the duplicates from the list). now serialize the binary tree and send it to the server , on the server end it can be de-serialized to get the original tree with no duplicates.

- Addy September 30, 2009 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

@Addy: solution seems fine.
what was your answer? @bv

- manoj September 30, 2009 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Do you allow to remove duplicate email address?

- Anonymous September 30, 2009 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

I think this is the right answer. Many of the BV questions seem to be incompletely specified,like they are seeing if you will ask questions to further define the end goal, instead of just going to town with the partial requirement spec.

- Anonymous September 14, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

I haven't done a complete analysis on this exact problem, but have on a very similar one, and the best data structure for string based sorting/searching was a 'trie' (retrieval tree). For de-duping, which isn't the normal use for 'tries', the insert function would be the flavor that only inserts, and resolves partial collisions, to the minimum depth for uniqueness. This ends up with an O(n) with a worst case coefficient/multiplier of the longest length string/email, and the average case multiplier being the average number of compares for inserts, and average depth for retrievals.

- Anonymous September 15, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

Hash has the least complexity.
Create the hash of each email address. If there is a duplicate hash, check if emails are duplicate. If yes, consider only one of them. Otherwise include both the email addresses.
Complexity O(n)

- Avinash October 01, 2009 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

I think the complexity is not O(n) but O(n log(n)).
You have to iterate over the list of email addresses (O(n)) and to check the hashes of each email address against some data structure like a hashset (O(log(n)). The overall complexity should be O(n log(n)).

- Anonymous February 05, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

(Reposted this, as the first one went to the wrong comment)

I haven't done a complete analysis on this exact problem, but have on a very similar one, and the best data structure for string based sorting/searching was a 'trie' (retrieval tree). For de-duping, which isn't the normal use for 'tries', the insert function would be the flavor that only inserts, and resolves partial collisions, to the minimum depth for uniqueness. This ends up with an O(n) with a worst case coefficient/multiplier of the longest length string/email, and the average case multiplier being the average number of compares for inserts, and average depth for retrievals.

- Anonymous September 15, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Additionally, the trie results in much less processing. The question involves strings, which can be operation intensive. The 'trie' only does compares/operations as far into the string as necessary. Hashing strings, not. The trie behaves like an auto-optimizing hash for the number of string elements to use for the hash to minimized processing, as well as self-sort the collision chain.

- Anonymous September 15, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

It should be obvious that the dupe count for a node can be incremented when inserting, then decremented when retrieving, so that the first batch is the super set of unique emails, and each subsequent traversal generates a set of emails with increasing dupe counts.

Whether this is the answer they were looking for can't really be known, since there isn't enough info about the problem to be solved.

- Anonymous September 15, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

but wouldnt this defeat the purpose of the question... where u are asked send out emails using minimum number of batches. The question has no restriction on the number of emails per batch, so if you remove duplicate entries you can send all emails in one batch. So I think the question is to be interpreted in different way.

- Anonymous October 07, 2009 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Yes, I think this is the right answer (see above similar comment). They seem to want to see if you realize the task is incompletely specified. I have seen this pattern several times with BV 'test problems'.

- Anonymous September 15, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

It's kind of like bin packing. There will be multiple hashes corresponding multiple bins. Scan each email address one by one and lookup the hashes one by one. If it occurs say in the first bin then check in the second bin. Now say there are only 2 bins. In this case create another bin(hash) and put it into this.

- Anonymous February 10, 2010 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

What you are essentially saying, in a more complicated way, is already part of a hash- the collision chain. After a complete scan of the emails, the hash table just needs to generate as many batches as the highest order collision chain. There are many markup strategies for ID'ing the hash members/collision chains.

- Anonymous September 15, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

I am an IT Recruiter looking for some Software Engineers in the Austin area contact me directly--client has asked that I recruit specifically for Bazaarvoice candidates suzie.jimenez@thehtgroup.com 512.345.9300

- suziejimenez3 September 25, 2014 | Flag Reply

CareerCup

Bazaarvoice Interview Question for Software Engineer / Developers

Books

Videos

Resume Review

Mock Interviews