## Amazon Interview Question

Software Engineer / DevelopersAight....read this in an algos book sometime back.....

Each set of elements is thought of logically as a set with the children pointing 2wards their respective parents by means of an array, named p (for parent!!). So, the value of p[i] for an element index i is the index of the parent of that element in the tree representing the set to which this element belongs...whoosh i hope i'm not makin it too complicated...cuz it really isn't!!

So, the union operation is reduced to simply setting the value, p[x] of the root element x of a set equal to the root element of the other set.....

This can be further optimized in terms of TIME (guess tiz pretty optimal in terms of space....hell, all i'm usin is a single array of the size of the no. of elements in all the sets!!), but well...like the good samaritan tht i am....guess i'd leave the TIME optimization to be done by yall.....

The scope for optimizaition lies in the fact tht one can ensure that the height of the union of two sets (i.e. the two tress representing the 2 sets), must never be allowed to exceed the order of log n....

So, chew on tht all u guys...n' may the source b with yall!

frivolosDude@gmail.com

I think bit arrays are a great idea. Bit 1 represents element 1, bit 2 element 2 and so on... Then for union all we need to do is OR them together. If the universal set is small enough it could be a constant time operation. It might be tedious if the number of elements is very large but then there are approximation algorithms available in that case.

If duplicates are allowed, then just add 1 array to the other.

If duplicates are not allowed, then sort one of the two arrays and then use an insertionsort type algorithm to combine the other array into the first. For each element in the second array, check the first array if it exists. If it does, then don't add it, else add it.

1.

Append one array at the end of the other

2.

Sort the resulting array

3.

Remove Duplicates

how abt this solution.. plz feedback needed on how good it is?

assume m and n are the lengths of the arrays. Combined lenth would be m+n and sorting of which takes O((m+n)log(m+n)) time and removing duplicates needs one scan which is O(m+n). so total m+n+(m+n)log(m+n). not so good.

sorted the smaller array...for example first array..it takes O(mlogm) time and for each element in second array do a binary search in first one. If the element is already there discard it otherwise add to the array of output array. This takes

O(nlongm). So total is O((m+n)logm) which is quite better than your solution

Start with set A.

Hash each element and put a 1 in the hash location and output

Start with set B.

Hash each element,

if collision, dont output value

else, output value

Complexity: O(n+m) where n and m are the 2 set sizes

Saurabh Sharma, your solution does not deal with duplicates. What you have described is a solution to another problem.

Divyesh, What if there is a collision between two different elements of SET A itself.(The two elements are different but their HASH values are the same)

Great follow up question. The answer lies in the fact that the only guarantee you ever have in resolving this is by standard hash conflicts (like chaining, double hashing and so on). The only way you can minimize conflicts is by choosing the hash function properly. Most books indicate the modulo function to take the Hash Table size relatively prime to all the elements in the list - while implementing this might be a little cumbersome, the interviewer at this point will probably let you be. Trust a stranger.

Yes using a Hash would be optimal o(n+m)

since it wud not allow 2 similar key values to be stored.

At the end you wud end up with a union of two sets :)

The HashTable(instead of HashSet) can be implemented using perfect hashing(worse-case time lookup is O(1) since the set is constant.

what is the difference between hashtable, hashset and hashmap? I am bit confused with the jargon.

Reply to Shm on 6/11/2006 :- Lets say set A has n and set B has m elements. After combining the arrays the size is m+n. Now depends on sorting algorithm it can take

1> O(pLogp) - comparison sort where p=m+n (extra space may/may not reqd)

2> O(p) - radix sort or so where p=m+n (extra space must require)

Note the extra space mentioned above is in addition to the appended array containing m+n spaces.

Thus previously suggested Hashtable solution has advantage in runtime O(m+n) and in space O(m+n)compared to your solution.

Take any array (I would prefer smaller one), Insert its elements into Hash Table (Do not insert duplicates) and also insert these elements into resulting array. Next, take the second array, check if each element is not already hashed and insert into resulting array (if hashed, do not insert). Complexity depends on a good hash function. Ideal calse is O(m+n) where m and n are sizes of the arrays. Worst case is ofcourse O(mn) assuming a very very bad hash function

1.insert all elements of set A and B in binary search tree for elements which are only less and greater than the element in the

binary tree.

then traverse it for the Union of A and B.

INSERT(node *ptr,int num){

if(ptr==NULL){

ptr=getnode(num);

return;

}

if(ptr->data <num){ //only smaller

if(ptr->left!=NULL)

INSERT(ptr->left,num);

else

ptr->left=getnode(num);

}

else if(ptr->data >num){ //only larger

if(ptr->right!=NULL)

INSERT(ptr->right,num);

else

ptr->right=getnode(num);

}

}

Now traverse the element and get the union of set A & B

2.if it is known that the elements are between 0 and k [0,k]

then take an array COUNT[k] of size k and initialize the array with 0

then traverse both sets A and B

for every element e in A or B increment COUNT[e] by 1.

now union of A & B is the set of index i such that

COUNT[i]>0;

This solution is not efficient in case k is very large.

3.Suppose A and B are arrays with n1 and n2 elements.

use an array C of size n1+n2.

populate C with the elements of A and B

sort C.

then remove dumplicates and return the index of array C with unique element ie the union of A & B.

int UNION(int C[]){

insert=1; //pointing to second element

current=1; //pointing to second element

while(current<n1+n2){

if(C[current]!=C[insert-1]){

C[insert]=C[current];

insert++;

}

current++;

}

return (insert-1);

}

//I'll propose another approach with some assumptions and the idea can be extended as required

//Assumptions:

1> sets will have non-negative integers (0,1,2...)

2> elemnts of the set will be unique

3> the values of the elements ranges between 0 to 31 , both inclusive

With these assumtions each set can be represented by a single interger (32 bit) where

ith LO bit is set to ON if element i exists in the set.

Example: a set S={1,5,7} will be represented by integer s=(0000 0000 0000 0000 0000 0000 1010 0010)binary

Now union of two sets = S1 U S2 = s1 OR s2 in O(1) time

Now intersection of two sets = S1 intersection S2 = s1 AND s2 in O(1) time

In certain situations such representation can give you amazing runtime and memory saving

I have problem with this code for union, intersection and difference of sets can someone help. Thanks in advance.

#include <iostream>

using namespace std;

int main ()

{

int setA[] = {1,3,8,5,9,13,20,65,81,45,50,43,12};

int setB[] = {4,5,6,7,50,13,24,89,55,12,8,99,25};

cout<<"A intersection B=";

cout<<"{";

for( int i = 0; i <13; i++ )

{

for( int j = 0; j <14; j++ )

if( setA[i]== setB[j] )

cout<<" "<<setA[i]||setB[j];

}

{

cout<<"}"<<endl;

}

cout<<"A union B=";

cout<<"{";

for( int x = 0; x <setA[13]; x++ )

{

for( int k = 0; k <setB[14]; k++ )

{

if( setB[k]!=setA[x] )

cout<<" "<< (setA[x]&&setB[k]);

{

cout<<"}"<<endl;

}

}

}

cout<<"A minus B=";

cout<<"{";

for( int s = 0; s <13; s++ )

{

for( int p = 0; p <13; p++ )

{

if( (setB[p]-setA[s])&& (setA[s]||setB[p]) )

cout<<" "<<setB[p] && setA[s];

}

{

cout<<"}"<<endl;

}

}

}

Although STL is not allowed to use...I think to traverse thru Set elements, we need to have iterators....Use iterator on each of the set...compare currently pointing value...if same...copy either of them in resulting array otherwise copy both of them (compare if need to produce sorted output)....

- Pratik January 13, 2006Assumption: both sets' size is same. check for boundary condition.