Amazon Interview Question
Software Engineer / DevelopersCountry: United States
Interview Type: Phone Interview
Since the integer is not larger than 2^32, simply use a bit map of size 2^32 can help, the total size is only 2^29 byte, which is 500Mb. Certainly modern computer can do that.
can u please elaborate a bit.... in array we can store the occurencences as given in above post.. but in one bmap hw wud u implement tat..... i can understand tat we can do it if we use 2 bmaps..
Hashing integer will not work if the memory is limited. May be bitvector would be acceptable.
That actually needs clarification. For instance if input arrays are {1,1,1} and {1,1,2} is the output suppose to be 1,1 or just 1?
by tweaking the external merge sort
First sort both the arrays using external merge sort....
Now you have 2 arrays which are sorted....
Now repeat extenal merge sort on both the arrys but
Now while merging this two arrays using external sorting...
if the front elemnets of both the arrays are same then include them in a separate array...(i.e tweak the merge step)...
HashSet<Integer> a = new HashSet<Integer>();
a.add(4);
a.add(2);
a.add(5);
a.add(6);
a.add(1);
a.add(3);
a.add(7);
a.add(9);
HashSet<Integer> b = new HashSet<Integer>();
b.add(4);
b.add(2);
b.add(6);
a.retainAll(b);
System.out.println(a);
In this way, you can make this one O(1).
There is no restriction placed by the interviewer on the secondary storage. So we can use it I suppose.
1. Read the integers in file1 one by one and make files with that name.
2. Open the second file and read the integers one by one. For every integer you read check the existence of the file.
3. If the file exists then print the integer as common.
It O(n+m) solution i.e. O(n).
That's a truly horrifying solution, in every way worthy of the Daily WTF. So you're saying that you would create as many files as there are integers? If the integers don't fit in memory, isn't it because there are probably, say, > 1B of them? So you're going to create over 1B files on your system?
Worse perhaps is the person who thinks this solution represents refreshing thinking. An Einstein quote comes to mind here: "Creativity is no substitute for knowing what you're doing."
A bit map of integer size containing all ints require only 2^32 space = 4 GB of RAM.
First step: Prepare this bit map and store it on disk.
Suppose the total memory available to program is 2 GB.
Second step : So take the first 1 GB of bitmap from first array and another GB of bitmap from second array. Compare them and store it in a third file in bitmap format. Do same thing for the rest of the array.
Third Step: Convert bit map back to integer array.
A bit map of integer size containing all ints require only 2^32 space = 4 GB of RAM.
First step: Prepare this bit map and store it on disk.
Suppose the total memory available to program is 2 GB.
Second step : So take the first 1 GB of bitmap from first array and another GB of bitmap from second array. Compare them and store it in a third file in bitmap format. Do same thing for the rest of the array.
Third Step: Convert bit map back to integer array.
As we have integers we can have +ve and -ve numbers
We can build a bitmap actually 2 bitmaps one for +ve and one for -ve
We simple mod a -ve number to get the position in a bitmap.
Build bit map for both the arrays and then "AND" them together.
Which ever bit is set to one is the common number.
Any repeating integer in single array can be rejected. This can be looked at a more compress hashmap.
We can store the one array integers into a hash map . Maximum number of different integers can be stored in RAM . Hash map will have only unique integer values. Duplicates are ignored.
Here is the implementation in Perl language.
#!/usr/bin/perl
use strict;
use warnings;
my @arr1,@array2; # array elements assumed to be filled
my $hash; # hash map to store value of one array
# runtime to prepare hash map is O(n).
foreach my $ele ($arr1){
$hash->{$ele}=true; # true here element exists key is integer number and value is true, duplicate elements will be overwritten
# size of array will fit in memory as duplicate integers are ignored ( mx size will be 2 ^( 32) -1 or 2^(64) -1 based on operating system) which can be stored in RAM.
}
# O(n ) to traverse second array and finding common elements in two array
foreach my $ele2($arr2){
# search in hash map is O(1), if all integers of array are same then hash map will have only one entry and still search tim is O(1)
if( defined $hash->{$ele}){
print "\n $ele is common in both array \n";
}
}
I hope it helps.
The basic idea behind limit the RAM size if that you have to use disk for solving the issue or a limited amount of RAM, for more clarity you will have to ask the size of RAM which is allowed.
The problem will involve sorting the disk files (if you know how to do it and get into questions related to sorting.). If the interviewer says that it will take a lot of time, you can simply place a bit vector, which will be allowed, of the files integers and then just print the common bits.
We can use max/min heaps for both the arrays in memory, and then compare the max elements, till the entire heap is compared, if its a match, reject the element from the heap, else insert the element back to the heaps, read back from the disk till the heap size is reached(assume a HEAP_SIZE).
Having RAM swap to and from disk is still O(N). It'd be N / some constant related to memory size, times a big constant for swap time, but still, O(N). I vote for some sort of hash table that can hold all the encountered ints; at each, keep a flag of whether it's been seen in A or B or both, and swap to/from disk as necessary. Use a memory-mapped file for ease-of-use and speed.
Of course, the intended answer is probably something tricky with XOR or summing and subtraction :)
I answered the same. Interviewer did not give any specific reaction. He moved to another question.
Take an array A1 of size equal to max size of integer. Now if the number is found n number of times in first array count should remain 1 A1 and if the number is found in both array then the count in A1 is increased to 2. At the end iterate through the array A2 to collect all numbers which have count as 2.
1. Create a boolean array of length = number of integers in the first file.
2. For every integer you read from file1 update it's index to true. ( Say x = file.readLine() ; array[x] |= true ) This way array is just 32 times smaller than the file. This loop = O(n)
3. Do step 2 for second file on the same array. ( y = file.readLine() ; array[y] &=true )
4. After step 3 you are left with the array that had intersection of file 1 and 2 and all those indices which have value = true are the integers that are common.
Only in the first step, create array of size = number of integers of the file that has biggest size.
We can use max/min heaps for both the arrays in memory, and then compare the max elements, till the entire heap is compared, if its a match, reject the element from the heap, else insert the element back to the heaps, read back from the disk till the heap size is reached(assume a HEAP_SIZE).
void printCommonNumbers(FILE* fp1,FILE* fp2){
int x,y;
fscanf(fp1,"%d",&x);
fscanf(fp2,"%d",&y);
while(!feof(fp1) || !feof(fp2)){
if(x<y)
fscanf(fp1,"%d",&x);
else if(x>y)
fscanf(fp2,"%d",&y);
else{
printf("%d ",x);
fscanf(fp1,"%d",&x);
fscanf(fp2,"%d",&y);
}
}
}
Assuming, the input files contain sorted numbers.
I can't see a way to do this if the arrays are unsorted. A linear-time common-elements algorithm does not come to mind on unsorted array
- Anonymous January 20, 2012