Amazon Interview Question
I had a similar question from Expedia:
How would you find the position of the duplicate element(1 duplicate at most) w/ O(1) space. This means you can use static memory but no dynamic allocation like Hash Tables. I didn't solve this problem but mentioned bit manipulation using XOR. Anyone have a clever idea on this?
I think, if you don't want to use dynamic memory, then first sort the array. Then compare two adjacent elements, using xoring function. Means when two elements are equal, then xor result will be zero otherwise 1.
aXORa == 0.
So aXORbXORa = b. XOR is commutative.
So XOR all elements of the array in turn till it becomes zero will give u the repeated number. But i think this can fail sometimes. nybody has a better solution?
The solution spontaneously came to me...
1. Calculate the sum from 1 to N, using the series formula.
2. Since there is at most 1 duplicate, you can find the position by searching in either direction, depending on which position you want to find.
The running time is then O(n) and O(1) storage(only original array).
Your solution works only when numbers are consecutive 1...n,if the array contains something like 14 13 5 6 7 7 8 1 then sorting would give you 1 5 6 7 7 8 13 14 and as you traverse the array and reach the element 7 & 7 you could either use the XOR 7^7 and check or use == operator.Correct me if I have misunderstood your approach.
From the above example
2,3,1,4,5,6,7,7 the sum is 35
but the sum is 35 even for 2,3,1,4,5,6,6,8.. and the repeated
number is 6!
How does just getting the sum help in finding the repeated number?
So, I think Hash which takes O(1) and extra space is the best solution
Second best is to sort and scan the sorted array O(nlogn)
A kind of bit vector like solution, in O(n).
Traverse the array, setting corresponding bit to 1. If the bit already one, then we found our duplicate, if not just keep traversing. Worst case would be if the last element is the repeated one.
But a CATCH is wasted space in creating the bitvector of that size. If range is known, then its great else, we have to make some guesses or give the range of Int.MAX_Value as the size of bit vector
Bit vector sounds expensive since a bit vector supports sequential access, it takes O(n) time to scan.
Mergesort & comparing adjacent duplicates will be O(nlogn). This is expensive as well. O(nlogn) is expensive than O(n). But, this method doesn't need any extra space.
Hash table is the fastest with O(1) retrieval & access times. But a hash table takes extra O(n) space.
So, we need to decide if we want space(mergesort & comparing adjacent duplicates) or time(the lovely hash table).
A simple solution is in O(1).
The sum of first n natural numbers i.e., 1+2+3+...+n if the numbers are in sequence is given by: n*(n+1)/2. So u can apply the above formula i.e., For example:
The below is the given sequence:
2,3,1,4,5,6,7,7.
Now as there are 8 numbers:
8*(8+1)/2 = 36.
Now the sum of the above elements is:35.
So u can easily say that 7 is repeated as sum is 1 less than it should be.
Just to add upon q's statement:
-------------------------------
from Jack's statement N is known.
1) so sum of numbers = N(N+1)/2.... which does not take much time. This can be O(1).
2) Now finding the duplicate element is direct. If N is 5 and 5 is repeated, new sum will be 20. So 20 - 15 (original sum) = 5.
3) Traverse the array and find the first position of '5'.
This is O(n).
Since the question doesn't talk about any sorting, i think, [keeping the data along with a counter in a structure] linked list would be a good solution.
- Niti December 05, 2006