Amazon Interview Question
Applications DevelopersCountry: India
Interview Type: In-Person
Usually hashing implementations do something smarter than just taking value % numBuckets, and so you usually don't have to be very fearful that all your numbers will wind up in the same bucket. In practice, hashing tends to be a good solution.
Your point that an O(n^2) worst case is possible is correct, though. Some people do use sorting for solving this problem for that reason (and also because sorting has low constant factors, is simple, and can have good cache performance).
Solution with Time Complexity "n log(n)" and with no extra space.
import java.util.Arrays;
Integer[] input_arr = {3,2,1,3,2,5,5,7,6,1};
Arrays.sort(input_arr); //This will sort the array in-place using quicksort algo in N log(N) time.
for(int data : input_arr)
System.out.print(data + " ");
for(int i=1; i<input_arr.length ; i++) {
if(input_arr[i] == input_arr[i-1])
input_arr[i-1] = null; // Replace the duplicate value with null
}
System.out.println();
for(int i=0 ; i<input_arr.length; i++)
System.out.print(input_arr[i] + " ");
}
In case we are using java APIs HashMap or HashSet...... they do have their own algo to keep functioning.... so we can't just say O(1) is the complexity.... using these APIs for purpose is OK if we are not too concerned about complexity or we are sure the best algo is being used in these APIs.
In case we are not using any pre built APIs yes BST or a SORTED ARRAY data structure is looking good as total nlog(n) complexity is required to insert, delete duplicates and maintain unique data having n elements.
don't you need a hash-entry for each of the N elements (in the worst case when there are no duplicates), so space complexity should be O(N) as well as its time-complexity, right?
When the N elemnts are given as an array and you're allowed to modify it, then you could use a modifies BuildHeap function to build a heap in place (and don't insert duplicates in that heap) in O(N) with no additional memory.
It is still O(nlgn) as you need to traverse the entire array i.e., O(lg n) of insertion for a total of n elements.
so what?
I gave it as second best option.
Hashmap takes O(n) and if BST is second best it meant it takes more than Hashmap O(nlong).
BST have the property of identifying the duplicates in O(logn) by which I meant searching and finding duplicates, and I have not mentioned building BST will take O(logn).
The question is more interested in knowing the data structures that have the property of identifying duplicate elements.
which I have correctly pointed out to :
1. Hashmap
2. BST
3. SkipList.
Consider you have 1000 elements and you want to eliminated duplicates, you don't have inbuilt Hash implementation/BST implementation which data structure will you implement?
If you say Hash
Hash will require implementation of a good hash function that evenly distributes which is a complex task.
It will also require collision resolution by chaining or whatever implementation.
In the above case I will rather opt for BST.
Now lets assume the data I have is partially sorted in this case I will get a degenerated tree which increases the complexity in this case I will opt for skip list it will take O(nlogn) space but for 10000 elements it is managable given O(logn) search time guaranteed each time.
So on basis of the size of data/search time/ space available we have different options to opt for.
Hashing might give the average complexity of O(N). The worst case is O(N*N). You ask why?
- sourabh July 18, 2013Imagine this-
array[] = 7, 14, 21, 28, 35, 42, 49
If you mod by the array length (7). It would mean all these would be colliding into bucket 0. Thus insertion of elements( or checking for duplicates takes- 0 + 1 + 2+ 3 + 4+ 5+ 6 = 21 steps).
I think sorting is the best general purpose option.