What is the best(time and spac

Amazon Interview Question for Applications Developers

0

of 0 votes

16
Answers
What is the best(time and space complexity) data structure to remove duplicates from given N elements.
And how?
- pirate July 18, 2013 in India | Report Duplicate | Flag | PURGE
Amazon Applications Developer Algorithm

Email me when people comment.

An error occurred in subscribing you.

Country: India
Interview Type: In-Person

More Questions from This Interview

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 3 vote

Hashing might give the average complexity of O(N). The worst case is O(N*N). You ask why?
Imagine this-
array[] = 7, 14, 21, 28, 35, 42, 49
If you mod by the array length (7). It would mean all these would be colliding into bucket 0. Thus insertion of elements( or checking for duplicates takes- 0 + 1 + 2+ 3 + 4+ 5+ 6 = 21 steps).
I think sorting is the best general purpose option.

- sourabh July 18, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 3 votes

Usually hashing implementations do something smarter than just taking value % numBuckets, and so you usually don't have to be very fearful that all your numbers will wind up in the same bucket. In practice, hashing tends to be a good solution.

Your point that an O(n^2) worst case is possible is correct, though. Some people do use sorting for solving this problem for that reason (and also because sorting has low constant factors, is simple, and can have good cache performance).

- eugene.yarovoi July 21, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 1 vote

We can use multi-hash function to prevent worse case, if you chose 2 good hash-function then the probability that 2 different numbers both collide to the same value is practically 0. In this case the space complexity is still the same

- LinhHA05 July 23, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 1 vote

Solution with Time Complexity "n log(n)" and with no extra space.

import java.util.Arrays;

        Integer[] input_arr = {3,2,1,3,2,5,5,7,6,1};
        Arrays.sort(input_arr); //This will sort the array in-place using quicksort algo in N log(N) time.
        for(int data : input_arr)
            System.out.print(data + "  ");
        for(int i=1; i<input_arr.length ; i++) {
            if(input_arr[i] == input_arr[i-1])
                    input_arr[i-1] = null; // Replace the duplicate value with null
        }
        
        System.out.println();
        for(int i=0 ; i<input_arr.length; i++)
            System.out.print(input_arr[i] + "  ");


}

- Ayaskant July 19, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

i would use hashset because of its efficient way of removing duplicates.

- Gopal July 18, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

I would say the answer mostly depends on the input. But Hashing is the best in most cases.

- Srikant Aggarwal July 18, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

In case we are using java APIs HashMap or HashSet...... they do have their own algo to keep functioning.... so we can't just say O(1) is the complexity.... using these APIs for purpose is OK if we are not too concerned about complexity or we are sure the best algo is being used in these APIs.

In case we are not using any pre built APIs yes BST or a SORTED ARRAY data structure is looking good as total nlog(n) complexity is required to insert, delete duplicates and maintain unique data having n elements.

- PKT July 18, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

how about the following solution :

Integer[] array = {1,1,2,2,3,3,4,4,4,5,6,7,8,8};
List<Integer> list2 = Arrays.asList(array);
Set<Integer> s = new HashSet<Integer> (list2);

for(Integer i :s)
{
System.out.println(i);
}

- myvyas@asu.edu July 19, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

-1

of 1 vote

yes hashing using key value as the number and it's value frequency is the best wy to eliminate duplicates as we ca do that in o(1) space and o(n) time complexity . Is there any better way ?

- Anonymous July 18, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

-1

of 1 vote

don't you need a hash-entry for each of the N elements (in the worst case when there are no duplicates), so space complexity should be O(N) as well as its time-complexity, right?

When the N elemnts are given as an array and you're allowed to modify it, then you could use a modifies BuildHeap function to build a heap in place (and don't insert duplicates in that heap) in O(N) with no additional memory.

- Anonymous July 18, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

-1

of 3 vote

The complexity will be O(n*logn) (to build sorted array) and additional memory O(n)

- mike cam July 18, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

-1

of 1 vote

Best will be hashmap
Second best is BST, it eliminates the duplicates in O(logn).

- varun July 18, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 1 vote

It is still O(nlgn) as you need to traverse the entire array i.e., O(lg n) of insertion for a total of n elements.

- Anonymous July 18, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

so what?
I gave it as second best option.
Hashmap takes O(n) and if BST is second best it meant it takes more than Hashmap O(nlong).
BST have the property of identifying the duplicates in O(logn) by which I meant searching and finding duplicates, and I have not mentioned building BST will take O(logn).

The question is more interested in knowing the data structures that have the property of identifying duplicate elements.

which I have correctly pointed out to :
1. Hashmap
2. BST
3. SkipList.

Consider you have 1000 elements and you want to eliminated duplicates, you don't have inbuilt Hash implementation/BST implementation which data structure will you implement?

If you say Hash
Hash will require implementation of a good hash function that evenly distributes which is a complex task.
It will also require collision resolution by chaining or whatever implementation.

In the above case I will rather opt for BST.

Now lets assume the data I have is partially sorted in this case I will get a degenerated tree which increases the complexity in this case I will opt for skip list it will take O(nlogn) space but for 10000 elements it is managable given O(logn) search time guaranteed each time.

So on basis of the size of data/search time/ space available we have different options to opt for.

- varun July 21, 2013 | Flag

Comment hidden because of low score. Click to expand.

-2

of 2 vote

probably hasing:)

- ram rs July 18, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

-2

of 2 vote

Yes hashing with O (1) space and O(n) time complexity.
Correct me if I am wrong.

- Anonymous July 18, 2013 | Flag Reply

CareerCup

Amazon Interview Question for Applications Developers

Books

Videos

Resume Review

Mock Interviews