You are given a large set of i

Amazon Interview Question for Interns

4

of 4 votes

66
Answers
You are given a large set of integers, which are not sorted. Figure out a method to retrieve the largest 1000 elements, in O(n) run time
- varunumesh77 March 12, 2014 in United States | Report Duplicate | Flag | PURGE
Amazon Intern Algorithm Data Structures

Email me when people comment.

An error occurred in subscribing you.

Country: United States
Interview Type: Phone Interview

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 4 vote

Use the selection rank algorithm.

Pick a random element in the array and use it as a ‘pivot’. Move all elements smaller than that element to one side of the array, and all elements larger to the other side.
»»If there are exactly i elements on the right, then you just find the smallest element on that side.
»»Otherwise, if the right side is bigger than i, repeat the algorithm on the right. If the right side is smaller than i, repeat the algorithm on the left for i – right.size().

cc book problem

- Nitin March 13, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Could you please explain it with set of 10 numbers. Solution seems to be interesting but not getting it.
What is i here ?

- Abhijeet Kandalkar March 15, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 1 vote

How is it O(n) ?

- Anon March 18, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 1 vote

He's using quicksort.
You do need another condition if, elements to the right of the pivot are less than 1000 then you have to sort the left side of that particular pivot and add, right side of the new pivot + right side of the particular pivot. If you have more than 1000, you only need the first 1000, if it is less than 1000 then you have to do the left right again.
I hope you get to your destination after all these left and rights. Which you should!

- kruz March 18, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Its average case is O(n). Worst case in O(n2). Being random worst case is seldom possible but for full proof Selection in worst case linear time check my comment below.

- dhruba.work March 26, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 1 vote

Technically, you could go through the array 1000 times and each time get the max value which is smaller than the previous max value you had already found. This is exactly This will almost require 2000(n) comparisons.

Another approach, with an extra O(n) space is this:
1) Build a Max-Heap in O(N) time and space (I guess will take around 3n swaps and compares).
2) Extract-Max 1000 time which takes at worst 2000 log(N) time.

- Ehsan March 12, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 1 vote

Min-heap is better for top-1000. Think about it.

- Anonymous March 12, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

I don't get it. If you use a Min-Heap, then the top 1000 numbers will be the leaves of the tree. There is almost n / 2 leaves in the tree. So you know the search range is "n / 2". How will you use this then? Or is there another trick I am missing?

- Ehsan March 12, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 6 votes

Maintain a min-heap of 1000 elements.

As you get new elements, check against the min. If smaller throw away. Else, extract min, and insert new element.

Space usage is constant, runtime is linear.

Voila!

- Le snob March 12, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

You are right. This should take around (10-20 ) n at most.

- Ehsan March 12, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

@Ehsan: Comparisons are not the only cost. Maintaining/mucking around with a huge array can have a huge cost (cache missed/paging etc).

Talking about 3n or (10-20)n is pointless.

- Le snob March 12, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

That is one factor, but when it comes to speed, talking about 1.5 n and 1.0 n makes a lot of sense. Let alone 3n, 10-20n , and 1000n.

- Ehsan March 13, 2014 | Flag

Comment hidden because of low score. Click to expand.

-1

of 1 vote

Best is to profile it and figure out. Yes, knowing the approximate number etc is good, but talking about only that is misleading.

- Le snob March 14, 2014 | Flag

Comment hidden because of low score. Click to expand.

-2

of 2 votes

This would give us O(n log n) in the worst case. Say the numbers are in decreasing order. You keep on putting the next element into the min-heap (O(log n)for all elements (O(n))

You might just as well sort the numbers in O(n log(n)) time and return the largest 1000 numbers.

The answer to this question is quick find with an initial shuffle

- gokayhuz March 17, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 1 vote

@gokay. It is clear you don't understand the min-heap solution.

- Anonymous March 17, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

How does the MinHeap solution work in O(N) time?
- Building the heap, i.e., inserting and deleting takes O(log N) time.
- When you are done building the heap which now maintains the largest 1000 elements, you need to extract these elements from the heap. To extract them, you need to call 'delete' 1000 times. That makes the complexity O(1000 log N) which is linearithmic.

- phantomlord November 27, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

use median of medians, O(n), to get 1000th largest
then scan array picking out all numbers >= the one you got from above
Runtime: O(n) worst case
------------------------
or you can use QuickSelect for 1000th largest (expected linear, but worst case with low probability to be quadratic)

- S O U N D W A V E March 12, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

-1

of 1 vote

Or just maintain the 1000 largest (in an array/linked list/min-heap etc) while you run through the elements. I vote for min-heap.

- Anonymous March 12, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Worst case analysis:

All three solutions you mention above would be supralinear in time.
Array/linkedList would be O(nm)=...= O(n^2).
min-heap would be O(n*lgm) =...= O(n*lgn)

the =...= is used to denote "if n=theta(m)"

Another method is to use a max-heap which would be:
O(n + m*lgn) = ...= O(n*lgn)

- S O U N D W A V E March 13, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

n is array size
m is the "1000"

[Forgot to define the variables above]

- S O U N D W A V E March 13, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

min heap is O(n log m).

Since m = 1000, this is equa-linear. Not supra or sub.

- Le snob March 13, 2014 | Flag

Comment hidden because of low score. Click to expand.

-2

of 2 votes

Who cares about the case when m = theta(n).

- Anonymous March 13, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Ok.
So if you tell me to find the biggest K elements of an array of size N, I can just say well K doen't count because it is given, and I'll give a O(K^K * N) algorithm, then I'll make K disappear into O(N). Equi-linear.

QuickSelect is blazingly fast in practice (works well with cache), and will drop the result in the first 1000 elements of your original array ready for your bidding.

Or you can build a min heap of size 1000, then do all that other stuff if you like...

- S O U N D W A V E March 13, 2014 | Flag

Comment hidden because of low score. Click to expand.

-1

of 1 vote

Did you even try the min-heap solution (implemented using an array)?

It is blazingly blazingly fast, much better than Quickselect.

Your comments are irrelevant.

- Anonymous March 13, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Ok if you say so.
Creating and playing with another array (for the min-heap) without locality of reference with the original array is much faster :)

- S O U N D W A V E March 13, 2014 | Flag

Comment hidden because of low score. Click to expand.

-1

of 1 vote

Stop making comments like this, and you might get a better position in programming contests :-)

Really, try it out, first.

- Anonymous March 13, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

I like min-heap solution (especially with streams , a.k.a., getting elements one by one from a socket or some external source).

But not sure about the linkedlist/array/loop-1000-times solutions.

minheap vs. maxheap vs. linear_selection(quickSelect or MoM) really depends on the situation. We can't even "try it out" because so many other variables need to be known.

Also, if the elements are integers in a tight range, there are solutions that destroy all above.

:) Healthy debate is good for the brain, agree?

- S O U N D W A V E March 13, 2014 | Flag

Comment hidden because of low score. Click to expand.

-1

of 1 vote

Yes, agree that workload needs to be known blah blah. Applies to comments made by you too :-)

Agree, debate is good.

- Anonymous March 13, 2014 | Flag

Comment hidden because of low score. Click to expand.

-1

of 1 vote

LOL. Healthy discussion, but silent downvoting?

- Anonymous March 14, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

Loop over the numbers and put them into an array of size 1000. Keep track of the index and value of the lowest number, when you need to replace it, loop over the array for the new lowest.

Should be 1001n or O(n) unless I'm missing something.

- Anonymous March 12, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 1 vote

Ok, so you keep track of the index and value of the lowest number. Fine.
Say you replaced it, fine.
Now... how do you get the index of the "new" lowest number?
This is what makes it quadratic.

- S O U N D W A V E March 13, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

His approach is correct, not the fastests, but correct, and needs almost 2000N comparisons.

For each new number, you make a comparison with min of 1000 numbers. Then, if you replace, you do another 1000 comparisons to find the next min. This adds up to 2000. In total, it will be 2000N since you repeat for any of the N numbers.

- Ehsan March 13, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

I didn't say it was incorrect :'(

- S O U N D W A V E March 17, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

#include <iostream>     
#include <algorithm>
#include <vector>

using namespace std;

const int K = 1000;
const int N = 100000;

int main () {
    vector<int> v;

    for (int i = 0; i < K; i++) {
        int d = rand();
        v.push_back(d);
    }

    // create a min-heap with K elements
    make_heap (v.begin(),v.end(), greater<int>());

    for (int i = 0; i < N; i++) {
        int d = rand();

        // if the new number is greater than the min value of the heap
        // remove the min value from the heap, and add the new value in 
        // the heap
        if (d > v.front()) {
            pop_heap (v.begin(),v.end(), greater<int>()); 
            v.pop_back();

            v.push_back(d);
            push_heap (v.begin(),v.end(), greater<int>());
        }
    }

    sort_heap (v.begin(),v.end(), greater<int>());

    for (int i : v) cout << i << " "; 
    cout << endl;

    return 0;
}

- Westlake March 12, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

You don't really need the top 1000 to be sorted. +1 though.

- Anonymous March 12, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

In deed the last sorting is not needed. I used it solely for verifying my solution by comparing the heap with a reference result.

- Westlake March 13, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

Use Selection Rank algoritm

Pick a random element in the array and use it as a ‘pivot’. Move all elements smaller than that element to one side of the array, and all elements larger to the other side.
»»If there are exactly i elements on the right, then you just find the smallest element on that side.
»»Otherwise, if the right side is bigger than i, repeat the algorithm on the right. If the right side is smaller than i, repeat the algorithm on the left for i – right.size().

cc book problem

- Nitin March 13, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 2 vote

Define a special data structure backed by doubly linked list with support of following features
(1) Add(Element) which can add up to 1000 elements in the list
(2) An internal var SIZE to record correct element size and each Add function check SIZE
(3) An internal var currentMIN to store current smallest item
(4) An internal pointer points to Node storing the var for currentMin
(5) When doing add(Element) if element > currentMin, then we append this new element to head of the list and remove the node storing currentMin

After insertion to this list from item 0 to n, then we will get a list storing only the largest 1000 elements

- Jerry Fan March 13, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 1 vote

When you remove the currentMin, what will be the new currentMin?

- Vingard March 13, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Strange. Did someone just upvote themselves?

- Anonymous March 13, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

dude it is just a heap,

when we change the min,it takes lon=g n time for finding the min again,
hence it is nlogn solution.

- Anonymous March 14, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

If it is heap. Say it is heap. Backed by doubly linked etc does not give it the same characteristics as a heap (O(log n) insert etc).

- Anonymous March 15, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

Find 1001th smallest element in O(n) time using the Kth smallest logic. All the elements on less than this element is the answer.

- Delta March 13, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Item with same value may exist. So K(th) smallest number may contain larger than K items

- kiumiu.ca March 13, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

max heap of 1000 size can work.as it take time complexity of n* log(1000) here log(1000) is constant,so time complexity is 0(n)

- go4gold March 13, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Not max! min! min! How many times do I have to repeat?

- Le snob March 13, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

go4gold thinks it is 1000 smallest elements to be found
be ncie SNOB

- Le snob is Subbu my captain March 13, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

We have to use augmented data structure.
We have to create a balanced binary search tree.
Node structure will be like

Node
{
	Node Left;
	Node Right;
	int value;
	int NoOfChildNodes;
}

While adding each node we keep on updating "NoOfChildNodes" for all nodes who all will
be parent of new node.

After creating that tree we can find any ith Min/Max Node.

Now we have to find 1000th min node
we will start from root node and start traversing downwards.

If (node.NoOfChildNodes > 1000)
	// traverse on right child
else 
	// traverse on right child. We have to update the i value also.

Let node.NoOfChildNodes = 1200 and i = 1000
if node.Left.NoOfChildNodes = 700 and Node.right.NoOfChildNodes = 500

then we have to find 300th Min element in right sub tree. (1000-700).
Keep on traversing downward till we get desired node.

Complexity computation
Making Balanced Binary search tree(ex AVL tree) for n nodes is: O(nlog(n))
For searching element in BST : O(log n)

For very large n vale , log(n) is negligible
that is nlog(n)+log(n) ~= nlog(n)

- Mohit Goel March 14, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Heap is always the best way to find 'k' min or max numbers.....

- Alok March 14, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

before you answer this question, you should ask
1) how big is the dataset, if it's huge and cannot fit in memory, you can use a minheap
2) otherwise, use selection algorithm.
3) can you change the data, since sort will change the array
4) how much extra space you can use
Ask these questions before you give a solution

- samuel March 15, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

How about this Divide the array into 1000 chunks and then run findMaxNumber in each parallely ?.

- Gurukanth March 15, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

using partition, till we pivot index is equal to 1000
public static int partition(int arr[], int left, int right) {
		int pivot = arr[(left+right)/2];
		int j = 0;
		int y = 0;
		int pivotIndex = (left+right)/2;

		for (j = left, y = right; j < y;) {
			if (arr[j] >= pivot && arr[y] <= pivot) {
				if (j == pivotIndex) {
					pivotIndex = y;

				} else if (y == pivotIndex) {
					pivotIndex = j;
				}
				int temp = arr[j];
				arr[j] = arr[y];
				arr[y] = temp;

			}

			if (arr[j] < pivot) {
				j++;
			}
			if (arr[y] > pivot) {
				y--;
			}

		}
		return pivotIndex;
	}
	
	
	public static void finKthIndex(int[] arr,int k)
	{
		 int pivotIndex =-1 ;
		int left =0;
		int right = arr.length-1;
		do
		{
			
			 pivotIndex = partition(arr, left, right);
			if(pivotIndex==k)
			{
				System.out.println("Got it");
			} else if(pivotIndex > k)
			{
				right = pivotIndex;
				
			} else
			{
				left = pivotIndex;
			}
			 
			 
		}
		while(pivotIndex!=k);
		
	}

- martin.mathew.2k11 March 15, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

i think one pass bubble sort will give us the max in O(n) time.

- TusharP March 15, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Use a min heap of size 1000 and compare every element of array with top of heap. If the element is bigger than insert it in the heap. When all the elements in the array are processed , you get the result in the heap.

- aj_ March 18, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Well use Median of Medians to find the 1000th Number. In theory MOM is fast but in practice it works better if we choose a random number as the pivot. The algorithm to find the 1000th largest number is O(n). Once we find the 1000th number in once scan(O(n)) we can find the 1000 largest numbers.

- Madhur Mehta March 18, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Use Bubble sort ,instead of running it n times run it 1000 times only.

- praveenkcs28 March 21, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Build a max heap from the unsorted array it will take O(n) time. Then extract 1000 elements from it which will be order of 1000log(n) . The overall complexity will be O(n) as 1000<n.

- khunima March 25, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

It can be solved using medians of medians algorithm. It only gurantees O(n) worst case time complexity in ranking an element in a unsorted array. Normal Ranking algo using random pivot takes O(n2) in worst case, average case is O(n).
It can be found in Introduction to Algorithms-Cormen book in Chapter medians and Order Statistics.

- dhruba.work March 26, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

median of medians algorithm is the way to go. As dhruba.work rightly pointed CLRS book has a chapter on this.

- rkalyankumar March 27, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Use a max-heap (array implementation). Insertion complexity is O(log N).
The 1st 1000 elements in this array are the 1000 largest numbers.

- uspriya2006 April 11, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

In questions like these, it is helpful to ask more about the nature of the dataset. For e.g. If it contains unique integers within a certain range, we can just use a bit-vector to represent those integers and read the bit vector from the beginning to give a sorted representation. This is O(N) time.

- Anonymous April 12, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

public class Largest1000NumberFromArrayUnknownLength {

	public static void main(String[] args) {

		Stack<Integer> s1 = new Stack<Integer>();
		Stack<Integer> s2 = new Stack<Integer>();

		int[] a = { 20,19,18,17,16,15,14,13,12,11,10,9,8,7,6,5,4,3,2,1};

		for (int i = 0; i < a.length; i++) {
			s1.push(a[i]);
		}
		
		for (int i = 0; i < a.length; i++) {

				if (s1.peek() > a[i]) {
					s2.push(s1.pop());
				}
			 else {
				s1.push(a[i]);
			}
		}
		for (int i = 0; i < 5; i++) {
			System.out.println(s1.pop());
		}

	}
}

- usage of 2 stacks makes it possible in O(n) April 20, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

#define Swap(a,b)(a=b+a-(b=a))
#include<iostream>
using namespace std;
int a[1000000],n,k;
void place(int x,int y)//o(n) approx. by T.P TRIPATHY
{
    int pvt = (x+y)>>1;
    while(1){//O(n/k)
            bool l = false,r = false;
            for(int i = x;i< pvt;++i)
                if(a[i] > a[pvt]) {Swap(a[i],a[pvt]);pvt = i;l = true;break;}
            for(int i = y;i >pvt;--i)
                if(a[i] < a[pvt]) {Swap(a[i],a[pvt]);pvt = i;r = true;break;}
            if(!l && !r) break;
            }
    if(n-pvt+1 >= k) {for(int i=0;i<n;++i) if(i >= n-k) cout<<a[i]<<" ";return;}
    else  place(x,pvt-1);
}
int main()
{
    cin>>n>>k;//n = no of elements and k = no of bigger elements we want
    for(int i =0;i<n;++i) cin>>a[i];//given array
    place(0,n-1);//worst case O(n+ n/2 + n/4 + n/8..............) < O(2*n)
    return 0;
}

- Anonymous May 17, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

hash it using x%n function........initialize the hashing array above the size limit......then check whether a no. is present in the hashtable from end...i.e n-1....if present print and increment count..when count reaches 1000 break...
if wrong plz comment ur suggestions

- Anonymous June 10, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

I am thinking for using bitwise hash, suppose its integer then there is need of 2 ^ 27 integer array, if we use double hashing then this required size of continguous space can be reduced in range of 2^14 to 2 ^ 27. For each number first hash will decide right array and 2nd hash will place in that specific array bit as per its place in array.
Then scan numbers from maximum, first 1000 numbers will be maximum. To reduce complexity we can put number count with array so that don't need to scan array those are blank.

CareerCup

Amazon Interview Question for Interns

Books

Videos

Resume Review

Mock Interviews