## Amazon Interview Question

Interns**Country:**United States

**Interview Type:**Phone Interview

Could you please explain it with set of 10 numbers. Solution seems to be interesting but not getting it.

What is i here ?

He's using quicksort.

You do need another condition if, elements to the right of the pivot are less than 1000 then you have to sort the left side of that particular pivot and add, right side of the new pivot + right side of the particular pivot. If you have more than 1000, you only need the first 1000, if it is less than 1000 then you have to do the left right again.

I hope you get to your destination after all these left and rights. Which you should!

Technically, you could go through the array 1000 times and each time get the max value which is smaller than the previous max value you had already found. This is exactly This will almost require 2000(n) comparisons.

Another approach, with an extra O(n) space is this:

1) Build a Max-Heap in O(N) time and space (I guess will take around 3n swaps and compares).

2) Extract-Max 1000 time which takes at worst 2000 log(N) time.

I don't get it. If you use a Min-Heap, then the top 1000 numbers will be the leaves of the tree. There is almost n / 2 leaves in the tree. So you know the search range is "n / 2". How will you use this then? Or is there another trick I am missing?

Maintain a min-heap of 1000 elements.

As you get new elements, check against the min. If smaller throw away. Else, extract min, and insert new element.

Space usage is constant, runtime is linear.

Voila!

@Ehsan: Comparisons are not the only cost. Maintaining/mucking around with a huge array can have a huge cost (cache missed/paging etc).

Talking about 3n or (10-20)n is pointless.

That is one factor, but when it comes to speed, talking about 1.5 n and 1.0 n makes a lot of sense. Let alone 3n, 10-20n , and 1000n.

Best is to profile it and figure out. Yes, knowing the approximate number etc is good, but talking about only that is misleading.

This would give us O(n log n) in the worst case. Say the numbers are in decreasing order. You keep on putting the next element into the min-heap (O(log n)for all elements (O(n))

You might just as well sort the numbers in O(n log(n)) time and return the largest 1000 numbers.

The answer to this question is quick find with an initial shuffle

How does the MinHeap solution work in O(N) time?

- Building the heap, i.e., inserting and deleting takes O(log N) time.

- When you are done building the heap which now maintains the largest 1000 elements, you need to extract these elements from the heap. To extract them, you need to call 'delete' 1000 times. That makes the complexity O(1000 log N) which is linearithmic.

use median of medians, O(n), to get 1000th largest

then scan array picking out all numbers >= the one you got from above

Runtime: O(n) worst case

------------------------

or you can use QuickSelect for 1000th largest (expected linear, but worst case with low probability to be quadratic)

Or just maintain the 1000 largest (in an array/linked list/min-heap etc) while you run through the elements. I vote for min-heap.

Worst case analysis:

All three solutions you mention above would be supralinear in time.

Array/linkedList would be O(nm)=...= O(n^2).

min-heap would be O(n*lgm) =...= O(n*lgn)

the =...= is used to denote "if n=theta(m)"

Another method is to use a max-heap which would be:

O(n + m*lgn) = ...= O(n*lgn)

Ok.

So if you tell me to find the biggest K elements of an array of size N, I can just say well K doen't count because it is given, and I'll give a O(K^K * N) algorithm, then I'll make K disappear into O(N). Equi-linear.

QuickSelect is blazingly fast in practice (works well with cache), and will drop the result in the first 1000 elements of your original array ready for your bidding.

Or you can build a min heap of size 1000, then do all that other stuff if you like...

Did you even try the min-heap solution (implemented using an array)?

It is blazingly blazingly fast, much better than Quickselect.

Your comments are irrelevant.

Ok if you say so.

Creating and playing with another array (for the min-heap) without locality of reference with the original array is much faster :)

Stop making comments like this, and you might get a better position in programming contests :-)

Really, try it out, first.

I like min-heap solution (especially with streams , a.k.a., getting elements one by one from a socket or some external source).

But not sure about the linkedlist/array/loop-1000-times solutions.

minheap vs. maxheap vs. linear_selection(quickSelect or MoM) really depends on the situation. We can't even "try it out" because so many other variables need to be known.

Also, if the elements are integers in a tight range, there are solutions that destroy all above.

:) Healthy debate is good for the brain, agree?

Yes, agree that workload needs to be known blah blah. Applies to comments made by you too :-)

Agree, debate is good.

Loop over the numbers and put them into an array of size 1000. Keep track of the index and value of the lowest number, when you need to replace it, loop over the array for the new lowest.

Should be 1001n or O(n) unless I'm missing something.

Ok, so you keep track of the index and value of the lowest number. Fine.

Say you replaced it, fine.

Now... how do you get the index of the "new" lowest number?

This is what makes it quadratic.

His approach is correct, not the fastests, but correct, and needs almost 2000N comparisons.

For each new number, you make a comparison with min of 1000 numbers. Then, if you replace, you do another 1000 comparisons to find the next min. This adds up to 2000. In total, it will be 2000N since you repeat for any of the N numbers.

```
#include <iostream>
#include <algorithm>
#include <vector>
using namespace std;
const int K = 1000;
const int N = 100000;
int main () {
vector<int> v;
for (int i = 0; i < K; i++) {
int d = rand();
v.push_back(d);
}
// create a min-heap with K elements
make_heap (v.begin(),v.end(), greater<int>());
for (int i = 0; i < N; i++) {
int d = rand();
// if the new number is greater than the min value of the heap
// remove the min value from the heap, and add the new value in
// the heap
if (d > v.front()) {
pop_heap (v.begin(),v.end(), greater<int>());
v.pop_back();
v.push_back(d);
push_heap (v.begin(),v.end(), greater<int>());
}
}
sort_heap (v.begin(),v.end(), greater<int>());
for (int i : v) cout << i << " ";
cout << endl;
return 0;
}
```

Use Selection Rank algoritm

Pick a random element in the array and use it as a ‘pivot’. Move all elements smaller than that element to one side of the array, and all elements larger to the other side.

»»If there are exactly i elements on the right, then you just find the smallest element on that side.

»»Otherwise, if the right side is bigger than i, repeat the algorithm on the right. If the right side is smaller than i, repeat the algorithm on the left for i – right.size().

cc book problem

Define a special data structure backed by doubly linked list with support of following features

(1) Add(Element) which can add up to 1000 elements in the list

(2) An internal var SIZE to record correct element size and each Add function check SIZE

(3) An internal var currentMIN to store current smallest item

(4) An internal pointer points to Node storing the var for currentMin

(5) When doing add(Element) if element > currentMin, then we append this new element to head of the list and remove the node storing currentMin

After insertion to this list from item 0 to n, then we will get a list storing only the largest 1000 elements

dude it is just a heap,

when we change the min,it takes lon=g n time for finding the min again,

hence it is nlogn solution.

Find 1001th smallest element in O(n) time using the Kth smallest logic. All the elements on less than this element is the answer.

max heap of 1000 size can work.as it take time complexity of n* log(1000) here log(1000) is constant,so time complexity is 0(n)

We have to use augmented data structure.

We have to create a balanced binary search tree.

Node structure will be like

```
Node
{
Node Left;
Node Right;
int value;
int NoOfChildNodes;
}
```

While adding each node we keep on updating "NoOfChildNodes" for all nodes who all will

be parent of new node.

After creating that tree we can find any ith Min/Max Node.

Now we have to find 1000th min node

we will start from root node and start traversing downwards.

```
If (node.NoOfChildNodes > 1000)
// traverse on right child
else
// traverse on right child. We have to update the i value also.
```

Let node.NoOfChildNodes = 1200 and i = 1000

if node.Left.NoOfChildNodes = 700 and Node.right.NoOfChildNodes = 500

then we have to find 300th Min element in right sub tree. (1000-700).

Keep on traversing downward till we get desired node.

Complexity computation

Making Balanced Binary search tree(ex AVL tree) for n nodes is: O(nlog(n))

For searching element in BST : O(log n)

For very large n vale , log(n) is negligible

that is nlog(n)+log(n) ~= nlog(n)

before you answer this question, you should ask

1) how big is the dataset, if it's huge and cannot fit in memory, you can use a minheap

2) otherwise, use selection algorithm.

3) can you change the data, since sort will change the array

4) how much extra space you can use

Ask these questions before you give a solution

```
using partition, till we pivot index is equal to 1000
public static int partition(int arr[], int left, int right) {
int pivot = arr[(left+right)/2];
int j = 0;
int y = 0;
int pivotIndex = (left+right)/2;
for (j = left, y = right; j < y;) {
if (arr[j] >= pivot && arr[y] <= pivot) {
if (j == pivotIndex) {
pivotIndex = y;
} else if (y == pivotIndex) {
pivotIndex = j;
}
int temp = arr[j];
arr[j] = arr[y];
arr[y] = temp;
}
if (arr[j] < pivot) {
j++;
}
if (arr[y] > pivot) {
y--;
}
}
return pivotIndex;
}
public static void finKthIndex(int[] arr,int k)
{
int pivotIndex =-1 ;
int left =0;
int right = arr.length-1;
do
{
pivotIndex = partition(arr, left, right);
if(pivotIndex==k)
{
System.out.println("Got it");
} else if(pivotIndex > k)
{
right = pivotIndex;
} else
{
left = pivotIndex;
}
}
while(pivotIndex!=k);
}
```

Well use Median of Medians to find the 1000th Number. In theory MOM is fast but in practice it works better if we choose a random number as the pivot. The algorithm to find the 1000th largest number is O(n). Once we find the 1000th number in once scan(O(n)) we can find the 1000 largest numbers.

It can be solved using medians of medians algorithm. It only gurantees O(n) worst case time complexity in ranking an element in a unsorted array. Normal Ranking algo using random pivot takes O(n2) in worst case, average case is O(n).

It can be found in Introduction to Algorithms-Cormen book in Chapter medians and Order Statistics.

In questions like these, it is helpful to ask more about the nature of the dataset. For e.g. If it contains unique integers within a certain range, we can just use a bit-vector to represent those integers and read the bit vector from the beginning to give a sorted representation. This is O(N) time.

```
public class Largest1000NumberFromArrayUnknownLength {
public static void main(String[] args) {
Stack<Integer> s1 = new Stack<Integer>();
Stack<Integer> s2 = new Stack<Integer>();
int[] a = { 20,19,18,17,16,15,14,13,12,11,10,9,8,7,6,5,4,3,2,1};
for (int i = 0; i < a.length; i++) {
s1.push(a[i]);
}
for (int i = 0; i < a.length; i++) {
if (s1.peek() > a[i]) {
s2.push(s1.pop());
}
else {
s1.push(a[i]);
}
}
for (int i = 0; i < 5; i++) {
System.out.println(s1.pop());
}
}
}
```

```
#define Swap(a,b)(a=b+a-(b=a))
#include<iostream>
using namespace std;
int a[1000000],n,k;
void place(int x,int y)//o(n) approx. by T.P TRIPATHY
{
int pvt = (x+y)>>1;
while(1){//O(n/k)
bool l = false,r = false;
for(int i = x;i< pvt;++i)
if(a[i] > a[pvt]) {Swap(a[i],a[pvt]);pvt = i;l = true;break;}
for(int i = y;i >pvt;--i)
if(a[i] < a[pvt]) {Swap(a[i],a[pvt]);pvt = i;r = true;break;}
if(!l && !r) break;
}
if(n-pvt+1 >= k) {for(int i=0;i<n;++i) if(i >= n-k) cout<<a[i]<<" ";return;}
else place(x,pvt-1);
}
int main()
{
cin>>n>>k;//n = no of elements and k = no of bigger elements we want
for(int i =0;i<n;++i) cin>>a[i];//given array
place(0,n-1);//worst case O(n+ n/2 + n/4 + n/8..............) < O(2*n)
return 0;
}
```

I am thinking for using bitwise hash, suppose its integer then there is need of 2 ^ 27 integer array, if we use double hashing then this required size of continguous space can be reduced in range of 2^14 to 2 ^ 27. For each number first hash will decide right array and 2nd hash will place in that specific array bit as per its place in array.

Then scan numbers from maximum, first 1000 numbers will be maximum. To reduce complexity we can put number count with array so that don't need to scan array those are blank.

Use the selection rank algorithm.

- Nitin March 13, 2014Pick a random element in the array and use it as a ‘pivot’. Move all elements smaller than that element to one side of the array, and all elements larger to the other side.

»»If there are exactly i elements on the right, then you just find the smallest element on that side.

»»Otherwise, if the right side is bigger than i, repeat the algorithm on the right. If the right side is smaller than i, repeat the algorithm on the left for i – right.size().

cc book problem