How do you find the greatest 1

Google Interview Question for Software Engineer / Developers

2

of 4 votes

40
Answers
How do you find the greatest 1000 elements in a list of a million elements? No other information given. What would be the runtime? Hint: You can do better than O(n log n). I didn't realize but it could be possible with Tree or Heaps.
- Ana April 23, 2013 in United States | Report Duplicate | Flag | PURGE
Google Software Engineer / Developer Algorithm

Email me when people comment.

An error occurred in subscribing you.

Country: United States
Interview Type: Phone Interview

More Questions from This Interview

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 14 vote

It can be solved using heap data structure. The idea is to maintain a heap of 1000 largest elements while scanning through the million elements.

1. Construct a min heap using the first 1000 elements. Now the root has minimum element out of the first 1000 elements.

2. Iterate over the remaining elements one by one. Let 'item' be the element under consideration
if the item is greater than root of the heap, replace the root with the item and heapify the tree.
else continue with the next element.

At the we will have the min heap of 1000 largest elements of the list

Time Complexity : O(N)
where N is the number of elements in the list

- Vijay April 24, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

-3

of 5 votes

You Logic's Time Complexity is O(nlogn)

For n iteration you are doing heapify meaning. n time's logn which is O(nlogn)

- hprem991 April 24, 2013 | Flag

Comment hidden because of low score. Click to expand.

-3

of 3 votes

shouldn't time complexity be O(n logn) , as we heapify for every element greater than the root.

- amber April 24, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 10 votes

The efficiency is O(NlogK), where N is the number of input elements and K is the size of the heap.

- rk April 24, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 7 votes

The efficiency is O(N log1000). since log1000 is a constant, complexity is O(N) only.

- Vin April 25, 2013 | Flag

Comment hidden because of low score. Click to expand.

-3

of 3 votes

In the worst case this algorithm is O(n log n), since the list can be reverse sorted and you end up calling heapify n times.

The best case in O(n)

I'm not sure on this one, by I this the average case (uniform number distribution) you will end up with O(n * log (n/2)) since at least 1/2 of the number you visit will need to be heapifyd

- roni April 28, 2013 | Flag

Comment hidden because of low score. Click to expand.

-1

of 1 vote

Here is my implementation to your solution in Java: {{{ import java.util.PriorityQueue; public class GetTopK { public static PriorityQueue<Integer> getTopK(int k, int[] array) { // Construct an empty queue PriorityQueue<Integer> minQueue = new PriorityQueue<Integer>(); // Add the first k elements into the queue for(int i = 0; i < k; i++) { minQueue.add(new Integer(array[i])); } // Go over the rest of the array, and when finding an element larger than the minimum of // minQueue, remove the minimum and add the new one for(int i = k; i < array.length; i++) { Integer minValue = minQueue.peek(); if(array[i] > minValue.intValue()) { minQueue.poll(); // remove the min from the queue minQueue.add(new Integer(array[i])); } } // now min queue contains the top k elements. return minQueue; } } - nonish May 08, 2013 | Flag

Comment hidden because of low score. Click to expand.

-1

of 1 vote

Nice one. here's my implementation in java:

---------------------------------------------------------------------
import java.util.PriorityQueue;

public class GetTopK {

public static PriorityQueue<Integer> getTopK(int k, int[] array) {

// Construct an empty queue
PriorityQueue<Integer> minQueue = new PriorityQueue<Integer>();

// Add the first k elements into the queue
for(int i = 0; i < k; i++) {
minQueue.add(new Integer(array[i]));
}

// Go over the rest of the array, and when finding an element larger than the minimum of
// minQueue, remove the minimum and add the new one
for(int i = k; i < array.length; i++) {
Integer minValue = minQueue.peek();

if(array[i] > minValue.intValue()) {
minQueue.poll(); // remove the min from the queue
minQueue.add(new Integer(array[i]));
}
}

// now min queue contains the top k elements.

return minQueue;

}

}

- nonish5 May 08, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 4 vote

Keep an ordered data structure. Insert every element as you go through the list. If the ordered data structure is larger than 1000, remove the smallest element (so it always stays at 1000). Every insert is log(1000) which is a constant factor. You do that for every element in the list, so you have O(N * log(1000)) = O(N * log(C)) = O(N).

Of course this only works as long as the number of 'largest' elements is in fact constant. If you wanted the largest half of elements for example then you'd be back to O(N*log(N)) runtime.

- Anonymous April 23, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 3 votes

What data structure are you using which allows you to find the element AND insert it in order in log(1000) time? You can't find the element in O(log n) time if you use a linked list, and you can't insert into an array in O(log n) time.

- Gayle L McDowell April 24, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

I imagine it would be a heap or balanced tree.

- eugene.yarovoi April 24, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 2 vote

Use Quick sort partitioning method, to partition around the (n-k)th element. All elements to right of (n-k)th element will be the result.
In this case, n = 1 million, k = 1000.

- Anonymous April 24, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

How wil you know piviot element ? ie which one is 1000th laregst?

- SK April 26, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Why can't we use kth select algorithm to select the 1000th largest element. Algorithm runs in O(n) time. Then apply Quick Sort Partition, total complexity- O(n).

- Anonymous April 28, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 1 vote

This is surely the answer. Using a kth selection algo + partition seems to be standard for this sort of problem and performs quite well.

- Jose Cuervo July 11, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 1 vote

Here is the code to find N largest element using Min heap . For simplicity i used an array as source of input elements and finding 10 largest element.

public class FindNlargestElement {

	
	public static void main(String[] args) {

		Integer arr[] = {2,45,34,67,133,55,32,88,99,36,78,66,56,90,3,45,16,7,63};
		
		BinaryMinHeap heap = new BinaryMinHeap(10);
		
		for (int i =0; i< arr.length;i++)
		{
			
			if (i <10)
			{
				heap.insert(arr[i]);
			}
			else
			{
				heap.insertAtRoot(arr[i]);
			}	
		}	
		System.out.println(Arrays.asList(heap.getData()));
		
	}

}



class BinaryMinHeap {

	private Integer[] data;
	private int heapSize;

	public BinaryMinHeap(int size) {
		data = new Integer[size];
		heapSize = 0;
	}

	public int getMinimum() {
		if (isEmpty())
			throw new HeapException("Heap is empty");
		else
			return data[0];
	}
	public Integer[] getData()
	{
		return data;
	}

	public boolean isEmpty() {
		return (heapSize == 0);
	}

	private int getLeftChildIndex(int nodeIndex) {
		return 2 * nodeIndex + 1;
	}

	private int getRightChildIndex(int nodeIndex) {
		return 2 * nodeIndex + 2;
	}

	private int getParentIndex(int nodeIndex) {
		return (nodeIndex - 1) / 2;
	}

	class HeapException extends RuntimeException {
		
		private static final long serialVersionUID = 1L;

		public HeapException(String message) {
			super(message);
		}

	}

	public void insert(int value) {
		if (heapSize == data.length)
			throw new HeapException("Heap's underlying storage is overflow");
		else {
			heapSize++;
			data[heapSize - 1] = value;
			siftUp(heapSize - 1);
		}
	}

	private void siftUp(int nodeIndex) {
		int parentIndex, tmp;
		if (nodeIndex != 0) {
			parentIndex = getParentIndex(nodeIndex);
			if (data[parentIndex] > data[nodeIndex]) {
				tmp = data[parentIndex];
				data[parentIndex] = data[nodeIndex];
				data[nodeIndex] = tmp;
				siftUp(parentIndex);
			}

		}

	}

	public void removeMin() {
		if (isEmpty())
			throw new HeapException("Heap is empty");
		else {
			data[0] = data[heapSize - 1];
			data[heapSize-1]=null;
			heapSize--;
			if (heapSize > 0)
				siftDown(0);
		}

	}
	
	public void insertAtRoot(int item)
	{
		if (isEmpty())
			throw new HeapException("Heap is empty");
		else if (data[0] < item){
			data[0] = item;
			siftDown(0);
		}
	}
	private void siftDown(int nodeIndex) {

		int leftChildIndex, rightChildIndex, minIndex, tmp;
		leftChildIndex = getLeftChildIndex(nodeIndex);
		rightChildIndex = getRightChildIndex(nodeIndex);
		if (rightChildIndex >= heapSize) {
			if (leftChildIndex >= heapSize)
				return;
			else
				minIndex = leftChildIndex;
		} else {
			if (data[leftChildIndex] <= data[rightChildIndex])
				minIndex = leftChildIndex;
			else
				minIndex = rightChildIndex;
		}
		if (data[nodeIndex] > data[minIndex]) {
			tmp = data[minIndex];
			data[minIndex] = data[nodeIndex];
			data[nodeIndex] = tmp;
			siftDown(minIndex);
		}
	}
}

- Prashant April 27, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 1 vote

Probably faster ways, but I only thought about making it easy and still better than n log n

Repeat 1000 times:
1. Loop through the array and search for a max element (remember the index)
2. Add that element to your list
3. Remove the selected element from the original list (remove by index)

Now the list will contain the greatest 1000 elements of the list of million. Could have duplicates. Runtime is 1000 * O(n), which is O(n).

- MxyL April 29, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

I have tried this when I had 100,000 elements and is much faster than quick sort.

- Somebody May 28, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

since the given list is already filled with million elements (assuming string, int or some data type), sort the elements in descending order. Use LSD Radix Sort (assuming fixed length elements). This will sort the array in O(N) time. Then print/extract the top 1000 elements in O(1000) time. Together, this approach should provide O(N) + O(1000) ~ O(N) time performance.

- mindstatic1976 April 24, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

I think this approach might work only if we assume that the initial set can't have duplicates.

Also, why are those questions so much easier than the one you get in the real interviews... At least from my experiences...

- Anonymous April 25, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

Divide the million elements of the input array into 1000 arrays of 1000 elements. Send each array to one of 1000 machines in your GRID computing array. Have them call Collections.sort() and return the first element to you.

What do you mean the network is down?

- David April 24, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

One of the solution can be using min heap of size (1000).i.e heap at any instance contains at most 1000 maximum elements.

Algo :-

1> For each input element compare the element with the root element of the heap.
2> if value(root) > value(element) ; replace root with input element and heapify

Time Complexity - O(nlogn).

- hprem991 April 24, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

What if the first value is the largest value in the list?

- Jim April 24, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Thats already covered in the Condition over. if value is largest than it will replace the root and restructure the heap

- hprem991 April 24, 2013 | Flag

Comment hidden because of low score. Click to expand.

-1

of 1 vote

Small correction: it should be

if value(root) < value(element)

- eugene.yarovoi April 24, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

-> Implement a Tree Set in reverse order, so we can get 1000 maximum elements in single shot. Complexity is O(N).
-> Implement a Tree set and use ListIterator and iterate from end. which also has the same time complexity.

- Prashanth April 24, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Using c#

//this is a node of a tree
Class Node
{
int val;
Node left;
Node Right;
}

//insert all the numbers to the tree
//then use the function below to get a list of numbers of 1000 elements
//pass the function the root of tree and the expected list of resultant 1000 numbers
void FindElements(Node N, List<int> res)
{
if(res.count>=1000)return;
if(N.Right!=null){FindElement(N.Right);}
id(res.count<1000)
          {
              res.Add(N.Val);
              if(N.left!=null){FindElements(N.Left);}
          }
}

- debdeepb34@gmail.com April 24, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Since there is nothing mentioned about the space complexity, declare an array of 1 million booleans. Say bool A[1 million].
Parse the list of numbers. Associate the highest number with 1 million in the array A and lowest 0 in the array. Call the minimum number as min.
Parse the array again, if x is any number present in the list of million digits, A[x - min] = 1.
Then traverse back from the highest index in the array set to 1 till you display 1000 elements.

- Anonymous April 25, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

implement min heap for 1000 elements.

void Heap::Insert(int input)
{
	if(store[0] < input)
	{
		store[0] = input;
		Heapify();
	}
}



void Heap::Heapify( )
{
	int index = 0;
	int left = 2*index + 1;
	int right = 2*index + 2;
	int smallest = index;

	while (true)
	{
		if (left < MAX && store[smallest] > store[left])
			smallest = left;

		if(right < MAX && store[smallest] > store[right])
			smallest = right;

		if (smallest != index)
		{
			swap(&store[index], &store[smallest]);
			smallest = index;
			left = 2*index + 1;
			right = 2*index + 2;
		}
		else
		{
			break;
		}
	}
}

- Yogesh Sharma April 26, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

u can use the concept of B tree what u have to do is to make 10 heap each of size 100,so that u can handle the 1000 number,now keep these heap on secondary data and in the B tree u keep the location of file and the root value of each heap.u can also maintain a heap of size 10 which keep the track of which heap u have refer when u processing the data.at the end u can traverse ur B tree and get all the 1000 no. :)
Tell me if u feel any problem

- go4gold April 29, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

How about a convex optimization approach with gradient descent? The memory requirements are almost nil (maybe one additional array), but there will be many iterations.

f = min_y ||x-y||_{2}^{2}
such that
||y||_{0} <=1000;

The above can be relaxed to a convex program by minimizing ||y-x||^2 + \lambda ||y||_{1}. Put a small weight on the sparsity constraint. The second term is a convex relaxation for sparsity or in bayesian terms its the laplace prior. It can be solved via coordinate gradient descent which has low memory requirements.

- anonu May 01, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Find the 1000th largest element using quickselect, which is quick sort modified to only recurse on half of each partition containing the kth largest element. Then, iterate once more to find all elements less then or equal to the 1000th largest element.

import random


def swap(A, x, y):
    tmp = A[x]
    A[x] = A[y]
    A[y] = tmp


def qselect(A, k, left=None, right=None):
    left = left or 0
    right = right = len(A) - 1
    pivot = random.randint(left, right)
    pivotVal = A[pivot]

    # Move pivot out of the sorting range
    swap(A, pivot, right)
    swapIndex, i = left, left
    while i <= right - 1:
        if A[i] < pivotVal:
            swap(A, i, swapIndex)
            swapIndex += 1
        i += 1
    # Move pivot to final position
    swap(A, right, swapIndex)

    # Check if pivot matches, else recurse on the correct half
    rank = len(A) - swapIndex
    if k == rank:
        return A[swapIndex]
    elif k < rank:
        return qselect(A, k, left=swapIndex+1, right=right)
    else:
        return qselect(A, k, left=left, right=swapIndex-1)
        

def find_largest(A, k):
    kth_largest = qselect(A, k)
    result = []
    for item in A:
        if item >= kth_largest:
            result.append(item)
    return result

- trunks7j May 02, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Python code

import heapq

heap = []
def topK(i, k):
    
    if len(heap) < k:
        #simply insert
        heapq.heappush(heap, i)
    else:
        if heap[0] < i:
            heap[0] = i
            heapq.heapify(heap)
            
    print heap
    
if __name__ == "__main__":
    
    topK(-1,3)
    topK(-100,3)
    topK(100,3)
    topK(200,3)
    topK(-200,3)

- EK MACHCHAR May 10, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

output is

[-1]
[-100, -1]
[-100, -1, 100]
[-1, 200, 100]
[-1, 200, 100]

- EK MACHCHAR May 10, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

O (n). Run through the list once replacing your saved 1000 largest numbers as you encounter them.

- Anonymous June 05, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Define: n = 1000000, k = 1000

1. Create a max heap of size n. O(n).
2. For K times:
2.1. Pop max, and max heapify the heap. O(logn).

Running time is: O(n) + O(Klogn), but it is really O(n) as it dominates.
Space: O(n)

- Anon June 11, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

-2

of 2 vote

you can use the Heap to implement it, build a max root heap to get the greatest 1000 elements

- Ben April 23, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

How do you identify which number has to be inserted and which one to deleted in case of MAX heap.

I think Solution should be using min heap, insert first 1000 element in the heap. now in case new element is greater then the min element, delete the min and insert the new element.

- Sanjay Kumar April 23, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 1 vote

Min-heap, not max-heap. (since you need to remove the smallest element each time)

- anon April 24, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

nice!

- Anonymous April 24, 2013 | Flag

CareerCup

Google Interview Question for Software Engineer / Developers

Books

Videos

Resume Review

Mock Interviews