Google Interview Question
Software Engineer / DevelopersCountry: United States
Interview Type: Phone Interview
The efficiency is O(NlogK), where N is the number of input elements and K is the size of the heap.
Nice one. here's my implementation in java:
---------------------------------------------------------------------
import java.util.PriorityQueue;
public class GetTopK {
public static PriorityQueue<Integer> getTopK(int k, int[] array) {
// Construct an empty queue
PriorityQueue<Integer> minQueue = new PriorityQueue<Integer>();
// Add the first k elements into the queue
for(int i = 0; i < k; i++) {
minQueue.add(new Integer(array[i]));
}
// Go over the rest of the array, and when finding an element larger than the minimum of
// minQueue, remove the minimum and add the new one
for(int i = k; i < array.length; i++) {
Integer minValue = minQueue.peek();
if(array[i] > minValue.intValue()) {
minQueue.poll(); // remove the min from the queue
minQueue.add(new Integer(array[i]));
}
}
// now min queue contains the top k elements.
return minQueue;
}
}
Keep an ordered data structure. Insert every element as you go through the list. If the ordered data structure is larger than 1000, remove the smallest element (so it always stays at 1000). Every insert is log(1000) which is a constant factor. You do that for every element in the list, so you have O(N * log(1000)) = O(N * log(C)) = O(N).
Of course this only works as long as the number of 'largest' elements is in fact constant. If you wanted the largest half of elements for example then you'd be back to O(N*log(N)) runtime.
What data structure are you using which allows you to find the element AND insert it in order in log(1000) time? You can't find the element in O(log n) time if you use a linked list, and you can't insert into an array in O(log n) time.
Use Quick sort partitioning method, to partition around the (n-k)th element. All elements to right of (n-k)th element will be the result.
In this case, n = 1 million, k = 1000.
Why can't we use kth select algorithm to select the 1000th largest element. Algorithm runs in O(n) time. Then apply Quick Sort Partition, total complexity- O(n).
Here is the code to find N largest element using Min heap . For simplicity i used an array as source of input elements and finding 10 largest element.
public class FindNlargestElement {
public static void main(String[] args) {
Integer arr[] = {2,45,34,67,133,55,32,88,99,36,78,66,56,90,3,45,16,7,63};
BinaryMinHeap heap = new BinaryMinHeap(10);
for (int i =0; i< arr.length;i++)
{
if (i <10)
{
heap.insert(arr[i]);
}
else
{
heap.insertAtRoot(arr[i]);
}
}
System.out.println(Arrays.asList(heap.getData()));
}
}
class BinaryMinHeap {
private Integer[] data;
private int heapSize;
public BinaryMinHeap(int size) {
data = new Integer[size];
heapSize = 0;
}
public int getMinimum() {
if (isEmpty())
throw new HeapException("Heap is empty");
else
return data[0];
}
public Integer[] getData()
{
return data;
}
public boolean isEmpty() {
return (heapSize == 0);
}
private int getLeftChildIndex(int nodeIndex) {
return 2 * nodeIndex + 1;
}
private int getRightChildIndex(int nodeIndex) {
return 2 * nodeIndex + 2;
}
private int getParentIndex(int nodeIndex) {
return (nodeIndex - 1) / 2;
}
class HeapException extends RuntimeException {
private static final long serialVersionUID = 1L;
public HeapException(String message) {
super(message);
}
}
public void insert(int value) {
if (heapSize == data.length)
throw new HeapException("Heap's underlying storage is overflow");
else {
heapSize++;
data[heapSize - 1] = value;
siftUp(heapSize - 1);
}
}
private void siftUp(int nodeIndex) {
int parentIndex, tmp;
if (nodeIndex != 0) {
parentIndex = getParentIndex(nodeIndex);
if (data[parentIndex] > data[nodeIndex]) {
tmp = data[parentIndex];
data[parentIndex] = data[nodeIndex];
data[nodeIndex] = tmp;
siftUp(parentIndex);
}
}
}
public void removeMin() {
if (isEmpty())
throw new HeapException("Heap is empty");
else {
data[0] = data[heapSize - 1];
data[heapSize-1]=null;
heapSize--;
if (heapSize > 0)
siftDown(0);
}
}
public void insertAtRoot(int item)
{
if (isEmpty())
throw new HeapException("Heap is empty");
else if (data[0] < item){
data[0] = item;
siftDown(0);
}
}
private void siftDown(int nodeIndex) {
int leftChildIndex, rightChildIndex, minIndex, tmp;
leftChildIndex = getLeftChildIndex(nodeIndex);
rightChildIndex = getRightChildIndex(nodeIndex);
if (rightChildIndex >= heapSize) {
if (leftChildIndex >= heapSize)
return;
else
minIndex = leftChildIndex;
} else {
if (data[leftChildIndex] <= data[rightChildIndex])
minIndex = leftChildIndex;
else
minIndex = rightChildIndex;
}
if (data[nodeIndex] > data[minIndex]) {
tmp = data[minIndex];
data[minIndex] = data[nodeIndex];
data[nodeIndex] = tmp;
siftDown(minIndex);
}
}
}
Probably faster ways, but I only thought about making it easy and still better than n log n
Repeat 1000 times:
1. Loop through the array and search for a max element (remember the index)
2. Add that element to your list
3. Remove the selected element from the original list (remove by index)
Now the list will contain the greatest 1000 elements of the list of million. Could have duplicates. Runtime is 1000 * O(n), which is O(n).
since the given list is already filled with million elements (assuming string, int or some data type), sort the elements in descending order. Use LSD Radix Sort (assuming fixed length elements). This will sort the array in O(N) time. Then print/extract the top 1000 elements in O(1000) time. Together, this approach should provide O(N) + O(1000) ~ O(N) time performance.
One of the solution can be using min heap of size (1000).i.e heap at any instance contains at most 1000 maximum elements.
Algo :-
1> For each input element compare the element with the root element of the heap.
2> if value(root) > value(element) ; replace root with input element and heapify
Time Complexity - O(nlogn).
Thats already covered in the Condition over. if value is largest than it will replace the root and restructure the heap
Using c#
//this is a node of a tree
Class Node
{
int val;
Node left;
Node Right;
}
//insert all the numbers to the tree
//then use the function below to get a list of numbers of 1000 elements
//pass the function the root of tree and the expected list of resultant 1000 numbers
void FindElements(Node N, List<int> res)
{
if(res.count>=1000)return;
if(N.Right!=null){FindElement(N.Right);}
id(res.count<1000)
{
res.Add(N.Val);
if(N.left!=null){FindElements(N.Left);}
}
}
Since there is nothing mentioned about the space complexity, declare an array of 1 million booleans. Say bool A[1 million].
Parse the list of numbers. Associate the highest number with 1 million in the array A and lowest 0 in the array. Call the minimum number as min.
Parse the array again, if x is any number present in the list of million digits, A[x - min] = 1.
Then traverse back from the highest index in the array set to 1 till you display 1000 elements.
implement min heap for 1000 elements.
void Heap::Insert(int input)
{
if(store[0] < input)
{
store[0] = input;
Heapify();
}
}
void Heap::Heapify( )
{
int index = 0;
int left = 2*index + 1;
int right = 2*index + 2;
int smallest = index;
while (true)
{
if (left < MAX && store[smallest] > store[left])
smallest = left;
if(right < MAX && store[smallest] > store[right])
smallest = right;
if (smallest != index)
{
swap(&store[index], &store[smallest]);
smallest = index;
left = 2*index + 1;
right = 2*index + 2;
}
else
{
break;
}
}
}
u can use the concept of B tree what u have to do is to make 10 heap each of size 100,so that u can handle the 1000 number,now keep these heap on secondary data and in the B tree u keep the location of file and the root value of each heap.u can also maintain a heap of size 10 which keep the track of which heap u have refer when u processing the data.at the end u can traverse ur B tree and get all the 1000 no. :)
Tell me if u feel any problem
How about a convex optimization approach with gradient descent? The memory requirements are almost nil (maybe one additional array), but there will be many iterations.
f = min_y ||x-y||_{2}^{2}
such that
||y||_{0} <=1000;
The above can be relaxed to a convex program by minimizing ||y-x||^2 + \lambda ||y||_{1}. Put a small weight on the sparsity constraint. The second term is a convex relaxation for sparsity or in bayesian terms its the laplace prior. It can be solved via coordinate gradient descent which has low memory requirements.
Find the 1000th largest element using quickselect, which is quick sort modified to only recurse on half of each partition containing the kth largest element. Then, iterate once more to find all elements less then or equal to the 1000th largest element.
import random
def swap(A, x, y):
tmp = A[x]
A[x] = A[y]
A[y] = tmp
def qselect(A, k, left=None, right=None):
left = left or 0
right = right = len(A) - 1
pivot = random.randint(left, right)
pivotVal = A[pivot]
# Move pivot out of the sorting range
swap(A, pivot, right)
swapIndex, i = left, left
while i <= right - 1:
if A[i] < pivotVal:
swap(A, i, swapIndex)
swapIndex += 1
i += 1
# Move pivot to final position
swap(A, right, swapIndex)
# Check if pivot matches, else recurse on the correct half
rank = len(A) - swapIndex
if k == rank:
return A[swapIndex]
elif k < rank:
return qselect(A, k, left=swapIndex+1, right=right)
else:
return qselect(A, k, left=left, right=swapIndex-1)
def find_largest(A, k):
kth_largest = qselect(A, k)
result = []
for item in A:
if item >= kth_largest:
result.append(item)
return result
Python code
import heapq
heap = []
def topK(i, k):
if len(heap) < k:
#simply insert
heapq.heappush(heap, i)
else:
if heap[0] < i:
heap[0] = i
heapq.heapify(heap)
print heap
if __name__ == "__main__":
topK(-1,3)
topK(-100,3)
topK(100,3)
topK(200,3)
topK(-200,3)
you can use the Heap to implement it, build a max root heap to get the greatest 1000 elements
How do you identify which number has to be inserted and which one to deleted in case of MAX heap.
I think Solution should be using min heap, insert first 1000 element in the heap. now in case new element is greater then the min element, delete the min and insert the new element.
It can be solved using heap data structure. The idea is to maintain a heap of 1000 largest elements while scanning through the million elements.
- Vijay April 24, 20131. Construct a min heap using the first 1000 elements. Now the root has minimum element out of the first 1000 elements.
2. Iterate over the remaining elements one by one. Let 'item' be the element under consideration
if the item is greater than root of the heap, replace the root with the item and heapify the tree.
else continue with the next element.
At the we will have the min heap of 1000 largest elements of the list
Time Complexity : O(N)
where N is the number of elements in the list