Google Interview Question
Software Engineer / DevelopersCountry: United States
Interview Type: In-Person
with explaination and implementation of quick select in java, blog.teamleadnet.com/2012/07/quick-select-algorithm-find-kth-element.html
Actually, you could do something like a bucket sort too. First, iterate over the entire array and find what the smallest and largest values are. Then, create an auxiliary array of size (largest - smallest + 1) and initiate all values in this array to zero. Now, at every i'th number, do: aux_array[i - smallest] = aux_array[i - smallest] + 1
In the end, iterate over the aux_array, with a variable seen_so_far = 0. At each position in aux_array, do
seen_so_far = seen_so_far + aux_array[i]
As soon as seen_so_far reaches floor(n/2), output the value (smallest + i). That would be your median.
yeh, the bucket sort idea works, and it's perfect for the case you have a huge stream of integers (like packets) which you can't fully store - space is bound by the integers range R and not by their total count N. It can even work if you have some outliers which are beyond the predefined "short range", as long as you assume that the median value is within that range. For handling outliers, count how many integers are smaller than RANGE_LOW (low_outliers) and how many integers are larger than RANGE_HIGH (not necessary if the total count N includes the outliers). In the end, start iterating the bucket sort array with seen_so_far = low_outliers.
#include <iostream>
#include <random>
using namespace std;
default_random_engine generator;
int randi(int low, int high){
uniform_int_distribution<int> distribution(low, high);
return distribution(generator);
}
int selectRand(int *array, int low, int high, int rank){
int randIndex = randi(low, high);
swap(array[low], array[randIndex]);
int pivot = array[low];
int lh = low+1;
int rh = high;
while(true){
while(lh<high && array[lh]<pivot) lh++;
while(rh>low && pivot<=array[rh]) rh--;
if(lh<rh)
swap(array[lh], array[rh]);
else
break;
}
swap(array[low], array[rh]);
int j=rh-low+1;
if(j < rank)
return selectRand(array, rh+1, high, rank-j);
if(j == rank)
return array[rh];
if(j > rank)
return selectRand(array, low, rh-1, rank);
}
//////////////////////////////////////////////////////////////////////////////
int main(){
int array[10] = {2,4,6,8,10,1,3,5,7,9};
cout << selectRand(array, 0, 9, 7) << endl;
for(int i=0; i<10; i++)
cout << array[i] << " " ;
return 0;
}
Use MaxHeap and MinHeap concept. The size of the maxHeap can be either equal or one greater than minheap. When a new element arrives, insert them into the maxHeap or minHeap as per follows:
If MaxHeap and minHeap size is equal then
1. If the number is greater than root of maxHeap (It has to be inserted into minHeap then.) But since the size of minheap cannot exceed MaxHeap, we will take the root of minheap and insert into MaxHeap. Then insert the new element into minHeap.
2. Otherwise, if the number is smaller that root of maxHeap, then insert the number in maxHeap. (Remember the size of maxheap can be 1 greater than Minheap).
If the size of maxHeap and minHeap is not the same then
1. If number is small than root of maxHeap, it has to be inserted in maxHeap. Since the size is not same, it means that the maxHeap is already 1 greater than minheap. Since the difference cannot be more than 1, we shift the root of MaxHeap into minHeap and insert the number in MaxHeap.
2. Otherwise, If number is larger than root of maxHeap, insert it into minHeap.
Use priority queue to implement MinHeap and MaxHeap behaior.
Code :
public class Practice52{
static class ComparatorMax implements Comparator<Integer>{
public int compare(Integer arg0, Integer arg1) {
// TODO Auto-generated method stub
return arg1 - arg0;
}
}
static class ComparatorMin implements Comparator<Integer>{
@Override
public int compare(Integer arg0, Integer arg1) {
// TODO Auto-generated method stub
return arg0 - arg1;
}
}
public static void main(String args[]) throws IOException{
ComparatorMax maxHeapComparator = new ComparatorMax();
ComparatorMin minHeapComparator = new ComparatorMin();
PriorityQueue<Integer> maxHeap = new PriorityQueue<Integer>(10,maxHeapComparator);
PriorityQueue<Integer> minHeap = new PriorityQueue<Integer>(10,minHeapComparator);
BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
while(true){
System.out.println("\n Enter the number :: ");
String str = br.readLine();
int num = Integer.parseInt(str);
if(num == -1){
break;
}
insertNum(num,maxHeap,minHeap);
System.out.println(displayMedian(maxHeap,minHeap));
}
}
public static double displayMedian(PriorityQueue<Integer> maxHeap,PriorityQueue<Integer> minHeap){
if(maxHeap.isEmpty()) return minHeap.peek();
if(minHeap.isEmpty()) return maxHeap.peek();
if(minHeap.size() == maxHeap.size())
return ((double)(minHeap.peek() + maxHeap.peek())/2);
else
return maxHeap.peek();
}
public static void insertNum(int num,PriorityQueue<Integer> maxHeap,PriorityQueue<Integer> minHeap){
if(maxHeap.size() == minHeap.size()){
if(maxHeap.peek()!= null && num > maxHeap.peek()){
maxHeap.offer(minHeap.poll());
minHeap.offer(num);
}else{
maxHeap.offer(num);
}
}else{
if(num < maxHeap.peek()){
minHeap.offer(maxHeap.poll());
maxHeap.offer(num);
}else{
minHeap.offer(num);
}
}
}
}
Use MaxHeap and MinHeap concept. The size of the maxHeap can be either equal or one greater than minheap. When a new element arrives, insert them into the maxHeap or minHeap as per follows:
If MaxHeap and minHeap size is equal then
1. If the number is greater than root of maxHeap (It has to be inserted into minHeap then.) But since the size of minheap cannot exceed MaxHeap, we will take the root of minheap and insert into MaxHeap. Then insert the new element into minHeap.
2. Otherwise, if the number is smaller that root of maxHeap, then insert the number in maxHeap. (Remember the size of maxheap can be 1 greater than Minheap).
If the size of maxHeap and minHeap is not the same then
1. If number is small than root of maxHeap, it has to be inserted in maxHeap. Since the size is not same, it means that the maxHeap is already 1 greater than minheap. Since the difference cannot be more than 1, we shift the root of MaxHeap into minHeap and insert the number in MaxHeap.
2. Otherwise, If number is larger than root of maxHeap, insert it into minHeap.
Use priority queue to implement MinHeap and MaxHeap behaior.
Code :
public class Practice52{
static class ComparatorMax implements Comparator<Integer>{
public int compare(Integer arg0, Integer arg1) {
// TODO Auto-generated method stub
return arg1 - arg0;
}
}
static class ComparatorMin implements Comparator<Integer>{
@Override
public int compare(Integer arg0, Integer arg1) {
// TODO Auto-generated method stub
return arg0 - arg1;
}
}
public static void main(String args[]) throws IOException{
ComparatorMax maxHeapComparator = new ComparatorMax();
ComparatorMin minHeapComparator = new ComparatorMin();
PriorityQueue<Integer> maxHeap = new PriorityQueue<Integer>(10,maxHeapComparator);
PriorityQueue<Integer> minHeap = new PriorityQueue<Integer>(10,minHeapComparator);
BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
while(true){
System.out.println("\n Enter the number :: ");
String str = br.readLine();
int num = Integer.parseInt(str);
if(num == -1){
break;
}
insertNum(num,maxHeap,minHeap);
System.out.println(displayMedian(maxHeap,minHeap));
}
}
public static double displayMedian(PriorityQueue<Integer> maxHeap,PriorityQueue<Integer> minHeap){
if(maxHeap.isEmpty()) return minHeap.peek();
if(minHeap.isEmpty()) return maxHeap.peek();
if(minHeap.size() == maxHeap.size())
return ((double)(minHeap.peek() + maxHeap.peek())/2);
else
return maxHeap.peek();
}
public static void insertNum(int num,PriorityQueue<Integer> maxHeap,PriorityQueue<Integer> minHeap){
if(maxHeap.size() == minHeap.size()){
if(maxHeap.peek()!= null && num > maxHeap.peek()){
maxHeap.offer(minHeap.poll());
minHeap.offer(num);
}else{
maxHeap.offer(num);
}
}else{
if(num < maxHeap.peek()){
minHeap.offer(maxHeap.poll());
maxHeap.offer(num);
}else{
minHeap.offer(num);
}
}
}
}
Sort the integers into an array using mod R, where R is the known short range. While sorting into the buckets record the position of the minimum element. This element is then the offset and the median is in bucket (R/2+offset) mod R.
If duplicate numbers are allowed need to take care of counting how many, and you have to actually walk along you're R buckets to find the median.
Complexity of this should be O(N) if N, number of integers, and O( N + R ) if duplicates are allowed.
public static int getMedian(int[] nums)
{
int min, max = nums[ 0 ];
for( int num : nums )
if( num > max )
max = num;
else if( num < min )
min = num;
int counts = new int[ max - min + 1 ];
for( num : nums )
counts[ num - min ] ++;
int sum = counts[ 0 ];
int med = 0;
while( sum < ( nums.length / 2 ))
{
sum += counts[ ++med ];
}
return med + min;
}
It's known short, ie small, right? I'd stick the /count of/ each integer into an array (or c++ vector, deque, whatever you want), with a global min and max for the numbers you've seen, and a total count (not sum) of the numbers seen. 'Min' acts as an offset telling you what to add to incoming numbers to make the index where their count is stored (eg, if the first number is '5' and you store it at index 0, the offset would be -5, ie -min). If the number is already present, bump the count (of eg the 5s you've seen). If the number is higher than your max seen, set the new max, set the array elements between your old max and your new one to 0, and bump the count at the new max. If you have a new min, well, painfully shift the array elements so as to accommodate the new min (maybe with some buffer space eh?), and set min and max accordingly. Now, this 'painful shift' is O(n), and you'll do it, what, 'size-of-known-short-range' / 2 times on average (right?) so that's actually constant. Everything else in maintaining this array is O(1). When you want to sample the median, walk from min to max, summing the array entry counts as you go, until the bin where you hit total count / 2. This number of this bin (taking your min offset into account) is the median.
Can be done in O(n+rlog(r)), with space requirement O(r), where r is the range of all elements.
Build a vector of buckets when iterating over the numbers. The first number is placed into first bucket. The subsequent numbers are placed into the buckets indexed by their distance from the first number. If numbers repeat, increment the count in the corresponding bucket.
When all numbers are done, sort the buckets in O(rlog(r)) time. Proces the counts to find the median in O(r) time.
The median is in the bucket where counts in the previous add up to half of the total count. The value of the median is the distance from the first element plus the value of the first element. In the case where the median falls in between two buckets, the median is the average of the distances of the two buckets from the first element plus the value of the first element.
This algo works on a stream of numbers. No need to store the numbers nor scan the numbers multiple times.
An improvement: Simply use a map of (key->count) pair to model the buckets. At the end the entries in the map is sorted by keys. For simplicity, a tree map is used.
double findMedian(Iterator<Integer> it) {
TreeMap<Integer, Integer> map = new TreeMap<Integer, Integer>();
int length = 0;
while (it.hasNext()) {
int d = it.next();
if (map.containsKey(d)) map.put(d, map.get(d)+1);
else map.put(d, 1);
length+=1;
}
int counts = 0;
for(int k:map.keySet()) {
counts+=map.get(k);
if (counts>length/2) {
Integer lowerKey = map.lowerKey(k); //in case there is no lower key
if (lowerKey==null) return k; //half elements in the same bucket
else if (counts-map.get(lowerKey)<length/2) { // in this bucket
return k;
}
else return (k+lowerKey)/2.0; //happen to fall in between two buckets
}
}
return -1; //won't happen, please compiler
}
Keep a max heap and a min heap. Whenever you insert in max heap pop an element and push it in min heap. Finally depending on the total number of elements, take the max and min and average them or take one from max heap.
This is a useful technique for keeping a running median of an incoming stream of integers. However, is it not really relevant when you have a fixed list of integers and you just want to calculate the median once. As others have mentioned, quickselect permits this to be done in linear time.
use quickselect with random pivot should achieve linear/O(n) time in practice
- Anonymous January 16, 2014