Facebook Interview Question
SDE1sCountry: United States
Working solution in Java. Array elements are added one after the other so can be fed from other machines as well.
import java.util.*;
class Main {
static PriorityQueue<Integer> minHeap = new PriorityQueue<>();
static PriorityQueue<Integer> maxHeap = new PriorityQueue<>(new Comparator<Integer>(){
public int compare(Integer a,Integer b){
return b.compareTo(a);
}
});
public static void main(String[] args) {
System.out.println("Hello, world!");
int[] arr = {4,3,1,8,4,7,6};
for(int i: arr){
findmedian(i);
}
}
public static void findmedian(int x){
if(maxHeap.isEmpty() || x<maxHeap.peek()){
maxHeap.add(x);
}else{
minHeap.add(x);
}
rebalance();
printmedian();
}
public static void rebalance(){
if(Math.abs(minHeap.size()-maxHeap.size())>1){
PriorityQueue<Integer> biggerHeap = (minHeap.size()>maxHeap.size())?minHeap:maxHeap;
PriorityQueue<Integer> smallerHeap = (minHeap.size()>maxHeap.size())?maxHeap:minHeap;
int element = biggerHeap.remove();
smallerHeap.add(element);
}
}
public static void printmedian(){
PriorityQueue<Integer> biggerHeap = (minHeap.size()>maxHeap.size())?minHeap:maxHeap;
PriorityQueue<Integer> smallerHeap = (minHeap.size()>maxHeap.size())?maxHeap:minHeap;
if(biggerHeap.size()==smallerHeap.size()){
float median = (float)(biggerHeap.peek()+smallerHeap.peek())/2;
System.out.println("ans:"+median);
}else{
System.out.println("ans:"+biggerHeap.peek());
}
}
}
if Range of the Keys is limited
eg if you are finding a median Age use Counting Sort to find the median
Credits : -Intro to Algo CLRS
{{
A-> Given Unsorted Array N-> number of Elements in Array; m - Range of Values
Create a New array getCount ---Set index of m 0 to m-1 to 0
for i=0 to N-1 //loop through the Array A
Index= A[i]-1
getCount [Index] ++; //increment the counting table
}}
After this run the following
{{
N-> number of Elements in Array A; m - Range of Values getCount > array produced above
Set Variable lessThan to 0
j=0
while j is less than m-1 or
if(getEqual[j] >0)
lessThan =getEqual[j] +lessThan
if(lessThan==N/2 and N%2==0) // Edge case where A is even and the median needs to be average of 2 Central elements
{
return (LastVariable+m+1)/2;
}
if(lessThan>N/2)
{
return m+1;
}
var LastVariable=m+1
}
}}
- the common approach is to use the quicksort partition method.
partition will pick a guess of a median and place smaller elements
on the left side of the array, equal elements in the middle part and
greater element on the right side and return the index of start and end
of middle part.
This is applied repeatedly until the array is separated
into two equal sized parts (considering the even length case). It has
the challenge of selecting a good pivot element...
One needs to consider as well odd and even length arrays, see code below.
- how to distribute:
one approach could be to guess the median, by just
determening the median on one machine. This would work if the
distribution accross machines is uniform and if a statistical
error is accepted
- alternatively one could use median of median method, where
every machine calculates a median and at the end median
of medians is calculated. This will be a pretty good aproximation,
but it's still an approximation.
- first machine does guess a good median, e.g. by using
median of medians, then it promotes this median to
other machines, each machine will then send back, how many
elements are left and right of that element, this information
is then used to repeat the process (look left or right)
the problem here is the communication among the machines, the slowest
or potentially a dead machine will slow the progress, further machines
don't work until the intermediate step has been completed by all others.
- a combination of those steps would be good, every machine
calculates the median +/- 10 elements and sends it to a coordinator.
the coordinater then determines median of medians and determines if it's
within the elements provided, if so, it can calculate the perfect
median, if not it will get back, with the best aproximation...
- what should be considered is that in a real life application
there is probably no point in time where the data won't change (e.g. grow).
so, if the perfect mean needs to be found and the algorithm is supposed
to terminate under all circumstances, some sort of snapshot mechanism
is required.
anyway here the median based on partition
def partition(data, begin, end):
pivot_idx = (begin + end) // 2 # midle, better randomize or median of median
pivot = data[pivot_idx]
data[end], data[pivot_idx] = data[pivot_idx], data[end] # move pivot to the end
piv_begin = 0 #[begin..piv_begin) < pivot
piv_end = 0 #[piv_begin..piv_end) = pivot
idx = 0 #[piv_end..idx) > pivot
while idx <= end:
if data[idx] < pivot:
data[piv_end], data[idx] = data[idx], data[piv_end] # swap
data[piv_begin], data[piv_end] = data[piv_end], data[piv_begin] # swap
piv_end += 1
piv_begin += 1
elif data[idx] == pivot:
data[piv_end], data[idx] = data[idx], data[piv_end] # swap
piv_end += 1
idx += 1
return (piv_begin, piv_end - 1)
def median(data):
n = len(data)
if n == 0: return None
if n == 1: return data[0]
begin = 0
end = n - 1
while begin < end:
m_beg, m_end = partition(data, begin, end)
if m_beg > (n - 1) // 2:
end = m_beg
elif m_end < (n - 1) // 2:
begin = m_end
else:
begin = (n - 1) // 2
end = (n - 1) // 2
if n % 2 == 1: return data[begin] #odd case, one element
return (data[begin] + min(data[begin + 1:])) / 2 #even case average of the two midle elements
print(median([1,2,3])) #2
print(median([3,2,1,4])) #2.5
print(median([2,1])) #1.5
print(median([3])) #3
print(median([9,9,9,1,1,1])) #5
This problem is almost same as "Running Median" problem. It can be found in book "Cracking the coding interview 6th ed: - Gayle Laakmann". You will need 1 max heap and 1 min heap. When inserting, keep both of them balanced. The median will be the average of top element of both the heaps if they both have same number of elements, otherwise it will be the top element of the heap with more elements.
@krupen.ghetiya
the question asks for the *median* of all elements from a dataset where you have random access to the elements. Your answer is for a *running median* problem, that is the median can be queried any point in time from the seen elements. Typically in running median data arrives from a stream, which means no random access.
As a result, the two heaps *running median* is not linear in time, it's O(lg(N)). ...
Distribution is a different beast, too because it would be safe to assume the data won't fit into the memory of a single machine and the two heaps solution would need some major tricks/assumptions/modifications to work in most cases
The Linear Median finding algorithm divides the input (A) into groups of size 5, finds the medians of medians, and use it as the pivot to partition the data; then according to the locations of pivot, it decides to proceed in the left or right of it.
This "Magical" algorithm (as explained in CLRS) is in O(n). Here is its implementation in c++. Please note that it works for ant k-th smallest value and for the median the input k should be A.size()/2
- a.asudeh July 22, 2017