Write the code to find the med

Facebook Interview Question for SDE1s

0

of 0 votes

6
Answers
Write the code to find the median of an unsorted array in average linear time.
followup the array is distributed across many machines.
- ajay.raj July 21, 2017 in United States | Report Duplicate | Flag | PURGE
Facebook SDE1

Email me when people comment.

An error occurred in subscribing you.

Country: United States

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 2 vote

The Linear Median finding algorithm divides the input (A) into groups of size 5, finds the medians of medians, and use it as the pivot to partition the data; then according to the locations of pivot, it decides to proceed in the left or right of it.
This "Magical" algorithm (as explained in CLRS) is in O(n). Here is its implementation in c++. Please note that it works for ant k-th smallest value and for the median the input k should be A.size()/2

int FastMedian(int *A, int begin, int end, int k); // finds the k-th smallest

int main()
{
	int A[] = { 14, 23, 13, 34, 15, 45, 65, 2};
	cout << FastMedian(A,0,8,4)<<endl;
	getchar();
	return 0;
}

int FastMedian(int *A, int begin, int end,int k)
{
	/*
	Analysis:
		T(n) = T(7n/10) + T(n/5) + O(n)
			 = O(n)
	*/

	// border condition
	int size = end - begin;
	if (end - begin <= 5)
	{
		sort(A + begin, A + end);
		return A[k];
	}

	// break down to groups of size 5, sort each and find the median of median
	int m = floor(size / 5), r = size % 5, *medians;
	int count = m, tmp;
	if (r) {
		medians = new int[m + 1]; count++;
	}
	else medians = new int[m];
	for (int i = 0; i < m; i++)
	{
		sort(A + begin + 5 * i, A + begin + 5 * i + 5);
		medians[i] = A[begin + 5 * i + 2];
	}
	if (r)
	{
		sort(A + begin + 5 * m, A + end);
		medians[m] = A[begin + (5 * m + end) / 2];
	}
	int mm = FastMedian(medians, 0, count, count / 2);
	//find the location of median in matrxi A
	tmp = -1;
	for (int i = 0; i < m; i++)
		if (A[begin + 5 * i + 2] == mm)
		{
			tmp = 5 * i + 2;
			break;
		}
	if (r && tmp == -1) tmp = begin + (5 * m + end) / 2;

	// apply the partition and divide and conqure
	Swap(A, begin, tmp);
	mm = Partition(A,begin,end);
	if (mm == k) return A[mm];
	if (mm<k) return FastMedian(A, mm+1, end, k);
	return FastMedian(A, begin, mm-1, k);
}

- a.asudeh July 22, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 1 vote

Working solution in Java. Array elements are added one after the other so can be fed from other machines as well.

import java.util.*;

class Main {
    static PriorityQueue<Integer> minHeap = new PriorityQueue<>();
    static PriorityQueue<Integer> maxHeap = new PriorityQueue<>(new Comparator<Integer>(){
        public int compare(Integer a,Integer b){
            return b.compareTo(a);
        }
    });

    public static void main(String[] args) {
        System.out.println("Hello, world!");
        int[] arr = {4,3,1,8,4,7,6};
        for(int i: arr){
            findmedian(i);
        }
    }

    public static void findmedian(int x){
        if(maxHeap.isEmpty() || x<maxHeap.peek()){
            maxHeap.add(x);
        }else{
            minHeap.add(x);
        }
        rebalance();
        printmedian();
    }

    public static void rebalance(){
        if(Math.abs(minHeap.size()-maxHeap.size())>1){
            PriorityQueue<Integer> biggerHeap = (minHeap.size()>maxHeap.size())?minHeap:maxHeap;
            PriorityQueue<Integer> smallerHeap = (minHeap.size()>maxHeap.size())?maxHeap:minHeap;
            int element = biggerHeap.remove();
            smallerHeap.add(element);
        }
    }

    public static void printmedian(){
        PriorityQueue<Integer> biggerHeap = (minHeap.size()>maxHeap.size())?minHeap:maxHeap;
        PriorityQueue<Integer> smallerHeap = (minHeap.size()>maxHeap.size())?maxHeap:minHeap;
        if(biggerHeap.size()==smallerHeap.size()){
            float median = (float)(biggerHeap.peek()+smallerHeap.peek())/2;
            System.out.println("ans:"+median);
        }else{
            System.out.println("ans:"+biggerHeap.peek());
        }
    }
}

- krupen.ghetiya July 22, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

if Range of the Keys is limited
eg if you are finding a median Age use Counting Sort to find the median
Credits : -Intro to Algo CLRS

{{
A-> Given Unsorted Array N-> number of Elements in Array; m - Range of Values
Create a New array getCount ---Set index of m 0 to m-1 to 0
for i=0 to N-1 //loop through the Array A
Index= A[i]-1
getCount [Index] ++; //increment the counting table

}}

After this run the following
{{
N-> number of Elements in Array A; m - Range of Values getCount > array produced above
Set Variable lessThan to 0
j=0
while j is less than m-1 or
if(getEqual[j] >0)

lessThan =getEqual[j] +lessThan
if(lessThan==N/2 and N%2==0) // Edge case where A is even and the median needs to be average of 2 Central elements
{
return (LastVariable+m+1)/2;
}
if(lessThan>N/2)
{
return m+1;
}
var LastVariable=m+1
}

}}

- Ankit July 21, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

- the common approach is to use the quicksort partition method.
partition will pick a guess of a median and place smaller elements
on the left side of the array, equal elements in the middle part and
greater element on the right side and return the index of start and end
of middle part.
This is applied repeatedly until the array is separated
into two equal sized parts (considering the even length case). It has
the challenge of selecting a good pivot element...
One needs to consider as well odd and even length arrays, see code below.

- how to distribute:
one approach could be to guess the median, by just
determening the median on one machine. This would work if the
distribution accross machines is uniform and if a statistical
error is accepted
- alternatively one could use median of median method, where
every machine calculates a median and at the end median
of medians is calculated. This will be a pretty good aproximation,
but it's still an approximation.
- first machine does guess a good median, e.g. by using
median of medians, then it promotes this median to
other machines, each machine will then send back, how many
elements are left and right of that element, this information
is then used to repeat the process (look left or right)
the problem here is the communication among the machines, the slowest
or potentially a dead machine will slow the progress, further machines
don't work until the intermediate step has been completed by all others.
- a combination of those steps would be good, every machine
calculates the median +/- 10 elements and sends it to a coordinator.
the coordinater then determines median of medians and determines if it's
within the elements provided, if so, it can calculate the perfect
median, if not it will get back, with the best aproximation...
- what should be considered is that in a real life application
there is probably no point in time where the data won't change (e.g. grow).
so, if the perfect mean needs to be found and the algorithm is supposed
to terminate under all circumstances, some sort of snapshot mechanism
is required.

anyway here the median based on partition

def partition(data, begin, end):
    pivot_idx = (begin + end) // 2 # midle, better randomize or median of median
    pivot = data[pivot_idx] 
    data[end], data[pivot_idx] = data[pivot_idx], data[end] # move pivot to the end
    piv_begin = 0 #[begin..piv_begin) < pivot
    piv_end = 0 #[piv_begin..piv_end) = pivot
    idx = 0 #[piv_end..idx) > pivot
    while idx <= end:
        if data[idx] < pivot:
            data[piv_end], data[idx] = data[idx], data[piv_end] # swap
            data[piv_begin], data[piv_end] = data[piv_end], data[piv_begin] # swap
            piv_end += 1
            piv_begin += 1
        elif data[idx] == pivot:
            data[piv_end], data[idx] = data[idx], data[piv_end] # swap
            piv_end += 1
        idx += 1
    return (piv_begin, piv_end - 1)

def median(data):
    n = len(data)
    if n == 0: return None
    if n == 1: return data[0]
    begin = 0
    end = n - 1
    while begin < end:
        m_beg, m_end = partition(data, begin, end)
        if m_beg > (n - 1) // 2:
            end = m_beg
        elif m_end < (n - 1) // 2:
            begin = m_end
        else:
            begin = (n - 1) // 2
            end = (n - 1) // 2
    
    if n % 2 == 1: return data[begin] #odd case, one element
    return (data[begin] + min(data[begin + 1:])) / 2 #even case average of the two midle elements

print(median([1,2,3])) #2
print(median([3,2,1,4])) #2.5
print(median([2,1])) #1.5
print(median([3])) #3
print(median([9,9,9,1,1,1])) #5

- Chris July 22, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 2 vote

This problem is almost same as "Running Median" problem. It can be found in book "Cracking the coding interview 6th ed: - Gayle Laakmann". You will need 1 max heap and 1 min heap. When inserting, keep both of them balanced. The median will be the average of top element of both the heaps if they both have same number of elements, otherwise it will be the top element of the heap with more elements.

- krupen.ghetiya July 22, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

@krupen.ghetiya
the question asks for the *median* of all elements from a dataset where you have random access to the elements. Your answer is for a *running median* problem, that is the median can be queried any point in time from the seen elements. Typically in running median data arrives from a stream, which means no random access.
As a result, the two heaps *running median* is not linear in time, it's O(lg(N)). ...

Distribution is a different beast, too because it would be safe to assume the data won't fit into the memory of a single machine and the two heaps solution would need some major tricks/assumptions/modifications to work in most cases

- Chris July 22, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Raju919191

- rajudube1919@gmail.com July 23, 2017 | Flag Reply

CareerCup

Facebook Interview Question for SDE1s

Books

Videos

Resume Review

Mock Interviews