Adobe Interview Question
MTSsCountry: India
Interview Type: In-Person
Looks like this would work. Seems like a pretty smart solution. The complexity seems like O(n) as it takes linear time to build a heap. So, to build 2 heaps of size n/2 will take will be O(n) as well.
sivibasava, in your post you state that "if both heaps are of equal size peep top element from both the heaps, sum them and divide the sum by 2 and return" . This wil not work if you have the dynamically generated sequence 5, 5, 5, 5, 7, 7, 7, 7. According to your approach the median should be 6 which is not the case.
How did you build your heap. After processing all of your input
your heap ( if you followed the algo) should be
4 elements (5) in on heap and other 4 elements (7) in other heap. And once median is requested you will peep 5 & 7 and return 6.
heap max: heap min ramaing input median
{} {} {5,5,5,5,7,7,7,7} -1
{5} {} {5,5,5,7,7,7,7} 5
{5} {5} {5,5,7,7,7,7} 5
{5,5} {5} {5,7,7,7,7} 5
{5,5} {5,5} {7,7,7,7} 5
{5,5} {5,5,7} {7,7,7,7} 5 ret top ele frm max size heap
{5,5,5} {5,7,7} {7,7} 5
{5,5,5} {5,7,7,7} {7,7} 5
{5,5,5,5} {7,7,7,7} {} 6
How did you build your heap. After processing all of your input
your heap ( if you followed the algo) should be
4 elements (5) in on heap and other 4 elements (7) in other heap. And once median is requested you will peep 5 & 7 and return 6.
heap max: heap min ramaing input median
{} {} {5,5,5,5,7,7,7,7} -1
{5} {} {5,5,5,7,7,7,7} 5
{5} {5} {5,5,7,7,7,7} 5
{5,5} {5} {5,7,7,7,7} 5
{5,5} {5,5} {7,7,7,7} 5
{5,5} {5,5,7} {7,7,7,7} 5 ret top ele frm max size heap
{5,5,5} {5,7,7} {7,7} 5
{5,5,5} {5,7,7,7} {7,7} 5
{5,5,5,5} {7,7,7,7} {} 6
According to wikipedia, if you have an even number of observations ,the median has several alternative definitions the choice being either 1) always the smallest one 2) always the largest one 3) randomly choose between the two. The advantages of 1) , 2) and 3) are that it is guaranteed that the the median is always a list element(e.g a list of integers will never have a fractional median) and it guarantees that the median exists for any ordinal valued data. sivabasava, how would you adapt your program to the alternative definitions of median?
In the even case: peep elements in both the heaps, let they be e1 and e2. In the solution I have given (e1+e2)/2 as my median, lets see how to attempt wikipedia cases.
Case 1: choose the smallest
simple return Math.min(e1,e2)
Case 2: always the largest one
simple return Math.max(e1,e2)
Case 3: randomly choose between the two
if random.randnextint(2) == 1 return e1 else e2
Hope this helps.
sivabasava, please let me know your professional opinion of the following SQLITE MEDIAN function. Thank you Cameron91
{{
CREATE TABLE BULBLIFE(HOURS INT)
INSERT INTO BULBLIFE VALUES(4)
INSERT INTO BULBLIFE VALUES(4)
INSERT INTO BULBLIFE VALUES(4)
INSERT INTO BULBLIFE VALUES(4)
INSERT INTO BULBLIFE VALUES(6)
INSERT INTO BULBLIFE VALUES(6)
INSERT INTO BULBLIFE VALUES(6)
INSERT INTO BULBLIFE VALUES(6)
Following is a SQLITE MEDIAN function solution credited to David Rozenshtein, Anatoly Abramovich, and Eugene Birger
SELECT x.hours median
FROM BULBLIFE x, BULBLIFE y
GROUP BY x.HOURS
HAVING
SUM(CASE WHEN y.HOURS <= x.HOURS
THEN 1 ELSE 0 END) >= (COUNT(*) + 1)/2 AND
SUM(CASE WHEN y.HOURS >= x.HOURS
THEN 1 ELSE 0 END) >=(COUNT(*) + 1)/2 + 1
SQLITE returns MEAN 4
}}
sivabasava, To handle data which is dynamically added at runtime. I modified the algorithm of the esteemed investment banking firm Goldman Sachs engineers --David Rozenshtein, Anatoly Abramovich , Eugene Birger in the following way
CREATE INDEX IDX_HOURS ON BULBLIFE(HOURS ASC)
Each time I insert data, INSERT INTO BULBLIFE VALUES(5 or 7). the SQLITE INDEX data structure gets modified
As a result when the Goldman Sachs' engineers' SQL statement is applied after each insert statement., the new median is accurately determined and changes if required. Please let me know if my modification is correct. Thank you Cameron Chang.
Hold on.
I guess the above code fails if there are odd distinct elements.
Example consider: x = { 1,2,3}
cond1: sum( y.hours<=x.hours) >= (count(*)+1)/2
cond2: sum( y.hours>=x.hours) >= (count(*)+1)/2+1
for element 1:
1 >= 2 -- false -no need to consider the other condition
for 2:
cond1: 2>= 2 true
cond2: 2 >= 3 -- false
for 3:
cond1: 3 >=2 true
cond2: 1>= 3 false
so, the median function fails. Let me know if i misunderstood.
sivabasava. I get the resut median = 2 for odd number of elements {1,2,3} if I run the following SQLITE statements:
CREATE TABLE BULBLIFE (HOURS INT);
CREATE INDEX IDX_HOURS ON BULBLIFE(HOURS ASC)
INSERT INTO BULBLIFE VALUES(1)
INSERT INTO BULBLFE VALUES(2)
INSERT INTO BULBLIFE VALUES(3)
SELECT X.HOURS median
FROM BULBLIFE X,BULBLIFE Y
GROUP BY X.HOURS
HAVING
SUM(CASE WHEN Y.HOURS <= X.HOURS
THEN 1 ELSE 0 END) >= (COUNT(*) + 1) / 2 AND
SUM(CASE WHEN Y.HOURS >= X.HOURS
THEN 1 ELSE 0 END) >=(COUNT(*)/2) + 1
sibabasava, please run the above sql statements and please let me know what you get. Thank you. Cameron.
Mean: keep track of sum and number of elements.
and print sum/no_of_element.
Median: Use linkedlist to store the items in sorted fashion.
keep track of number of elements stored in linked list
Whenever a new item comes, add it into list.
based on the count of elements find the median.
Any better idea is welcome...
Hope you know how to find mean ( sum of elements till now / number of elements ).
- sivabasava October 05, 2012For finding median:
Have two heaps :
one min heap and one max heap and keep their sizes ( number of element in its heap)
To find median:
1. if both heaps are of equal size peep top element from both the heaps, sum them and divide the sum by 2 and return.
2. else return the top element of the heap having max elements.
Insertion of x:
1 if x is less than median insert it into max heap else insert into min heap.
2. if heap size differ by > 1 adjust them ( pop from largest heap and insert into small heap).