## Google Interview Question Software Engineer / Developers

- 0of 0 votes
Which data structure will you use to store large number of integers? How will you find the median ? (Hint: B-tree , same as bst store number of nodes of subtrees)

Note the qualifier: "large number". If Google says large number, that should be massive. That amount of data can only be stored in hard drive or tape etc, where IO dominates the efficiency. So, to speed up searching, B-tree is optimal.

For median, I guess we can search in the leaf nodes. B-tree is ordered. So, you can find the median half way through keys of the leaf nodes.

Note the qualifier: "large number". If Google says large number, that should be massive. That amount of data can only be stored in hard drive or tape etc, where IO dominates the efficiency. So, to speed up searching, B-tree is optimal.

For median, I guess we can search in the leaf nodes. B-tree is ordered. So, you can find the median half way through keys of the leaf nodes.

<pre lang="" line="1" title="CodeMonkey45093" class="run-this">Use two heaps

Max heap and Min heap

( The constraint is that the size difference between minheap and maxheap should be 0 or 1)

Read the elements from source:

median is represented by the root of maxheap ( if number of elements is odd)

median is represented by the average of the root of maxheap and minheap (if the number of elements is even)

if the incoming element is less than the median

if the size of maxheap is greater than minheap by 1. Delete the element from maxheap and insert to minheap.

insert the element to maxheap

else

if the size of minheap is greater than maxheap by 1. Delete the element from minheap and insert to maxheap.

insert the element to minheap

comments Welcome :-)</pre><pre title="CodeMonkey45093" input="yes">

</pre>

Here is my solution (C). For the even tree_size, traverse_inorder may be optimized for returning two nodes instead of one (using a struct)

```
/**
* Traverse inorder
* left-root-right
* @param N Nth number in the ordered list
*/
int traverse_inorder(Node_t *root, int N)
{
static int counter = 0;
if(!root) return;
traverse_inorder(root->left, N);
counter++;
if(counter == N) /*Nth number in list is found */
{
return root->data;
}
traverse_inorder(root->right, N);
}
find_median(Node_t *root, int tree_size)
{
int N = 0;
int lower = 0;
int higher = 0;
if(tree_size & 1)
{
N = (tree_size/2) + 1;
return traverse_inorder(root, N);
}
else
{
N = tree_size/2;
lower = traverse_inorder(root, N);
N++;
higher = traverse_inorder(root, N);
return (lower + higher)/2;
}
}
```

Your function traverse_inorder uses a static counter. The function won't return an accurate result more than once because the counter is not reset during subsequent calls.

guys does it make sense to use self-balancing trees as root will roughly represent the median of all the numbers...

how does this sounds?

Why can't we use arrays to store large number of integers

- abhimanipal on February 02, 2010 Edit | Flag ReplyWe can use the Select Algorithm to find the median. This algorithm also has time complexity of O(n)