## Facebook Interview Question

Software Engineers**Country:**United States

What you are looking for : [ cs.stackexchange.com/questions/57542/find-pth-percentile-of-a-stream-of-numbers ]

However, there is a neat other way. If the numbers are within a range ( well, all integers supported by machine naively will be ), then you can have a sorted dictionary with integers as key, and count of the integers as value.

That will compress the data, and any time, you want to calculate a certain %, it can be easily doable ( there is a formula ).

assume n is the length of the sequence, potentially infinite, p is the percentile given

This question is quite interesting as NoOne pointed out it's not just selecting kth element due to:

- n is not known until the iterator finally terminates

- there is no random access to the elemnts

- the number of elements may not fit into memory

- k is not known, if n is not known, because k is given as percentile

Options:

1) convert to array and find k-th element: does not deserve further comments. But it's O(n) space and O(n) runtime with quickselect.

2) slect the kth element (with array, binary tree, etc.) takes O(n*p) space and O(n*p*lg(n*p)) time, since p < 1 O(n) and O(n*lg(n)) so, still not better with space if n is getting close to infinite

3) probabilistic approach 1:

model a PMF (probability mass function) that represents the data seen so far. If the values in the stream have a small range, I'd count the occurence of every value. Alternatively I'd create intervals which creates a histogram. I can calculate the "estimated" p*n th element, by going to the right interval. I can be more exact by interpolating the value using linear, spline, etc. interpolation within the interval of interest. Interpolation only works, if the PMF is nearly continous.

The size of the intervals will define the memory consumption and error.

If I know the data distribution in advance, e.g. that it's normal, poisson, ... I could even get along with only the mean and standard deviation, both of wich can be computed online

in O(1)

Before we go to code, few things to note:

1) We dont know the size hence a naive approach would be to store all elements in BST and then return the element at proper position

2) Or we can still use a BST but remove the elements from the tree which we know are before the position desired for eg: if the position is 50% then our tree will never be more than size of n/2 Or n * .5 since every time we add two elements we know the position will shift by one hence we can get rid of older elements at that point since we dont care about them.

In other words p(n, d) <= p(N+1, d) .. your index of desired position will only increase as size increases.

The below solution returns the desired number by using BST in O(log(N)) in best case and O(N) in worst case with memory of O(n*percent/100)

```
public class Solution {
public int findNumber(Interator<Integer> itr, double percent) {
if(percent <= 0) {
throw new IllegalArgumentException("percent has to be positive");
}
int size = 0;
Node root;
while(itr.hasNext()) {
if(root == null) {
root = new Node();
root.value = itr.next();
}
else {
root.add(itr.next());
}
size++;
if(size * percent >= 1 && root != null) {
root = root.removeMin();
size--;
}
}
return root != null ? root.removeMin().value : Integer.MIN_VALUE;
}
private static class Node {
public int value;
public Node leftChild;
public Node rightChild;
public void add(int v) {
if(this.value >= v) {
if(this.leftChild == null) {
Node n = new Node();
n.value = v;
this.leftChild = n;
}
else this.leftChild.add(v);
}
else {
if(this.rightChild == null) {
Node n = new Node();
n.value = v;
this.rightChild = n;
}
else this.rightChild.add(v);
}
}
public Node void removeMin() {
if(this.leftChild == null) {
return this.rightChild;
}
else {
Node n = this;
while(n.leftChild != null && n.leftChild.leftChild != null) {
n = n.leftChild;
}
n.leftChild = n.leftChild.rightChild;
return this;
}
}
}
}
```

There was a slight bug in my code above so fixing it:

```
public class Solution {
public int findNumber(Interator<Integer> itr, double percent) {
if(percent <= 0) {
throw new IllegalArgumentException("percent has to be positive");
}
int size = 0;
Node root;
while(itr.hasNext()) {
if(root == null) {
root = new Node();
root.value = itr.next();
}
else {
root.add(itr.next());
}
size++;
if(size * percent >= 1 && root != null) {
root = root.removeMin();
size--;
}
}
return root != null ? root.getMin().value : Integer.MIN_VALUE;
}
private static class Node {
public int value;
public Node leftChild;
public Node rightChild;
public Node getMin() {
Node n = this;
while(n.leftChild != null) {
n = n.leftChild;
}
return n;
}
public void add(int v) {
if(this.value >= v) {
if(this.leftChild == null) {
Node n = new Node();
n.value = v;
this.leftChild = n;
}
else this.leftChild.add(v);
}
else {
if(this.rightChild == null) {
Node n = new Node();
n.value = v;
this.rightChild = n;
}
else this.rightChild.add(v);
}
}
public Node void removeMin() {
if(this.leftChild == null) {
return this.rightChild;
}
else {
Node n = this;
while(n.leftChild != null && n.leftChild.leftChild != null) {
n = n.leftChild;
}
n.leftChild = n.leftChild.rightChild;
return this;
}
}
}
}
```

@Jerm, good point.

====

how this isn't as simple as sorting the array and then returning the (percent * array length)th element in the array? Maybe I'm not understanding the problem correctly.

=====

because, the problem is about getting it done from a stream.

======

Give you an unsorted integer ***iterator***

=====

What is one thing that is certain about iterator? They can be iterated upon. What is not certain about an iterator? That, they will end.

Surprised? Not so. For example, if we create an iterator, that returns you next prime number, where do you think the iterator ends? Well, nowhere.

That is an iterator, so... how you are going to find the % stuff right at any iteration? You would be out of memory very soon.

In fact, here is an trivial god knows what it would do iterator :

This generates in infinite stream of random numbers between [0,100). Now?

- NoOne June 20, 2017:-)