Google Interview Question
Software Engineer / DevelopersCountry: China
Interview Type: Phone Interview
What should be the distribution? If not specified, we can start in root and decide randomly in each node whether to continue to left or to right. If uniform, we need to maintain weights of left or righ subtree in each node. Than, start in root and decide in every node in the following way: if weight of left subtree is L and right is R, then D = random (1, L+R); if D < L then go left; else go right.
That approach will never select the root node of the tree, I would do an inOrder traversal then store it on an array and select a random value from the array, that would guarantee equal random probability for all nodes.
@vhmj1982: This is a reasonable solution. Not picking the root is a minor issue and can easily be corrected.
This way, for a balanced tree, insert/delete and getRandom all have O(log n) time.
Doing an inorder traversal will be O(n) time (btw, if you do decide to do inorder, you can save the array space of O(n) by using reservoir sampling).
@ywdong8809: The problem seems to imply that the binary tree implementation is in your hands, and so we expect the L/R will be available as a node field. Did you clarify this with the interviewer?
@Loler Yes,i have clarify with the interviewer. We can maintain some extra data in this BST, just guarantee not slow down the other operation s.
but in this solution when does it return it decides randomly to continue left or right, how do you know when to stop and return the element?
you don't need to store the array, you generate first the kth number and do the traverse and stop when you've seen k nodes and return the element you are at, no extra space needed
@Anonymous: If you don't know the number of nodes in the tree, you cannot generate a random number and then do an inorder traversal stopping at that point.
@Anonymous: once again it depends on what you are willing to trade, if you don't want to add extra space, you can count them in one pass and then stop at k in the second pass. If you can use the extra space using the array is perfectly valid.
Traverse the tree once and record the size (i.e. the number of nodes) of each subtree at each node. Then, starting from the root, do the following:
leftSize := size(left) or 0 if left == null
rightSize := size(right) or 0 if right == null
totalSize := leftSize + rightSize + 1
with probability 1 / totalSize, return current node
with probability leftSize / totalSize, go to the left subtree and repeat
with probability rightSize / totalSize, go to the right subtree and repeat
This should return each element with equal probability. However, I couldn't find a way to utilize the fact that the tree in question is a BST. This algorithm will work for any binary tree. Any ideas for that?
If we know the size of the subtree rooted at each node of the tree (and hence of the whole tree) we could get a random element in O(log n) time (assuming the tree is balanced and its height is log n, O(h) in the general case)
Check the root and see how many elements are in the tree, say n. Then generate a random int k between 0 and n. We just have to select the k-th element: but we know how many elements there are in each subtree, and we can use a recursive function.
So if size(root->left) == k-1 we return the root, if size(root->left) < k - 1 we can return the (k - size(root->left) - 1)-th element of the right tree, otherwise we return the k-th element of the left subtree.
No more than O(h) calls are needed
Warning: if k (the index of a node) starts from 0, then one must check that size(node->left) == k (and not k-1) at each step.
Here is the code:
public Value randomNode() throws IllegalStateException {
if (root == null) {
throw new IllegalStateException();
}
int k = generator.nextInt(root.N);
return getKthNode(root, k);
}
private Value getKthNode(Node node, int k) throws IllegalArgumentException {
if (node == null || k > node.N) {
throw new IllegalArgumentException();
}
int lsize;
try {
lsize = node.left.N;
} catch (NullPointerException e) {
lsize = 0;
}
if (lsize == k) {
return node.val;
} else if (lsize > k) {
return getKthNode(node.left, k);
} else {
//Invariant: node.right != null
//(node.N > k - 1 && node.left.N < k =>
//node.right.N = node.N - node.left.N - 1 > k - (k - 1) - 1 = 0 => right!=null
return getKthNode(node.right, k - lsize - 1);
}
}
I tested it using this code:
public static void testRandomNodeDistribution() {
int i, n = 10000, m = 100000000, s[] = randomSequence(n);
BST tree = new BST();
HashMap<Integer, Integer> map = new HashMap<Integer, Integer>();
for (i=0; i < n; i++) {
tree.put(s[i]);
}
for (i=0; i < m; i++) {
int x = tree.randomNode();
if (map.containsKey(x)) {
map.put(x, map.get(x) + 1);
} else {
map.put(x, 1);
}
}
double mean = 0, sdv = 0;
for (i=0; i < n; i++) {
mean += map.get(i);
}
mean /= n;
for (i=0; i < n; i++) {
double tmp = (map.get(i) - mean);
tmp *= tmp;
sdv += tmp;
}
sdv = Math.sqrt(sdv/n);
System.out.println(sdv / mean);
}
And the resulting standard deviation is about 0.934 % of the mean value, so one can say the resulting distribution is actually pretty uniform.
how does this sound? Assume that height of the tree is always known to us. In that case we select a random int < height of the tree. Then we travel that deep and at each level we make a random decision of whether to go left or right.
We might have to store height of the subtree at each node if the tree is highly skewed and take the height of the subtree into account too whenever we make decision on left/right child.
The issue with that approach is that, the distribution would be skewed in favor of elements belonging to depths (/heights) which have lesser nodes. For e.g. at depth 0, we only have 1 element, i.e. root. At depth 1, we have 2. At depth N, say its a balanced BST, we'd have 2^N elements. If we're choosing equally among depth [0, N], individual element at depth N, would have a much smaller chance of being selected than anything above it.
This might or might not be a concern. But, if I was the interviewer, I'd definitely want to get a solution where there's an equal probability of selection for each element in the tree.
Here is my thought: all we need is an API which will find the kth element in the tree, such as tree.find(int k). For the kth element, we can achieve O(n) time complexity using normal binary search tree itself. If we want better, we can store some counts of child elements in the node, and so the delete and insert operation would need to update the count of all its Ancestors. The final performance of search would depend on the snap shot of tree at the time. If it is more like balanced binary search tree, you got the O(log(n)). If it is more like a list, it is O(n). But it will all guarantee the uniform probability distribution at any snapshot of the tree.
- chenlc626 March 07, 2013