## Microsoft Interview Question

Software Engineer / Developers**Country:**India

**Interview Type:**In-Person

Construction of Optimal BST's using DP.

h t t p://en.wikipedia.org/wiki/Binary_search_tree#Optimal_binary_search_trees

this answer is pretty good. However, what if the probability is not distinct.

In extreme case, if the probability for all element is same, then it means the key for each element is same.After all, there is no way to search for an element.

Sort the nodes with according to the probabilities in descending order - O( n Log n)

Loop over them starting from the highest probability node. This becomes the root. Likewise keep inserting into the tree.

With this logic your highest probability node is the root. Generalizing, higher the node probability closer it is to the root while lesser the node probability farther it is from the root

How do you insert elements in a general tree, even if it is binary tree, or BST, you can't just use values or probabilities to insert ???, think carefully, if you are right, the prove that is gives me optimum tree

why cant we create a max heap tree??? its easy to insert & delete elements.. (Iogn time it will take) & building the heap tree will take O(n) time..

@prabhat do you think that in heap O(logn) time is required to delete an element ?

we don't want to delete, we just want to search elements, here one more thing is given which is probability. each node have certain probability of being searched. new we need to utilized that probability and create an efficient data structure that minimize the expected search time.

can anybody explain this question.... m not getting wot exactly "importance" is....plz somebody thro some light on this q.....

thanks

The question says that, you are given n nodes, each node have probability associated with it, lets say node *A* have probability *P*, here P is the probability that in one search element A is searched. for example i have 3 element, "a,b,c" with "2/4,1/4,1/4", this means if i do 4 searches then 2 times i will ask for a, one time b and one time c,

Now you need to give me a data structure where expected search time is minimum

Did the following analysis: (I am taking an example without any loss of generality)

Suppose the nodes given to you are 0.4, 0.3,0.2 and 0.1.

Sort it in descending order and save in an array( or any other linear data structure for that matter) the expected time is proportional to 0.4*1(memory access)+ 0.3*2 + 0.2*3 +0.1*4=2 Time Units. Now with little playing around you can find out that if you gor for a special hierarchical data structure which stores the highest element as the root, the second highest as the right element, and the the rest in the same order then we get a expected search time of 1.5 TU

The tree is:

0.4

0.2 0.3

0.1

The complexity is 0.4*1 + 0.2*2+0.3*2+0.1*3=1.7TU

This is the solution I came to after a bit of analysis. Let me know your thoughts.

Going by your analysis, MAX Heap does better. Try adding two more nodes say 0.6 and 0.5 and you will see the difference. Your approach creates a left heavy tree where as max heap will create a complete binary tree with more nodes near the root. But still it was an intresting data structure you came up with :)

your analysis is ok with the example you have given here, but you forget that the probabilities are assigned to some nodes, and the nodes have some values stored in it. so when you do the search, if you use heap, you need to search the whole data, then what is the use of these probabilities, the question is not to arrange the probabilities, but to arrange data items such that expected search time is as minimum as possible.

to make more sense look at your example : 7(0.4), 2(0.2), 5(0.3), 9(0.1)

and you have made following tree

....7

../...\

.2...5

/

9

now i do 10 searches in this tree, and then you please calculate time complexity of these searches and find out weather my structure is optimal or not.

searches are : 5, 7, 2 , 7, 5, 9, 5, 7, 2 ,7 call it array A

now you calculate the time required to search 2,5,7,9, once you have calculated that then we do following sum

sum from i = 1 to 10 (time required to search A[i])*probability of A[i]

and then tell us the value, i will tell you an optimal value for it.

We can use priority queue using balanced tree data structure. Insertion operation will take O(logn), Search operation will take O(logn), if you want to use importance information, we can design extra variable to maintain maximum probability node. This help us to have FindMaxProbability with O(1).

I think maintaining a HEAP or priority_queue is a great way to provide a constant -access time. Search will always be O(1). Insert and Delete are amortized O(logN)

Tell me the expected running time of your algorithm ?, please read question first, Is your solution satisfy the objective ??

How about sorting the nodes based on probabilities, and star merging 2 nodes with least probability, which results in elements having higher probability will be in smaller levels ?

@Guest : You are correct, But how do you create such a data structure, what kind of data structure do you use ??

Not sure if the interviewer himself gave the hint as a binary tree ... Here is an attempt ...What about using a linked list and maintaining the nodes in descending order of importance so that the most important node is at the beginning of the linked list ... if a node gets retrieved its importance can be updated and based on the updated value of the importance, a comparison with neighbouring nodes can be done and the node can be moved to a position based on the new value of importance ... using a binary tree may allow retrieval in log(N) time but updation of the tree may be time consuming

I think a Huffman tree should be an ideal choice if we know the priority of the nodes

- Rohit January 09, 2013