## Amazon Interview Question

Interns**Country:**India

Eh, we can use level traversal to put each node to a TreeNode array,

but I think the inner connection of each node doesn't changed even though we get an array,

so just return the first element of the array -- the root of the tree

But if we can only construct a int array(assume the value of each node is integer type) i cannot get it because even re-constructing the binary tree, we need 2 different traversal array.

i am not an English speaker, so if there is something weird, please forgive me.

> i am not an English speaker, so if there is something weird, please forgive me.

There is no need to apologize! :D No one's English is perfect and I am never down on people that try. I am dabbling in various languages, with mediocre success in most. I care that people try - nothing more. I think that's how we should engage with each other, so there is no apology needed IMHO. People complaining about other people's proficiency in a given language are usually the ones who speak the fewest languages :D (my experience in America)

As for the issue you raise with needing a second array, I think that is okay for this question (see my own answer below). It seems like the answer is more about something akin to a format for serialization/de-serialization (or hibernation, if you like that term better). I assume that auxiliary data structures are fine.

Not a memory efficient method, but here it goes.

You could turn the n-ary tree into an array similar to a heap is represented in array form.

The equation to find the index of a chile is as follows. If the number of children at each node is k, then at index i the cth child can be found using the equation k*i+1+c.

But this only works for a complete n-ary tree. Since it probably isn't a complete tree, what you could do is mark a child as -1 in the array to show that there is no node at that position.

When you are done you have an array that can easily used to reconstruct the tree again. However, if the tree has a small number of nodes and a large array of children, then you will be wasting a large about of space with this method. But without being able to use multiple traversal arrays this is the only thing that I could think of quickly.

Should be easy to extend the Serialization and Deserialization mentioned for Binary to Nary (N needs to be known).

leetcode.com/2010/09/serializationdeserialization-of-binary.html

Instead of the two explicit Recursive calls in the Deserialization you do n Recursive calls.

Note: As n increases the number of Null Markers (#) takes more and more space. You can compress it with say #3 or so, and have the File.GetNextToken() interpret it and return # three times.

Should be easy to extend the Serialization and Deserialization mentioned for Binary to Nary (N needs to be known).

leetcode.com/2010/09/serializationdeserialization-of-binary.html

Instead of the two explicit Recursive calls in the Deserialization you do n Recursive calls.

Note: As n increases the number of Null Markers (#) takes more and more space. You can compress it with say #3 or so, and have the File.GetNextToken() interpret it and return # three times.

Structure the array as follows:

First element is the number of nodes, K.

Next K elements are nodes in the trees, (say numbers associated with them).

Take remaining elements of array in consecutive pairs. i.e., if we have 4 and 5, 4 is parent of 5. Add parent relationships to this part of the array in left-right order, so tree reconstruction is unambiguous.

Easy to reconstruct tree in linear time and space. Might be able to get away with constant extra space.

Array is linear in the size of the tree, since we're storing the edges and nodes exactly one.

Take this, super easy, No need to read long post. Apply same logic we apply for binary tree.

```
**
* * 10
* *
* * 2 20 12
* *
* * 1 7 8 9 21 24 25
* *
* * Serialized: 10,$,2,20,12,$,1,7, $, 8, 9, $, 21, 24 , 25, $
* *
*
* @param root
* @return
*/
@Override
public String serialize(NArrayTreeNode root) {
if (root == null)
return "";
final String levelSep = NULL_INDICATOR + SEPARATOR;
Queue<NArrayTreeNode> queue = new LinkedList<>();
StringBuilder serialize = new StringBuilder();
queue.offer(root);
serialize.append(root.val);
serialize.append(SEPARATOR).append(levelSep); //Defines the Level end
while (!queue.isEmpty()) {
NArrayTreeNode treeNode = queue.poll();
if (treeNode.children != null) {
for (NArrayTreeNode child : treeNode.children) {
serialize.append(child.val);
serialize.append(SEPARATOR);
queue.offer(child);
}
serialize.append(levelSep); //Defines the Level end
}
}
serialize.setLength(serialize.length() - 1); //Remove last ,
return serialize.toString();
}
/**
* * 10
* *
* * 2 20 12
* *
* * 1 7 8 9 21 24 25
* *
* * Serialized: 10,$,2,20,12,$,1,7, $, 8, 9, $, 21, 24 , 25, $
* *
*
* @return
*/
@Override
public NArrayTreeNode deserialize(String data) {
if (data == null || data.isEmpty())
return null;
NArrayTreeNode root = new NArrayTreeNode();
int index = 0;
String[] content = data.split(SEPARATOR);
root.val = Integer.parseInt(content[index++]);
Queue<NArrayTreeNode> queue = new LinkedList<>();
queue.offer(root);
while (!queue.isEmpty() && index < content.length - 1) {
NArrayTreeNode node = queue.poll();
index++;
while (!content[index].equals(NULL_INDICATOR)) {
if (node.children == null)
node.children = new ArrayList<>();
NArrayTreeNode child = new NArrayTreeNode();
child.val = Integer.parseInt(content[index++]);
node.children.add(child);
queue.offer(child);
}
}
return root;
}
```

I think this answer should be up voted. I do not want to use my Google account credentials to log into this site and cannot do so myself.

My line of reasoning was very similar:

A linked list's representation as an array is trivial to most people and it could be viewed as a tree in which each node has just one child (an 1-ary tree). Slightly fewer people will probably be familiar with one possible array representation of binary trees (a 2-ary tree), which is to store the root of the tree into the array's first element and let the left and right children be at the positions (n*2) and (n*2+1) respectively. This representation is very convenient, because a traversal through the tree can be accomplished by these two simple formulas.

To reconstruct such a tree into objects linked by pointers, one could pass the index n into a method that has a recursive pattern similar to a method that would do a left/right depth-first traversal like the following:

I have changed "n" from above the explanation in the previous paragraph to "i" in the code. This solution also skips over only dealing with fully populated trees directly to dealing with any binary trees, even such that are missing branches, as is the case with the 11th node above. This will also work with nodes missing on tree levels other than the bottom one, such as 5. Per representation in the array exchanging "5" with "null" would make the values of "10" and "11" unused.

On to a tree with nodes having three children. The general arithmetic relationships between the index of a value in the array representation and its corresponding location in the equivalent 3-ary tree will still hold. In this case, multiplying the index of a node by three will yield its middle child. 3*n-1 will be its left child; 3*n+1 will be its right child. As for the sizing of the array, a similar relationship holds between the number of nodes in a fully populated 3-ary tree and the former's length. Such a 3-ary tree will have 1, 3, 9, 27, 81, ... nodes on consecutive levels and the total numbers of nodes in such a tree with k levels will be [3^(k+1) - 1] / (k-1). Thus, a 3-ary tree with 5 levels will have 3^6 / 2 = 121 nodes.

In the following code I am skipping ahead to the construction of an n-ary tree. I hope that the pattern is clear by now and the code makes sense from the previous explanations. The easiest way to convince oneself of the correctness of this (or incorrectness in case I have made a mistake) is to draw a few trees of a particular degree.

One detail that is not completely explained is the generalization of what child exactly we arrive at if we multiply the position of a node by the tree's degree. If we examine trees of various degrees, one will notice that we always appear to end up at the child that is second to last of the in-level sequence of a parent's child nodes. For a binary tree, we arrive at the first of two nodes; on a 3-ary tree, the second of three; on a 4-ary tree the third of four. I have delivered a proof of this earlier, in my private time, but will ask readers to take this on faith, convince themselves empirically, or do the proof themselves. Be it correct or not, the expression "i*k - (k-2)" is supposed to deliver the position of the first child, by going (k-2) positions from the second to last child backwards.

The above illustrates one direction of this question: how to transform an array of node values into a tree. It has not yet shown how to do the reverse and create an array of a tree. The recursion is very similar in that we repeatedly arithmetically calculate the position of the child node and pass that value into the method traversing in whatever manner preferable through the tree. I am not expanding this into source code.

Here comes the funny part. In all likelihood, I will find this a lot funnier than you do. :) None of what I have said so far is a very good solution to this problem. If the tree is very sparsely populated, there is a lot of space wasted in the array. The most extreme case is a tree with nodes that each only have a single, right-most (assuming the typical left to right fashion a graph is represented visually) child, up to its deepest leaf. Above, we have already calculated the number of elements we would need in an array, using the method explained so far. The space complexity of this solution would be O(k^n), k being the degree of the tree and n being the number of nodes.

Why did I put you through reading this and me typing all of it? Because I found it to be a good exercise for me to generalize the array representation many of us, I included, have learnt for binary trees for n-ary trees. If this generalization had not been clear to you, it was probably not that bad for you to read this (and possibly point out errors I have made). If it had been clear, I guess you have been able to read or skip through this fast enough and I have hopefully not wasted too much of your time.

Now how did we arrive at such a sub-optimal solution? I have personally recognized the following as the cause of any sub-optimal algorithm I have ever come across: We express something beyond what is necessary for solving the problem. Everything should be as simple as possible, but not any simpler.

What is the unnecessary part here that has lead to a sub-optimal solution? The positions of the node! Let us remember that a tree is simply a graph. What are the bare essentials to define an instance of a graph? Thinking back to basic computer science lectures and classes, a graph is represented as a set of nodes V and a set of edges E = U (v_x, v_y) where v_x and v_y are elements of V and (v_x, v_y) is a tuple of those nodes.

There are many ways this could be translated into an array. One could let the first element of the array mean the number of nodes in the tree. Simply start at the second element of the array and then traverse that many elements to create all nodes.

If that array would already contain Nodes we could simply link up their objects, but then there is probably not even a need for that, because we could simply leave them linked in the array and were done already. Since I feel it would render the question, at least partially, moot I assume that the array only store the nodes' payload contents, such as integers.

Therefor, I assume we only have the nodes' values in that array and we have to create each node. In that case, it is probably helpful to have another array of nodes in which we reference each node to not lose them to the vastness of the computer's memory. It is assumed that the indexes of the edges refer to the nodes by their positions in the array, not their position in the tree. Otherwise, we would probably end up where we started, because we would need an array of size k^n to keep track and reference the nodes, after we create them, by their positions in the tree.

Thus, the elements following the node values should be pairs of indexes into the former part of the array, referencing a pair of nodes each by their position in the array. We link up the nodes by first locating them in the array of nodes we created, and then setting their child attributes as specified by the connecting edge, i.e. the adding the node indexed by v_y into the child attribute of the node indexed by v_x.

At this point we should realize that we (potentially) violated our above maxim ("our" if you accepted it): we simplified the information we used to express the problem, but made it too simple. It may have only been implicit, but a node in the tree the way we worked with has more properties than just a set of child nodes, which would have truly made it a graph like any other. There was also an implicit ordering of child nodes. A binary search tree, for example, has the property that the left child's value is always smaller (or equal, depending on how you want to handle that) than the right child's value. If we do not and should not care about this property and we are really only interested in trees in their more narrow sense of a graph, there is no harm done. We should just be aware that a correct algorithm that satisfies the question's request and equivalently transforms between a tree and an array may not necessarily put the children of a node into the same order as before.

I find this important, because too often in introductory computer science lectures do we define a tree as a graph where its nodes have certain properties, such as having no loops, and then go on to say that there is a "left" and "right" sub-node. To be precise, at that point the tree is no longer a graph in the particular way we have defined it in terms of the sets E and V! Now, beyond those two sets, there is an ordering of nodes or edges or an implied relative spatial arrangement of the graph's nodes.

It also stresses another important factor in the code we write: a lot of properties of the algorithms and data structures we write are coincidental. It may offend people to hear this (although maybe those are exactly the people that this should offend), but too much of the code we write works through "dumb luck" and not because we adequately reflected on them. Once we change some of our underlying assumptions that we have left implicit, properties such as the ordering of the nodes may blow up into our faces, out of the blue, coming from code that "has always worked". The solution? Make your implicit assumptions explicit! Is that a lot of work? Well ... yes, but probably not more effort than fixing the bugs you will eventually have to deal with by not doing so. This idea is also generally referred to as "design by contract". Spell out what your assumptions of your input are, what a given method should do with it, and how exactly the output should look like. Things such as the ordering or passing a search tree vs. a regular tree should no longer be implicit, but spelled out in the contract of the method, or generally speaking, the interface. Granted, we cannot prove every single algorithm in our products through formal methods. Many engineers far better than myself have made the experience that the proof of even relatively simple algorithms through formal methods quickly becomes intractable. That is probably also the reason why formal methods have not taken off in the software industry as some people would have predicted (Tony Hoare et al, please don't smite me for writing this :D).

I've gone far off topic now, but I like those kind of reflections. They may not all be correct, you may have quit reading this a long time ago, but sometimes I get interesting feedback, food for iterated reflection, from those rants. I am also writing so much, because I think that the quality of a candidate and the answer to a question lies much more in the way a candidate thinks then whether they can come to the correct answer. We should not be machines specialized in coding interviews. Software engineering is much more than that and coding interviews test only a very small subset of what makes a good engineer. At Microsoft, I only wrote code about 50% of the time. In all those interview questions we churn through on this site, where is the part that tests the other half? How do they reveal the difference in experience between someone who has been an engineer for ten years and someone who has been for two? I do not think on these types of questions a well prepared candidate of either experience would do that much differently if we just dump code into a window ... Apologies for venting and the rant, but maybe there is someone out there who can relate.

Back to the topic, we have made it explicit and brought it into our engineering conscience that we are really dealing just with trees, in the most narrow sense of them being a subset of the domain of graphs. Our algorithm will deal with just those and any consistency in the ordering of nodes or edges is purely coincidental. The question at hand explicitly demands the use of an array, but we might as well have transformed a tree into two sets - one for nodes one for edges - and it would structurally not have been much different.

Here is the code finally. First to make a tree from an array:

Second, the code to transform a tree into an array:

Some of you may feel that I have cheated a little by passing in the degree of the nodes "k". I could have taken the root node and simply done a count on its child count, but the root's child count does not have to equal "k". There may be a clean way to determine this from the data structure, but the way the code that I posted is right now, the degree of the tree is something that is more externally imposed through the algorithm and not inherent in the data structure, so I thought it better to pass it in.

And here is a little bit of a test driver:

If I did not too badly and this implementation is correct (are they ever?), tree1 and tree2 should be equivalent. They do not need to be _identical_! That is what I emphasized above. I also violated my very own rally cry for contractual design and have neither implemented checks or added comments to that effect. As we often do, I will save that for another time.

- The Internet August 18, 2014