## xuyan.nus

BAN USERThere are two classes in this problem:

1) Species

2) Animal

Species should use Tree Structure

class Species {

//attributes

Species *belongs_to;

string type_name;

std::vector<Species *> sub_types;

//function

void behavior() {***};

};

Animal should belongs to one species.

Animal {

//attributes

Species the_type;

};

First, the question is not that clear, what does the interviewer want to see. But it seems that it requires to design a System like Git.

Second, the solution is like:

1) Keep a list of managed files (say, L) and their status.

2) Each file has a connection to all related commits. (commits are ranked based on time)

3) Each commit has a connection to all related file (including not-changed files).

For the latest version, scan the files and fetch the latest files.

For one particular commit, scan the related files and fetch the files.

Each time, when a new commit happens,

1) If the commit new a file, insert the file to L.

2) For all modified files, insert the commit into the beginning.

3) Insert the commit to the ranked commit list.

First, if the dictionary needs to support Range Query, like: "fetch all words starts from a*", the HashTable is not sufficient.

So, let's say the dictionary only needs to support Exist or Not.

Second,

Case 1: The index can be kept in main memory.

Trie: Average Time Cost: O(n), n is the average length of words. Space: O(N * n), N is the number of words.

HashTable: Average Time Cost: O(1). Space: O(N), N is the number of words.

(Both supports Parallel Query)

Case 2: The index cannot be kept in main memory, but can kept in 1 PC's Disc.

Then, Disc read is the major source of time cost.

Trie: O(n) - k, (k is the level of trie that can keep in memory)

HashTable: O(1)

Case 3: The index cannot be kept in main memory, but can kept in a distributed memory on K machines. (assume that there is 1 master machine)

Trie: O(n)

HashTable: O(1) + O(1)

Case 3: The index cannot be kept in main memory, but can kept in a distributed file system on K machines. (assume that there is 1 master machine)

Trie: O(n)

HashTable: O(1)

In summary, in all the cases, HashTable is more scalable than Trie, if only Exist() should be supported.

First, talk with the interviewer, "Why split a search query across multiple machines?". It is for high performance or completeness or both. Or "What is on the multiple machines?"

Second, Split Query Index and Document.

Third, if the index is slitted into K parts kept on K machines.

then the query must be split into K data-based sub-queries.

(It depends on the split method)

Forth, if the index is duplicated into M parts on M machines.

If the query is time costly, one way is to split the query into M logic-based sub-query.

What is data-based sub-query and logic-based sub-query?

data-based sub-query: each sub-query scans different parts of the Index using same keywords.

logic-based sub-query: each sub-query scans the same Index using different keywords.

logic-based sub-query (Divide and Merge) is more complex.

Finally, when the target index is complete. Get the target documents on Document Server.

The same method is used on duplication servers and partition servers.

Assume:

"input=987, return=1023" is a wrong example, which violates the rule 3.

A brute-force algorithms:

Define Two Helper Data Array:

MAX_Data_Helper = { 0, 9, 89, 789, 6789, 56789, 456789, 3456789, 23456789, 123456789 };

MIN_Data_Helper = { 0, 12, 123, 1234, 12345, 123456, 1234567, 12345678, 123456789, -1 };

Algorithm is shown below: Given a number: n

Case 1: If n is negative, return 0;

Case 2: If n is >= 123456789, then there is no solution;

Case 3: Otherwise, n is in [0, 123456789) [for example: n = 321]

3.1: Find the minimum upper bound in MAX_Data_Helper. [It is 789]

3.2: If n < 123, then return 123. [gets 123 from MAX_Data_Helper]

Otherwise, brute-check all numbers from n+1 to minimum upper bound. [In this case, check from 322 to 789]

3.3: the three rules in together means (lower sinificant degit > higher sinificant degit).

Performance Analysis:

Ignore negative numbers and very large numbers > 123456789.

Assume numbers are uniformly random in [0, 123456789).

Then, most numbers get Big-O: O(1). For example, for numbers in [56789, 123456), return 123456 immediately.

As n goes larger, the precentage of numbers with O(1) is larger. So, the amortized cost is O(1).

But the worst case performance is very bad. for example, if n = 23456789, it is costly to find the next.

The key in this problem is to clarify whether there are parent pointers?

Because parent pointers are not necessary for BST.

Case 1: Has parent pointers.

The algorithm proposed by Zol is a great answer.

1.1: The leftmost child of the right child, if your current node has a right child.

1.2: If not, find a parent whose left child is the node you're currently at.

1.3: If not, there is no successor.

Case 2: No parent pointers.

1.0: keep a global variable, say "find_the_node", and initialized as "false";

1.1: DO "in-order traversal method".

1.2: If find, set "find_the_node" to be true.

1.3: The next traversal node is the successor. Otherwise, no successor.

Complexity:

Case 1: Big-O worst case: O(lg(n))

Case 2: Big-O worst case: O(n)

There are two things not explained clearly in the problem:

1) Require stable sort or not?

2) Can overwrite the array A or not?

Assume stable sort not requried and A can be overwrote. The solution is:

Step 1: HeapSort(X); [Note: Use Maximum Heap]

Step 2: PermutationReorder(X,A);

Step 1:

HeapSort is satisfied in this problem.

1) Big-O: O(n*log(n)) even in the wrost case.

2) In-space: No extra space required.

3) But not stable. (Assume not required. if required, need to modify HeapSort)

[Note: it is time costly to code the HeapSort in a short time, so be careful]

Step 2:

PermutationReorder:

-------------------------

Step 2.1:

Loop A

A[i] = X[A[i]];

Step 2.2:

Loop X

X[i] = A[i];

------------------------

If A is not allowed to be overwrote. I do not have a solution. I see there are many talking about the "Swap Solution" proposed by Levitan. But I think that solution is wrong. The "Swap Solution" is basically "sorting A", which is not exactly what the problem want.

Hi, ninhnnsoc, Thanks, It is a lot fun to code here.

I append the Logic in Move_wR():

1) At least move +1 to the right.

2) If array[wR] < 0, (negative), keep going to the right.

("negative numbers in boundary" never be a solution, so no need to check)

3) If window_sum < 0, start a new window. wR + 1 and wL = wR;

(It means "The left side will never be part of a solution", so drop the numbers and strat over)

There are two things not explained in the problem:

1) Nodes have no X-Y locations. In Theory, there is no way to check it is a thriangle or not. In a Line or Not?

2) Is the memory a bottleneck?

Assume the two are not issues. Then the solution comes:

A: Using Matrix to store Graph in the Memory. (it is always faster).

B: The algorithm is:

1) New an empty std::unordered_set<std::string> to keep all triangles.

----------------

2) Loop each node (v1) in V. [Big-O: O(n)]

3) For each pair of connected nodes (v2 and v3). [Big-O: O(d^2), d is the average node degree]

4) Check whether v2 and v3 are connected. [Big-O: O(1)]

5) If 4) get 'YES', then sort the node IDs and push the new triangle into unordered_set. [Big-O: O(1)]

----------------

6) Finally, output triangles in the unordered_set. [Big-O: O(n)]

Thus, the total cost is Big-O: O(n * d * d), where d is the average node degree.

In a complete graph, it is n^3; In a tree, it is O(n); (even through tree does not make sense in this problem)

The "SLIDING WINDOW" proposed by ninhnnsoc is a great idea. But his answer is not the best.

Basically, "SLIDING WINDOW" has four key attributes:

1) window_sum: sum of all numbers in the window.

2) wL: left index of the window, initialized to be 0;

3) wR: right index of the window, initialized to be 0, also;

4) minimum_sequence_length: the final answer, initialized to be INT_MAX;

The algorithm is basically a while-loop.

while (wR < array_size) {

Move_wR(); // Find the next wR, window_sum > S;

Move_wL(); // Remove unnecessary left numbers;

UpdateMinimumSequenceLength();

}

//Note the window_sum should always larger than S, except:

1) in the initialization phase.

2) There is no solution to the question, (minimum_sequence_length == INT_MAX)

**CareerCup**is the world's biggest and best source for software engineering interview preparation. See all our resources.

Step 1: You need confirm what chat services to support? text, voice, group chatting, saving chat history, etc.

- xuyan.nus February 25, 2016Step 2: Calculate System Design Target. Active users per second?

Step 3: OOD Design. Data Storage and Performance.