Interview Question


Country: India
Interview Type: Written Test




Comment hidden because of low score. Click to expand.
0
of 0 vote

I think its a variation of longest common sub sequence problem.

Along with finding the longest common sub sequence we have to keep track the frequency which is the count of no of string this particular longest common sub sequence is present or simply the most frequent remaining length after removing longest common string would win....!

- Sai August 11, 2013 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

Sorry didnt understand question completely, how can u remove fo, foo from fruit?

- Praveen August 12, 2013 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 votes

its not removing that why the length of fruit is 5. In case if substring no found it is not removing anything. The word will be as it was earlier

- tarunverma August 12, 2013 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 vote

Hash these strings char-by-char, e.g.:
hash[f]=5
hash[d]=1
...
hash[o]=6
....
Now sweep through the hash table and pick the values that are :
1. most repeated
2. form a sub-string (i.e., they are sequential)
It is also important to keep a counter that indicates the number of strings that have a selected substring in common. This is to address cases such as "fruit" in above example. One approach is to select the substring if it is common to more than 50% of the entire strings.

- Anonymous August 14, 2013 | Flag Reply
Comment hidden because of low score. Click to expand.
-1
of 1 vote

The problem is not stated very clearly. I will try to reword it:

You're given an array of strings, for example {fog, fool, food, fruit, for}. You pick a string S, and for each string x in the array, if S is a prefix of x, you remove S from the beginning of x. In other words, if x could be written as S.concat(y), where y is some (possibly empty) string, you replace x with y. The goal is to find the choice of S that minimizes the sum of all string lengths in the array after this replacement + length(S).

This can be solved in optimal O(M) time, where M is the total number of characters present in the input, by using a prefix tree. Build a prefix tree that is augmented at each node with a counter of how many strings in the prefix tree begin with the prefix defined by the path to that node. For example, if the prefix tree contains the words "cow" and "cool", it would look like

root (2) -> c(2) -> o(2) -> o(1) -> l (1)
                     |
                     -> w(1)

I hope the above formats correctly when people look at it. The top c and o nodes have a count of 2 because there are 2 words that begin with "c" and "co" respectively. The next "o" node has a count of 1 because only 1 word begins with "coo".

Now, you want to choose S to be the string defined by some path from the root to a node in this prefix tree. If you don't, your choice cannot be optimal, since there are characters at the end of your chosen string that could be removed to improve the cost. The question now becomes which path we should choose.

We can do a traversal of the prefix tree. Each node corresponds to a possible choice of S. At each node, we have the count of how many strings in the array have that choice as a prefix. All we need to do is multiply the count by the depth of the node (length of the corresponding choice for S), which we maintain during the traversal, to determine how many total characters will be removed by that choice of S. We then subtract the depth of the node once to compensate for the length(S) term in the cost function. The result is the cost reduction that would be achieved by this choice of S. Minimizing the cost is equivalent to maximizing the cost reduction. We keep a reference to the node seen so far that has the largest cost reduction as we traverse. After the traversal, we do a depth-first search to find the path from the root to that node, and reconstruct the optimal choice of S from the path.

A prefix tree can be constructed in O(M) time, traversed in O(M) time, and DFS searched in O(M) time, where M is the total number of characters in the input. Adding an additional count to each node and maintaining the depth and the maximum so far during the traversal does not change the asymptotic bounds, so the total time is O(M). The total space will also be O(M), assuming storing a count takes O(1) space per node.

- eugene.yarovoi August 14, 2013 | Flag Reply


Add a Comment
Name:

Writing Code? Surround your code with {{{ and }}} to preserve whitespace.

Books

is a comprehensive book on getting a job at a top tech company, while focuses on dev interviews and does this for PMs.

Learn More

Videos

CareerCup's interview videos give you a real-life look at technical interviews. In these unscripted videos, watch how other candidates handle tough questions and how the interviewer thinks about their performance.

Learn More

Resume Review

Most engineers make critical mistakes on their resumes -- we can fix your resume with our custom resume review service. And, we use fellow engineers as our resume reviewers, so you can be sure that we "get" what you're saying.

Learn More

Mock Interviews

Our Mock Interviews will be conducted "in character" just like a real interview, and can focus on whatever topics you want. All our interviewers have worked for Microsoft, Google or Amazon, you know you'll get a true-to-life experience.

Learn More