Given a 2D array of strings :

Interview Question

0

of 2 votes

5
Answers
Given a 2D array of strings : A[][]={ fog, fool, food, fruit,for}.
Find a substring such that after removing this substring from beginning of all the strings in A , the length of remaining strings + length of substring is minimum.
example:
if substring is "f" then total length= 1+{2+3+3+4+2} = 15
if substring is "foo" total len is= 3+{3+1+1+5+3}=16
if substring is "fo" total len is= 2+{1+2+2+5+1}=13
since minimum is 13 so output should be "fo"
- shivii.1230 August 11, 2013 in India | Report Duplicate | Flag | PURGE

Email me when people comment.

An error occurred in subscribing you.

Country: India
Interview Type: Written Test

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 0 vote

I think its a variation of longest common sub sequence problem.

Along with finding the longest common sub sequence we have to keep track the frequency which is the count of no of string this particular longest common sub sequence is present or simply the most frequent remaining length after removing longest common string would win....!

- Sai August 11, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Sorry didnt understand question completely, how can u remove fo, foo from fruit?

- Praveen August 12, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

its not removing that why the length of fruit is 5. In case if substring no found it is not removing anything. The word will be as it was earlier

- tarunverma August 12, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

Hash these strings char-by-char, e.g.:
hash[f]=5
hash[d]=1
...
hash[o]=6
....
Now sweep through the hash table and pick the values that are :
1. most repeated
2. form a sub-string (i.e., they are sequential)
It is also important to keep a counter that indicates the number of strings that have a selected substring in common. This is to address cases such as "fruit" in above example. One approach is to select the substring if it is common to more than 50% of the entire strings.

- Anonymous August 14, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

-1

of 1 vote

The problem is not stated very clearly. I will try to reword it:

You're given an array of strings, for example {fog, fool, food, fruit, for}. You pick a string S, and for each string x in the array, if S is a prefix of x, you remove S from the beginning of x. In other words, if x could be written as S.concat(y), where y is some (possibly empty) string, you replace x with y. The goal is to find the choice of S that minimizes the sum of all string lengths in the array after this replacement + length(S).

This can be solved in optimal O(M) time, where M is the total number of characters present in the input, by using a prefix tree. Build a prefix tree that is augmented at each node with a counter of how many strings in the prefix tree begin with the prefix defined by the path to that node. For example, if the prefix tree contains the words "cow" and "cool", it would look like

root (2) -> c(2) -> o(2) -> o(1) -> l (1)
                     |
                     -> w(1)

I hope the above formats correctly when people look at it. The top c and o nodes have a count of 2 because there are 2 words that begin with "c" and "co" respectively. The next "o" node has a count of 1 because only 1 word begins with "coo".

Now, you want to choose S to be the string defined by some path from the root to a node in this prefix tree. If you don't, your choice cannot be optimal, since there are characters at the end of your chosen string that could be removed to improve the cost. The question now becomes which path we should choose.

We can do a traversal of the prefix tree. Each node corresponds to a possible choice of S. At each node, we have the count of how many strings in the array have that choice as a prefix. All we need to do is multiply the count by the depth of the node (length of the corresponding choice for S), which we maintain during the traversal, to determine how many total characters will be removed by that choice of S. We then subtract the depth of the node once to compensate for the length(S) term in the cost function. The result is the cost reduction that would be achieved by this choice of S. Minimizing the cost is equivalent to maximizing the cost reduction. We keep a reference to the node seen so far that has the largest cost reduction as we traverse. After the traversal, we do a depth-first search to find the path from the root to that node, and reconstruct the optimal choice of S from the path.

A prefix tree can be constructed in O(M) time, traversed in O(M) time, and DFS searched in O(M) time, where M is the total number of characters in the input. Adding an additional count to each node and maintaining the depth and the maximum so far during the traversal does not change the asymptotic bounds, so the total time is O(M). The total space will also be O(M), assuming storing a count takes O(1) space per node.

- eugene.yarovoi August 14, 2013 | Flag Reply

CareerCup

Interview Question

Books

Videos

Resume Review

Mock Interviews