given a string find the number

Google Interview Question for Software Engineer / Developers

1

of 1 vote

27
Answers
given a string find the number of distinct substrings of the string.
ex:
input-> aaaa
output-> 4(a, aa, aaa, aaaa)
input->abcd
output->10(a, b, c, d, ab, bc, cd, abc, bcd, abcd)
- Anonymous July 09, 2010 | Report Duplicate | Flag | PURGE
Google Software Engineer / Developer Algorithm

Email me when people comment.

An error occurred in subscribing you.

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 3 vote

The solution consists of constructing the suffix array and then finding the number of distinct substrings based on the Longest Common Prefixes.

One key observation here is that:

If you look through the prefixes of each suffix of a string, you have covered all substrings of that string.

Let us take an example: BANANA

Suffixes are:

0) BANANA
1) ANANA
2) NANA
3) ANA
4) NA
5) A

It would be a lot easier to go through the prefixes if we sort the above set of suffixes, as we can skip the repeated prefixes easily.

Sorted set of suffixes:

5) A
3) ANA
1) ANANA
0) BANANA
4) NA
2) NANA

From now on,

LCP = Longest Common Prefix of 2 strings.

Initialize

ans = length(first suffix) = length("A") = 1.

Now consider the consecutive pairs of suffixes, i.e, [A, ANA], [ANA, ANANA], [ANANA, BANANA], etc. from the above set of sorted suffixes.

We can see that,

LCP("A", "ANA") = "A".

All characters that are not part of the common prefix contribute to a distinct substring. In the above case, they are 'N' and 'A'. So they should be added to ans.

So we have,

ans += length("ANA") - LCP("A", "ANA") 
ans = ans + 3 - 1 = ans + 2 = 3

Do the same for the next pair of consecutive suffixes: ["ANA", "ANANA"]

LCP("ANA", "ANANA") = "ANA".
ans += length("ANANA") - length(LCP)
=> ans = ans + 5 - 3
=> ans = 3 + 2 = 5.

Similarly, we have:

LCP("ANANA", "BANANA") = 0
ans = ans + length("BANANA") - 0 = 11

LCP("BANANA", "NA") = 0
ans = ans + length("NA") - 0 = 13

LCP("NA", "NANA") = 2
ans = ans + length("NANA") - 2 = 15

Hence the number of distinct substrings for the string "BANANA" = 15.

- Psycho March 13, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

i tried the same thing your logic is good but there is a problem when string length is 75000 in my case.......pls help

#include<iostream>
#include<algorithm>
#include<string.h>

using namespace std;

long long LCP(string a,string  b)
{
    long long counter = 0;
    for(long i = 0 ; i < max(a.length(),b.length()) ; i++)
    {
        if(a[i] == b[i])
        counter++;
        else
        break;
    }
    return counter;
}

string a;
string b[75004];

int main(){
    long n,q;
    cin >> n >> q;
    cin >> a;
    while(q--){
        long index;
        char replace;
        cin >> index >> replace;
        a[index-1] = replace;
        for(long i = 0 ; i < n ; i++)
        {
            b[i] = a.substr(i,n-i);
        }
        sort(b,b+n);
        long long ans = b[0].length();
        for(long i = 1 ; i < n ; i++)
        {
            ans = ans + b[i].length()-LCP(b[i-1],b[i]);
        }
        cout << ans << "\n";
    }
}

- vaibhav raj June 09, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

The easiest way to do this is to insert all of suffixes of the string into a trie. The answer is then the number of nodes of the trie.
This will do the job in O(len^2) time.

If this would not satisfy you, do it with suffix tree. The main idea is that every substring of a string s is a prefix of a suffix of s.

- ftfish July 09, 2010 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Not sure that would work.

COnsider the string "abab"

For suffix tree, you make it "abab$"

When you insert you will get one internal node corresponding to the first "ab" and then the leaf node below that would be ab$. Even though they are different substrings in the string, they are the same - "ab".

So, for the actual answer, you should count all the nodes and then subtract the count of those leaf nodes that is a child of an internal node, on just the string "$". That will give you the answer.

- vinod July 11, 2010 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

If we use normal uncompressed tries, the answer should be the number of nodes.
For suffix trees it will be a little different.

- ftfish July 11, 2010 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

The complexity in case of trie will be O(N^3)

Following are the steps .. correct me if I am wrong
For each possible substring insert that substring into trie

each possible substring O(N^2)
inserting into trie O(N)

overall complexity O(N^3)

- Loony July 15, 2010 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

i am naive to coding
Can u please explain how generating all possible substrings o(n^2)

- ravikant0909 July 24, 2010 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

For example consider all words ending with the char at the position J. You will generate all the words from 0-J up to J-1 - J and then you will go to the next position j+1.

- mbriskar May 25, 2015 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

@ftfish: Cool :)

- buried.shopno July 10, 2010 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

use recursion..

- ranjeet July 13, 2010 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Check out my blog for the complete program
sudipta.posterous.com/finding-all-the-possible-substrings-of-a-give

- Sudipta July 18, 2010 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

There should be 15 distinct substrings not 10.
"d","c","cd","b","bd","bc","bcd","a","ad","ac","acd","ab","abd","abc","abcd"

Wrote recursive c++ program to do so.

vector<string> PrintDistintSubStrings(string inStr)
{
vector<string> result;
if(inStr.size() == 0)
return result;

if(inStr.size() == 1)
{
result.push_back(inStr);
}else
{
string first = inStr.substr(0,1);
string rest = inStr.substr(1, inStr.length());
vector<string> subResult = PrintDistintSubStrings(rest);
result = subResult;
result.push_back(first);
for(unsigned j=0;j<subResult.size();++j)
{
result.push_back(first + subResult[j]);
}
}
return result;
}

- krishna July 21, 2010 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Just realized my solution only works only if the characters in the input are unique.

it fails for the case1.

- krishna July 21, 2010 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

@krishna,
strings consists of only adjacent character.. so only 10 possibilities.

- sri July 25, 2010 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

my solution

main()
{
int len, start, slen;
int count=0;

char str[] = "abcde";
slen = strlen(str);

for(len=1; len<=slen; len++)
{
for(start=0;start<slen; start++)
{
for(count=0; count<len && start+len<=slen; count++)
{
printf("%c",str[count+start]);
}
printf(" ");
}
printf("\n");
}
getch();
return;
}

let me know if can be done better

- srini July 27, 2010 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

1. Make a trie of all the suffixes.(Do not try to compact it)
2. Now output the first level strings, then second level and so on.
And that is it.

So for example abab the suffixes would be abab, bab, ab, b
Suffix tree would look like

a-b-a-b-$-1
|-$-3
b-a-b-$-2
|-$-4

Now First level - a,b [Go one child deep on each branch]
Second level - ab, ba [Go two child deep on each branch and so on]
Third level - aba, bab
Fourth Level - abab

- nishant30 July 29, 2010 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Good

- Arpit April 17, 2011 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

Use Hashing.

for(i = 0; i <len; i++)
{
    for(j=i+1; j < len; j++)
    {
      if( str(i,j) not in hashtable)
           insert str(i,j) in hashtable;
     }
}

print hashtable;

2. Use suffix tree.

- gaurav August 02, 2010 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

A naive approach:

There would be n(n-1)/2 substrings. Put them into a set, and return the size of the set. O(n^2) time, O(n^2) space.

- Ryan September 15, 2010 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

there are two ways to do it in O(n)

(1) Use suffix arrays and lcp arrays. lcp arrays will help us remove redundancy. Do a cumulative sum on the lcp array and we get the number of distinct substring

(2) Use the first step of KMP.

For example, given string "ababab", the first step of KMP will get:

-1 -1 0 1 2 3

The number of distinct substrings will be (6 + 1) * 6 / 2 - (1 + 2 + 3 + 4) = 11

- yy October 22, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 1 vote

can u explain the last line.
"The number of distinct substrings will be (6 + 1) * 6 / 2 - (1 + 2 + 3 + 4) = 11"

- Anonymous July 10, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

Use suffix-array

SuffixArray sa = SuffixArray(str);
int ret += N - sa[0].p;
for (int i=1; i < N; i++)
      ret += (N-sa[i].p) + LCP(sa[i].p, sa[i-1].p); 
return ret

- Anonymous July 25, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

-1

of 1 vote

I think I have got a much simpler solution...
If the length is n, there are n one letter words,n-1 two letter successive words,(for eg:- if the word is abcde,the two letter words will be ab,bc,cd,de)...similarly n-2 three letter successive words...and so on..Therefore as we are asked the number of such words, should'nt it be just sum of numbers from 1 to n???? Am I missing something here???

- Sarthak July 20, 2010 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

why do have to post nonsensical comments everytime
dude atleast read the question properly

- Anonymous July 24, 2010 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

plz at least see the problem ex.
I guess you haven't cleared any int... you have this bad tendency of not listening

- Anonymous July 28, 2010 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

you are rude people.
Sarthak, your solution only works if all the letters are different

- Teodor September 15, 2010 | Flag

Comment hidden because of low score. Click to expand.

-1

of 1 vote

Ben's father, Ralph, said the bear hit the shifter and the car rolled backward about 125 feet, off the driveway, down an embankment and into some trees on Eagle Road near Tenderfoot Drive. "So this bear opened the door on his own. Somehow the door closed behind him. He panicked and started thrashing around, hit the shifter and put the car, took it out of park," Ralph said. "It rolled back, down over the hill, and down into here, and stopped. The four way flashers were on. It's like he knew what was going on, and kept hitting the horn." Ben told 7NEWS he had left a sandwich in the car and that may have attracted the big bear.

- Anonymous July 25, 2010 | Flag Reply

CareerCup

Google Interview Question for Software Engineer / Developers

Books

Videos

Resume Review

Mock Interviews