## Google Interview Question

Software Engineer / Developers**Country:**United States

**Interview Type:**Phone Interview

using trie is a good option, but it takes O(n * m) time where m is the length of the string.

If you use a hash, won't you still have an 'm' factor for computing the hash function?

Actually, I can't imagine any algorithm without a dependency on 'm'...

is it array of Strings??

we can solve it in many ways..can you give the complexity of the soln?

Copy each element of the array into std::set. Size of set = number of unique elements!

complexity = O(n)

Mmm... this would lead to the question: in terms of what data structure does the STL implement sets?

A std::set is implemented as a red-black tree, so this "algorithm" would be O(n log n).

@Anonymous std::set is not implemented as a red-black tree, rather as a "binary search tree" and insertion in set would take logarithmic time and we find the size of set in constant time.

TRIE IS good option. but only probelm it wastes so much of space.

step1: scan all strings and construct trie.

step2 : do Trie traversal, find unique string counts.

TC O(n)

SC O(n)

Hash option is good - with the caveat that a good hashing function is used with very low probability of collision. Also note that the strings themselves don't need to be inserted into the hash table, we only need to generate the keys.

can this problem be solved using operating on the given array?

for example:

/*Let 's' be the size of the array of strings and let 'a' be the array of strings*/

int counter(char a[][])

{

for(int i=0;i<s;i++)

{

if(!strcmp(s[i],s[i+1])

{

count++;/*This is the variable which is initialised to zero and counts no. of matching string*/

}

}

if(count>0)

{

count++;

}

return (s-count);

}

I'm thinking to use hashMap in java, each char element in the string as a key. Only put the item into the table when it's the first time to find the key. Then the size of the items will be the number of distinct element. Please correct me if anything is wrong. thanks!

```
public int distinct_element(String str){
HashMap<Character, Boolean> map=new HashMap<Character, Boolean>();
for(int i=0;i<str.length();i++){
char value=str.charAt(i);
if(map.containsKey(value)==false){
map.put(value, true);
}
}
return map.size();
}
```

So,

I could read the 2 possible solutions:

1) By Hash Table: Which is straight Forward. We just need to come up with a good Hash Function

2) Build TRIE.

Is there any other good solutions?

Thanks

Use the first step of KMP.

For example, given a string "abab", the number of substrings is 4 * 5 / 2 = 10

Compute the next array, we got A= [-1 , -1 , 0 , 1]. This array implies the redundancies of the string. For example, A[2] = 0, we know that s[2] = s[0], there are one redundant substring. A[3] = 1, we know that s[2, 3] = s[0,1] and s[3] = s[1], there are two redundant substrings.

So the result is: n(n + 1) / 2 - SUM(A[i] + 1) i from 0 to n

Complexity: O(n)

Sorry. The algorithm is wrong.

We should use suffix array and lcp array to solve this problem. The complexity is O(n)

Give questions "cabab", we got the suffix array as follows

ab 0 2

aba 2 3

abab 3 4

b 0 5

bab 1 7

cabab 0 12

There are 12 distinct substrings in total

Time complexity is still O(n)

Use a suffix array and calculate the longest common prefix for the adjacent strings in the array.

Total no. of substrings = n*(n+1)/2

Now find the sum of all LCP for the suffix array.

No. of distinct substrings = Total no. of substrings - sum of all LCP

```
#include<cstdio>
#include<cstring>
#include<algorithm>
#include<cstdlib>
using namespace std;
int pstrcmp(const void* p, const void* q)
{
return strcmp(*(const char**)p, *(const char**)q);
}
int lcp(char *p, char *q)
{
int len = 0;
while(*p!='\0' && *p == *q){
len++;
p++;
q++;
}
return len;
}
int main()
{
char s[100];
printf("Enter string:\n");
scanf("%s",&s);
int len;
len = strlen(s);
char *a[len];
//Assign the suffix
for(int i=0;i<len;i++)
a[i] = &s[i];
//Sort the suffix array
qsort(a,len,sizeof(char *),pstrcmp);
int sum = 0;
for(int i = 0;i<len-1;i++){
int l;
l = lcp(a[i],a[i+1]);
sum = sum + l;
}
int res;
res = (len * (len+1))/2;
res = res - sum;
printf("%d\n",res);
return 0;
}
```

Could anyone tell me if my solution is O(n)?

```
std::vector<std::string> strings;
strings.push_back("hola");
strings.push_back("adeu");
strings.push_back("heli");
strings.push_back("caca");
strings.push_back("caca");
strings.push_back("pedo");
strings.push_back("heli");
strings.push_back("adeu");
strings.push_back("hola");
std::unordered_set<std::string> set;
for (unsigned int k = 0; k < strings.size(); ++k)
{
std::string string = strings.at(k);
set.insert(string);
}
int count = set.size();
```

Yes, you are right. But, why in the world would you answer my question with a simple "Not O(n)" instead of giving me the run time of my code?

Below is my Java code for this problem:

I used a HashMap. I would ask the interview does Capitalization matter? (Ex. Is Cat the same as CaT?)

If Capitalization matters, use the code below. If not, add

eachString = eachString.toLowerCase(); to the for loop.

```
public static int distinctString ( String [] inputArray){
Map <String, Integer > hashMapOfString = new HashMap <String, Integer > ();
int distinctCount =0;
for ( String eachString : inputArray){
if ( ! hashMapOfString.containsKey(eachString) ){
hashMapOfString.put(eachString,1);
distinctCount++;
}
}
return distinctCount;
}
```

1) In java we can create a Hash<String, int> where int will be the count(though we do not need that) Later we can find out the length of the hash. It will work for java.

- Sourav April 21, 20122) In case of C we can use trie to implement this.

Please let me know if there is any better solution than the above.