Interview Question
Country: India
Interview Type: Written Test
Hi Ayahuasca ,
Is there any possibility to put the entire string in array and then we can write an algo.
@Anonymous,
Without using extra space, or by using a constant amount of extra space, I can think of a naive algorithm, that given a word, compares it with all other words to find out the one with maximum occurrence count. Time complexity for this would be O(n^2).
Alternatively, in order to be more space-efficient, we can use a trie to save the words and maintain a count variable at each node. This saves space in the sense, all the words that have a common prefix will have the prefix characters specified only once in the trie.
While building the trie, keep track of the word that has maximum occurrence count in the list of words seen so far.
Here is the code to solve this problem using array with O(n) time complexity:
a. First for all the words calculate the key value and make an array of the key value. Here key value is calculated by multiplying the integer value of character by 10. Also if the character is uppercase then it is converted to lower case as ARe and are are same words.
b. Now if the words are repeated then they will have the same key value. So now problem reduces to find the number in the array repeated maximum number of times.
c. This is done by taking the count array and initializing it to 0 and further incrementing the count value for every key value of the array taken as index.
d. Then take that key value and then using that key find the corresponding index in the string array. (here it is found by using the space count)
e. Then take that start index and then using that start index print the word till the space is not encountered.
#include <stdio.h>
#include <conio.h>
#include <string.h>
#include <math.h>
#include <limits.h>
#define MAX_SIZE 32768
void findRepeatedWord(char str[],int n)
{
int i,k=0;int num=0,numArray[MAX_SIZE];
for(i=0;str[i]!='\0';i++)
{
if(str[i]>=65&&str[i]<=90)
{
str[i]=str[i]+32;
}
if(str[i]!=' ')
{
num=num+10*str[i]-str[i+1];
}
else
{
numArray[k]=num;
k=k+1;
num=0;
}
}
numArray[k]=num;
k=k+1;
int count[MAX_SIZE];
int max_count=0;
for(i=0;i<MAX_SIZE;i++)
{
count[i]=0;
}
for(i=0;i<k;i++)
{
count[numArray[i]]++;
}
int max=count[0],max_index,idx;
for(i=0;i<MAX_SIZE;i++)
{
if(count[i]>max)
{
max=count[i];
max_index=i;
}
}
for(i=0;i<k;i++)
{
if(max_index==numArray[i])
{
idx=i;
break;
}
}
int start_index,count_space=0;
for(i=0;i<n;i++)
{
if(idx==count_space)
{
start_index=i;
break;
}
if(str[i]==' ')
{
count_space=count_space+1;
}
}
printf(" Maximum word repeated is ");
for(i=start_index;str[i]!=' ';i++)
printf("%c",str[i]);
printf(" with %d number of count ",max);
}
int main()
{
char str[]="Are are you are my boy are";
int n=strlen(str);
findRepeatedWord(str,n);
}
>> a. First for all the words calculate the key value and make an array of the key value. Here key value is calculated by multiplying the integer value of character by 10. Also if the character is uppercase then it is converted to lower case as ARe and are are same words.
Your method will not work if, along with "are", there are words like "ear" and "era" in the sentence, as they all compute to the same value. Even if you change the computation of key to something like multiplying each character by 10^x(or by using some radix(r), r^x) based on the position of the character in the word, the chances of collision are high.
For that matter, no matter what mapping function you use to map the strings to (a finite set of) numbers, there are going to collisions and to resolve them, you will have to go for a data-structure like a hashtable. A hashtable uses hash-functions(which do better distribution of numbers) and readily provides you a collision resolution mechanism.
Build a suffix tree in linear time with respect to length of string.
Check for words starting at the root beginning with a non alpha character. The word ends when you hit another alpha char. Check for special cases obviously. If you detect a word, count how many end chars there are further down the tree. The branch starting with a complete word and has the most end leaves is your winner.
In your example, the tree will have a branch that goes
<space> <d> <o> <end>
<space><y><o><u><space><d><o><end>
Use a hashtable with words as keys and occurrence word-count as values. Also make use of 2 running variables 1) max_repeated_word and 2)max_count.
- Murali Mohan July 22, 20130. Every time a word is read from the input, check if it is already present in the hash table. If not add to it with word-count =1.
1. If the word is already present, increment the word-count. Now check this word-count against max_count. If word-count > max_count, max_count := word-count and max_repeated_word := current word.
2. Repeat steps 1 and 2 until all the input is read.
3. print out max_repeated_word