Microsoft Interview Question
Software Engineer in TestsGroup all words of equal size and add the elements and determine the total. Words with equal total will have the same distribution of words. ASCII value of a=97, b=98, c=99, d=100, e=101
abc => a + b + c => 97 + 98 + 99 => 294
bbe => b + b + e => 98 + 98 + 101 => 298
bca => b + c + a => 98 + 99 + 97 => 294
eeb => e + e + b => 101 + 101 + 99 = 301
Since can store the above in a table and all those with the same total will have the same words.
1) Associate a Prime number with each letter.
2) Hash by product of the word letter.
3) Maintain a list of words for each key
4) Iterate and print all lists having more than one word
For each word, compute an array letter[26]={}.
For example abc 's array is { 1, 1, 1, 0, ..... 0}.
Then view each array as a binary number and compute the corresponding decimal number.
Sort all the decimal result and print out the words have the same decimal numbers.
Ze code:
#include <map>
#include <string>
#include <vector>
#include <iostream>
using namespace std;
void Tokenize(const string& str,
vector<string>& tokens,
const string& delimiters = " ")
{
// Skip delimiters at beginning.
string::size_type lastPos = str.find_first_not_of(delimiters, 0);
// Find first "non-delimiter".
string::size_type pos = str.find_first_of(delimiters, lastPos);
while (string::npos != pos || string::npos != lastPos)
{
// Found a token, add it to the vector.
tokens.push_back(str.substr(lastPos, pos - lastPos));
// Skip delimiters. Note the "not_of"
lastPos = str.find_first_not_of(delimiters, pos);
// Find next "non-delimiter"
pos = str.find_first_of(delimiters, lastPos);
}
}
void printSame(string sent) {
map<string, vector<string> > anagrams;
string hash_key;
vector<string> words;
Tokenize(sent, words, " ");
for (int i = 0; i < words.size(); i++) {
hash_key = words[i];
sort(hash_key.begin(), hash_key.end());
anagrams[hash_key].push_back(words[i]);
}
map<string, vector<string> >::iterator itr;
vector<string> curAnas;
for (itr = anagrams.begin(); itr != anagrams.end(); itr++) {
curAnas = itr->second;
if (curAnas.size() > 1) {
for (int i = 0; i < curAnas.size(); i++) {
cout << curAnas[i] << ' ';
}
}
}
}
int main(int argc, char** argv) {
string sentence(argv[1]);
printSame(sentence);
return 0;
}
A better solution
for each letter assign a digit > 0 e.g. a=1, b=2 ...
now compute the sum,multiplication and length triplet for each word
e.g. abc => (6, 6, 3)
cba => (6, 6, 3)
aad => (6, 8, 3)
use triplet as a key in Hashmap and let list of string be the value stored against it.
Traverse the Hashmap and print all the values.
Algorithm:
struct word
{
int len; // length of word;
int alpha[26]; // count a to z numbers
int iStart; // start char index in string
}
1. go through the whole string; find every word and fill the data of every word.
O(n) where n is the length of the string.
Space: sizeOf(word) x m, where m is the number of word.
2. Sort on m word by word.len, O(m);
Ignore those lens that do duplicates. Compare their alpha[i], i=0 to 25 for those have the same length. If same, print out.
This is the tough part.
void sortstr(char *str)
{
int i=0,j=0;
char ch;
int len=strlen(str);
for(i=0;i<len;i++)
{
for(j=0;j<len-1-i;j++)
if(str[j]<str[j+1])
{
ch=str[j];
str[j]=str[j+1];
str[j+1]=ch;
}
}
}
void anagram(char *str)
{
char *tmp;
char arr[100][100]={'\0'};
int i=0,j=0,k=0,len=0,spaces=0,flag=0;
while(str[i]!='\0')
{
if(str[i++]==' ')
spaces++;
}
len=strlen(str);
if(len>0)
{
tmp=(char *)malloc(len);
memset(tmp,'\0',len);
for(i=0,k=0;str[i];i++)
{
j=0;
while(str[i]!=' '&&str[i]!='\0')
{
tmp[j++]=str[i++];
}
sortstr(tmp);
strcpy(arr[k],tmp);
k++;
memset(tmp,'\0',len);
}
for(i=0;i<spaces+1;i++)
{
for(j=i+1;j<spaces+1;j++)
{
if(!strcmp(arr[i],arr[j]))
flag=1;
}
if(flag)
{
flag=0;
printf("%s\n",arr[i]);
}
}
}
}
int main()
{
char str[] = "abc hello bca elloh dell till hir";
anagram(str);
getch();
return 0;
}
O(n) algo:
=========use std::set<string>=====
1. traverse the sentence. For each word, reform the word as follows
: "aab"->"a2b1". "ccaa"->"a2c2".
For each char c in the every word, Use int alphabet[26]: alphabe[c-'a']++. Then convert alphabet[26] to string like "a2c2". Put the modified string into the set.
O(n) traversal in total.
==========
2. for the next word, if set.find() is true, then it is an anagram.
The set is usually implemented by binary tree, so O(logN) for find. You can use hashtable as well, it would be O(1).
How about this?
1. go through the sentence, for each word, sort it to the standard form. That is, abc->abc, bca->abc, eeb->bee, etc. But always keep a pointer to its original form. For each word, we need O(m) by using counting sort. Totally we need O(mn), n is # of words.
2. We let a->1, b->2, etc., then we convert each word to a decimal number. For example, abc->123. aad->114. It takes O(mn) in total.
3. Sort the array. For repeated numbers, print the original form. O(nlogn).
Can it be done in more simpler way: O(n * 2)
Read all words in array 'inputWords'
for i = 0 to inputWords.length {
for j = 0 to inputWords.length {
if(i == j || inputWords[i].length != inputWords[j].length)
continue;
String lhs = inputWordsp[i];
String rhs = inputWordsp[j];
// Fetch each character of lhs, search the first occurence
// and replace it with blank
// When all characters of lhs are exhausted, if rhs is blank
// then lhs and rhs are anagrams.
}
}
There are some improvements that can be made.. like conversion between stringstream to string once etc..
string finddups(string in){
string out="";
map<string,string> sign,matches;
if(in.size()==0)
return out;
stringstream ss(in);
string word;
while(ss >> word){
string sorted(word);
sort(sorted.begin(),sorted.end());
if(sign.find(sorted) != sign.end())
matches.insert(pair<string,string>(sorted,word));
else
sign.insert(pair<string,string>(sorted,word));
}
if(matches.size()==0)
return out;
word="";
stringstream ss1(in);
while(ss1 >> word){
string sorted(word);
sort(sorted.begin(),sorted.end());
if(matches.find(sorted) != matches.end())
out +=word +" ";
}
return out;
}
The idea is to use two hashes(kind of expensive but in interview this is straight forward to code) one- 'sign' to hold the sortedorder -> word. Another checks with the sortedorder hash and if present inserts into 'match' hash. Now again iterate over the string and if the sorted order of that string is present in match spit it out.
Idea is based on non-repeated chars from PIE
// Solved this way. Is there a better approach (without STL and other libraries)?
#define INPUT_STR "abc sjhd oahuf iad p pofdf ohdc pod oijsdf cba oidsf dai oif p iodf d oijsdf upls osofs psdf pjscp sfdo psj cpojd sdfop sjiof pncdis adi siojf k psfj o oisjdf e osdjf p ojsdf lkioodf"
template <class T>
void qsort(T *ap_arr, int a_start, int a_end) // Quick sorting algorithm
{
if (a_start < a_end)
{
int i, j, m;
T c;
for (j = a_end, i = a_start, m = 1; i < j; m > 0 ? j-- : i++)
{
if (ap_arr[i] > ap_arr[j])
{
c = ap_arr[i];
ap_arr[i] = ap_arr[j];
ap_arr[j] = c;
m = -m;
}
}
qsort(ap_arr, a_start, i - 1);
qsort(ap_arr, i + 1, a_end);
}
}
struct WordDesc // A struct to remember the found words
{
WordDesc *mp_nextDesc;
unsigned int m_checkSum; // A chack sum of a word
char *mp_start; // Points to the beginning of the word.
unsigned int m_len; // The lenght of the word (stored for the sake of speed).
};
int _tmain(int argc, _TCHAR* argv[])
{
char inputStr[] = INPUT_STR;
char *strBeg = inputStr;
char *strIdx = strBeg;
WordDesc *p_listHead = NULL, *p_listTail = NULL; // List pointers
while (' ' == *strIdx) strIdx++; // Skip spaces in the beginning
while (*strIdx != 0)
{
strBeg = strIdx;
unsigned int len = 0;
while ((' ' != *strIdx) && (0 != *strIdx)) // Get the length of the first word
{
strIdx++;
len++;
}
char *p_tmpStr = new char[len]; // Since we can not change words, we have to create a copy for sorting
if (p_tmpStr)
{
for (int i = 0; i < len; i++) p_tmpStr[i] = strBeg[i];
qsort(p_tmpStr, 0, len - 1); // Sort it
unsigned int checkSum = 0;
for (int i = 0; i < len; i++)
{
checkSum += p_tmpStr[i] * (i + 1); // "Unique" check sum
}
checkSum += len;
delete [] p_tmpStr;
WordDesc *p_newItm = new WordDesc; // Create a struct
if (p_newItm)
{
if (NULL == p_listTail)
{
p_listHead = p_newItm; // Init the head
}
else
{
p_listTail->mp_nextDesc = p_newItm; // Add to the list
}
p_listTail = p_newItm; // Init the struct
p_newItm->mp_nextDesc = NULL;
p_newItm->m_len = len;
p_newItm->m_checkSum = checkSum;
p_newItm->mp_start = strBeg;
}
else
{
break; // The result is likely to be wrong due to no memory for proper analysis
}
}
else
{
break; // The result is likely to be wrong due to no memory for proper analysis
}
while (' ' == *strIdx) strIdx++;
}
// Now we have to analyse the result and while we traverse it, we also clean it
while (p_listHead)
{
unsigned int doPrint = 0;
p_listTail = p_listHead;
while (p_listTail) // Is there a struct with the same check sum as the first one in the list?
{
p_listTail = p_listTail->mp_nextDesc;
if (p_listTail)
{
if (p_listHead->m_checkSum == p_listTail->m_checkSum)
{
doPrint = p_listHead->m_checkSum; // Yes, we found one.
break;
}
}
}
p_listTail = p_listHead;
if (doPrint) // There are structs with equal check summs, so we print those and delete
{
WordDesc *p_listPrev = NULL;
while (p_listTail)
{
if (p_listTail->m_checkSum == doPrint)
{
for (int i = 0; i < p_listTail->m_len; i++) printf("%c", p_listTail->mp_start[i]);
printf("\n");
if (p_listPrev)
{
p_listPrev->mp_nextDesc = p_listTail->mp_nextDesc;
delete p_listTail;
p_listTail = p_listPrev->mp_nextDesc;
}
else
{
p_listHead = p_listTail->mp_nextDesc;
delete p_listTail;
p_listTail = p_listHead;
}
}
else
{
p_listPrev = p_listTail;
p_listTail = p_listTail->mp_nextDesc;
}
}
}
else
{
p_listHead = p_listHead->mp_nextDesc; // The chack summ in the first structure of the list is unique, so we delete it.
delete p_listTail;
p_listTail = p_listHead;
}
}
return 0;
}
Basically, this question is to find all anagrams in a given sentence.
- acoader@gmail.com November 19, 2008I'll present only the idea here:
(1) for each word generate a key- for eg., for eeb, the key is "bee", i.e., sorted. The sorting takes o(n) time, where n is the length of the word. This can be achieved by using a count array of size 26 (assuming only a-z), where each slot gives how many times an alphabet occurs.
(2) After computing the key, add it to a hash map, where the key is the above key, and value is the list of words (or a pointer to it). Add the word to the value list for this key.
(3) After repeating (2) for each word in the sentence, iterate the hash map, and for each key such that the value list has more than 1 element, print the list.
SPACE: O(len) where len is the number of characters. Basically this space is needed for the hash map. But typically hash maps from library preallocate space, so the actual space used depends on the library characteristics.
TIME: O(len). this is because keys are generated on the fly while traversing the sentency from left to right. The count array is reset at the start of the next word, which is after a space.