Goldman Sachs Interview Question
Software Engineer / DevelopersCountry: United States
Based on Prince
public static String findMostCommonString(String longString) {
if(longString.length()==0) return "";
String[] words = longString.split(" ");
Map<String, Integer> countMap = new HashMap<>();
for(String word : words) {
if(countMap.containsKey(word)) {
int count = (Integer)countMap.get(word);
countMap.put(word, ++count);
} else {
countMap.put(word, 1);
}
}
int max = 0;
String maxWord = "";
for(Map.Entry<String, Integer> entry: countMap.entrySet()) {
int count = entry.getValue();
if (count>max) {
maxWord = entry.getKey();
max = count;
}
}
return maxWord;
}
Yes, you can also create a has table based approach just like said by Prince above.
1. Hashtable<String,Integer> ht = new Hashtable<String,Integer>();
2. Tokenize each word and check the word as existing key in the table, if it exists increment the Integer part.
3. Finally check for each has slot and see whose Integer part is maximum, that will be the word whose frequency of occurence in the sentence is the highest.
What about this code .
private static void countMostCommonlyUsedWord(String line) {
Map<String, Integer> map = new LinkedHashMap<String, Integer>();
StringTokenizer token = new StringTokenizer(line);
int noFrequency=0;
String str="";
while (token.hasMoreTokens()) {
String strToken = token.nextToken();
if (map.containsKey(strToken)) {
map.put(strToken, map.get(strToken) + 1);
if(map.get(strToken)>noFrequency){
noFrequency=map.get(strToken);
str=strToken;
}
} else {
map.put(strToken, 1);
}
}
System.out.println(map);
System.out.println(str);
}
The ideal solution would describe how search engines work. What you need is -
1. A tokenizer
2. Stemming library
3. List of stop words
4. A hashmap
Method:
1. Tokenize the input text, filter tokens using stop words
2. Use stemmer to find stem of the given word
3. If the stemmed word is in hash map, increment its frequency count else add word to hash map with frequency 1
4. Along with word also store its position in the text
This is a solution to the initial question in code:
public static String mostCommonWord(String longString) throws InvalidDataException {
if (longString == null) {
throw new InvalidDataException();
}
HashMap<String, Integer> hashMap = new HashMap<String, Integer>();
String subString;
String mostCommon = null;
int mostTimes = 0;
StringTokenizer wholeString = new StringTokenizer(longString);
while (wholeString.hasMoreTokens()) {
subString = wholeString.nextToken();
if (hashMap.get(subString) == null) {
hashMap.put(subString, 1);
} else {
if (hashMap.get(subString) > mostTimes) {
mostCommon = subString;
}
hashMap.put(subString, (hashMap.get(subString) + 1));
}
}
return mostCommon;
}
here is the algo :
- Prince August 12, 20141. Create a map of <word, frequency> i.e <String, Integer>
2. Every time you have a word you look up in the map and if it returns null you do insert <word, 1> if it is not null then you read the value and increment it by 1.
3. After reading the string you Iterate the map on words and hold word which has higher frequency.
Time complexity is o(n) where n is the number of words in the string
space complexity is the o(m +1) where m is the unique words in the string.
Yes having a dictionary will help bocz we can remove invalid words by checking in the dictionary. This will reduce space complexity and time complexity. Further we can also massage data by ignoring stems
Instead of loading it as a string first I would stream data so that I avoid memory spike. Further our map size might increases as new words are added to it. Also, we can use map reduce type job were we create map<word, frequency> of each play in parallel and later join/collapse them this will reduce the time it will take to get result.
Common phrase should be no different then above algorithm. However we need to rebuild our index with <phase, frequency>