Given a string and a number k,

Interview Question for Senior Software Development Engineers

0

of 0 votes

9
Answers
Given a string and a number k, break the string up into tokens and return the k most frequently occurring tokens. The algorithm should take O(n) time, where n is the length of the string.
- rangarajan.chari July 16, 2014 in United States | Report Duplicate | Flag | PURGE
Senior Software Development Engineer Algorithm

Email me when people comment.

An error occurred in subscribing you.

Country: United States
Interview Type: Written Test

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 0 vote

If by tokens you mean words, keep a hash table to maintain count. If you mean characters, keep an array of size 256 to maintain count.

- alex July 16, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

To find k most frequent, use selection algorithm.

- alex July 16, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

1. Hash table: Because of the possibility of collisions, each insertion could take O(m) time, where m is the number of tokens or substrings, which is O(n). So complexity of inserting all tokens is worst-case O(m^2).
2. As for selection, selecting the kth element without sorting is O(n) in the worst case, but you have to do that O(m) times, m being the number of words/tokens.

I decided that an approach using suffix trees is the best for this problem. Suffix trees can be constructed in O(n) time, n being the size of the text string. One doesn't have to tokenize it beforehand - it is done for free when inserting the whole text into the tree. At the end node of each word, store pointers to the ends of the word in the text and a count of the current number of occurrences. At the end collect all these <word, # of occurrences> pairs, and do a radix sort. We thus end up with an O(n) algorithm.

Any comments are welcome.

- rangarajan.chari July 16, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

I dont think Alex solution is so bad, he is asking to maintain count not insert in hash table, which should be constant theoretically, practically unless I insert same set of keys, collision will hardly occur so this works as O(n). Coming to selection you can use max heap of size k, to get max k elements in a constant time. nlog(k) is complexity, in a worst case of k=n(or approaching n), heap insertion becomes constant.

By the way, what is the interviewer response to your solution

- R July 21, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

Use a hash method and count each duplicate tokens upon hashing. For each hashed item count the duplicate, and the item which has maximum number of hash slot encountered should be the frequently repeated item, producing complexity of O(n).

- Bhupendra July 16, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Code is as below-

import java.util.HashMap;
import java.util.Map;
import java.util.Scanner;

/**
 * @author kedar dixit
 *
 */
public class WordCount {

	public static void main(String[] args) {
		Scanner scanner = new Scanner(System.in);
		System.err.println("Enter A String");
		String string = scanner.nextLine();
		Map<String, Integer> map = new HashMap<>();
		String[] tokens = string.split(" ");
		int maxVal = 0;
		for (String token : tokens) {
			map.put(token, (map.get(token) == null ? 0 : map.get(token)) + 1);
			if (maxVal < map.get(token)) {
				maxVal = map.get(token);
			}
		}
		System.err.println(map);
		System.out.println(maxVal);
	}
}

Sample Input and Output-

Enter A String
Enter A StringEnter A StringEnter A StringEnter A StringEnter A StringEnter A StringEnter A StringEnter A StringEnter A StringEnter A StringEnter A StringEnter A StringEnter A StringEnter A StringEnter A StringEnter A StringEnter A StringEnter A StringEnter A StringEnter A StringEnter A StringEnter A StringEnter A StringEnter A StringEnter A StringEnter A StringEnter A StringEnter A String
{A=28, String=1, Enter=1, StringEnter=27}
28

- kedarsdixit July 16, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

That's not what the question is asking. It needs to print out the top K most frequent tokens where K can be 1, 2, 3, 4... Your solution just prints out the maximum occurrrence of a token.

- wasabi July 16, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

need clarification:
input: 121221312
output: 1 , 2 , 12 with tokenization 1 2 12 2 1 3 12 ??

input: 1212213
output: 1 , 2 , 3 with tokenization 1 2 1 2 2 1 3 ??

- just_do_it July 18, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

The usual delimiters for tokens apply, so the string "121221312" is simply one token, whereas "1 2 1 2 2 1 3 12" would be 8 tokens, and the output would be "1 2 12 3".

I forgot to say that when two strings are equally frequent, they are reported in lexicographic order.

- rangarajan.chari July 19, 2014 | Flag

CareerCup

Interview Question for Senior Software Development Engineers

Books

Videos

Resume Review

Mock Interviews