Google Interview Question for Software Engineers


Country: United States
Interview Type: Phone Interview




Comment hidden because of low score. Click to expand.
0
of 0 vote

creates a set out of ngrams for document1 and cross checks ngrams from document2

public Set<String> ngramIntersection(String d1, String d2, int n){
		//create n gram and add it to a set
		Set<String> ns1 = new HashSet<>();
		String words[] = d1.split(" ");
		
		for(int i=0;i<words.length-n;i++){
			StringBuilder sb = new StringBuilder();
			for(int j=0;j<n;j++){
				sb.append(words[i+j]+" ");
			}
			ns1.add(sb.toString());
		}
		words = d2.split(" ");
		Set<String> result = new HashSet<>();
		for(int i=0;i<words.length-n;i++){
			StringBuilder sb = new StringBuilder();
			for(int j=0;j<n;j++)
				sb.append(words[i+j] + " ");
			if(ns1.contains(sb.toString()))
				result.add(sb.toString());
		}
		return result;
		
	}

- Phoenix June 20, 2015 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 votes

Looks like order of the words matters in the question. Otherwise, Ngrams for doc1 would have been = 'This is a', 'is a dog', 'This is dog', 'This a dog'.
With this assumption the algo is incorrect.

I would propose a kind of modified version of KMP-string comparison, replacing characters with words. We scan a doc word by word and fill hash(doc_word[i]...doc_word[i+n]) so that the order matters (similar to KMP). Do the same for the next doc and check if the hash already contains an element with a particular key.

- Alex M. June 01, 2016 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 vote

public static void compare(String a,String b){
StringBuilder n = new StringBuilder();
int i=0,j=0;
while(a.charAt(i)==b.charAt(i)){
System.out.print(a.charAt(i));
if(a.charAt(i)==' '){
j++;
}
i++;
}
System.out.println(j);
}

- Gayu June 21, 2015 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

public Set<String> getIntersection(Set<String> a, Set<String> b) {
		Set<String> intersection = new HashSet<String>();
		for (String ngram : a) {
			if (b.contains(ngram)) {
				intersection.add(ngram);
			}
		}

		return intersection;
	}

	public Set<String> getNGrams(String document, int n) {
		String[] words = document.split(" ");
		Set<String> ngrams = new HashSet<String>();
		for (int i = 0; i <= words.length - n; i++) {
			String ngram = "";
			for (int j = i; j < i + n; j++) {
				ngram += words[j];
				ngram += " ";
			}
			ngrams.add(ngram.trim());
		}

		return ngrams;
	}

- mo July 08, 2015 | Flag Reply
Comment hidden because of low score. Click to expand.
-1
of 1 vote

In N-Gram problems, n is often very small compare to M and N, where M and N are size of the documents in words.

Ignoring the case that words are separated with multiple space, I've written a simple code that implements a O((M+N)n) time complexity

public class Interview {
	private void process(String file, HashMap<String, Boolean> cmap, int n){
		String[] words = file.split(“ “);
		if (words.length < n)
			return;
		for (int i = 0; i<words.length-n; i++){
			StringBuilder sb = new StringBuilder();
			for (int j = i; j<n+i; j++)
				sb.append(words[j] + “ “);
			String key = sb.toString();
			if (cmap.containsKey(key))
				cmap.put(key, Boolean.True);
			else
				cmap.put(key, Boolean.False);			
		}
	}

	public List<String> intersectionNGram(String file1, String file2, int n){
		List<String> result = new ArrayList();
		HashMap<String, Boolean> cmap = new HashMap();
		process(file1, cmap, n);
		if (cmap.size()==0)
			return result;

		process(file2, cmap, n);
		for (String key : cmap.keySet()){
			if (cmap.get(key))
				result.add(key);
		}
		return result;
	}
}

- hieu.pham June 20, 2015 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 votes

Good one. I think you can replace HashMap with HashSet

- Phoenix June 20, 2015 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

Nice answer but I think there might be a problem.

If I understand your code correctly, you would count a repeated Ngram in the first document as an intersection.

- Dyjor June 20, 2015 | Flag
Comment hidden because of low score. Click to expand.
1
of 1 vote

That's a nice solution,

However when I look at your code I think that there is a small mistake.
Indeed if I understand correctly, you would count a duplicated Ngram in the first document as an intersection.

- Dyjor June 20, 2015 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

That's a nice solution,

However when I look at your code I think that there is a small mistake.
Indeed if I understand correctly, you would count a duplicated Ngram in the first document as an intersection.

- Dyjor June 20, 2015 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

That's a nice solution,

However when I look at your code I think that there is a small mistake.
Indeed if I understand correctly, you would count a duplicated Ngram in the first document as an intersection.

- Dyjor June 20, 2015 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

That's a nice solution,

However when I look at your code I think that there is a small mistake.
Indeed if I understand correctly, you would count a duplicated Ngram in the first document as an intersection.

- jordy.domingos June 20, 2015 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

set<string> process(string& file, int n)
{
	ostringstream oss;
	vector<string> words;
	set<string> ngrams;
	split(file, ' ', words);
	for (size_t i = 0; i <= words.size() - n; i++)
	{
		oss << words[i] << ' ' << words[i+1] << ' ' << words[i+2];
		ngrams.emplace(oss.str());
		oss.str("");
	}
	return ngrams;
}

set<string> intersectionNgram(string& file1, string& file2, int n)
{
	set<string> result;
	set<string> ngrams1 = process(file1, n);
	if (!ngrams1.size())
		return result;
	set<string> ngrams2 = process(file2, n);
	set_intersection(ngrams1.begin(), ngrams1.end(), ngrams2.begin(), ngrams2.end(), std::inserter(result,result.begin()));
	return result;
}

- Teh Kok How June 28, 2015 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

How about "cat is cat" & "dog is dog" example? That's not gonna work..

- Alex M. June 01, 2016 | Flag
Comment hidden because of low score. Click to expand.


Add a Comment
Name:

Writing Code? Surround your code with {{{ and }}} to preserve whitespace.

Books

is a comprehensive book on getting a job at a top tech company, while focuses on dev interviews and does this for PMs.

Learn More

Videos

CareerCup's interview videos give you a real-life look at technical interviews. In these unscripted videos, watch how other candidates handle tough questions and how the interviewer thinks about their performance.

Learn More

Resume Review

Most engineers make critical mistakes on their resumes -- we can fix your resume with our custom resume review service. And, we use fellow engineers as our resume reviewers, so you can be sure that we "get" what you're saying.

Learn More

Mock Interviews

Our Mock Interviews will be conducted "in character" just like a real interview, and can focus on whatever topics you want. All our interviewers have worked for Microsoft, Google or Amazon, you know you'll get a true-to-life experience.

Learn More