Google Interview Question for Software Engineer / Developers


Country: India
Interview Type: Phone Interview




Comment hidden because of low score. Click to expand.
2
of 2 vote

Seems to be a problem of information retrieval and text mining.
Cosine Similarity seems to be the trick .... We need to construct feature vectors involving tf-idfs and then carry out thier similarity....

- Sagar June 02, 2014 | Flag Reply
Comment hidden because of low score. Click to expand.
2
of 2 vote

Score each file's contents on several parameters:
1) no. of words
2) no. of letters
3) no. of spaces
4) count of each of the letters
5) count of each words
etc

Give weightage to each of the properties and arrive at a number (score). The closest of the scores is the answer.

- gi June 02, 2014 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

It has to be done in iterative method.
Pick randomly some lines from "hello" file. For example pick the lines in the following pattern.
The first line from the "hello" file, then move the cursor to the line at the 100th MB (in hello file), and then 200th MB and so on. Compare it against the each and every file and which ever the file has more number of "found" entries focus on that one and go to the next file.

- Raj May 30, 2014 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

if it is exact match without writing any code will be using chksum utility in linux/unix

- Moses May 30, 2014 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 4 vote

If we have known the right answer for this then we are founder of google

- Anonymous May 31, 2014 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

there are 2 approaches to solve this problem:
Solution 1: This is an expensive approach though..for comparing each word of the source to each word of the target file.
1. Compare each word from "Hello" file with each word from each of the "target files".
2. Set a match index based on the comparison. There is an option to not count the duplicate words found during comparison.
3. The max value of the ten match indexes corresponds to the file having closest match.

Solution 2:
1. Let's consider source file Hello is a Dictionary of words.
2. Compare the target files with the Dictionary of words and get target match indexes.
3. Based on the closest number we can decide which file is the closet one.
eg: Source Match Index = 45
Target 1 Match Index = 10
Target 2 Match Index = 20
Target 3 Match Index = 30
Target 4 Match Index = 40
Target 5 Match Index=50.... etc

if (Source file size < Target File Size)

then Target 5 would be a closest file because most of the source file words could be present in target.
else target 4 would be the closet option.

- Anonymous June 01, 2014 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 votes

Below is the code for the problem. I took an example of three files instead of 10 files..

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;

namespace ClosestFileMatching
{
public class ClosestFileMatch
{

public string[] targetfilename = new string[] {"C:\\Test\\one.txt", "C:\\Test\\two.txt", "C:\\Test\\three.txt"};
public int[] MatchIndexArray = new int[3];
public string sourcefilename;
public string sourceword;
int matchindex;
Dictionary<string, string> sourceDict = new Dictionary<string, string>();


/// <summary>
/// This method compares words between target files and the source dictionary and returns a matchindex Array.
/// </summary>
/// <param name="word"></param>
/// <returns></returns>
public int[] CompareWords()
{

string word;
char[] buffer = new char[256];

for (int i = 0; i < 3; i++)
{
foreach (string filename in targetfilename)
{
StreamReader tr = new StreamReader(filename);
word = tr.ReadLine();
string[] split_targetwords = word.Split(' ');
foreach (string targetstr in split_targetwords)
{
if (sourceDict.ContainsKey(targetstr))
{
matchindex++;
}

}

MatchIndexArray[i] = matchindex;
matchindex = 0;
}
}
return MatchIndexArray;
}


/// <summary>
/// Split words in the file and create a Dictionary..
/// </summary>

public void SplitWordsAddDict()
{
StreamReader sr = new StreamReader(sourcefilename);
int i = 0;
while (sr.Peek() >= 0)
{
sourceword = sr.ReadLine();
string [] split_words = sourceword.Split(' ');
foreach (string str in split_words)
{
sourceDict.Add(sourceword.Split(' ')[i], sourceword.Split(' ')[i]);
i++;
}
}

}

static void Main(string[] args)
{

ClosestFileMatch filematch = new ClosestFileMatch();
filematch.sourcefilename = "C:\\Test\\SourceFile.txt";
filematch.SplitWordsAddDict();
filematch.CompareWords();
Console.WriteLine(filematch.MatchIndexArray.Max());
for (int i = 0; i < filematch.MatchIndexArray.Length - 1; i++)
{

if (filematch.MatchIndexArray[i] == filematch.MatchIndexArray.Max())
{

Console.WriteLine(filematch.targetfilename[i]);
}
}


}
}
}

- Anonymous June 02, 2014 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 vote

Minhash or SimHash

- fang-l10 June 12, 2014 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

There are many File Similarity Algorithms for these kind of problems, like - TTTD (Two Threshold Two Divisor) Algorithm, Rabin's Fingerprinting etc. File sizes here are very large, so the dead is to generate the fingerprints from a file to represent that using some hash functions like - simHash or minHash.

- suri July 30, 2014 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

Closest one is with maximum longest common subsequences,

- Madiyar92 August 01, 2014 | Flag Reply


Add a Comment
Name:

Writing Code? Surround your code with {{{ and }}} to preserve whitespace.

Books

is a comprehensive book on getting a job at a top tech company, while focuses on dev interviews and does this for PMs.

Learn More

Videos

CareerCup's interview videos give you a real-life look at technical interviews. In these unscripted videos, watch how other candidates handle tough questions and how the interviewer thinks about their performance.

Learn More

Resume Review

Most engineers make critical mistakes on their resumes -- we can fix your resume with our custom resume review service. And, we use fellow engineers as our resume reviewers, so you can be sure that we "get" what you're saying.

Learn More

Mock Interviews

Our Mock Interviews will be conducted "in character" just like a real interview, and can focus on whatever topics you want. All our interviewers have worked for Microsoft, Google or Amazon, you know you'll get a true-to-life experience.

Learn More