Google Interview Question for Software Engineer / Developers






Comment hidden because of low score. Click to expand.
0
of 0 vote

At high level, sort file data then print unique lines.
Sorting is the real problem here.Since it is a file which can be too big and memory overflow may occur if we put complete file in main memory and try to sort it.
I believe we need to use external R way merge sorting here. The way unix sort works.
Look like "sort -u" unix implementation would be the right solution here.

- Anurag Singh February 03, 2011 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

or u could use a hash table?

- February 04, 2011 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 votes

Wouldn't a hash set be more efficient?

- Apoorva March 07, 2011 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 vote

Use the getLine() method to retrieve the lines from the file and put them in a Hash table .... it will only contain the unique lines... Print the hash table contents

- Ajay Singh February 04, 2011 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

^v and Ajay Singh.. Yes, hashing looks much better option here (instead of sorting and then printing unique)

- Anurag Singh February 04, 2011 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

@ v and Ajay...can u explain more how would u do that?like what are the key and value pairs here?and wat is de hash function?

- Sathya February 04, 2011 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 votes

key = "line", value = 1
if value already 1 then its a duplicate

- GekkoGordan February 04, 2011 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

I m sorry but still dont get it. Lets say u hv n lines
Now ur to print kth line and already printed p lines. So u need to string compare this kth line with all the p lines u printed to know if its a duplicate or not...isn't it?or can u elaborate ur method a lil more?

- Sathya February 04, 2011 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

With hash, it doesn't look as simple. One thing can done is:
Have some identifier which should imply duplicate (This identifier should not be a line in file) then
1.Read lines in file and store it in hash IF line is not already in hash. If line already there, put DUPLICATE identifier in hash.
2.Once complete file is read, scan the hash and print values (lines) which are not DUPLICATE identifier.

But this doesn't seem to be a good soln due to following:
1. DUPLICATE identifier may be a line in file
2. Size of file, If too big, memory issue. External Hashing ???

- Anurag Singh February 04, 2011 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 vote

If file size is too big, do multiple scans of file. In first scan consider only lines starting with 'a', in 2nd scan with 'b'... and so on. or a variation of scanning method to ensure scanning is uniform.

- ajay.mishra February 06, 2011 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

If the lines are too big, one idea is to hash each line and store it in memory and then in second pass, very that the lines are actually equal.

- Anonymous February 06, 2011 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

Or maybe use a prefix tree? When a new line comes in traverse the prefix tree and see if you have the corresponding path in the tree, if not then it's a new line otherwise add the path in the tree.

Compared to hash it will save you some space.

- Jianzhao Huang February 13, 2011 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

Use hashing. If file size is too big, Use bloom filter.

- mohan February 17, 2014 | Flag Reply


Add a Comment
Name:

Writing Code? Surround your code with {{{ and }}} to preserve whitespace.

Books

is a comprehensive book on getting a job at a top tech company, while focuses on dev interviews and does this for PMs.

Learn More

Videos

CareerCup's interview videos give you a real-life look at technical interviews. In these unscripted videos, watch how other candidates handle tough questions and how the interviewer thinks about their performance.

Learn More

Resume Review

Most engineers make critical mistakes on their resumes -- we can fix your resume with our custom resume review service. And, we use fellow engineers as our resume reviewers, so you can be sure that we "get" what you're saying.

Learn More

Mock Interviews

Our Mock Interviews will be conducted "in character" just like a real interview, and can focus on whatever topics you want. All our interviewers have worked for Microsoft, Google or Amazon, you know you'll get a true-to-life experience.

Learn More