Given a file and and API conta

Google Interview Question for Software Engineer / Developers

0

of 0 votes

13
Answers
Given a file and and API containing getline() to read lines from it, how will u print only the unique lines from it.
- psycho February 03, 2011 | Report Duplicate | Flag | PURGE
Google Software Engineer / Developer Algorithm

Email me when people comment.

An error occurred in subscribing you.

More Questions from This Interview

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 0 vote

At high level, sort file data then print unique lines.
Sorting is the real problem here.Since it is a file which can be too big and memory overflow may occur if we put complete file in main memory and try to sort it.
I believe we need to use external R way merge sorting here. The way unix sort works.
Look like "sort -u" unix implementation would be the right solution here.

- Anurag Singh February 03, 2011 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

or u could use a hash table?

- February 04, 2011 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Wouldn't a hash set be more efficient?

- Apoorva March 07, 2011 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

Use the getLine() method to retrieve the lines from the file and put them in a Hash table .... it will only contain the unique lines... Print the hash table contents

- Ajay Singh February 04, 2011 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

^v and Ajay Singh.. Yes, hashing looks much better option here (instead of sorting and then printing unique)

- Anurag Singh February 04, 2011 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

@ v and Ajay...can u explain more how would u do that?like what are the key and value pairs here?and wat is de hash function?

- Sathya February 04, 2011 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

key = "line", value = 1
if value already 1 then its a duplicate

- GekkoGordan February 04, 2011 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

I m sorry but still dont get it. Lets say u hv n lines
Now ur to print kth line and already printed p lines. So u need to string compare this kth line with all the p lines u printed to know if its a duplicate or not...isn't it?or can u elaborate ur method a lil more?

- Sathya February 04, 2011 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

With hash, it doesn't look as simple. One thing can done is:
Have some identifier which should imply duplicate (This identifier should not be a line in file) then
1.Read lines in file and store it in hash IF line is not already in hash. If line already there, put DUPLICATE identifier in hash.
2.Once complete file is read, scan the hash and print values (lines) which are not DUPLICATE identifier.

But this doesn't seem to be a good soln due to following:
1. DUPLICATE identifier may be a line in file
2. Size of file, If too big, memory issue. External Hashing ???

- Anurag Singh February 04, 2011 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

If file size is too big, do multiple scans of file. In first scan consider only lines starting with 'a', in 2nd scan with 'b'... and so on. or a variation of scanning method to ensure scanning is uniform.

- ajay.mishra February 06, 2011 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

If the lines are too big, one idea is to hash each line and store it in memory and then in second pass, very that the lines are actually equal.

- Anonymous February 06, 2011 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Or maybe use a prefix tree? When a new line comes in traverse the prefix tree and see if you have the corresponding path in the tree, if not then it's a new line otherwise add the path in the tree.

Compared to hash it will save you some space.

CareerCup

Google Interview Question for Software Engineer / Developers

Books

Videos

Resume Review

Mock Interviews