Google Interview Question
Software Engineer / Developers@ v and Ajay...can u explain more how would u do that?like what are the key and value pairs here?and wat is de hash function?
I m sorry but still dont get it. Lets say u hv n lines
Now ur to print kth line and already printed p lines. So u need to string compare this kth line with all the p lines u printed to know if its a duplicate or not...isn't it?or can u elaborate ur method a lil more?
With hash, it doesn't look as simple. One thing can done is:
Have some identifier which should imply duplicate (This identifier should not be a line in file) then
1.Read lines in file and store it in hash IF line is not already in hash. If line already there, put DUPLICATE identifier in hash.
2.Once complete file is read, scan the hash and print values (lines) which are not DUPLICATE identifier.
But this doesn't seem to be a good soln due to following:
1. DUPLICATE identifier may be a line in file
2. Size of file, If too big, memory issue. External Hashing ???
At high level, sort file data then print unique lines.
- Anurag Singh February 03, 2011Sorting is the real problem here.Since it is a file which can be too big and memory overflow may occur if we put complete file in main memory and try to sort it.
I believe we need to use external R way merge sorting here. The way unix sort works.
Look like "sort -u" unix implementation would be the right solution here.