Interview Question


Country: United States
Interview Type: Written Test




Comment hidden because of low score. Click to expand.
2
of 2 vote

When the file is "large", they usually mean that you can't throw the whole thing into your memory. If you can fit it in your memory, then all you need is a hashmap for both day and compare them carefully.

So what if you can't fit it all into your memory. My solution would be a bucket sort (in a different sense).

You probably would use up some harddisk space to accomplish this. You should put all of the once with last name starting with A into a file, all of them starting with B into a file, etc. Then you have smaller files to deal with and you only need to compare the A files with each other, B files with each other, etc.

Now, if the files are not small enough, you can break them down using the second letter. You need to break it up until both file can fit into your memory with enough memory to do other things too.

Now for each pair of files (yesterday's A and today's A, etc).
You can sort them using your normal O(log(n)) methods, and then compare line by line. And log them into your desire log file (note that there's a chance that the log won't fit into your memory, print them to your hard disk too)

- Mo Lam May 16, 2013 | Flag Reply
Comment hidden because of low score. Click to expand.
1
of 1 vote

Bucket sort the file based on last name (A... - Z...) every day. Now compute the hash for every bucket. Ignore buckets where hashes match. For every bucket where the hashes don't match, compare records using binary search.

- sumeet May 09, 2013 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 votes

Same hash code DOES-NOT necessary mean the buckets are exactly the same.

A simple example:
String a="Aa";
String b="BB";
sysout(a.hasCode());
sysout(b.hasCode());

this produces same hashcode: 2112

There are 2^32 possible hashcodes. Hashcode collision Is possible.

- Prithvi May 09, 2013 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

@Prithvi: he may be intending to use a secure hash of some sort. Collisions would be possible but statistically unlikely.

- Anonymous May 09, 2013 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

Not all hashes have to be 32-bit.

- Anonymous May 09, 2013 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

Two person can have same last name but different first name.

I would treat all the three columns combined together to be unique (or ask the interviewer which columns make the row unique) and use those columns to build the key for the dictionary and value will be rowid or the row itself.

The dictionary's key is concatenation of the unique columns.

- Anonymous May 09, 2013 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

Agreed. I was just pointing out a probability.

- Prithvi May 09, 2013 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 vote

if Sorting is not an option, the only way to check new records is to compare each row in new file with rows in old file.

CollectionOfExcelObject oldSetofPerson;// this contains all objects of all old persons info
CollectionOfExcelObject  newSetofPerson;//this contains all objects of new persons info
foreach(persons in new newSetofPerson){
	if(oldSetofPerson.has(person))
		this is old person. continue the loop.
	else
		this is new person.
}

CollectionOfExcelObject any collection implementation.

- Prithvi May 09, 2013 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 votes

In your program, is it break or continue ?

- temp May 09, 2013 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

thanks for pointing out. I edited.

- Prithvi May 09, 2013 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 vote

From what I could understand, there is no way to sort the data in the given file. I think the below approach would work. I am not sure whether it works for a very large data set.

1. Either sort the entries in the first by email id, or store the email ids using some data structure like HashMap (if no duplicates are found, and where key is the email id and value is first name + last name) or trie (where each valid email id entry contains a list of first name + last name).  
2. For each entry in the second file, compare it using the previous data structure. If info is not found, add the entry. If found, remove the entry from the data structure. At the end of this stage, we obtain all non-matching entries in the new file.
3. The data structure now contains only the entries which were present in the old file, but absent in the new one. Iterate through it and print all entries.

- temp May 09, 2013 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 votes

How would there be no way to sort the data? If you can form a trie, that implies the data can be sorted.

- Anonymous May 09, 2013 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 vote

The name could be same but we can user email column as unique.

- Arvind May 10, 2013 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

There is no need to sort to find out the difference between two sets of items. You can soimply use a hash table to do that. All that remains here is to partuition the files so they fit in memory.

- gen-y-s May 10, 2013 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

Assuming that emailid is unique, if it is not then conider Firstname_LastName_emaiid as a key.
Create tries for previous day.
From current day excel file, for each record check it is present in previous day's tries or not.
If its present delete it from tries, if its not present then add it to new_added list for today.

At the end we would have new_added list and deleted_list (remaining entries in tries)

- jigs May 12, 2013 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

with the start of a new day create a new file as report and when you insert or update record in previous file update the report file too.

By the EOD of will have a report file with all the insert and updates of the day.

Its similar to maintaining a log of operations for each day.

- Anonymous April 28, 2014 | Flag Reply


Add a Comment
Name:

Writing Code? Surround your code with {{{ and }}} to preserve whitespace.

Books

is a comprehensive book on getting a job at a top tech company, while focuses on dev interviews and does this for PMs.

Learn More

Videos

CareerCup's interview videos give you a real-life look at technical interviews. In these unscripted videos, watch how other candidates handle tough questions and how the interviewer thinks about their performance.

Learn More

Resume Review

Most engineers make critical mistakes on their resumes -- we can fix your resume with our custom resume review service. And, we use fellow engineers as our resume reviewers, so you can be sure that we "get" what you're saying.

Learn More

Mock Interviews

Our Mock Interviews will be conducted "in character" just like a real interview, and can focus on whatever topics you want. All our interviewers have worked for Microsoft, Google or Amazon, you know you'll get a true-to-life experience.

Learn More