Vdopia Interview Question


Country: United States
Interview Type: Phone Interview




Comment hidden because of low score. Click to expand.
4
of 4 vote

Divide the file in 3 parts(3GB+3GB+4GB,(each part must be less than 4GB )) and sort each chunk using any O(nlogn) sorting algo(as it is less than 4GB we can apply any in-place sorting algo). once each chunk of file sorted, bring 1GB of each chunk in memory, so it'll occupy 3GB(1+1+1), now sort this 3GB data(by 3-way merge sort),select a number,add that in remaining 1GB, and while selecting this number, bring a number from corresponding chunk to replace it, finally write sorted 1GB to secondary storage.

- mk13 May 25, 2012 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 votes

This is exactly what I replied, I modified merge sort to be done in place.

- Achilles May 25, 2012 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

@mk13: I didn't understand the second part of your answer

"select a number,add that in remaining 1GB, and while selecting this number, bring a number from corresponding chunk to replace it, finally write sorted 1GB to secondary storage."

Can you explain it more clearly?

- Kiran Kumar May 27, 2012 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

@kiran kumar: let say we divide file in three chunk A,B,C and after sorting individully, content of these parts is like: A{12,18,20,25,33,35,37},B{8,13,14,40,41,45,47},C{2,15,50,70,72,75,78}.
Now suppose in memory, we have {12,18,20,25} {8,13,14,40} {2,15,50,70} respectively from A,B,C. now we'll select 2 from C's part as its minimum, so put this in remaining 1GB and replace 2 in memory by an element of chunk C. i.e. insert 72 in C's part in memory.. next replace 8 by 41 and so on.. we are maintaining a min heap.. i think its clear now

- mk13 May 29, 2012 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

@mk13: Thank you. I got it :)

- Kiran Kumar May 29, 2012 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

Suppose we can use the whole 4G memory to save our data, and the integer is 32bit, actually we can use bitmap to count the numbers of each integer in the 10G or larger files.

- Anonymous June 11, 2012 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

yes, we can use bitmap technique also.. we should elaborate both method to interviewer.

- mk13 June 11, 2012 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

I did not understand the bitmap technique. Can you please elaborate. What happens when a number exists more than one time? Should the numbers be written back and the bit map re-initialized?

- Achilles June 11, 2012 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

in bitmap method, we'll take a byte array where each bit of byte will represent unique number.. e.g. byte_[n], so this byte_[n] will represent the range  {8*(n-1)+1to 8*(n-1)+8}, and so on. To allocate a bit to number x, simply divide x by 8 and get quotient (q) and reminder(r) and make r'th bit of byte_[q] ON.
this is how bitmap works.
now your point is correct as what happen when number repeats, since here in this file sorting, we need to retain all numbers, bitmap won't work. if it is given that many no repeats n times, then we can take n different bitmaps.. but by that many bit may be waste.. we can't re-initialize bitmap as it'll lost some other no... to get over this no, we can take any other small array if only few number repeats and for repeating no, we can update that array..

if you have any idea, plz put that here..

- mk13 June 11, 2012 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

For reference to the bitmap sort, see Jon Bentley's book "Programming Pearls". It is the very first topic in the first chapter:)

It is not clear whether the resulted file is required to store the duplicate values or not. Bitmap works well if we are only required to store the distinct values. If not, the worst case scenario is when the whole file contains only one value. Given that the file size is 10GB, the occurence count still can't possibly get above 2^32. To handle this situation, we'd partition the search over smaller ranges, and use an array of 32-bit integers to store the occurence count for each number in range.

- Florian June 26, 2012 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 vote

Which company asked you this question?

- Gayle L McDowell May 25, 2012 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 votes

Vdopia

- Achilles May 25, 2012 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

Thanks. I added that information back into the question.

Out of curiosity / so that I can improve the UI, why didn't you list the company when you added the questions? (There's a drop down to provide the company name.) Just trying to figure out what's happening when people don't list this.

- Gayle L McDowell May 25, 2012 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

I don't think many people know of the company and hence just added the question, which according to me is more important than knowing when and where was it asked, unless it is from a very well known company.

- Achilles May 25, 2012 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

I see... okay. So was it too much hassle to add the company then? I mean, I agree that the question is more important than when / where, but obviously it could only be a good thing to add the company name...

- Gayle L McDowell May 25, 2012 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

will try to take care of that and add the name instead of skipping in the future....it isn't too much of a hassle but shear laziness from my side...:)

- Achilles May 25, 2012 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

Hi Achilles,
actually i have scheduled a interview with vdopia, i am interested to get in touch with you to know about the type of question they asked , if you have no problem, can we do chat in your free time.
Thanks in advance...!!

- Gupta June 20, 2012 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 vote

Actually this question seems less puzzling when we say 10GB and 4 GB memories... What if we say 100 GB of data and 2 GB of RAM??

Well, the Answer is less puzzling for all of this.. Its all External Sorting...

Using the Same Idea, we can achieve it ..

- hprem991 May 25, 2012 | Flag Reply


Add a Comment
Name:

Writing Code? Surround your code with {{{ and }}} to preserve whitespace.

Books

is a comprehensive book on getting a job at a top tech company, while focuses on dev interviews and does this for PMs.

Learn More

Videos

CareerCup's interview videos give you a real-life look at technical interviews. In these unscripted videos, watch how other candidates handle tough questions and how the interviewer thinks about their performance.

Learn More

Resume Review

Most engineers make critical mistakes on their resumes -- we can fix your resume with our custom resume review service. And, we use fellow engineers as our resume reviewers, so you can be sure that we "get" what you're saying.

Learn More

Mock Interviews

Our Mock Interviews will be conducted "in character" just like a real interview, and can focus on whatever topics you want. All our interviewers have worked for Microsoft, Google or Amazon, you know you'll get a true-to-life experience.

Learn More