Microsoft Interview Question for Software Engineers


Country: United States




Comment hidden because of low score. Click to expand.
4
of 4 vote

First split the 1 TB data into 1024 blocks and then sort these files/data individually.
Then open all files (1024 of them. just have a pointer to the beginning of the file, not load the entire file). read the first entry from all the files and then put them into a min heap. So this way, we get 1st 1000 integers which are sorted globally. Next again load the heap with next 1000 entries from the file, continue this until all the entries in the file are finished.

- rangacoder February 02, 2016 | Flag Reply
Comment hidden because of low score. Click to expand.
-1
of 1 vote

i think this won't work.
say i have a sorted already 1T data
and according to your way, the first 1024 been sorted out of the first batch is going to ruin the originally sorted 1T list

- bill February 18, 2016 | Flag
Comment hidden because of low score. Click to expand.
2
of 2 vote

external MergeSort

- zr.roman January 29, 2016 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

You just store the difference between the long integers, that way you can save up memory.

You can use bucket sort to sort the array/buffer where you stored the difference then since you know the maximum and the minimum number will be a long integer so the constraints are met and your time complexity would be O(N).

- Anonymous January 29, 2016 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 votes

I really liked this one. But this one has one issue.

Difference of 2 longs can be a long as well.

Long.MAX_VALUE - 0 = Long.Max_VALUE

- RangaCoder February 02, 2016 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 vote

First we could check with the interviewer whether the numbers are in any range. For example if the numbers represent human range. if yes ,
then we need to store numbers between 1 - 130 and counts of each one of them.

Its just about asking right questions.

If the numbers are really distributed and no correlation
we can bring in numbers in memory and use merge sort
That we we bring data 1000 + 500 + 250 + 125 + 63 + .... times

- Ananymouse February 02, 2016 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

Assumptions: Size of integer is 4 bytes

Range of integers is -2Billion (Integer.minValue) to 2Billion (Integer.maxValue)

Space required to hold 4Billion numbers = 4B * 4bytes = 16B bytes = 16GB

But, it is given that we have 1TB of data which are numbers. Obviously, there are a lot of duplicates. We will need to find out, which of them is repeated how many times.
We can create a hash table of all numbers in the integer range (from -2Billion to 2Billion) as key and the count of their occurrence as value. Count is unsigned int.
Size of 1 entry of hash table (4 Bytes for number and 4 Bytes for count) = 8 Bytes
Total number of entries in the hash table = 4Billion
Total size of hash table = 4Billion * 8 Bytes = 32 Billion bytes = 32 GB

Available RAM is1GB= We will need 32 different hash tables to hold 32GB of data. Alternatively, we can use 64 hash tables, each with 500MB size.

Here is the algorithm:
1. Read 500MB of data from the 1TB file
2. Read numbers sequentially from step 1
3. Pick the hash table (1 to 64) depending on where the number read in step 2 fits. Add the entry and increment the count against the number. Save hashtable to disk if you have to pick another hashtable.
4. Repeat steps 2-3 until you exhaust the number
5. Goto step 1 to read the next 500MB worth of data from disk

Complexity of this algorithm is O(n)

Limitation: This will not work of any number(s) is repeated more than 4B times (range of int)

- RD February 02, 2016 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 votes

The question is about long integers which is 8 bytes long

- Anonymous February 03, 2016 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

This requires a lot of copying from memory to disk picking out the right hash table for each number! So even though its O(n), it may be highly inefficient.

- fernando February 06, 2016 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

This will require a lot of read/write from memory to disk, in order to pick out the appropriate hash table for each number. So even though it is O(n), it may be highly inefficient.

- fernando February 06, 2016 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

This will require a lot of read/write from memory to disk, in order to pick out the appropriate hash table for each number. So even though it is O(n), it may be highly inefficient.

- fergow February 06, 2016 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 vote

Divide the array in blocks of 0.5Gb each. Then proceed in a Bubble Sort fashion by sorting adjacent blocks. This is equivalent to the swapping procedure used between scalars in Bubble Sort.

If B is the number of blocks

For i=1 to B
For j=1 to B-1
sort(Block_j merged to Block_{j+1})

The sorting can be performed in memory. The complexity is quadratic in B.

- Simo March 15, 2016 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

Divide the array in blocks of 0.5Gb each. Then proceed in a Bubble Sort fashion by sorting adjacent blocks. This is equivalent to the swapping procedure used between scalars in Bubble Sort.

If B is the number of blocks

For i=1 to B
For j=1 to B-1
sort(Block_j merged to Block_{j+1})

The sorting can be performed in memory. The complexity is quadratic in B.

- Simo March 15, 2016 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

(sorry for the previous posts)

Divide the array in blocks of 0.5Gb each. Then proceed in a Bubble Sort fashion by sorting adjacent blocks. This is equivalent to the swapping procedure used between scalars in Bubble Sort.

If B is the number of blocks

For i=1 to B
For j=1 to B-i
sort(Block_j merged to Block_{j+1})

The sorting can be performed in memory. The complexity is quadratic in B.

- Simo March 15, 2016 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

Consider 1 TB array has size N, and 1 GB array has size n
1. Partition N into block of size n
2. Inplace/using memory Sort all blocks
3. use memory as min heap and take first element of every sorted block
4.heapify and take out first element from min heap push to first position of N array
5. take second element from same array n add to min heap. and repeat (4,5) until we get sorted array

Complexity : (n+1)N * log(n)

- Akshay Dhule October 18, 2016 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

I think we can convert this question to merge K sorted list, right?

- Jason Guo November 09, 2016 | Flag Reply
Comment hidden because of low score. Click to expand.
-1
of 1 vote

I will suggest quicksort in this case
check quick-sort algorithm tutorial

Worst case performance O(n2)
Average case performance O(n log n)
Average space complexity O(log n)

- chovatiyabhavesh January 29, 2016 | Flag Reply


Add a Comment
Name:

Writing Code? Surround your code with {{{ and }}} to preserve whitespace.

Books

is a comprehensive book on getting a job at a top tech company, while focuses on dev interviews and does this for PMs.

Learn More

Videos

CareerCup's interview videos give you a real-life look at technical interviews. In these unscripted videos, watch how other candidates handle tough questions and how the interviewer thinks about their performance.

Learn More

Resume Review

Most engineers make critical mistakes on their resumes -- we can fix your resume with our custom resume review service. And, we use fellow engineers as our resume reviewers, so you can be sure that we "get" what you're saying.

Learn More

Mock Interviews

Our Mock Interviews will be conducted "in character" just like a real interview, and can focus on whatever topics you want. All our interviewers have worked for Microsoft, Google or Amazon, you know you'll get a true-to-life experience.

Learn More