Given 1 GB memory, input a fil

Microsoft Interview Question for Software Engineer / Developers

0

of 0 votes

35
Answers
Given 1 GB memory, input a file which contians 4 billion integers, output one integer that is not in the file. What if you have only 10 MB memory?
- AK47 November 08, 2007 | Report Duplicate | Flag | PURGE
Microsoft Software Engineer / Developer Coding Algorithm Math & Computation Data Structures

Email me when people comment.

An error occurred in subscribing you.

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 0 vote

The number of distinct integers can be 2^32, which is 4billion. Since 1GB(8billion bits) memory is given, we can declare
bit map array of size 4billion and initialize it to zero. Scan through the file. As and when you get an integer, set the bit to 1. If you encounter the same number again, check the bitmap's value is set. If it is set to 1, don't change.

Once you finish reading the file, scan the bit map. If any bit is not set, it is the number that is missing from the 4billion integer list.

- Praveen April 06, 2008 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

what if memory is 10MB?

- missing answer for part of the question May 30, 2008 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Memory is 1GB
The number of distinct integers can be 2^32, which is 4billion.
Since 1GB(8billion bits) memory is given, we can declare bit map array of size 4billion and initialize it to zero. Scan through the file. As and when you get an integer, set the bit to 1. If you encounter the same number again, check the bitmap's value is set. If it is set to 1, don't change.

Once you finish reading the file, scan the bit map. If any bit is not set, it is the number that is missing from the 4billion integer list.

memory is 10MB
The number of distinct integers can be 2^32, which is 4billion.

Step 1: Check 1st 80million integers
Since 10MB(80 million bits) memory is given, we can declare bit map array of size 80 million and initialize it to zero. Scan through the file. As and when you get an integer BETWEEN 1-80 million, set the bit to 1. If you encounter the same number again, check the bitmap's value is set. If it is set to 1, don't change.
Once you finish reading the file, scan the bit map. If any bit is not set, it is the number that is missing from the 80million integer list.

Step 2: Check 2nd 80million integers
Since 10MB(80 million bits) memory is given, we can declare bit map array of size 80 million and initialize it to zero. Scan through the file. As and when you get an integer BETWEEN (80million+1) -160 million, set the bit to 1. If you encounter the same number again, check the bitmap's value is set. If it is set to 1, don't change.
Once you finish reading the file, scan the bit map. If any bit is not set, it is the (number+80million) that is missing from the integer list.
Step3: repeat above step until u cover the complete integer list.
Total steps = 4billion\80million = 5

Same logic is applicable to all memory constraints.

- Deepak Agarwal December 06, 2008 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Why not just scan through all the numbers looking for maximum. Return (1+max).

The question should have been worded better. It probably meant to ask this...

"One of the integer from 0 to 2^32 - 1 is missing and another number from the same range is repeated. Find the misisng number".

- DeRanged February 01, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

@Deepak 1billion/80 million = 5 ?

- mike March 10, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

- for first case. keep a 32e9 long bit vector. set bits corresponding bits as you find elements(basically some sort of directly mapped hashing). in the end traverse through the bit vector to find the first off bit.

- for second case, the max bit vector you can have is 32e7. first traverse the entire 4 billion integer set and set bits for 0-32e7. traverse this bit vector till you find the first off bit. if you dont find the off bit, traverse the entire 4 billion integer set and set bits for 32e7-64e7. repeat the process, till you find an off bit or reach the 4 billion limit on bit vector.

i think this should work. can anyone suggest a better solution?

- joe di maggio November 08, 2007 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

How about this one?

1. With 1 GB memory: bit mask solution seems fine but the check at the end while looking for off bit, can be made faster by looking at 32-bit values rather than individual bits.
2. With 10 MB memory: create a hash table with 2 ^ 21 entries each of size 32 bits, i.e. 8 MB memory. Then hash each integer as following: find the hash bucket using the MSB 21 bits of the number. And keep a sum of LSB 11 bits of all 2048 (2^32/2^22) integers in that range in the bucket. After hashing is done, Now iterate thru' all buckets and check if any of those has a sum less than (2048*2049)/2 (let this constant value be x). If yes, the missing number's MSB 21 bits come from the bucket number and LSB 11 bits come from the difference (x-sum of that bucket).

- Aman December 31, 2007 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Nice solution for 10 MB memory
However, i doubt if you can deal with the case when the missing number has 11 bits LSB all 0s.

- Aaron February 26, 2008 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Aaron, you absolutely correct observation contains the answer to this problem:

just peek up all numbers with zero 11 LSB bits and repeat the whole procedure considering only 22 MSB bits ;)

10Mb is enough room for 2^20*8 = 2^23 bits, so you don't need to use buckets in the 2nd run

- asm November 24, 2008 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

The slution by Aman is good bu it has one more problem, what if a single bucket has multiple values missing ?
We need to find the first missing integer. This would in worst case require 2 runs over the complete sample set.

I think the 1 GB solution is correct while for 10MB, the solution is to search the first 2^22 integers in the sample set and store the integers not found in a map which can easily be made in rest of 2 MB. The complete sample set has to be iterated for the smallest integer in the map. This has a O(n) complexity with a (comparitively) small space overhead Can this be further optimized?

- Rahul August 23, 2009 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

will merge-sort do it? instead of bringing in all 4 billion integers, just bring in a chunk of it, sort it, kick it out, then bring it another chunk, sort it again, and so on. When combining, merge them a few chunks at a time as long as it fits the memory. Opinion?

- chao November 08, 2007 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

External Merge sort will work. If 1000 sorted files have been created, merge them using a heap (priority queue) of size 1000 and a buffer of size = available memory. Keep flushing the buffer to the merged results file until the duplicate is found.

Bit vector idea works as well - stop as soon you are setting a bit thats already set. However, the number of passes through the 4 billion integer file can be thousands, thereby slowing down things.

- Merger November 09, 2007 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

your solution finds a duplicate but not the missing element as asked in the question.
i think joe di maggio's solution would take about 13 passes in the worst 4e9/32e7 ~ 13 passes though the 4 billion integer file in the worst case and not thousands.

- santiago solari November 09, 2007 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Would this work ?
First case: We have 2 GB of memory. Create 2 files .. Go through the 4 billion numbers and put all numbers starting with Zero in one file and all numbers starting with one in the other. Check which file is smaller. Do the same thing over and over .. you finally arrive at the solution ...compexity log n
Second case: The bot vector thing could be used ..

- rathi November 25, 2007 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

I am sorry, but what do you mean by 32e9 long bit?
Why are you using e? Sorry for the basic question.

- Nick December 30, 2007 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

I did not understand how this number 32e9 was calculated. Assuming 4 byte integer, there are 2^32 possible unsigned integers. That comes to 4294967296. which is 4.3 billion approx. I was under the impression that we keep a bit vector of 4.3 * (10^9) bits and mark the bit one as we see that integer. Could some one throw light on the 32e9 number.. (Sorry if the question is very basic.)

- Nikhil Agashe January 09, 2008 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Keep intervals. If u see numbers like 1,2,3,4 (not necessary in any order) just remember the max and min element of the interval. Every time you see a new number check if u can use it to increase the size of an interval. If the number does increase the size of any interval check it there is a change of merging that interval with adjacent interval for e.g. Interval 1 is 110 to 250 and interval 2 is 252 to 572 and if u find the number 251 we can merge the two intervals to form a single interval of 110 to 572. For every interval we save only the min and max value of the interval. If at the end of reading through all the numbers we have more than one interval then the numbers between the intervals are missing in the data given

- Swapnil D. February 03, 2008 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

I think some details are missing in the original question. The input contains numbers from 1 to 4 billion in random order, no duplicates and 1 missing number. Find the missing number.

Answer :
Calculate n(n+1)/2 where n is 4 billion. Calculate sum of all the numbers in the file.

Missing number = n(n+1)/2 - Array Sum

- Snakebyte February 07, 2008 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Right solution Snakebyte... this is the way to do it.

- Anonymous March 02, 2008 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

n(n+1) = 18446744073709551616

Will that fit in a Float?

- DrFunFrock March 12, 2008 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Don't u worry about there will be a overflow?
Besides, it's not necessary sequential numbers

- J May 03, 2008 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

Can't you take max integer and add one? Or min integer and subtract one? That would give you a value not in the original list.

- vanmeistro March 16, 2008 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

One byte should be 8 bits.

So 1GB contains 8 billion bits.

- Lindos March 22, 2008 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

1GB - almost one million
4 billion integers is almost 8 times 10^9
how can you input a file from the disk to the memory?
forget about 10MB

- what the heck May 30, 2008 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

Allocate 32 signed long variables for taking a count of 0 and 1. Total memory taken so far is 8*32 = 256 bytes. Now input the numbers one by one and scan the bit pattern of the number. Highest MSB corresponds to the integer 32 and LSB corresponds to integer 1. Now if the MSB is 1 increment the value of integer 32. If it is 0 decrement it by 1. Do it for all the bits. After you finish scanning all the numbers in the file, go through integers 32 to 1 and that will give you the bit pattern of the number missing in the file. This solution is based on the assumption that the total number of numbers is a power of 2 and the fact that when we write bit pattern for all the numbers all the column in the bit pattern will have the same number of 0s and 1s.

Please correct me if I am doing something wrong above.

- gauravk.18 April 05, 2008 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Smart solution

- GA January 20, 2009 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Smart solution

- GA January 20, 2009 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

It can't be more wrong. can you give an example ?

- Anonymous October 02, 2010 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

hmm

- Mohan June 10, 2008 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

- Deepak Agarwal December 06, 2008 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

great solution except the part that it will need 50 iterations.
4billion = 4x10^9
80million= 8x10^7

so 4billion/80million = 4x10^9/8x10^7 = 0.5x10^2 = 50.
correct me if i am wrong.

- patel sapan October 15, 2010 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

I really don't think we need so much space in calculation!
This method consumes several Bytes' space:
keep a marker integer, iterate through all the integers one by one, set the marker integer to be the largest(or the smallest) value ever meet. After all 4 billion integers are iterated, marker +　１(or marker - 1) is one of the missing integers.

If this works, what is the purpose of the memory limitation in the question?

- Liao January 07, 2009 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

I like that solution. That's smart. ;-)
Perhaps the question is to find a "gap" between two numbers that exist in the file.

- Steve July 03, 2011 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

What if the max number is at the max-limit an integer variable can hold and the minimum is at the lowest an integer can hold?

- PsychoNaut April 06, 2009 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

here the best solution in my mind (i found it here: stackoverflow.com/questions/7153659/find-an-integer-not-among-four-billion-given-ones )

A detailed discussion on this problem has been discussed in Jon Bentley "Column 1. Cracking the Oyster" Programming Pearls Addison-Wesley pp.3-10

Bentley discusses several approaches, including external sort, Merge Sort using several external files etc., But the best method Bentley suggests is a single pass algorithm using bit fields, which he humorously calls "Wonder Sort" :) Coming to the problem, 4 billion numbers can be represented in :

4 billion bits = (4000000000 / 8) bytes = about 0.466 GB
The code to implement the bitset is simple: (taken from solutions page )

#define BITSPERWORD 32
#define SHIFT 5
#define MASK 0x1F
#define N 10000000
int a[1 + N/BITSPERWORD];

void set(int i) { a[i>>SHIFT] |= (1<<(i & MASK)); }
void clr(int i) { a[i>>SHIFT] &= ~(1<<(i & MASK)); }
int test(int i){ return a[i>>SHIFT] & (1<<(i & MASK)); }
Bentley's algorithm makes a single pass over the file, setting the appropriate bit in the array and then examines this array using test macro above to find the missing number.

If the available memory is less than 0.466 GB, Bentley suggests a k-pass algorithm, which divides the input into ranges depending on available memory. To take a very simple example, if only 1 byte (i.e memory to handle 8 numbers ) was available and the range was from 0 to 31, we divide this into ranges of 0 to 7, 8-15, 16-22 and so on and handle this range in each of 32/8 = 4 passes.

- Gerald December 07, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

The first part of the problem can be solved as explained in the above comments (have 1 bit assigned for each integer in the range 0-4billion). This is well below the 1GB memory limit.

When memory = 10MB.
In the previous case, we could store 1 number per bit, but here we have to store 400 numbers per bit. (4*10^9) / (10*10^6). This is not possible to do unless we iterate through the list twice.
In the first iteration we find the number of integers that are present in a particular range ie. 1-1000, 1001-2000, 2001-3000...etc.
Let's create an integer array that will help us count the number of integers in the ranges for this question.
Since it is an integer array and we need 4 bytes per array element, we can have a max of 2^21 elements in the array (10MB ~ 2^23 bytes, 4B =2^2 bytes).
Therefore we now need to map 2^31 (4 billion) integers to 2^21 elements.
Now this was the array size calculation for our array. We increase the count of the particular index of the array which maps to a particular range in the 4 billion numbers. Once we iterate through the entire list of 4 billion numbers we check for the array index whose value is less than the number of integers mapped to an element (in this case 2^21). We then perform bitwise setting/resetting for that range of numbers.

Source: Cracking the Coding Interview by Gayle Laakmann McDowell (6th edition)

- Navneet Arakony May 02, 2016 | Flag Reply

CareerCup

Microsoft Interview Question for Software Engineer / Developers

Books

Videos

Resume Review

Mock Interviews