Amazon Interview Question
Senior Software Development EngineersCountry: United States
1 pass - counts for suffixes mod 10000. This will fill the 10 K elements with counts
Look for all less than 10^5 counts. Store suffixes for them in one part of 10K array.
For each, do a pass and utilize other part of 10K for bitmap
There should be plenty for bitmap cause we need 10^5 bits but we have way more than 10 bits per element
Total complexity is O ( n m ) where m is number of misses.
Do you think taking 10k Long number and setting bits every time would work ?
1billion/10k*64bits(unsigned long bits) ~=1562 times algorithm would have to run and we can get the missing numbers.
Just look for numbers 1-64000 and set the bits accordingly in 1-10000 numbers.
keep on increasing range.
Not sure though if this is best solution, something on top of my head.
I am approaching it with file sort. This is a storage expensive in that it would create another file of the same type. The idea is to get a sorted list and and find the numbers afterwards. it has complexity of n^2.
There are ways to make it better with usage of the chunk hint given... at least it would be a quick and dirty solution :)
Another approach could be using linux to sort the file and utilize that, in that case the complexity would reduce to nlogn assuming the internal sort by linux would be nlogn
package algorithm;
import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
import java.io.PrintWriter;
import java.io.RandomAccessFile;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Random;
/**
* Assumption: the numbers are read from file and they are sequential
*
*
*/
public class MissingNumbers {
//source file
private static String FILE_PATH = "/tmp/random-thousands.txt";
//final sorted file
private static String FILE_SORTED_PATH = "/tmp/sorted-thousands.txt";
public static void main(String str[]) {
PrintWriter writer = null;
try {
MissingNumbers missingNumbers = new MissingNumbers();
writer = new PrintWriter(FILE_SORTED_PATH);
writer.write("");
writer.close();
writer = new PrintWriter(FILE_PATH);
writer.write("");
missingNumbers.createRecords();
missingNumbers.createSortedFileList();
List<Integer> numbersLost = missingNumbers.findMissing();
for (Integer numbers:numbersLost) {
System.out.println(numbers);
}
} catch (Exception ex) {
System.out.println(ex.getMessage());
}
writer.close();
}
/**
* find the missing numbers from the billion numbers.
*
* The file would be provided as random populated file. First the file would be
* sorted on the file basis and then it would be read from top to bottom and checked for
* missing number.
* @return
*/
public void createSortedFileList() {
try {
FileReader randomReader = new FileReader(FILE_PATH);
BufferedReader bufferedRandomReader = new BufferedReader(randomReader);
RandomAccessFile sortedAccess = new RandomAccessFile(FILE_SORTED_PATH, "rw");
String currentRandom = null;
String currentSorted = null;
Long lastPointer = null;
while ((currentRandom = bufferedRandomReader.readLine()) != null) {
int currentSortedInt = 0;
//set the file pointer
lastPointer = sortedAccess.length();
sortedAccess.seek(lastPointer);
/*
* Write the numbers in such a way that the length of the written numbers
* is in equal size to make the calculation much easier since we are readin
* by size of the numbers.
*/
sortedAccess.writeBytes(String.format("%04d\n", Integer.parseInt(currentRandom))); //append it at the end
lastPointer = sortedAccess.length()-("0000\n".length());
sortedAccess.seek(lastPointer);
while ((currentSorted = sortedAccess.readLine()) != null && lastPointer >= 5) {
currentSortedInt = Integer.parseInt(currentSorted);
lastPointer -= "0000\n".length();
sortedAccess.seek(lastPointer);
String rawUpper = sortedAccess.readLine();
int upperSorted = Integer.parseInt(rawUpper);
if (currentSortedInt < upperSorted) {
sortedAccess.seek(lastPointer);
sortedAccess.writeBytes(currentSorted+"\n");
lastPointer += "0000\n".length();
sortedAccess.seek(lastPointer);
sortedAccess.writeBytes(rawUpper+"\n");
lastPointer -= "0000\n".length();
sortedAccess.seek(lastPointer);
} else {
break;
}
}
}
//release the resource
bufferedRandomReader.close();
sortedAccess.close();
} catch (FileNotFoundException foe) {
System.out.println(foe.getMessage());
} catch (IOException io) {
System.out.println(io.getMessage());
} catch (Exception ex) {
System.out.println(ex.getMessage());
}
}
/**
* At this level there is a file on FILE_SORTED_PATH which is sorted in ascending.
* go through the file and search for the missing ones. The missing ones can be
* found when the continuity is broken
* @return
*/
public List<Integer> findMissing() {
List<Integer> missingNumbers = null;
try {
FileReader fileReader = new FileReader(FILE_SORTED_PATH);
BufferedReader bufferedReader = new BufferedReader(fileReader);
missingNumbers = new ArrayList<Integer>();
String line = null;
int prevNumber = Integer.parseInt(bufferedReader.readLine());
while ((line = bufferedReader.readLine()) != null) {
int currentNumber = Integer.parseInt(line);
if (currentNumber - prevNumber != 1) {
missingNumbers.add(currentNumber-1);//the missing number is less by 1
}
prevNumber = currentNumber;
}
fileReader.close();
} catch (IOException ioe) {
//log it
}
return missingNumbers;
}
/**
* Creates 1000 that are random and put those in /tmp/random-thousands.txt
*/
public void createRecords() {
//get 20 random numbers that would be considered as missing
Map<Integer, Integer> randomHolder = new HashMap<Integer, Integer>();
Random random = new Random();
while (randomHolder.size() <= 20) {
int rand = random.nextInt(1000)+1;
if (!randomHolder.containsKey(rand)) {
randomHolder.put(rand, rand);
}
}
//now we have random numbers get all the numbers
int[] temp = new int[980];//make it to hold 980 elements, assume 20 is missing
//fill the numbers
int size = 1;
int index = 0;
while (index < 980) {
if (!randomHolder.containsKey(size)) {
temp[index++] = size;
}
size++;
}
//finally shuffle the numbers.
for (int i=0; i<980 ; i++) {
int rand = random.nextInt(980);
int holder = temp[i];
temp[i] = temp[rand];
temp[rand] = holder;
}
//finally write those to the file
try (FileWriter fileWriter = new FileWriter(FILE_PATH, true);) {
for (int i=0;i<980;i++) {
fileWriter.append(String.format("%s\n", temp[i]));
}
} catch (IOException io) {
System.out.println(io.getMessage());
}
}
}
assuming the numbers are unsorted.
first we'll use the 10k space as a sort of hash table (HT).
the hash function will be a simple mod 10000.
for each number n in the list increment the appropriate cell in HT: HT[n%10000]++;
after going over the whole list, save the hash table cells that have less than 10000 in the solution array (i'm assuming we get an additional array or list for the solution).
then for each of those numbers N zero the 10K array, make another pass on the 10B list and save all the numbers where modulo 10000 equals N.
we need O(N) passes on the list where N is the number of missing numbers.
The idea to keep divide the 1 billion number into smaller range with missing numbers until the range is not greater than the given memory. Then go through the number stream to find the missing numbers in each range. Combine missing numbers of all ranges to form the result. The detail can be found: cpluspluslearning-petert.blogspot.co.uk/2016/10/data-structure-and-algorithm-detect.html
A loop and recursive solutions are implemented. The test is based on small range due to the time taken to complete.
Test
void Test_DetectMissingNumbers()
{
NumberGenerator ns(1000000, 100);
NumberGenerator::MissingNumberCollection results;
results = DetectMissingNumber(ns, 10000);
assert(results == ns.GetMissingCollection());
results = DetectMissingNumber(ns, 100000);
assert(results == ns.GetMissingCollection());
results = DetectMissingNumber(ns, 1000000);
assert(results == ns.GetMissingCollection());
}
void Test_DetectMissingNumbers_R()
{
NumberGenerator ns(1000000, 100);
NumberGenerator::MissingNumberCollection results;
results = DetectMissingNumber_R(ns, 10000);
assert(results == ns.GetMissingCollection());
results = DetectMissingNumber_R(ns, 100000);
assert(results == ns.GetMissingCollection());
results = DetectMissingNumber_R(ns, 1000000);
assert(results == ns.GetMissingCollection());
}
void Test_DetectMissingNumbers_Div()
{
NumberGenerator ns(1000000, 100);
NumberGenerator::MissingNumberCollection results;
results = DetectMissingNumber_Div(ns, 10000);
assert(results == ns.GetMissingCollection());
results = DetectMissingNumber_Div(ns, 100000);
assert(results == ns.GetMissingCollection());
results = DetectMissingNumber_Div(ns, 1000000);
assert(results == ns.GetMissingCollection());
}
The idea is to keep divide the large range into smaller range until the range is not greater than the given mem. Then go through each smaller ranges to find out missing number. Combine the missing numbers of all ranges to form the result. Details: cpluspluslearning-petert.blogspot.co.uk/2016/10/data-structure-and-algorithm-detect.html
A loop and recursive solutions are implemented. And the test is based on the range of 1 million numbers.
Test
void Test_DetectMissingNumbers()
{
NumberGenerator ns(1000000, 100);
NumberGenerator::MissingNumberCollection results;
results = DetectMissingNumber(ns, 10000);
assert(results == ns.GetMissingCollection());
results = DetectMissingNumber(ns, 100000);
assert(results == ns.GetMissingCollection());
results = DetectMissingNumber(ns, 1000000);
assert(results == ns.GetMissingCollection());
}
void Test_DetectMissingNumbers_R()
{
NumberGenerator ns(1000000, 100);
NumberGenerator::MissingNumberCollection results;
results = DetectMissingNumber_R(ns, 10000);
assert(results == ns.GetMissingCollection());
results = DetectMissingNumber_R(ns, 100000);
assert(results == ns.GetMissingCollection());
results = DetectMissingNumber_R(ns, 1000000);
assert(results == ns.GetMissingCollection());
}
void Test_DetectMissingNumbers_Div()
{
NumberGenerator ns(1000000, 100);
NumberGenerator::MissingNumberCollection results;
results = DetectMissingNumber_Div(ns, 10000);
assert(results == ns.GetMissingCollection());
results = DetectMissingNumber_Div(ns, 100000);
assert(results == ns.GetMissingCollection());
results = DetectMissingNumber_Div(ns, 1000000);
assert(results == ns.GetMissingCollection());
}
Agreed with Siva's answer except one case - Missing numbers of one chunk can be present in other chunk. So you have to put a check to verify it.
I will divide memory in 2 parts 9000 + 1000 (As mentioned few hundreds , and i am assuming it is not more than 1000 missing numbers)
First 9k
1- Take chunk with 9k , sort it
2- Traverse it and create a hashMap with missing number as key
For the remaining numbers
3- Clean the 9k memory and take another chunk if numbers are available
4- Push the chunk in memory and verify the hashMap with each number, If number is in hashMap then delete the entry from hashMap
5- Sort it and push missing number in the hashMap as key
After final iteration the hashMap keys will be the missing numbers.
Overall time complexity will be n + nlogn
Thanks
Divide array into 10k chunks, sort internally. and merge using external sorting. Then traverse and find missing elements in O(n). Although internal sorting takes O(nLogn)
- Siva October 08, 2016