Amazon Interview Question
Software Engineer / DevelopersCountry: United States
Interview Type: Phone Interview
"In most cases, sizeof is a compile-time operator, which means that during compilation sizeof expressions get replaced by constant result-values." - wiki
Read into some buffer, that would reduce number of roundtrips to file on disk, something like:
#define BUFF_SIZE 100
int readFromFile(char * fileName){
int i [BUFF_SIZE];
int readLimit = BUFF_SIZE;
long sum=0;
ifstream file(fileName, ios::in|ios::binary);
if(file.is_open())
{
while(!file.eof()) {
file.read(reinterpret_cast<char*>(i), sizeof(unsigned int) * BUFF_SIZE);
for(int w=0; w<file.gcount()*sizeof(char)/sizeof(unsigned int); i++)
sum += i;
}
}
file.close();
return sum;
}
A major cost in reading the file comes from the time to read the disk. To optimize this, one can prefetch blocks ahead of using them. The 4K buffer read is a kind of prefetch, but I would prefetch more and also before computing the sum.
A second possibility is to take advantage of concurrency. We can have multiple threads so that the sum computation can happen in parallel to the io read. We can have multiple threads for io and computation, if there is a possibility of parallelism on the box.
A third possibility is: do we need the "i=0" statement? Isn't i overwritten anyways? This may be a language aspect that I am not familiar with.
Without buffering : inefficient code
try{
File f = new File("myFile.txt");
FileInputStream fis = new FileInputStream(f); int count = 0;
int b = 0;
while((b = fis.read()) != -1){
if(b== '\n') {
count ;
}
}
// fis should be closed in a finally block.
fis.close() ;
}
catch(IOException io){}
With Buffering: yields better performance
try{
File f = new File("myFile.txt");
FileInputStream fis = new FileInputStream(f);
BufferedInputStream bis = new BufferedInputStream(fis); int count = 0;
int b = 0 ;
while((b = bis.read()) != -1){
if(b== '\n') {
count ;
}
}
//bis should be closed in a finally block.
bis.close() ;
}
catch(IOException io){}
I have two inputs here for optimization
1) sizeof(unsigned int) , is the culprit, assign the value to an integer and use it.
don't call function in a loop.
2) instead of reinterpret_cast<char*> , if you know that all data read is going to be char*.. just use an explicit cast as (char*)&i.. This saves lot of time as well.
If it's in binary (as opposed to ASCII) and the entire file is nothing BUT integers, I think this could work - we're using a vector in particular since once we get the size of the file, we know exactly what size to make it and only need one allocation (constant resizing for a vector is terrible, so we want to do this without push_back)
long sum=0;
ifstream file("binary.dat", ios::in|ios::binary);
if(file.is_open())
{
file.seekg(file.end, 0);
int numIntegers = file.tellg() / sizeof(int);
file.seekg(file.beg, 0);
vector<int> intVec;
intVec.resize(numIntegers);
file.read(reinterpret_cast<char*>(intVec.data()), numIntegers * sizeof(int)); //now that vector is big enough and we know the file size in advance after using tellg, just read everything in one read call
for(int i = 0; i < intVec.size(); ++i)
sum += intVec[i];
}
file.close();
Take the sizeof(int) operator outside the while loop and put into a varibale. this will improve the performance as it will not calculate size of int each time :).
- Santosh April 09, 2012---Comment if I am wrong