Google Interview Question
Software Engineer / DevelopersCountry: -
Interview Type: In-Person
Yeah, Eugene, Bitmap is one of solutions in certain conditions. That's why we ask assumptions. Treat the integers as string and apply trie is another a brilliant idea
Bitmap is a good solution, but in this case the 64-bit integers had no lower or upper bounds, making it less feasible. Trie may be a good alternative which had not appeared to me.
there is also a variable-byte coding scheme where an integer
is represented in the following way:
7 least significant bits of each byte are taken from the original
integer, then the highest 8th bit is set to 1 if the number has
further non-zero bytes and is 0 otherwise
in such a way a 32-bit integer '200' would occupy just 2 bytes instead of 4.
This method gives a good compression if numbers represented have different magnitudes and allows us to store variable-length integers
If I am the interviewee, I will ask the question what is the integer? telephone number, IDs? temperature? The questions are intended to find out the following key issues:
1) if the numbers have duplicates
2) what's the lower and upper bound of the number
3) the format of the number, for example, for telephone number you expect to be 10 digit and never starts with 0 or 1 (in us)
If you know enough details, you can go bitmap(no duplicate, known uppbound). Better refers to "Programming Pearl Column 1 "
The problem is to compress the file and also have good search time. The integers had no special property.
I came up with sorting the file followed by differential encoding with offsets in various parts of the file which can be used to perform binary search to identify the chunk where the number is. Then use sequential search to find the number and add up the offsets with base to get the actual value.
The problem is to compress the file and also have good search time. The integers had no special property.
I came up with sorting the file followed by differential encoding with offsets in various parts of the file which can be used to perform binary search to identify the chunk where the number is. Then use sequential search to find the number and add up the offsets with base to get the actual value.
Yes, I think you has a point. When mentioned searching, BST and hashtable are two major method with O(logn) and O(1) search time. But for integers if we can represent it as bitmap, we can search it in O(1) time because it is randomly accessed, right?
here is the bitmap solution: SHIFT=5, MASK=0x1f
void inline set(int i,vector<int> &a){
a[i>>SHIFT]|=(1<<(i&MASK));
}
void inline clr(int i,vector<int> &a){
a[i>>SHIFT]&=~(1<<(i&MASK));
}
int inline test(int i,vector<int> &a){
return a[i>>SHIFT]&(1<<(i&MASK));
}
then
bool Pearl::Bitmap_search(vector<int> &v,int value){
int n=v.size();
int i;
vector<int> a(N,0);
for(i=0;i<=N;i++) clr(i,a);
for(i=0;i<n;i++) set(v[i],a);
return test(v,value);
}
The solution is O(max(N,K)) build time, K is the upper bound. The space is N/32. Search time is O(1)
The offset solution by lyra_vega, is a decent one.
- P November 10, 2011Just thinking, can TRIE help here ? Given, that no of integers are huge, and span a large range. Whenever a number ends, we can append a diffrent kind of node, say, "EndNode", then also tells the number of occurance of that number, in addition to indicating the end of a no, on the path.
For this to be worth, the nos should be large enough. As in, if there are mostly 2-3 digits nos present, then this dosnt make much sense, since space will be taken up by pointers as well.
If however, the no of digits are large, then this makes a worthy soln.