Flipkart Interview Question
Software Engineer / Developerswell that would depend on the no. of orders :)
if you are getting only orders in order of hundreds then you dont need hadoop cluster to do the required job, a simple implementation of getting the intersection from 3 log files would do.
if log files are really big then map/reduce would help.
Guys you can get a linear time algorithm by using modified merge. Using modified merge, find the common elements of first two arrays and store the result in fourth array. Again repeat the same procedure on third and fourth array. You get the answer.
By modified merge, I mean while merging select the common elements.
void modifiedMerge(int A[], int B[], int size1, int size2)
{
int i = 0;
int j= 0;
while(i<size1 && j<size2)
{
if(A[i]==B[j]) printf("%d ",A[i]);//you can store it in array
else if(A[i] < B[j]) i++;
else j++;
}
}
Note files are in sorted order and are huge, so in memory sort is ruled out.
Let all three files have 3 pointers if the current value of all the pointers is same, then the order is same in all files, now increment the pointers, always compare the pointer values until you reach the pointers in all three files with same value. Note increment the pointer which is lagging behind. This is a linear complexity algo.
This problem can be solved using Map Reduce.
Write three mapper classes which will read each file and retrieve each OrderId.
In the shuffle and Sort phase the data is segregated into block
The segregated blocks are passed to Reduce where Aggregation.
Correct me if any changes need to be done
If the sizes were not huge, we could have merged all 3 arrays into one and then done a linear scan to find numbers repeated thrice.
- Metta September 04, 2010Here, maintain 3 pointers one for each list. If values at all 3 are same, increment commonCount. If not, increment the index of the list with the minimum value out of all 3. Keep doing this till all 3 pointers go out of range. Return commonCount