tarutrehan
BAN USERIf there is a possibility of overflow then we must do this step. It's correct to avoid overflow.
- tarutrehan June 12, 2012The difference is in the way we store the information. In my solution you are storing the information in a tree like data structure and representing that tree as an array in memory. So if suppose a person is in office for 5 hours from 09:00:00 to 14:00:00 then in My solution you will need to update only 5 hour node values where as in @Manan solution you will need to update 5*60*60 = 18000 node values for every second in that time duration. So the insertion is almost constant time in my solution.
- tarutrehan June 08, 2012This will be optimized using the following logic. The operation num%8 will have a value greater than 0 only if the least significant 3 bits of a number are set. This is because of the fact that least significant 3 bits represent 2^0=1, 2^1=2 and 2^2=4. Any other bit that is set will be a multiple of 8 (8,16,32...) and will be useless for num%8 operation as the result will be always 0.
So we can just add the least significant 3 bits to the sum.
sum = 0;
while (!EOF)
{
sum += num&0x7;
}
if (sum & 0x07)
// not divisible by 8
else
// divisible by 8.
So basically we have removed the % operator and substituted it with the much faster bitwise operator &. Also since we are using only the lower order 3 bits for sum calculations the possibility of overflow is also less (until it's told that we are dealing with thousands of numbers in each file)
You are correct. it's a copy paste error :). what you are saying is exactly what should happen here.
- tarutrehan June 07, 2012One solution that I can think of is to keep 24 nodes (1 for each hour). each of these 24 nodes will keep a list of 60 nodes each as children (1 node for each minute in an hour). and each of the minute nodes will again keep 60 nodes under it (1 for each second in a minute). So the internal representation will have 3 levels one each for hours, minutes and seconds. Each of the 24*60*60 nodes will have an counter variable indicating how many persons are there in office during that time.
Insertion:
========
The technique here is that we will increment the counter of the hour node only if the person is in office for the complete hour. Similarly we will increment the minute node of a particular hour only if the person is in office for the complete minute during that hour. Similarly we will increment the second counter of a minute only if the person is not inside for the complete minute but for few seconds within that minute.
.
Example: 09:55:50 - 15:05:05
in this case the data structure is updated as follows:
9-10: we will not increment the counter for hour 9 as we are not inside office for the complete hour. We will increment the 56, 57, 58, 59 and 60 minutes counter under hour 9. Similarly we will increment seconds counter for 50-60 under minutes 55. No other counter needs to be updated for this hour.
10-11 We will increment hour 10 node only as we are inside for the whole hour.
11-12 We will increment hour 10 node only as we are inside for the whole hour.
12-13 We will increment hour 10 node only as we are inside for the whole hour.
13-14 We will increment hour 10 node only as we are inside for the whole hour.
14-15 We will increment hour 10 node only as we are inside for the whole hour.
15-16 Again we are not inside for the whole hour so we will not increment the hour node but minutes and seconds nodes only as in the case of 9-10.
So using this technique we will at max need to change a constant number of nodes (22hour nodes + 59 minutes node and 59 seconds node) in the worst case for an employee which is not bad considering that we need to make the data structure only once and it makes the search operation constant time.
Search
=======
Now during searching if we want to search the number of employees at say 10:20:30 then we will go to node 10 of first level and get the value of the counter there. It will represent all the employees who were inside office for the whole hour from 10 to 11. Next we go to minute node 20 of second level which indicate the number of employees who were present for the whole minute during that hour. next we go to scond node 30 at third level and get the counter value there which indicate the employees who were inside during that second in the minute. Adding these 3 counters is our result.
So it just took us O(3) or constant time.
Space complexity:
==============
We need to store 24*60*60 nodes. The size of each node will be 4 bytes to store the counter(assuming Unsigned int on 32 bit m/c). We can avoid using a pointer to represent the next or child node by using some clever array arithmetic as the number of nodes are constant and known.
So the total requirement is 24*60*60*4 bytes = 337.5KB which is not at all large considering the search complexity we are achieving with this solution.
If the difference between the highest price and lowest price of the books is not much I think we can simply keep an array of size[Highest_price - Lowest_price] and increment the relevant index by 1 whenever we see a book with that price. This solution will take o(N) time to fill the array and O(arraysize) time to search the most popular book from the array.
But if the interviewer is looking for a more complex solution by keeping the constraint that the difference between the highest price and lowest price too much then this is not a good solution as the amount of memory array allocation takes is too much. Infact it depends if the price distribution is sparse or dense and we can according use Hashing or array method respectively.
If the requirement is to find using O(1) space then I think In place sort is best technique.
IMO option 4 is the best and the correct solution to this problem. allocating 512 MB of memory for such a large set of input should not be a problem. However if the interviewer wants you to reduce the amount of memory required then also the same logic will apply but with a little change and slightly more but still constant processing time.
What we will do is divide the processing into N batches depending upon the memory restriction imposed by the interviewer. Consider N to be 2 for this example. So we will divide it into 2 passes. In the first pass of 5 billion numbers we will only consider numbers from 0 to 2^32/N = 2^31 and accordingly allocate 2^31/2^3 = 2^28 = 256 MB of memory. In the second pass we will consider elements in the integer range of 2^31 to 2^32 and again allocate 256 MB and remove duplicates according to technique mentioned in point 4.
Hence by applying the same algorithm with different ranges of number I have reduced my memory footprint by Half. We can further reduce it by incorporating more passes into it.
How can we use this BST tree to find the nth highest frequency element? We are inserting the elements in the tree based on the value of the element and not its frequency. so after constructing the complete tree we can determine the element with highest frequency logically but not the element with highest frequency. We will need to go through each element in the tree to find it and its as good as finding from a link list or any other data structure. BST have no particular advantage in this case.
- tarutrehan June 13, 2012