VMWare Inc Interview Question
Senior Software Development EngineersCountry: United States
Interview Type: Phone Interview
Use Binary search and instead of sending data, only send sh1 or md5 of the data.
Slow network is the key so packet drop, retransmission, etc has to be avoided at any cost.
1) Send MD5 of first half of data. If it matches, send true else false.
2) if(ture) {MD5 of half of second half} else { MD5 of half of first half}
3) It should go on till the time it comes down to some bytes under or equal to MTU. Once length of data for which we are doing md5 reaches into MTU limit, then just send the data directly and compare the data in for loop.
An improvement on Flux's solution:
In each step only calculate the hash of half the sequence, not both halves. If the half isn't equal, then the is in that half, otherwise it is in the other one.
Another improvement would be to keep the values of the first half calculations on the half been calculated, and if the change is in this half, then you already have the calculation of one of the lesser halves.
I.e.:
We want to calculate on the first half of 1 billion, 500 mil.
so we can already stop the hashes for the first halves in the 250 mil, 125 mil, 64.5 mil etc, since we are already calculating the whole hash.
If the difference isn't in this half, no harm done, since we didn't add to the calculations.
I think mittalrishabh has the correct solution.
We need to minimize the communication between these serves as much as possible.
In the binary search case the solution is log(billion) transmissions in worst case.
In XOR case it is only 1 transmission in worst and best case.
mittalrishabh has a solution but not complete...
The number of characters are same on both sides, (and it is not missing on one side), so if P is the char on one side, other side of will have a P'.
And after XOR, you will get (P XOR P').
The complete solution will be -
(1) P XOR P' will have some bit set, which means P and P' differ on that bit.
(2) Divide every element in 2 sets... one with that bit as 0 & another with that bit as 1. (Note that both sets will span across both machines, but we will not merge them)
(3) P will be in set 1, P' will be in set2. XOR again within set-1 that will yield P & set-2 will yield P'.
@ Rajesh
Your solution is interesting. I am wringing some code to verify it.
However, what is wrong with the count each char's occurrence number and simply compare? Is the performance low or what?
(Update) We can find the two different elements but we could not tell which is from which file using your algorithm. Right?
Introduce some simple hashing of integer sequences (for example, polynomial hash or even simple xor) and locate mismatching byte with binary search.
- Flux February 15, 2017On first step, calculate hash of first and second halves of sequence, send them to other server, check which one doesnt match with other server's one. Repeat for mismatching part of array.
Total O(ln(n)) network interactions, obviously.
Hashes can be calculated in O(n) time and O(1) space, or in O(1) time and O(n) space using partial hashes.