LinhHA05
BAN USERThe idea is to use quaternary search tree so the complexity T(m*n) = 3 * T(m*n/4) + c it results in O((m*n)^log 3 base 4) complexity - it is not the best result available, see below discussion for a better result
#include <iostream>
#include <cassert>
bool searchHelper(int v, int* a, int n) {
if (a[0] > v || a[n-1] < v)
return false;
int cx = n / 2;
if (a[cx] == v)
return true;
return (a[cx] < v) ? searchHelper(v, a + cx + 1, n - cx - 1 ) : searchHelper(v, a, cx);
}
bool searchHelper(int v, int* a, int m, int sx) {
if (a[0] > v || a[(m-1) * sx] < v)
return false;
int cy = m / 2;
if (a[cy * sx] == v)
return true;
return (a[cy * sx] < v) ? searchHelper(v, a + (cy + 1) * sx, m - cy - 1, sx ) : searchHelper(v, a, cy, sx);
}
bool searchHelper(int v, int* a, int n, int m, int sx) {
if (n == 1 && m == 1)
return a[0] == v;
if (a[0] > v || a[n-1 + (m-1) * sx] < v)
return false;
if (m == 1) // binary search{
return searchHelper(v, a, n);
if (n == 1)
return searchHelper(v, a, m, sx);
int cx = n / 2;
int cy = m / 2;
int p = a[cx + cy * sx];
if (p == v)
return true;
if (p > v) {
return searchHelper(v, a, cx, cy, sx)
|| searchHelper(v, a + cx, n - cx, cy, sx)
|| searchHelper(v, a + cy * sx, cx, m - cy, sx);
}
else
return searchHelper(v, a + (cx + 1) + (cy + 1) * sx, n - cx - 1, m - cy - 1, sx)
|| searchHelper(v, a + cx + 1, n - cx - 1, cy + 1, sx)
|| searchHelper(v, a + (cy + 1) * sx, cx + 1, m - cy - 1, sx);
}
int main()
{
int a[4][4] = {{1,14,25,35},
{2,16,28,38},
{5,20,28,40},
{16,22,38,41}};
bool found = true;
for (int i=0; i< 4; ++i)
for (int j=0; j< 4; ++j) {
bool res = searchHelper(a[i][j], &a[0][0], 4, 4, 4);
found = found & res;
}
assert(found);
assert(!searchHelper(3, &a[0][0], 4, 4, 4));
return 0;
}
Call the string with wild card s1, string without wild card is s2
The idea is use branch and bound so that we can quickly define when we can not proceed matching process any futher. Observation to quickly stop matching as going further will not result in a match
- The length of s1 is longer than s2 then, it will not match since wildcard should have at least 0 length
- If we see wild card at the beginning then start checking for matching at the end where both are different from wildcard and immediately stop when mis-match is found
Futher optimization:
- Keep the length of string available all the time since we can update length in 0(1) while the length of a string might take 0(N) to determine.
- Keep the number of wild card available since trivial case when there is only one wild card *, it will be replace by l2 - l1 of '?' which can be equal to anything (we can do better than this by keeping the matching position)
The complexity is a litte bit hard to predict, it take lineartime for the matching test when there is only one wild card, time complexity depend on the number of wild card and length different (it is 0(1) if s1 is longer than s2 :)).
#include <cassert>
bool isEqual(const char* s1, size_t l1, const char* s2, size_t l2) {
for (int i=0, j = 0; i< l1; ++i, ++j) {
if (s1[i] == '*')
j = j + (l2 - l1);
else if (s1[i] != s2[j])
return false;
}
return true;
}
bool MatchingHelper(const char* s1, size_t l1, size_t w, const char* s2, size_t l2) {
if (l1 - w > l2)
return false;
if (w == 1)
return isEqual(s1, l1, s2, l2);
if (s1[0] == '*') {
for (;s1[l1-1] != '*'; --l1, --l2)
if (s1[l1-1] != s2[l2-1])
return false;
int maxw = l2 - l1 + w;
bool match = false;
for (int i=0; i<= maxw; ++i) {
match = MatchingHelper(s1+1, l1 - 1, w - 1, s2 + 1 + i, l2 - i - 1);
if (match)
return true;
}
return false;
} else {
if (s1[0] != s2[0])
return false;
return MatchingHelper(s1+1, l1 - 1, w, s2+1, l2 -1);
}
}
bool isMatch(std::string s1, std::string s2) {
size_t l1 = s1.length();
size_t l2 = s2.length();
size_t w = 0;
for (int i=0; i< l1; ++i)
w += (s1[i] == '*');
return MatchingHelper(&s1[0], l1, w, &s2[0], l2);
}
int main()
{
assert(isMatch("b*ba","bbba"));
assert(isMatch("b**a","bbba"));
assert(isMatch("*bba","bbba"));
assert(isMatch("bb*a","bbba"));
assert(isMatch("bbb*","bbba"));
assert(!isMatch("bb*c","bbba"));
}
The probability one person hit the stump twice is E(X) = P^2
Assume that n people (n= 5) are n independent variables (the result from one does not influence result of the others) then the expected number of people who hit both time is
E(X1) + E(X2) + .. E(Xn) = n E(X) = n * P^2
Quick validation if P = 1, always hit, and all the people hit twice then there are 5 people hit
P = 0, always failed, non-of-them hit, then we expect 0 as the result
There are several part of the question
1. Fastest sorting algorithm for string depend on the input
- if all the input string has equal size: radix sorting is the best choice with complexity of O(k*n) with k is the size of the string. Space complexity O(n) since it can not be done in-place
- if the input has very different size: and the longest string can be large > 20, quick sort is generally favored . The space complexity is O(1), it can be done in-place
2. Sorting when the number of string is large. When he mean "large", it should be inferred as large than memory can handle (for both in-place or out-of-place)
The answer for the case that we can sort in the memory I have mentioned above.
Now I assume we need to do external sorting.
One potential solution (that can combine with the other in-memory method that I mentioned above)
- Sort all string based on the length of the string
- you need only a linear scan to collect the length
- you need another linear scan to put string on to the group based on the length
- Based on the counting histogram, divide the pre-sorted string into group that fit onto the memory then sort
- Chose radix sorting if the length in the group uniform
- Chose quicksort if the length is non-uniform or the number of string is small log N is a small number
- At this point the only extra step is to merge between sorting string elements of the same length in consecutive groups.
Known best solution for longest palindrom 0(n) "akalin.cx/longest-palindrome-linear-time", I found the idea is quite similar to string matching in that we need to restart the search where it is most make sense (to give a potential better result). While DP solution is generally used, the solution is not any better than a simple solution that consider all position a potential center of the palindrome and then extend its two-ends to the maximal length.
- LinhHA05 July 24, 2013Use fix size queue (size = n), and push the data from one end pop the data from the other end when the queue is full. We want to maintain the total and update total as well so that it takes constant time to compute the average
if the average value is required everytime new element come (in the streamming applications)
template<typename T>
class AvgQueue {
queue q;
T total;
public:
AvgQueue(): total(0);
void push(T v) {
T old = 0;
if (q.size() == n) {
old = q.front();
q.pop();
}
q.push(v);
total += v - old;
}
T get() {
return (q.empty()) ? 0 : total / q.size();
}
}
My order of preference
Performance
* Reference
* Pointer
* Object
Maintability/Debug
* Reference
* Object
* Pointer
Reason:
- Performance: Reference and Pointer are equaly well since it introduce no overhead, object might require copy which is very expensive if the size of the object is large. Reference is better then pointer since it require no extra test to see the refered object exist.
- Maintainability: it is generally well-know that raw pointer cause more trouble so people prefer to use smart pointer, you have to be very careful to use the raw pointer to make sure the refered object is valid and clean up the memory after all. Reference can be used anywhere object can be used.
@As mentioned in the below comment. It would be better to use const reference (const object and pointer) if you don't want to change the value of refered objects, it also help compiler to produce more optimized code
Call the floor where the edge break is m.
The sub-optimal solution for the infinite problem for two eggs is to use the sequen 1, 2^2, 3^3, ,i^2, ... and start the second edge where the last value the first egg remains. So if the first edge remains at n^2, then the next one will do at most 2*n + 1 - 1 (minus 1 from the first) test to give the total of 3 * n in the worst case with n = sqrt(m). This is much better bound than the doubling strategy that result in log(m) + m/2 = 0(m) in the worst case.
The optimal solution if the number of floor is known is pointed out by PKT, but he is wrong in the first case
It is simply wrong, the function grows almost as fast as doubling function.
- LinhHA05 July 26, 2013