Ivan
BAN USERIs it possible to store additional data with tree node? If so, we can store "count" field in the node that holds the number of elements in the subtree including this element. Then the algorithm can be:
1. Find the node A where x <= A <= y.
2. Find the biggest node that is less than x in the A->left subtree. Lets call it LBorder.
3. Find the smallest node that is greater than y int the A->right subtree. Lets call it RBorder.
4. The result is: A->count - LBorder->count - RBorder->count.
The complexity is O(log N).
If it is not possible to store additional info I am afraid the complexity cannot be better than O(N), but it's quite easy and not interesting :).
Assumption:
1. We cannot load all files into memory (1 TB).
2. Word means separate word (in text "killer" theres is just one word; there is no word "kill").
1. Load several source files into memory (for example 100 files; it takes 1 GB).
2. Build tree (RB-Tree), where key is a word and value is number of occurences.
3. Save into new temp file: each line with format "word number".
4. Repeat 1-3 until process all files.
5. Open 2 temp files, use external merge sort by reading string from each file, if equal words appear then merge then into one record, write result into new temp file. Maintain the most frequent word (s).
6. Delete processed temp files.
7. Repeat until we have one file.
struct Point {
Point(int _x, int _y) : x(_x), y(_y) {}
int x;
int y;
};
vector<Point> GetPoints(const vector<vector<int>>& f)
{
//1 - west cost
//2 - east cost
size_t rows = f.size(), cols = f[0].size();
vector<vector<int>> m(rows, vector<int>(cols));
for (size_t i = 0; i < rows; ++i) {
m[i][0] |= 1; //left edge is a west
m[i][cols - 1] |= 2; //right edge is a east
}
for (size_t i = 0; i < cols; ++i) {
m[0][i] |=1; //top edge is a west
m[rows - 1][i] |= 2; //bottom edge is a east
}
for (size_t r = 1; r < rows; ++r) {
for (size_t c = 1; c < cols; ++c) {
//mark all points where I can reach west coast from
if ( (m[r][c - 1] & 1) && f[r][c - 1] <= f[r][c]) {
m[r][c] |= 1;
} else if ( (m[r - 1][c] & 1) && f[r - 1][c] <= f[r][c]) {
m[r][c] |= 1;
}
//mark all positions where I can reach east coast from
size_t curr = rows - r - 1, curc = cols - c - 1;
if ( (m[curr][curc + 1] & 2) && f[curr][curc + 1] <= f[curr][curc]) {
m[curr][curc] |= 2;
} else if ( (m[curr + 1][curc] & 2) && f[curr + 1][curc] <= f[curr][curc]) {
m[curr][curc] |= 2;
}
}
}
vector<Point> res;
for (size_t r = 0; r < rows; ++r) {
for (size_t c = 0; c < cols; ++c) {
if (m[r][c] == 3) {
res.push_back(Point(r, c));
}
}
}
return res;
}
It's very interesting solution. I didn't expect to see binomial coefficients here :).
But what about complexity.
As I can see time complexity is O(N^2) and memory consumption is O(N).
On the other hand the solution where we the number is represented as a container of digits (for example, getCountDigit() method posted by bhanu.ashwani above) has the same time and memory characteristic. IMHO, that solution is more straightforward and obvious than this.
Correct me about solution comparing if I am wrong.
rit, could you provide an example where the algorithm fails?
I provide my examples. In a BST bellow each node contains a number and count in braces.
- Ivan April 27, 2015