Tsuna00
BAN USERThe answer is a K-D Tree: Split on the m problems. Each leaf represents a combination of the problems, and stores a list of references to patients.
Assuming that m is significantly smaller than n:
Time to search is O(1) - basically the height of the tree, which is m
Space is O(n) since no patient objects have to be duplicated
Edit: This is assumed to be an OPTIMIZED tree. Meaning: the tree would ONLY store enough paths to divide all the patients.
A simplified example: There are 100 diseases, and only 3 of those diseases are seen in all the patients. Some patients have only disease A, and the rest have both disease B and C. Then, the tree will only have a single node, with 2 children (not 2^100 children). The left child stores all patients with disease A, and the right child stores all diseases WITHOUT disease A. In practice, trees like this will be very small, relative to your data set. They're used in machine learning and indexing for geographic related searches.
2 issues with your response:
- Tsuna00 November 13, 20141. The problem states that n is relatively large compared to m
2. I didn't state it, but the tree would ONLY store enough paths to divide all the patients.
A simplified example: If there are 100 diseases, but only 3 of those diseases are seen in all the patients. Some patients have disease A, and the rest have both disease B and C. Then the optimized tree will only have a single node, with 2 children (not 2^100 children). The left child is all patients with disease A, and the right child is all diseases WITHOUT disease A. In practice, trees like this will be very small, relative to your data set.