## Epic Systems Interview Question for Software Engineer / Developers

Country: United States
Interview Type: Phone Interview

Comment hidden because of low score. Click to expand.
2
of 4 vote

The answer is a K-D Tree: Split on the m problems. Each leaf represents a combination of the problems, and stores a list of references to patients.

Assuming that m is significantly smaller than n:
Time to search is O(1) - basically the height of the tree, which is m
Space is O(n) since no patient objects have to be duplicated

Edit: This is assumed to be an OPTIMIZED tree. Meaning: the tree would ONLY store enough paths to divide all the patients.

A simplified example: There are 100 diseases, and only 3 of those diseases are seen in all the patients. Some patients have only disease A, and the rest have both disease B and C. Then, the tree will only have a single node, with 2 children (not 2^100 children). The left child stores all patients with disease A, and the right child stores all diseases WITHOUT disease A. In practice, trees like this will be very small, relative to your data set. They're used in machine learning and indexing for geographic related searches.

Comment hidden because of low score. Click to expand.
0

I think this is the best solution. It's basically a balanced binary tree. Each path represents a combinations of problems and leads to the list of patients with these problems.

Comment hidden because of low score. Click to expand.
0

Remember the question is to find all the patients with at least the 3 particular diseases. It actually takes O(2^(m-3)) to find all the patients from the tree.

I think the best solution depends. If n >> 2^m, then the tree solution is good. If
n is comparable or even less than 2^m, a simple 2D array may be better.

Comment hidden because of low score. Click to expand.
0

1. The problem states that n is relatively large compared to m
2. I didn't state it, but the tree would ONLY store enough paths to divide all the patients.

A simplified example: If there are 100 diseases, but only 3 of those diseases are seen in all the patients. Some patients have disease A, and the rest have both disease B and C. Then the optimized tree will only have a single node, with 2 children (not 2^100 children). The left child is all patients with disease A, and the right child is all diseases WITHOUT disease A. In practice, trees like this will be very small, relative to your data set.

Comment hidden because of low score. Click to expand.
1
of 1 vote

^counter argument for above suggestion:
The problem demands solution for "find all the patients who have at least the 3 problems from the set."
So if you create a 2D array, you can surely access required disease in O(1). However, while searching for one patient who has at least 3 diseases, you have to access all the disease columns in worst case. So for one patient, the time complexity in you DS will be O(m). Even though m is small, this seems a trivial solution. The time complexity in your case will be O(n x m) for all the patients. Please correct me if I am wrong.

Comment hidden because of low score. Click to expand.
2

Maintain a extra column (Tot_diseases) containing total number of diseases per patient. Increment this attribute while inserting 'T' for a particular disease. This way time complexity will be O(1).

Comment hidden because of low score. Click to expand.
0

@khunima can you explain the additional column comment in some detail how would it reduce the time complexity to O(1)? (based on the argument to which it is commented)

Comment hidden because of low score. Click to expand.
0

Explanation for what @khunmia is trying to say: maintain a count of the number of diseases each patient has using a separate column. So now there are m+1 columns(m diseases + one Tot_diseases column). When returning the patients with atleast 3 diseases, check if the value of the Tot_diseases column is >=3. If yes, return that patient. This makes it O(1) to check if a patient has atleast 3 diseases and the over all time O(n)

Comment hidden because of low score. Click to expand.
0

searching diseases for each patient is not o(n), because we have know the index of every disease. Even we don't know that, we can use ArrayList<HashMap<String, Boolean>> to do that.

Comment hidden because of low score. Click to expand.
0
of 0 vote

I would use a hashtable where patients are mapped to the keys and the problems are values.

Comment hidden because of low score. Click to expand.
0
of 0 vote

Not sure how one can search based on values in a hash map?

Comment hidden because of low score. Click to expand.
0

That's impossible, you have to iterate all the values of the hash map to find it.

Comment hidden because of low score. Click to expand.
0
of 2 vote

Two dimensional array of size n x m work?
n - no of patients
m - no of diseases.

since n > m, one might want to go over the n less no of times compared to m. For every patient (i.e. for each n) , inspect the required diseases ( at specific columns in the row for the patient) in constant time, as array look is O(1).
time complexity would be O(n).
space complexity would be O(n x m)

Hi Jobseeker,
What was the answer you came up with?

Comment hidden because of low score. Click to expand.
0

The data structure I came up with was also the dynamic 2 dimensional array. For space and time its efficient. Space complexity is O(n+m) and the time complexity to find an element is constant.

Comment hidden because of low score. Click to expand.
0
of 0 vote

We could use integers represented by m-bits. Each bit could represent a disease(boolean values). space complexity is now O(n). But time complexity is still O(n*m):
Let x represent the set of given diseases as bits. Bitwise and it with all 'n' numbers. count no. of set bits in O(m) for each patient.

If we could store the no. of bits stored in all 2^m bits stored before hand:
space complexity: O(n)+O(2^m).
time complexity: O(n)

Comment hidden because of low score. Click to expand.
0
of 0 vote

I think in this case, using a database and create a table would be a good choice, the table stores all the patients and what kind of diseases they have.
Or for another thought, using XML to store this kind of information?

Comment hidden because of low score. Click to expand.
0

The interviewer was keen on knowing the specific datastructure I would use. I never asked him about the DB part, I thought any of our specific datastructures would be the answer.

Comment hidden because of low score. Click to expand.
0
of 0 vote

I would create a Map of problems, which would have two keys "T" and "F" with their values being a List of patients corresponding to that problem's key.

To find the patients with all three problems, I would create an empty list. I would then get the list of patients with one problem (eg: m_diabetes.get("T")) and iterate over that list and adding a patient to the empty list if the patient is in the other two lists (ie: m_liver.get("T") and m_kidney.get("T")).

``````Initialize empty list of Patients, m_all
for Patient p in list of patients with problem m_diabetes
if p in list of patients with problem m_kidney
if p in list of patients in problem m_liver

My rationale is this; using the problem possibilities as the keys and the patients as the values allows us not to need to iterate over every patient to see what problems they have. It also allows us to "short-circuit" our search if say the patient doesn't have one of the problems.

My algorithm complexity analysis is super-rusty, so I'm not sure how much better this is.

I'd appreciate any counter-arguments or proofs.

Comment hidden because of low score. Click to expand.
0
of 0 vote

As it is request for Datastructure...
We can use Ordered double Linked list.

struct FindPatientProbelm
{
long Patient_ID; //Unique...
string Patient_Name;
string Patient_Problem;
struct FindPatientProblem *Left;
sturct FindPatientProbem *Right;
};
typedef struct FindPatientProblem *Node;

Need to have three Pointers to traverse;
First --> Always pointing to first node.
Mid --> For implementing search pattern,
Last --> To point last node.

Applying binary search..

Comment hidden because of low score. Click to expand.
0
of 0 vote

Asked me the same question, but there was little variance. He asked how soon can you find out a disease which has maximum patients infected. Next, find maximum number of patients who have common maximum common diseases.

Comment hidden because of low score. Click to expand.
0
of 0 vote

We can have two hash table
1. key as 'problem' value as List of 'patients'.
2. key as 'patient' value as List of 'problems'

Basically, its type of indexing we have create for making search easy in all possible scenario.

Comment hidden because of low score. Click to expand.
0
of 0 vote

How about a Bipartite graph where we have two sets. One of patients and other of problems. An edge exists between a node in patient set and a node in problem set if that patient has that problem. That way picking a problem from the set we can easily find out any 3 nodes it is connected to in patient set.

Comment hidden because of low score. Click to expand.
0
of 0 vote

The answer to this question is using a maxheap. You can find the top indicators for the n patients and if more patients are added, then maxheap can fix itself and sort the tree

Comment hidden because of low score. Click to expand.
0
of 0 vote

Why not use HashMap<Person, HashSet<Disease>> map ?
Then we iterate through all the people, then it should be O(n).

if( map.get(i).contains(diabetes) && map.get(i).contains(..) && map.get(i).contains(..)){
}

return res;

Since the .contains() of HashSet should run in O(1), then the whole TIME complexity can be O(n) ?? Just asking. Not sure.

Name:

Writing Code? Surround your code with {{{ and }}} to preserve whitespace.

### Books

is a comprehensive book on getting a job at a top tech company, while focuses on dev interviews and does this for PMs.

### Videos

CareerCup's interview videos give you a real-life look at technical interviews. In these unscripted videos, watch how other candidates handle tough questions and how the interviewer thinks about their performance.