Google Interview Question
Developer Program EngineersHi guys,
Can anybody explain a little bit how to merge the middle node if conflict happens? Thanks a lot.
12-1
8-9 3-4
This binary tree contains 3 children. In the above appointment table, if 4th appointment request from 12.30 - 1.30, so there is conflict. Now this appointment will be added to middle node. Like below
12-1
8-9 12.30-1.30 3-4
Again if 5th appointment comes from 11.30-12-30, this will be linked to middle node of 12.30-1.30 node... and so on...
If another conflict appointment request comes like 8.30-9.30 then the tree will be like below
12-1
8-9 N 12.30-1.30 N 3-4
NULL 8.30-9.30 NULL N 12-12.30 N
What happens if the conflict spreads over multiple appointments
e.g. in above example if another appointment comes as 8.45 to 1.30, where will that node go?
You already have the appointments list and you have to find conflicts in the existing list of appointments.
Your solution determines whether a new appointment conflicts with the current appointments.
Moreover, creating a tree is > O(n).
The BST solution does not work if the conflicting appointment spans over to conflict with further appointments.
Example: (1,4), (2,8), (5,8)
Now you 1st process (1,4) and create a node for it.
Add (2,8) as its middle child
You add (5,8) as its right child
Clearly using your solution (5,8) will not be marked as conflicting even though it is. This BST only gives you conflicting appointments wrt the root node.
1. Sort All the appointments based on start time and check the following condition.
Consider App1 (S1,E1) App2 (S2,E2)
Non-Overlap Condition:
S1 < S2 && E1 < S2
Start and end of Appointment1 is before start of appointment2.
Sorting algos have a complexity of O(nlogn) while the above bucket solution has a overall complexity of O(n).
Also, how does `Consider` pick the two appointments to test if they collide? It needs to check all, which is quadratic.
Assuming the input is validated i.e. endTime > startTime
heapify(apntmnts) on start time (this is binary heap) -> complexity[log(n)]
foreach aptn in heapified array, linearly search
if (endTime of current aptn > startTime of immediate (two) decedent aptns) then set hasConflicts=true
... complexity[o(n)]
total is log(n)+o(n) = o(n)
correct me if wrong
If I get ur suggestion right, you are trying to create a heap using the arraylist of Appointments but add a node to a is of complexity O(nlog(n)) and resultant heap will be of height log(n).
I am firstly confused on how the time represented in long datatype. If we can figure out that, we can find a way to have a hashmap or lookup table and that will help to bring the overall complexity to O(n). Do you have any idea on how time is represented in long?
First: my analysis was wrong and I meant to say building of heap will take at most nlog(n) but this is upper bound and is not asymptotically tight, can be proven as O(n).
Second: As far as representation is concerned, I am assuming time as long double e.g.
startTime=1.156
endTime=1.205
For hashmap/lookup you will need additional space. also will you need double set/map based upon start and end time?
In practice, it would be better assuming the start and end time would not be double num, should be like "15:30". And if we are talking about setting up appointment on calendar, the gap would be the times of 5 mins (e.g. "15:30 - 15:40"). So we can think of using baskets on a time line to indicate whether there have already had an appointment. That would be definitely O(n).
Sounds good. With the assumption of 8 working hours starting from 9:00 to 17:00 and each hour having 12 buckets (each bucket representing 5 mins duration) the total number of buckets needed will be 12*8 = 96. If we create a byte array of size 96 this extra memory required will be 96 bytes. Its better especially in the asymptotic value of size of given Arraylist becoz no matter how big a Arraylist is given, we make use of this constant 96 bytes to detect the conflicts. Does this sound fine??
I think that sounds good, even if you go down to 1 min as min separation i.e. 12.13 to 13.45, you can still create a byte array of (24*60) bytes and mark 1 for endTime-startTime places, and quickly find for overlap.
The algorithm you described is basically "signal convolution" but the algorithm gets worse when the signal is sampled at a much lower rate i.e. when appointments are by seconds.
Consider a similar problem in stock market, given all the time values of when a stock was above its average over 1 yr. Find if the stock was above average in a specific period. (Here the time sampling is at a rate of micro seconds. the algorithm gets drastically worse in this case).
finally my point being the algorithm is sensitive to a different dimension i.e sampling rate.
@Sri,
Consider the following case:
(format is appointment name, start time, end time)
(a, 1, 5)
(b, 7, 10)
(c, 9, 12)
(d, 2, 5)
(e, 3, 6)
Now let us say the array comes in the following order:
[b, c, a, d, e]
After you do the bubble-down procedure on start-time (this is the procedure that can build heap in O(n)) - you will have the following:
[a, d, b, c, e]
Please draw a heap diagram, to help in understanding maybe.
Now, using your algo, we can detect a has conflict (by comparing with d), d has conflict (by comparing with e). But we can't conclude that b has conflict with c, since c is not b's child in the heap. b has no children. How do you handle this?
@wangbingold: because of first assumption that startTime<endTime. With this assumption we always make sure that we dont have to "look" back to parent item because that will already be processed. Above algo can further be optimized with current and forward processing i.e. if we find conflict, we mark current node as well as the node with wich the conflict occurs.
I guess Sri mean that process the heap top, remove the heap top and maintain the heap, process the next top, remove the next top and maintain the heap... We can only guarantee that the heap top is biggest, we cannot guarantee that the left child is smaller or bigger than the right child. Without removing top and maintaining the heap, I don't see how it can work. Please correct me if I miss anything. Thanks.
How about just marking the booked times while iterating through the appointments and reporting any conflicts as they are found. This booking can be done on a time unit array.
Use counting sort to sort appointments based on startTime.
First sort this based on millisecond, which takes O(n+1000)=O(n)
Then sort appointments based on second, which takes O(n+60)=O(n)
Then sort based on minute, which takes O(n+60)=O(n)
Then day,and month and finally year,each round takes O(n), total 6 rounds which takes 6*O(n)=O(n).
After sorting, check conflicts can be done in O(n) time.
This requires O(n) space.
Correct me if I'm wrong.
The only problem I see with using byte array is that we are assuming their size with the minute gap or 5 minute or second gap but the questions have not specified any duration, so I can set up an appointment for say after 10 days or 15 days.
How do we handle this. In this case the size would differ with each day added .
Could we assume approx number of days?
Use a BST approach
let the appointment list be as follows
1-2
11-1
4-6
12-1
5-7
8-10
1-2
11-1 4-6
i/p 12-1 it would first go to the left of root but then
clashesh with the left child
as it doesnot satisfy the condition
start1<start2 && end1<start2
start1>end2 && end1>end2
similar collision for 5-7
1-2
11-1 4-6
8-10
Actually this is simplified version of 'Exon Chaining algorithm'.
In first pass : Create nodes for all the times (start and end time separately) and add it to a hash such that given a time, we can retrieve all the nodes associated.
Also get the smallest and largest times. THis can be done in O(n).
In second pass:
Go through all the times (we can just go through intervals if this is calendar appt). For each interval, get the node (if available) at that time.
If this is start node, simply add it to a stack (stack of all open nodes)
else
If this is close node, then go through the stack of all open nodes till you hit the open node for the current close node. Mark all these node's conflict = true. Remove current node with closed node and retain the other open nodes in the stack (as they might have conflict with other nodes).
This is a line sweep algorithm.
sweep line are nlogn...removing nodes out of your stack will take max O(n) therefore your algo will be O(n^2)
My approach is a bit different and does have O(k*n) complexity, where k is a constant.
Scan the appointments to get the earliest start time - complexity = O(n)
Scan the appointments to get the latest end time - complexity = O(n)
The range is start time - end time (in minutes).
Outer loop for every minute, from start time to end time (constant)
- Inner loop through all the appointments to check whether the loop time falls between any of the appointments' start and end times.
Were these phone interview for in person questions?
Do you remember any of your phone questions?
First phone interview question:
- write a method to return the unique characters in it
- write junit test cases for above method
Second phone interview question:
Given a string and dictionary as a input, write a method to return all the anagrams of the input string in the dictionary.
This was asked to me in a phone interview.
- Anonymous January 09, 2011Try building a tree like bst. Each node is an appointment. But each node has 3 children
left - appts completely less than the node
middle - appts merging with the node
right - appts completely greater than the node.
Now traverse and mark all middle children and all nodes which has a middle child
Got through the phone interview, but screwed up onsite.