showell30@yahoo.com
BAN USER- 0of 0 votes
AnswersCode up a simple Bloom Filter for four letter words. Assume all words are lowercase, exactly four letters long, and only use the letters a to z. An example set of test words might be able, band, bend, card, darn, form, grab, harm, look, make, mate, peer, and team, but assume the actual list of words is much bigger.
- showell30@yahoo.com in United States| Report Duplicate | Flag | PURGE
Amazon Software Engineer / Developer Data Structures
Whenever you recurse on reverseIOT, you are essentially returning/mutating two values: the node and a new k. The node goes from NULL to an actual node (if found), or remains the same. The number k goes from its original value to a lesser value (if the right tree has nodes), or remains the same. So, in some sense k and node really are similarly mutable. When you call a function that mutates two values, you can tackle it three ways:
1) Have the function mutate both values.
2) Have the function mutate one value and return the other.
3) Have the function return new copies of both values.
You chose solution #2, and I have no problem with that decision, but I'm curious at how you arrived at it.
In my implementation I chose strategy #3. I don't believe I've seen anyone choose strategy #1, but it would be conceptually easy as well. You call a function that sets the node if it's found; otherwise; it decrements k as needed.
You're not the only person that chose #2, but it seems asymmetrical to me. The other thing that seems strange to me is that you have a class that does nothing but wrap an integer to mutate it.
I like the clear code and tests, so I am upvoting this, by the way.
Great answer, @sonesh.
One thing to clarify with the interviewer is it's taken as a hypothetical that the actual addition computation is expensive enough to justify the overhead of passing values between the threads. Even for arbitrarily big integers, it's probably gonna cost more to hand off a number to a thread than it is to simply add the same number to a single accumulated sum managed by one core.
Also, since adding numbers is O(N), using a binary tree breakdown is way overkill. For a really long list of numbers, you're eventually gonna do N-1 additions, and the only real engineering concern is keeping the cores busy doing the real work (arithmetic), instead of keeping the cores busy doing synchronization (pure overhead). So, maybe send 10,000 integers at a time to each worker, let them spin a while, and then each worker update the overall sum after each iteration, using a mutex to manage contention.
P.S. If you use the id approach, then you'll be tempted to maintain a second data structure for reverse lookups, but it's not worth it if you're only ever gonna do one reverse lookup. If the problem statement evolves and you're gonna want to do multiple reverse lookups, then you can maintain a second data structure for the eventual reverse lookups, and since ids are sequential, you'd prefer a vector to a hash.
- showell30@yahoo.com March 15, 2013First solution:
Keep a hash of the most recent pages visited by each customer:
customer_name -> [page_name, page_name, page_name]
Keep a hash of page-tuples and their frequency:
(page_name, page_name, page_name) -> count
Refinement:
Build ids for customers and pages on the fly with two hashes and two integers:
customer_name -> customer_id (w/max_customer_id)
page_name -> page_id (w/max_page_id)
Since it's the same pattern for both, you can encapsulate the id generation with a class that puts a light wrapper around a hash, with an API of get_id_for_name(name) -> id and get_name_for_id(id) -> name.
Then your other data structures become more compact (and probably faster, depending on the hash implementation):
customer_id -> [page_id, page_id, page_id]
(page_id, page_id, page_id) -> count
For each line of the file, extract the customer name and page name, and then get their ids. Then update the customer_id's pages, truncating the array as needed. Then update the hash of counts.
At the end sort the values of the count hash to get the most popular page sequence, and then convert the ids back to full names for the final output.
You don't need it.
- showell30@yahoo.com March 14, 2013Yep, the key observation is that you have the odd squares (1, 9, 25, 49, etc.) on the diagonal of the NE quadrant of the matrix. You essentially find the relevant odd square number and then compute an offset along an edge or two of the square. This allows you compute the number in constant time.
- showell30@yahoo.com March 14, 2013If you're allowed massive preprocessing, you might want to connect your cities in a graph, roughly analogous to a highway system. Start by taking each city, and find its closest three neighbors and make highways between them. This will create a forest of highways (sorry to mix analogies). You may have isolated foursomes of cities that are only connected to each other (e.g. four cities in Alaska). For isolated foursomes, connect each of the cities to its next closest neighbor. After this step, you may still have an isolated cluster of eight cities; if so, repeat the process. After your preprocessing, you'll eventually have a highway system that connects all cities, but your highways will have the property that a trip between two cities always takes one time unit, no matter the actual distance.
Now imagine you're the traveler trying to get closest to home on the highway system. Start at any city, and look at the map to find neighboring cities. There will be at least three cities reachable by highway. Whichever city is closer to your destination, then go to that city next. If you're already in the closest city, then you're done.
There are edge cases where this approach won't produce the optimal solution, but it will always be pretty close. Consider continental U.S. geography and the top 50 US cities. Imagine creating a relatively minimal highway system between those 50 cities and think about your route.
My numbers were off a bit in the prior comment, but here's the program:
def strategy(n):
# A user draws a card from 1 to 10, and if
# he doesn't like it, he can return the card
# for up to n turns. He acts rationally.
# Return a tuple of the cards he keeps and
# his mean expected outcome.
if n == 0:
return ([1,2,3,4,5,6,7,8,9,10], 5.5)
next_keepers, next_outcome = strategy(n-1)
# This next bit of code could be generalized, but
# I'm leaving it explicit for clarity.
if next_outcome < 6:
return ([6,7,8,9,10], 0.5*8 + 0.5*next_outcome)
elif next_outcome < 7:
return ([7,8,9,10], 0.4*8.5 + 0.6*next_outcome)
elif next_outcome < 8:
return ([8,9,10], 0.3*9 + 0.7*next_outcome)
elif next_outcome < 9:
return ([9,10], 0.2*9.5 + 0.8*next_outcome)
else:
return ([10], 0.1*10 + 0.9*next_outcome)
for n in range(10):
keepers, outcome = strategy(n)
print n, '%.2f' % outcome, keepers
Output:
0 5.50 [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
1 6.75 [6, 7, 8, 9, 10]
2 7.45 [7, 8, 9, 10]
3 7.91 [8, 9, 10]
4 8.24 [8, 9, 10]
5 8.49 [9, 10]
6 8.69 [9, 10]
7 8.86 [9, 10]
8 8.98 [9, 10]
9 9.09 [9, 10]
A rational user draws a second card in exactly 50% of cases, i.e. only if the first card is 1/2/3/4/5. If the cases where he holds the first card, his expected return is 8, i.e the average of 6/7/8/9/10. In the cases where he draws a second card, his expected return is 5.5. His overall expected return is the mean of 8 and 5.5, which is 6.75. If the user were allowed to return up to two cards to the hat (i.e. up to three turns), then on the first turn, he only holds the 7/8/9/10, knowing the result of the two-turn game. His expected outcome then becomes 0.4*8.5 + 0.6*6.5, or 7.3. In the four-turn game, he only settles with 8/9/10 on the first hand, and his expected return is 0.3*9 + 0.7*7.3, or 7.81. You can obviously write a program to determine the strategy for an arbitrary number of turns. For a high enough n, the user's strategy is eventually to return anything other than a 10, but for low values of n, he needs to be careful about throwing away 6s, 7s, 8s, etc., for fear of getting worse cards on the redraws.
- showell30@yahoo.com March 13, 2013def best_strategy(prices):
holding = False
result = []
for i in range(len(prices) - 1):
if not holding and prices[i+1] > prices[i]:
result.append(('buy', 'day %d' % i, prices[i]))
holding = True
elif holding and prices[i] > prices[i+1]:
result.append(('sell', 'day %d' % i, prices[i]))
holding = False
if holding:
i = len(prices) - 1
result.append(('sell', 'day %d' % i, prices[i]))
return result
A simple clustering solution:
def anagram_cluster(lst):
anagrams = {}
for elem in lst:
sig = str(sorted(elem))
if sig not in anagrams:
anagrams[sig] = []
anagrams[sig].append(elem)
result = []
for variants in anagrams.values():
result.extend(variants)
return result
This is simpler than my long reply suggests, since I don't try to update the list in place. Also, while the results will vary according to how the hash sorts its keys, it does indeed return a valid solution:
['man', 'god', 'dog', 'abc', 'cab']
If you're gonna sort the list, then Python makes it trivial:
def anagram_sort(lst):
return sorted(lst, key=lambda elem: (sorted(elem), elem))
An interesting challenge is to treat this not as a sorting problem, but more as a clustering problem. The problem statement only specifies that anagrams need to be adjacent, but it doesn't specify any order otherwise. You can cluster the anagrams with a mostly linear pass through the list. For each element, sort the string to get its anagram signature. Check a hash of anagram signatures to see if it has any fellow anagrams so far. If it doesn't, then leave the element in its current index in the list, and mark that index as "free" (i.e. 0 elements reserved). If it does have a fellow anagram, then look up the signature hash to determine its new position. If that position is marked as free, then simply perform a swap. If that position is marked as reserved, then move it out of the way, using another data structure that says how far to jump to the end of its current cluster. You may have to perform multiple swaps to make room, but eventually there will be room. Finally, update your data structures to reflect the new clustering.
Your data structures:
original array: swaps will be in-place
hash #1: key=anagram signature, value=position to write next anagram to
hash #2: key=index into original array, value=num elements reserved
Read what I wrote again. You only sell when the next day is lower.
- showell30@yahoo.com March 11, 2013Buy Low, Sell High:
Basically, you want to buy a share if you have no shares and you know the next day the stock will trade higher, because you're at a local minimum. You want to sell a share if you have a share and you know the next day the stock will trade lower, because you're at a local maximum. Iterate through the array in O(N) time to figure out when you buy and sell. Don't buy stock on the last day for obvious reasons.
The more common variation of this problem is actually harder. Suppose you're only ever allowed to make one purchase. It would possibly be a short sighted strategy to buy at a local minimum, because you would be forgoing a future buying opportunity at an even lower price. But if you're allowed multiple buy/sell opportunities, this isn't an issue, because you can collect earning after every increase.
Define a "travel companion subset" as a group of students who have the same history of standing and sitting. For any two members of a travel companion subset of size >= 2, it's clear that we can't know their individual names. The good news is that once you find a person without any travel companions, you can identify his name immediately--it will be the only name that was called for the appropriate sitting/standing positions. Proof: Let's say person 4 has no travel companions. If he had heard both the names "Al" and "Bob" during his roll calls, he would clearly would have had a travel companion (since person 4 can only call his own name), so it's a proof by contradiction.
Anyway, divide and conquer, by splitting up people using a Fibonacci pattern:
5/8 standing/sitting
2/3 3/5 (stand/sit, stand/sit)
1/1 1/2 1/2 2/3 (stand/sit, stand/sit, stand/sit, stand/sit)
After three rounds, we have uniquely identified 4 people, and then we have travel companion subsets of 2, 2, 2, and 3 people. We can split those up in two more rounds:
1/1 1/1 1/1 1/2 # 7 people uniquely identified
1/1 # last 2 people identified
Ah, I didn't notice the "continuous" restriction. Simple modification:
def max_product(a):
ans = pre_max = pre_min = a[0]
for elem in a[1:]:
new_min = pre_min*elem
new_max = pre_max*elem
pre_min = min([elem, new_max, new_min])
pre_max = max([elem, new_max, new_min])
ans = max([ans, pre_max])
return ans
Python Solution Using Key Function:
def anagram_sort(lst):
return sorted(lst, key=lambda elem: (sorted(elem), elem))
lst = ['god', 'dog', 'abc', 'cab', 'man']
assert anagram_sort(lst) == ['abc', 'cab', 'man', 'dog', 'god']
And, of course, to do this more efficiently, it might be worthwhile to compare elem to -1/0/1 so that you can special case the min([...]) and max([...]) statements.
- showell30@yahoo.com March 10, 2013The max product of {-3,-4,0,1,2,3} should be 72, correct? I believe @duckling had a typo. Here is a Python translation of his approach:
def max_product(a):
ans = pre_max = pre_min = a[0]
for elem in a[1:]:
new_min = pre_min*elem
new_max = pre_max*elem
pre_min = min([elem, pre_min, new_max, new_min])
pre_max = max([elem, pre_max, new_max, new_min])
ans = max([ans, pre_max])
return ans
assert 0 == max_product([0,0,-2,0,0,0])
assert 8 == max_product([-2,1,1,8])
assert 18 == max_product([-2,-3,1,3])
assert 72 == max_product([-3,-4,0,1,2,3])
assert 720 == max_product([1,-1,10,-8, -9, 0])
assert 2 == max_product([-50, 0.5, 2])
assert 2 == max_product([-50, 2, 0.5])
@HauntedGhost, @bunnybare, If you constrain the problem to have only positive numbers, then you could indeed reuse the sum algorithm. You would create a new list with the logarithms of the numbers, find the max sum of the logs, then take the inverse log of the max sum to get the max product of the original list. Because of float imprecision, your number would have some rounding error. I throw this out as food for thought. For this particular problem, it's not the right approach, but there are other problems where changing to logarithm space can be a good technique.
- showell30@yahoo.com March 10, 2013Assume you have N bytes of storage and a function that can randomly pick an integer element from a range from 1 to x. Populate an array A with 1 to N. Use your random integer function to choose an index between 1 and N. Use A[<random int chosen>] as your next value. Replace A[<random int chosen>] with A[N] and decrement N. Repeat until N == 1, and then just return A[1] and you're done.
By virtue of setting A[n] to be A[<last chosen index>] after each iterations, there will be no collision/repeat detection, so you can pick each element in constant time.
Actually, I think I'd just bubble the lead 1 down into the 2s after any instance where the 2 sorts ahead of the 1. The worst case runtime is N squared, but if the lists are roughly interlaced, it's pretty fast.
- showell30@yahoo.com March 08, 2013Merging two sorted lists can be done in O(N) time. The challenge here is doing it in place, which can be done in close to O(N) time. Basically, you just need to buffer elements from the first list that aren't ready to be inserted into the final list. Your intermediate lists will look like this:
FFF1111BBB222
F = element in final position
1 = element in 1st list
B = element in buffered head portion of first list
2 = element in 1st list
Keep track of the offsets of the first 1, the first B, and the first 2. When you do the normal merge step, if there are any Bs, grab them before you grab the 1s. You can think of this like a circular array for 1s, or it might make more sense to just think of it as a scratch area to save off 1s for later processing. Either interpretation leads to the same result.
One minor slowdown is that you can get a long run of 1s that are less than 2s, which builds up your buffer of Bs. Once you get to a B that is less than a 2, you'll need to shift over all the other Bs one to the left to buffer the 1 that is displaced by the new F.
When there are no Bs and the lead 1 sorts ahead of the 2, then you simply turn the 1 into an F.
When there are no Bs and the lead 2 sorts ahead of the 1, then you put the 2 in 1's old place, turning it into an F, then put the 1 in the 2's old place, turning it into a B.
When there ARE Bs and the lead 2 sorts ahead of the lead B, then you replace the lead 1 with the lead 2 (turned into an F) and put the lead 1 in the 2's old place (turned into a B).
#include <stdlib.h>
#include <stdio.h>
int main(int argc, char **argv) {
int foo = 1 + 2*256 + 3*(256*256) + 4*(256*256*256);
int i;
char *p = &foo;
for (i = 0; i < 4; ++i) {
printf("%d: %c\n", i, '0' + p[i]);
}
return 0;
}
Here is working code in Python:
def find_min(f, lo_x, hi_x, max_error):
x0 = lo_x
x3 = hi_x
x1 = (2*x0+1*x3) / 3.0
x2 = (1*x0+2*x3) / 3.0
while x2 - x1 > max_error:
if f(x2) > f(x1):
# The middle is increasing, so
# prune the right interval.
x3 = x2
x2 = x1
x1 = (x0+x1) / 2.0
else:
# The middle is decreasing, so
# prune the left interval.
x0 = x1
x1 = x2
x2 = (x2+x3) / 2.0
return x1
f = lambda x: (x - 1.5) ** 2
guess = find_min(f, -10, 7, 0.0001)
assert abs(guess - 1.5) < 0.0001
f = lambda x: (x - 3.7) ** 4
guess = find_min(f, 0, 5, 0.00001)
assert abs(guess - 3.7) < 0.00001
Start with three intervals and determine whether they go net down or net up. If they're DDU or DDD (i.e the middle interval is net down), then you know the min is the 2nd or 3rd interval, so toss the first interval and split the third one. If they're DUU or UUU (i.e. the middle interval is net up), then you know the min is in the 1st or 2nd interval, so toss the third interval and split the first one. Keep applying the algorithm until your three intervals are suitably small.
- showell30@yahoo.com March 06, 2013Here's a Python translation, which prints square brackets instead of squigglies, but otherwise similar logic:
def power_set(big_lst):
n = len(big_lst)
power_set = []
for i in range(1 << n):
lst = []
for j in range(n):
if i&(1<<j):
lst.append(big_lst[j])
power_set.append(lst)
return power_set
lst = [10, 20, 30]
print power_set(lst)
Also easy to do recursively:
def power_set(lst):
if not lst:
return [[]]
else:
head = lst.pop()
ps = power_set(lst)
return ps + [[head] + s for s in ps]
Python Power Set.
This basically does binary counting to mutate a list of True/False values that determine which elements get included in the next subset.
def power_set(s):
lst = list(s)
bools = [False] * len(lst)
while True:
yield set([lst[i] for i, b in enumerate(bools) if b])
i = 0
while i < len(lst):
if not bools[i]: break
i += 1
if i >= len(lst):
break
bools[i] = True
for j in range(i):
bools[j] = False
def test():
s = set([10, 20, 30])
for sublist in power_set(s):
print sublist
test()
You actually don't need to worry about a data structure for the walls if you model the cells as vertexes and lack-of-walls as edges. Start from the starting cell, and then do a random walk to the ending cell, connecting adjacent cells with edges (which are essentially non-walls). Mark the cells that are already visited so that you don't have any cycles in your path. Then flesh out the tree by picking random vertexes on your tree and random walk a random length, again avoiding cells that have been already visited. Obviously, enforce the geometric constraint that you can only connect two cells that are horizontally or vertically adjacent. Stop once all cells have been visited (a simple count will suffice).
When it comes time to actually draw walls, look for adjacent cells, and if they don't have an edge between them, draw the wall.
But the underlying data structure is a tree. With the edges of the tree being explicit ways that you can get between two cells without crossing a wall and/or teleporting, it becomes very straightforward to solve the maze with a DFS.
Iterative Version in Python:
This is essentially brute force, but it efficiently terminates paths as soon as a mismatched letter is found.
def test():
matrix = [
['C', 'X', 'T', 'Z'],
['X', 'A', 'Y', 'Z'],
['W', 'X', 'Q', 'R'],
]
assert [(0,0), (1,1), (0,2)] == find_string(matrix, 'CAT')
def find_string(matrix, s):
substrings = []
for row in range(len(matrix)):
for col in range(len(matrix[0])):
c = matrix[row][col]
if c == s[0]:
if len(s) == 1:
return [(row, col)]
substrings.append([(row, col)])
while substrings:
substring = substrings.pop()
row, col = substring[-1]
deltas = [
(-1, -1), (-1, 0), (-1, 1),
( 0, -1), ( 0, 1),
( 1, -1), ( 1, 0), ( 1, 1),
]
for rd, cd in deltas:
rn = row + rd
cn = col + cd
if (rn, cn) in substring:
# don't allow repeats
continue
try:
c = matrix[rn][cn]
except IndexError:
continue
if c != s[len(substring)]:
continue
new_substring = substring + [(rn,cn)]
if len(new_substring) == len(s):
return new_substring
substrings.append(new_substring)
return None
test()
@Arjun, see also: "Efficient Solution in Python (with tests)".
- showell30@yahoo.com March 05, 2013@Loler, I think you are correct to attack this with a depth first search. Usually, you only resort to a breadth first search when you know your problem might not need to traverse all the way down to deep leaves. But, here, we need to visit every leaf, even the even ones (sorry, no pun intended). We need to visit the even leaves to make sure they don't have any children. As you say, breadth first search introduces extra storage requirements, and it's the hardest traversal to get right.
- showell30@yahoo.com March 05, 2013@Loler, What's your rationale for passing in an int by reference instead of simply having the function return an int? Is it to avoid having to compare the right subtree's depth with the left's depth? Seems like a lot of extra machinery, since you have to introduce a helper function and do the ++/-- dance.
- showell30@yahoo.com March 05, 2013Sorry to downvote this, but this problem shouldn't require classes, global variables, or special modifications to any data structures. A simple function should suffice.
- showell30@yahoo.com March 05, 2013As others have mentioned, you need to clarify with the interview what constitutes the depth of a leaf, but at the end of the day, it's a simple recursion:
def test():
assert 0 == max_odd_leaf_depth(None)
assert 1 == max_odd_leaf_depth((None, 1, None))
assert 0 == max_odd_leaf_depth(((None, 2, None), 1, (None, 2, None)))
tree3 = (None, 3, None)
tree2 = (tree3, 2, tree3)
tree1 = (tree2, 1, tree2)
assert 3 == max_odd_leaf_depth(tree1)
def max_odd_leaf_depth(tree, depth=1):
# return 0 if no leafs at an odd
# depth
if not tree:
return 0
right, val, left = tree
if not right and not left:
# overly fancy way of saying
# return 0 for even, depth for odd
return depth * (depth % 2)
child_depths = [
max_odd_leaf_depth(left, depth+1),
max_odd_leaf_depth(right, depth+1)
]
return max(child_depths)
test()
I'm not sure this captures every edge case, but it's a sketch of divide-and-conquer.
def nth_smallest(lst1, lo1, hi1, lst2, lo2, hi2, n):
if lo1 == hi1:
return lst2[lo2+n]
if lo2 == hi2:
return lst1[lo1+n]
if n == 0:
return min(lst1[lo1], lst2[lo2])
offset1 = min([n/2, hi1 - lo1 - 1])
offset2 = min([n/2, hi2 - lo2 - 1])
print offset1, offset2
if lst1[lo1+offset1] < lst2[lo2+offset2]:
return nth_smallest(
lst1, lo1+offset1+1, hi1,
lst2, lo2, hi2,
n-offset1-1)
else:
return nth_smallest(
lst1, lo1, hi1,
lst2, lo2+offset2+1, hi2,
n-offset2-1)
def mutual_median(l1, l2):
n = len(l1) + len(l2)
mid = n / 2
return nth_smallest(l1, 0, len(l1), l2, 0, len(l2), mid)
Efficient Solution in Python (with tests)
This is a DP solution that only computes one octant of the matrix, since all the other octants are just reflections. It also defers dividing by powers of four until the end.
def test():
assert 1.0 == probability_of_alive(5, 0, 0, 0)
assert 0.5 == probability_of_alive(5, 1, 0, 0)
assert 0.75 == probability_of_alive(5, 1, 1, 4)
assert 1.0 == probability_of_alive(5, 2, 2, 2)
assert 0.9375 == probability_of_alive(5, 3, 2, 2)
def probability_of_alive(N, n, x, y):
dct = get_survival_dct(N, n)
return 1.0 * lookup(dct, N, x, y) / (4**n)
def get_survival_dct(N, n):
# This returns a dictionary where keys
# are (x,y) tuples and it only has values
# for the WNW octant of the island, since
# other values can be calculated via
# horizontal, vertical, and diagonal
# reflection. Values are probabilities
# of survival * (4**n).
dct = {}
for i in range((N+1)/2):
for j in range(i, (N+1)/2):
dct[(i,j)] = 1
for step in range(n):
new_dct = {}
for i in range((N+1)/2):
for j in range(i, (N+1)/2):
new_dct[(i,j)] = \
lookup(dct, N, i+1, j) + \
lookup(dct, N, i, j+1) + \
lookup(dct, N, i-1, j) + \
lookup(dct, N, i, j-1)
dct = new_dct
return dct
def lookup(dct, N, i, j):
if N - i - 1< i: i = N - i - 1
if N - j - 1< j: j = N - j - 1
if j < i: i, j = j, i
return dct.get((i,j), 0)
def print_probability_matrix(N, dct):
for i in range(N):
print [lookup(dct, N, i, j) for j in range(N)]
N = 5
for n in range(4):
dct = get_survival_dct(N, n)
print_probability_matrix(N, dct)
print
test()
The program's output shows that sample matrices for N=5.
[1, 1, 1, 1, 1]
[1, 1, 1, 1, 1]
[1, 1, 1, 1, 1]
[1, 1, 1, 1, 1]
[1, 1, 1, 1, 1]
[2, 3, 3, 3, 2]
[3, 4, 4, 4, 3]
[3, 4, 4, 4, 3]
[3, 4, 4, 4, 3]
[2, 3, 3, 3, 2]
[6, 9, 10, 9, 6]
[9, 14, 15, 14, 9]
[10, 15, 16, 15, 10]
[9, 14, 15, 14, 9]
[6, 9, 10, 9, 6]
[18, 30, 33, 30, 18]
[30, 48, 54, 48, 30]
[33, 54, 60, 54, 33]
[30, 48, 54, 48, 30]
[18, 30, 33, 30, 18]
The key insight here is that you need a helper function to manage the rank. Here's my take on it, with tests:
#include <stdlib.h>
#include <stdio.h>
#include <assert.h>
struct Node {
int val;
struct Node *left;
struct Node *right;
};
typedef struct Node Node;
struct search_result {
int need;
struct Node *node;
};
struct search_result kth_biggest_result(Node *root, int k) {
struct search_result sr;
sr.need = k;
sr.node = NULL;
if (!root) {
return sr;
}
if (root->right) {
sr = kth_biggest_result(root->right, k);
if (sr.need == 0) return sr;
}
sr.need -= 1; // root
if (sr.need == 0) {
sr.node = root;
return sr;
}
return kth_biggest_result(root->left, sr.need);
}
Node *kth_biggest(Node *root, int k) {
struct search_result sr;
sr = kth_biggest_result(root, k);
return sr.node;
}
Node *make_node(int n) {
Node *Node = malloc(sizeof(struct Node));
Node->left = NULL;
Node->right = NULL;
Node->val = n;
return Node;
}
int main(int argc, char **argv) {
Node *node;
Node *t1 = make_node(1);
Node *t2 = make_node(2);
Node *t3 = make_node(3);
Node *t4 = make_node(4);
Node *t5 = make_node(5);
Node *t6 = make_node(6);
t2->left = t1;
t2->right = t3;
t4->left = t2;
t4->right = t5;
t5->right = t6;
node = kth_biggest(t4, 1);
assert(node->val == 6);
node = kth_biggest(t4, 2);
assert(node->val == 5);
node = kth_biggest(t4, 3);
assert(node->val == 4);
node = kth_biggest(t4, 4);
assert(node->val == 3);
node = kth_biggest(t4, 5);
assert(node->val == 2);
node = kth_biggest(t4, 6);
assert(node->val == 1);
node = kth_biggest(t4, 7);
assert(!node);
node = kth_biggest(t4, 8);
assert(!node);
return 0;
}
I would start my answer by trying to lay out some rules:
1) Once a passenger gets on the elevator, the elevator must deliver him to any floor in the direction he indicated while waiting for the elevator.
2) The elevator system must serve the extremes, so if an elevator is going down, and people are waiting on the bottom, it can't turn around to get more passengers on middle floors unless there's another elevator working toward the bottom.
The interviewer might challenge both rules. For rule #1, you would probably make an exception if you have just one passenger. For rule #2, an unethical elevator operator might decide overall throughput is worth temporarily standing people for a while (especially if they can take the stairs).
Once an elevator is empty, it needs to decide whether to work up or down, based on the other elevator's current travel plans. A greedy plan has you going to the nearest extreme, i.e. if the elevator is close to the top, go pick up the topmost passenger. But you need to account for other elevators maybe already being en route to that passenger (especially empty elevators that are already above you).
First and foremost, I would try to find the Nth element among the lists, not the medians, and then I'd treat the median as a special case.
If constant time you can determine if Amax < Bmin or Bmax < Amin, and then you can find the Nth element in constant time as well.
If the size of both lists are small (say < 10), then simply merge them to find the Nth element.
If we have two large overlapping lists, then consider where the Nth element could be. Suppose size(A) = 13, size(B) = 30, and N=19. It's possible for any element of A to be the 19th element among the two lists, but it's clear that the first 6 elements of B have to be less than the 19th element, since at max 12 elements of A could be less than the 6th element of B. So we can recurse on finding the 13th element among all A and among 24 elements of B. For small values of N you can trim the larger list on the right using similar reasoning.
This problem is far from trivial to do efficiently, and it has lots of fiddly cases to deal with, but it's definitely amenable to some level of divide and conquer.
A divide-and-conquer solution for median has to find the Nth element among the partial lists after the first step of recursion. For example, if you have two lists of size 100, and you can eliminate the first 50 elements of the first list, then now you've reduced your problem to finding the 50th of 150 elements, but that's no longer a median.
- showell30@yahoo.com March 02, 2013Looks good. Also, there are lots of symmetries here that allow you to reduce storage. A simple way to cut the processing in half is to only consider squares where x <= y, and if you need P(x,y,n) where x > y, then just find P(y,x,n). You can also take advantage of horizontal and vertical reflections.
- showell30@yahoo.com March 02, 2013You have a to be a little careful about considering the border. Once you step off the island, you're dead, whereas if you only consider the water at the end of N steps, you can count situations where folks stepped into the water and out of the water.
- showell30@yahoo.com March 02, 2013Also, the recursive calls should use n-1 as a parameter. The probability of dying from a square in n steps when the first step is a neighbor involves the probability of dying in n-1 steps from the neighbor.
- showell30@yahoo.com March 02, 2013Yep. Create a set of letters from the smaller string (e.g. 'ab'). Advance character by character through the larger string, checking to see if current character is in the set. If it is, remove it from the set. If the set's empty, you're done. If not, keep advancing. If you find a character that's not in the set, advance to the next index, and repopulate the set from the smaller string.
- showell30@yahoo.com March 02, 2013If you sort both strings, you will get a false positive for acb and ab.
- showell30@yahoo.com March 02, 2013Anonymous, think about it, if you only have an exists() function, then your dictionary is effectively infinite and this problem is unsolvable.
- showell30@yahoo.com February 28, 2013If the maze paths are indeed constrained to only go right and down, then this is a classic DP problem. If not, check out the solution that starts "This is a non-recursive solution that uses backtracking." for a recursive solution that allows for a more realistic maze with spirally paths around the obstacles.
- showell30@yahoo.com February 28, 2013
@Barry, thanks. If you scroll up in the comments, I respond to duckling about the optimization where you can eliminate A and B at the same time. As you point out, it doesn't reduce the overall complexity, which is why I'd be somewhat shy about trying to handle this case, unless I were really trying to squeeze out extra performance.
- showell30@yahoo.com March 16, 2013