Jurko Gospodnetić
BAN USERCheck out the test cases in my answer and you'll see failing ones with your logic - those that have multiple 'largest sequences of a'.
The main problem is in deciding which 'longest sequence' to use. :-)
You could also tweak the implementation to make it a bit faster on the average by using sentinel elements instead of constantly explicitly checking start/stop array boundaries. But that could require extra O(N) space if adding the extra sentinel values causes the start/stop array data storage to be reallocated.
- Jurko Gospodnetić November 16, 2014input2 can be easily supported by tracking how many medals each start/stop adds/removes to the current medal count and not just add +-1 to the current medal count when crossing an iteration's start/stop boundary.
You can do this by using extra O(N) space for storing this information or, if you are allowed to modify the original input data, you could even keep the O(1) space complexity by encoding the count inside the sorted start/stop arrays, e.g. start/stop[i] + 1000000 * count[i]; since 1 <= start/stop[i] <= 10^6 and 1 <= count[i] <= 100 this would fit inside a 32-bit integer.
Solution in Python with a bit adjusted interface:
input1 = size of start & stop
input2 = not used, assumes all elements are 1
input3 = start
input4 = stop
input5 = T
time complexity: O(N * log(N)) where N is the size of the start/stop array
space complexity: O(1)
def solution(start, stop, T):
N = len(start)
assert N == len(stop)
start.sort()
stop.sort()
start_i = 0
stop_i = 0
sum = 0
i = 1 # next officer to process (indices are 1-based)
while sum <= T:
# count how many medals the next officer has (+1 for each opened range;
# -1 for each closed range)
while start_i < N and start[start_i] <= i:
start_i += 1
while stop_i < N and stop[stop_i] < i:
stop_i += 1
medal_count = start_i - stop_i
# find the next officer that might have a different medal count (the
# first in or the first after some range); if none may change it, we
# have passed all the ranges, no further officers have any medals and
# we will never cross the total awarded medal count threshold
if stop_i == N:
return -1
next_i = stop[stop_i] + 1 # the first after some range
if start_i < N:
next_i = min(start[start_i], next_i) # or the first in some range
# skip the least number of officers needed to either cross the total
# awarded medal count threshold or we reach the next officer that might
# have a different medal count
officer_count = next_i - i
if medal_count:
officer_count = min(officer_count, (T - sum) // medal_count + 1)
i += officer_count
sum += medal_count * officer_count
return i - 1 # last processed officer
import pytest
@pytest.mark.parametrize("start, stop, T, expected", (
# index: 1 2 3 4 ...
# count: 0 0 0 0 ...
# sum : 0 0 0 0 ...
([], [], 0, -1),
([], [], 1, -1),
# index: 1 2 3 4 ...
# count: 1 0 0 0 ...
# sum : 1 1 1 1 ...
([1], [1], 0, 1),
([1], [1], 1, -1),
# index: 1 2 3 4 ...
# count: 2 0 0 0 ...
# sum : 2 2 2 2 ...
([1, 1], [1, 1], 0, 1),
([1, 1], [1, 1], 1, 1),
([1, 1], [1, 1], 2, -1),
([1, 1], [1, 1], 3, -1),
# index: 1 2 3 4 ...
# count: 0 1 1 0 ...
# sum : 0 1 2 2 ...
([2], [3], 0, 2),
([2], [3], 1, 3),
([2], [3], 2, -1),
([2], [3], 3, -1),
# index: 1 2 3 4 5 ...
# count: 0 1 0 1 0 ...
# sum : 0 1 1 2 2 ...
([2, 4], [2, 4], 0, 2),
([2, 4], [2, 4], 1, 4),
([2, 4], [2, 4], 2, -1),
([2, 4], [2, 4], 3, -1),
# index: 1 2 3 4 5 6 7 ...
# count: 0 1 1 0 1 1 0 ...
# sum : 0 1 2 2 3 4 4 ...
([2, 5], [3, 6], 0, 2),
([2, 5], [3, 6], 1, 3),
([2, 5], [3, 6], 2, 5),
([2, 5], [3, 6], 3, 6),
([2, 5], [3, 6], 4, -1),
([2, 5], [3, 6], 4, -1),
#index: 1 2 3 4 5 6 7 8 ...
#count: 0 1 2 3 3 2 1 0 ...
#sum : 0 1 3 6 9 11 12 12 ...
([2, 3, 4], [7, 6, 5], 0, 2),
([2, 3, 4], [7, 6, 5], 1, 3),
([2, 3, 4], [7, 6, 5], 2, 3),
([2, 3, 4], [7, 6, 5], 3, 4),
([2, 3, 4], [7, 6, 5], 4, 4),
([2, 3, 4], [7, 6, 5], 5, 4),
([2, 3, 4], [7, 6, 5], 6, 5),
([2, 3, 4], [7, 6, 5], 7, 5),
([2, 3, 4], [7, 6, 5], 8, 5),
([2, 3, 4], [7, 6, 5], 9, 6),
([2, 3, 4], [7, 6, 5], 10, 6),
([2, 3, 4], [7, 6, 5], 11, 7),
([2, 3, 4], [7, 6, 5], 12, -1),
([2, 3, 4], [7, 6, 5], 13, -1),
#index: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ...
#count: 0 0 1 1 2 2 3 3 3 3 2 2 1 1 0 ...
#sum : 0 0 1 2 4 6 9 12 15 18 20 22 23 24 24 ...
([3, 5, 7], [14, 12, 10], 0, 3),
([3, 5, 7], [14, 12, 10], 1, 4),
([3, 5, 7], [14, 12, 10], 2, 5),
([3, 5, 7], [14, 12, 10], 3, 5),
([3, 5, 7], [14, 12, 10], 4, 6),
([3, 5, 7], [14, 12, 10], 5, 6),
([3, 5, 7], [14, 12, 10], 6, 7),
([3, 5, 7], [14, 12, 10], 7, 7),
([3, 5, 7], [14, 12, 10], 8, 7),
([3, 5, 7], [14, 12, 10], 9, 8),
([3, 5, 7], [14, 12, 10], 10, 8),
([3, 5, 7], [14, 12, 10], 11, 8),
([3, 5, 7], [14, 12, 10], 12, 9),
([3, 5, 7], [14, 12, 10], 13, 9),
([3, 5, 7], [14, 12, 10], 14, 9),
([3, 5, 7], [14, 12, 10], 15, 10),
([3, 5, 7], [14, 12, 10], 16, 10),
([3, 5, 7], [14, 12, 10], 17, 10),
([3, 5, 7], [14, 12, 10], 18, 11),
([3, 5, 7], [14, 12, 10], 19, 11),
([3, 5, 7], [14, 12, 10], 20, 12),
([3, 5, 7], [14, 12, 10], 21, 12),
([3, 5, 7], [14, 12, 10], 22, 13),
([3, 5, 7], [14, 12, 10], 23, 14),
([3, 5, 7], [14, 12, 10], 24, -1),
([3, 5, 7], [14, 12, 10], 25, -1)))
def test_solution(start, stop, T, expected):
assert solution(start, stop, T) == expected
Time complexity: O(n)
Space complexity: O(1) not including input storage
First some reasoning to simplify the algorithm:
- if we have a word of a form 'a*b*' then the result is (0, 0)
- otherwise word is of a form 'a*b+a.*'
- let us mark the position of the first 'b' as b1 (0-base index)
- let us mark the position of the first 'a' after b1 as a2
- let (X, Y) be the best solution
- this best solution must not change the leading b1 'a' characters
or (0, 0) would result in a word with lower lexicographic ordering
- this best solution must also change the leading 'b' character
into 'a', or (b1, a2) would result in a word with lower lexicographic
ordering
==> X <= b1
- let us mark the position Y - (b1 - X) as Y1 - that is the position
that will be reversed over position b1, and so we know that
word[Y1] == 'a'
- since we know the leading 'a' character must not be altered by
the best solution's character reversal, we know that
word[Y1 + 1..Y] must all be 'a'
- if word[Y1 + 1] == 'a' then (b1, Y1 + 1) is a better solution than
(X, Y) as its reversal results in more leading 'a' characters
==> word[Y1 + 1] == 'b' or word[Y1 + 1] not defined
==>
Y1 == Y as all characters in word[Y1 + 1..Y] must be 'a' but
word[Y1 + 1] must be either 'b' or not defined
==> b1 == X
==>
X == position of the first 'b' character in the string
word[Y] == 'a'
word[Y] is the last 'a' in its group
Now we just need to find the 'a' character, after the first 'b' character for which reading backwards from it results in a word with the minimal lexicographic ordering.
Here's how the algorithm below achieves this:
- as a helper we split up the input into 'b...ba...a' blocks which, when
reversed, can be compared lexicographically as follows:
block1 < block2
<==> block1 has < #a or (== #a and > #b)
<==> (#a in block1, -#b in block1) > (#a in block2, -#b in block2)
- that means that if each block is assigned weight (#a, -#b) blocks with
higher weight are 'better', i.e. they come first in the lexicographic order
when reversed
- we start with a block ending with the last 'a' character and work our way
backwards, looking for a block with the highest weight
- if there is one such block - we are done and that is the last block that
needs to be reversed
- if there is more than one block with matching weights then we need to
compare blocks in front of them in order, i.e. those coming after them
since we are moving backwards
- if we find a difference - we discard the candidate with the less
weighing block
- if they match all the way to the end, we use the first encountered
candidate
- if we encounter three such max weight candidate blocks - the middle one
can be safely discarded, i.e. we need to track & check at most two
candidates at any time
Here's the actual solution code:
def scan_block(word, start, last_in_block):
"return the block's weight and the position in front of the block"
prev_b = word.rfind("b", start, last_in_block)
count_a = last_in_block - prev_b
prev_a = word.rfind("a", start, prev_b)
if prev_a < 0:
prev_a = start - 1
count_b = prev_b - prev_a
weight = count_a, -count_b
return weight, prev_a
def solution(word):
x = find(word, "b") # first 'b'
current = word.rfind("a", x + 1) # last 'a' after the first 'b'
candidate = []
best_weight = 0, 0
while current > x:
weight, next = scan_block(word, x, current)
if weight > best_weight:
# new best weight block - discard any previous candidates and start
# afresh from here
candidate = [(current, next)]
best_weight = weight
else:
if len(candidate) > 1:
# we already have 2 candidates - see if we can discard one
candidate0_weight, candidate0_current = scan_block(
word, x, candidate0_current)
if weight > candidate0_weight:
candidate[:1] = [] # new candidate is better
elif weight < candidate0_weight:
candidate[1:] = [] # old candidate is better
if weight == best_weight:
# new best weight block - use it as the second candidate,
# possibly discarding the previous second candidate in the
# process
candidate[1:] = [(current, next)]
candidate0_current = candidate[0][1]
current = next
return (x, candidate[0][0]) if candidate else (0, 0)
And here's some test code implemented using pytest:
import pytest
@pytest.mark.parametrize("word, expected", (
("a", (0, 0)),
("b", (0, 0)),
("ab", (0, 0)),
("ba", (0, 1)),
("aaaa", (0, 0)),
("bbbb", (0, 0)),
("abab", (1, 2)),
("abba", (1, 3)),
("bbaa", (0, 3)),
("abaa", (1, 3)),
("aaaabbbb", (0, 0)),
("babaabba", (0, 4)),
("aabaabaabaaba", (2, 10)),
("aabaabaaabaaba", (2, 8)),
("aabaaabaaabaaba", (2, 9)),
("aabaaaabaaabaaba", (2, 6)),
("aabaabaabbaaba", (2, 7)),
("aabaabbaabaaba", (2, 11)),
("aabbaabaabaaba", (2, 11)),
("aabbaabbaabaaba", (2, 12)),
("bbbaaabbbaabbbaaa", (0, 16)),
("abbbaaabbbaabbbaaa", (1, 17)),
("aaaabbbaaabbbaabbbaaa", (4, 20)),
("bbaabbabbaabbabbbaaba", (0, 10)),
("bbaabbabbaabbaabbabbbaaba", (0, 14)),
("baabbaaababbaaa", (0, 7))))
def test_solution(word, expected):
assert solution(word) == expected
Bah, found a test case where it fails... back to the drawing board :-(
for "baabbaaababbaaa" it should return (0, 7), but returns (0,14)
We can not detect which 'a' to use for the Y position based simply on the number of 'a' chars in its series and the number of 'b' chars in the series before :-(
[EDIT: Unfortunately I could not remove or edit the original comment as it was added as an anonymous user and is not connected to my current account, so I just give this comment a -1 and added the correct answer in a separate comment. Please give negative scores to this one so it sinks down the dung pile. :-)]
In your example "babaabba", applying (0,4) & (0,7) results in:
(0,4): aababbba
(0,7): abbaabab
as you can see (0,4) is the better solution of the two as its result comes lexicographically before the other (starts with 'aa', while the other starts with 'ab')
- Jurko Gospodnetić November 14, 2014
See the O(n) solution in my comment (be careful - there are two comments - one correct and one not so much that I unfortunately can not edit or remove :-)). It basically implements a 'smart way' to decide which of the 'longest a' groups is the best one. It depends on the data that comes before that group and it is possible to greedily decide which one is the best by scanning the string from the back.
- Jurko Gospodnetić November 25, 2014