Jurko Gospodnetić

Tech Lead

Comments (8)

Reputation 0

Page:

Comment hidden because of low score. Click to expand.

of 0 vote

See the O(n) solution in my comment (be careful - there are two comments - one correct and one not so much that I unfortunately can not edit or remove :-)). It basically implements a 'smart way' to decide which of the 'longest a' groups is the best one. It depends on the data that comes before that group and it is possible to greedily decide which one is the best by scanning the string from the back.

- Jurko Gospodnetić November 25, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Check out the test cases in my answer and you'll see failing ones with your logic - those that have multiple 'largest sequences of a'.

The main problem is in deciding which 'longest sequence' to use. :-)

- Jurko Gospodnetić November 23, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

You could also tweak the implementation to make it a bit faster on the average by using sentinel elements instead of constantly explicitly checking start/stop array boundaries. But that could require extra O(N) space if adding the extra sentinel values causes the start/stop array data storage to be reallocated.

- Jurko Gospodnetić November 16, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

input2 can be easily supported by tracking how many medals each start/stop adds/removes to the current medal count and not just add +-1 to the current medal count when crossing an iteration's start/stop boundary.

You can do this by using extra O(N) space for storing this information or, if you are allowed to modify the original input data, you could even keep the O(1) space complexity by encoding the count inside the sorted start/stop arrays, e.g. start/stop[i] + 1000000 * count[i]; since 1 <= start/stop[i] <= 10^6 and 1 <= count[i] <= 100 this would fit inside a 32-bit integer.

- Jurko Gospodnetić November 16, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Solution in Python with a bit adjusted interface:

input1 = size of start & stop
input2 = not used, assumes all elements are 1
input3 = start
input4 = stop
input5 = T

time complexity: O(N * log(N)) where N is the size of the start/stop array
space complexity: O(1)

def solution(start, stop, T):
    N = len(start)
    assert N == len(stop)
    start.sort()
    stop.sort()
    start_i = 0
    stop_i = 0
    sum = 0
    i = 1  # next officer to process (indices are 1-based)
    while sum <= T:
        # count how many medals the next officer has (+1 for each opened range;
        # -1 for each closed range)
        while start_i < N and start[start_i] <= i:
            start_i += 1
        while stop_i < N and stop[stop_i] < i:
            stop_i += 1
        medal_count = start_i - stop_i

        # find the next officer that might have a different medal count (the
        # first in or the first after some range); if none may change it, we
        # have passed all the ranges, no further officers have any medals and
        # we will never cross the total awarded medal count threshold
        if stop_i == N:
            return -1
        next_i = stop[stop_i] + 1  # the first after some range
        if start_i < N:
            next_i = min(start[start_i], next_i)  # or the first in some range

        # skip the least number of officers needed to either cross the total
        # awarded medal count threshold or we reach the next officer that might
        # have a different medal count
        officer_count = next_i - i
        if medal_count:
            officer_count = min(officer_count, (T - sum) // medal_count + 1)
        i += officer_count
        sum += medal_count * officer_count
    return i - 1  # last processed officer


import pytest

@pytest.mark.parametrize("start, stop, T, expected", (
    # index: 1 2 3 4 ...
    # count: 0 0 0 0 ...
    # sum  : 0 0 0 0 ...
    ([], [], 0, -1),
    ([], [], 1, -1),

    # index: 1 2 3 4 ...
    # count: 1 0 0 0 ...
    # sum  : 1 1 1 1 ...
    ([1], [1], 0, 1),
    ([1], [1], 1, -1),

    # index: 1 2 3 4 ...
    # count: 2 0 0 0 ...
    # sum  : 2 2 2 2 ...
    ([1, 1], [1, 1], 0, 1),
    ([1, 1], [1, 1], 1, 1),
    ([1, 1], [1, 1], 2, -1),
    ([1, 1], [1, 1], 3, -1),

    # index: 1 2 3 4 ...
    # count: 0 1 1 0 ...
    # sum  : 0 1 2 2 ...
    ([2], [3], 0, 2),
    ([2], [3], 1, 3),
    ([2], [3], 2, -1),
    ([2], [3], 3, -1),

    # index: 1 2 3 4 5 ...
    # count: 0 1 0 1 0 ...
    # sum  : 0 1 1 2 2 ...
    ([2, 4], [2, 4], 0, 2),
    ([2, 4], [2, 4], 1, 4),
    ([2, 4], [2, 4], 2, -1),
    ([2, 4], [2, 4], 3, -1),

    # index: 1 2 3 4 5 6 7 ...
    # count: 0 1 1 0 1 1 0 ...
    # sum  : 0 1 2 2 3 4 4 ...
    ([2, 5], [3, 6], 0, 2),
    ([2, 5], [3, 6], 1, 3),
    ([2, 5], [3, 6], 2, 5),
    ([2, 5], [3, 6], 3, 6),
    ([2, 5], [3, 6], 4, -1),
    ([2, 5], [3, 6], 4, -1),

    #index: 1  2  3  4  5  6  7  8 ...
    #count: 0  1  2  3  3  2  1  0 ...
    #sum  : 0  1  3  6  9 11 12 12 ...
    ([2, 3, 4], [7, 6, 5], 0, 2),
    ([2, 3, 4], [7, 6, 5], 1, 3),
    ([2, 3, 4], [7, 6, 5], 2, 3),
    ([2, 3, 4], [7, 6, 5], 3, 4),
    ([2, 3, 4], [7, 6, 5], 4, 4),
    ([2, 3, 4], [7, 6, 5], 5, 4),
    ([2, 3, 4], [7, 6, 5], 6, 5),
    ([2, 3, 4], [7, 6, 5], 7, 5),
    ([2, 3, 4], [7, 6, 5], 8, 5),
    ([2, 3, 4], [7, 6, 5], 9, 6),
    ([2, 3, 4], [7, 6, 5], 10, 6),
    ([2, 3, 4], [7, 6, 5], 11, 7),
    ([2, 3, 4], [7, 6, 5], 12, -1),
    ([2, 3, 4], [7, 6, 5], 13, -1),

    #index: 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 ...
    #count: 0  0  1  1  2  2  3  3  3  3  2  2  1  1  0 ...
    #sum  : 0  0  1  2  4  6  9 12 15 18 20 22 23 24 24 ...
    ([3, 5, 7], [14, 12, 10], 0, 3),
    ([3, 5, 7], [14, 12, 10], 1, 4),
    ([3, 5, 7], [14, 12, 10], 2, 5),
    ([3, 5, 7], [14, 12, 10], 3, 5),
    ([3, 5, 7], [14, 12, 10], 4, 6),
    ([3, 5, 7], [14, 12, 10], 5, 6),
    ([3, 5, 7], [14, 12, 10], 6, 7),
    ([3, 5, 7], [14, 12, 10], 7, 7),
    ([3, 5, 7], [14, 12, 10], 8, 7),
    ([3, 5, 7], [14, 12, 10], 9, 8),
    ([3, 5, 7], [14, 12, 10], 10, 8),
    ([3, 5, 7], [14, 12, 10], 11, 8),
    ([3, 5, 7], [14, 12, 10], 12, 9),
    ([3, 5, 7], [14, 12, 10], 13, 9),
    ([3, 5, 7], [14, 12, 10], 14, 9),
    ([3, 5, 7], [14, 12, 10], 15, 10),
    ([3, 5, 7], [14, 12, 10], 16, 10),
    ([3, 5, 7], [14, 12, 10], 17, 10),
    ([3, 5, 7], [14, 12, 10], 18, 11),
    ([3, 5, 7], [14, 12, 10], 19, 11),
    ([3, 5, 7], [14, 12, 10], 20, 12),
    ([3, 5, 7], [14, 12, 10], 21, 12),
    ([3, 5, 7], [14, 12, 10], 22, 13),
    ([3, 5, 7], [14, 12, 10], 23, 14),
    ([3, 5, 7], [14, 12, 10], 24, -1),
    ([3, 5, 7], [14, 12, 10], 25, -1)))
def test_solution(start, stop, T, expected):
    assert solution(start, stop, T) == expected

- Jurko Gospodnetić November 16, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Time complexity: O(n)
Space complexity: O(1) not including input storage

First some reasoning to simplify the algorithm:

- if we have a word of a form 'a*b*' then the result is (0, 0)
- otherwise word is of a form 'a*b+a.*'
- let us mark the position of the first 'b' as b1 (0-base index)
- let us mark the position of the first 'a' after b1 as a2
- let (X, Y) be the best solution
- this best solution must not change the leading b1 'a' characters
  or (0, 0) would result in a word with lower lexicographic ordering
- this best solution must also change the leading 'b' character
  into 'a', or (b1, a2) would result in a word with lower lexicographic
  ordering
==> X <= b1
- let us mark the position Y - (b1 - X) as Y1 - that is the position
  that will be reversed over position b1, and so we know that
  word[Y1] == 'a'
- since we know the leading 'a' character must not be altered by
  the best solution's character reversal, we know that
  word[Y1 + 1..Y] must all be 'a'
- if word[Y1 + 1] == 'a' then (b1, Y1 + 1) is a better solution than
  (X, Y) as its reversal results in more leading 'a' characters
==> word[Y1 + 1] == 'b' or word[Y1 + 1] not defined
==>
	Y1 == Y as all characters in word[Y1 + 1..Y] must be 'a' but
	word[Y1 + 1] must be either 'b' or not defined
==> b1 == X
==>
	X == position of the first 'b' character in the string
	word[Y] == 'a'
	word[Y] is the last 'a' in its group

Now we just need to find the 'a' character, after the first 'b' character for which reading backwards from it results in a word with the minimal lexicographic ordering.

Here's how the algorithm below achieves this:

- as a helper we split up the input into 'b...ba...a' blocks which, when
  reversed, can be compared lexicographically as follows:
      block1 < block2
          <==> block1 has < #a or (== #a and > #b)
          <==> (#a in block1, -#b in block1) > (#a in block2, -#b in block2)
- that means that if each block is assigned weight (#a, -#b) blocks with
  higher weight are 'better', i.e. they come first in the lexicographic order
  when reversed
- we start with a block ending with the last 'a' character and work our way
  backwards, looking for a block with the highest weight
- if there is one such block - we are done and that is the last block that
  needs to be reversed
- if there is more than one block with matching weights then we need to
  compare blocks in front of them in order, i.e. those coming after them
  since we are moving backwards
   - if we find a difference - we discard the candidate with the less
     weighing block
   - if they match all the way to the end, we use the first encountered
     candidate
   - if we encounter three such max weight candidate blocks - the middle one
     can be safely discarded, i.e. we need to track & check at most two
     candidates at any time

Here's the actual solution code:

def scan_block(word, start, last_in_block):
    "return the block's weight and the position in front of the block"
    prev_b = word.rfind("b", start, last_in_block)
    count_a = last_in_block - prev_b
    prev_a = word.rfind("a", start, prev_b)
    if prev_a < 0:
        prev_a = start - 1
    count_b = prev_b - prev_a
    weight = count_a, -count_b
    return weight, prev_a

def solution(word):
    x = find(word, "b")  # first 'b'
    current = word.rfind("a", x + 1)  # last 'a' after the first 'b'
    candidate = []
    best_weight = 0, 0
    while current > x:
        weight, next = scan_block(word, x, current)
        if weight > best_weight:
            # new best weight block - discard any previous candidates and start
            # afresh from here
            candidate = [(current, next)]
            best_weight = weight
        else:
            if len(candidate) > 1:
                # we already have 2 candidates - see if we can discard one
                candidate0_weight, candidate0_current = scan_block(
                    word, x, candidate0_current)
                if weight > candidate0_weight:
                    candidate[:1] = []  # new candidate is better
                elif weight < candidate0_weight:
                    candidate[1:] = []  # old candidate is better
            if weight == best_weight:
                # new best weight block - use it as the second candidate,
                # possibly discarding the previous second candidate in the
                # process
                candidate[1:] = [(current, next)]
                candidate0_current = candidate[0][1]
        current = next
    return (x, candidate[0][0]) if candidate else (0, 0)

And here's some test code implemented using pytest:

import pytest

@pytest.mark.parametrize("word, expected", (
    ("a", (0, 0)),
    ("b", (0, 0)),
    ("ab", (0, 0)),
    ("ba", (0, 1)),
    ("aaaa", (0, 0)),
    ("bbbb", (0, 0)),
    ("abab", (1, 2)),
    ("abba", (1, 3)),
    ("bbaa", (0, 3)),
    ("abaa", (1, 3)),
    ("aaaabbbb", (0, 0)),
    ("babaabba", (0, 4)),
    ("aabaabaabaaba", (2, 10)),
    ("aabaabaaabaaba", (2, 8)),
    ("aabaaabaaabaaba", (2, 9)),
    ("aabaaaabaaabaaba", (2, 6)),
    ("aabaabaabbaaba", (2, 7)),
    ("aabaabbaabaaba", (2, 11)),
    ("aabbaabaabaaba", (2, 11)),
    ("aabbaabbaabaaba", (2, 12)),
    ("bbbaaabbbaabbbaaa", (0, 16)),
    ("abbbaaabbbaabbbaaa", (1, 17)),
    ("aaaabbbaaabbbaabbbaaa", (4, 20)),
    ("bbaabbabbaabbabbbaaba", (0, 10)),
    ("bbaabbabbaabbaabbabbbaaba", (0, 14)),
    ("baabbaaababbaaa", (0, 7))))
def test_solution(word, expected):
    assert solution(word) == expected

- Jurko Gospodnetić November 14, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Bah, found a test case where it fails... back to the drawing board :-(

for "baabbaaababbaaa" it should return (0, 7), but returns (0,14)

We can not detect which 'a' to use for the Y position based simply on the number of 'a' chars in its series and the number of 'b' chars in the series before :-(

[EDIT: Unfortunately I could not remove or edit the original comment as it was added as an anonymous user and is not connected to my current account, so I just give this comment a -1 and added the correct answer in a separate comment. Please give negative scores to this one so it sinks down the dung pile. :-)]

- Jurko Gospodnetić November 14, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

In your example "babaabba", applying (0,4) & (0,7) results in:

(0,4): aababbba
(0,7): abbaabab

as you can see (0,4) is the better solution of the two as its result comes lexicographically before the other (starts with 'aa', while the other starts with 'ab')

- Jurko Gospodnetić November 14, 2014 | Flag Reply

Page:

CareerCup

Jurko Gospodnetić

Books

Videos

Resume Review

Mock Interviews