## eugene.yarovoi

BAN USERHave you taken into account that a+b can overflow?

- eugene.yarovoi November 10, 2012So I suppose it would depend on what range of numbers rand() returns, and how many bits an int has (as this is also not guaranteed to be 32). When I said the probability was very near 1/2, I was assuming that rand() generates numbers uniformly at random from 0 to max value of unsigned int, inclusive.

- eugene.yarovoi November 10, 2012It depends on whether c has the potential to overflow.

- eugene.yarovoi November 10, 2012"We don't really care about the details of you 2 groups of 3, or 3 groups 2, since maybe both them work, or neither of them work."

I'd like to give you a direct counterexample to this.

Consider N= 8, K = 5, NUM = 11, and the input, after modulo by NUM, is { 1, 1, 1, 1, 4, 4, 4, 4}. So here we will have 5 groups, 8%5 = 3 of which will be size 2, and 5 - 8%5 = 2 of which will be size 1. Now, it's time to partition the input into groups. You seem to be saying that it doesn't matter whether I start placing 1s into groups of 1, or groups of 2, as it will be the same either way. I follow this advice and choose {1, 1}, {1, 1}. I now have one group of 2 and 2 groups of 1 left, so my remaining groups are {4, 4}, {4}, and {4}. I now compute 1 + 1 + 4 + 4 + 4, and it is 14. I check 14%11 == 0 -- it's false, so I return false.

Indeed, as you said, the array of size 8 is like this in terms of groups, where the number is the group index:

1 2 3 4 5 1 2 3

I had: group 1: {1, 1}

group 2: {1, 1}

group 3: {4, 4}

group 4: {4}

group5: {4}

So filling this in with the numbers from the group, I see that the groups I chose correspond to the arranged array 1 1 4 4 4 1 1 4. And true, consecutive arrays of size k=5 in this array are not divisible by NUM=11.

However, the answer of false is wrong. If I had arranged my groups containing 1s differently (by choosing 3 groups to place 1s in, namely {1, 1}, {1}, and {1}), my only choice to place the 4s would be {4, 4}, {4, 4}. I would then compute 1 + 1 + 1 + 4 + 4, which is divisible by 11! And lo' and behold the corresponding arrangement is 4 4 1 1 1 4 4 1, every subarray of size 5 of which is divisible by 11!

So here our groups were:

group 1: {4, 4}

group 2: {4, 4}

group 3: {1, 1}

group 4: {1}

group 5: {1}

As you can see, how elements are distributed within groups makes a difference in the answer.

My question now is: on what basis would your program have made the right choice between these two alternatives? It is not, as you say, that both of them are working or neither of them are working.

And though this is a very simple example because there's only 2 options here, I can show much more complex cases, which will have entire combinations of decisions to be made about how to distribute elements between groups.

I have good hopes that now that I've given such a specific example, one of two things can happen. Either:

1) If you are in fact mistaken, you can see the flaw in your approach;

or

2) If I still misunderstand, you will be able to pinpoint the exact way in which I misunderstand and say the right thing to make me understand.

A linked list that is standalone will not share its final node with any other list. So, traverse all the lists, and store all the final nodes as keys in a map, associating final nodes with (linked list head, count) pairs. In the end, iterate through all the key-value pairs, and when count == 1, include the linked list head in the output.

If there's a grand total of D elements distributed throuhout K lists, the runtime here will be O(D+K), which is optimal.

If you didn't want to use a map, you could get O(D + KlogK) with a sort.

No, because you didn't consider the possibility of overflows.

- eugene.yarovoi November 09, 2012Please justify the 3/4. I believe the probability should be ~1/2.

- eugene.yarovoi November 09, 2012Alright, I think I understand what your idea is now. I'm still skeptical, however, because I don't think you have an algorithm for efficiently partitioning the elements into these groups that you propose.

This idea seems still similar to what I was thinking about before, though framed a little differently, and I ran into the same problem. The question is this: how will you efficiently determine which numbers to put into which groups? Like for example, let's say you have 6 numbers that are 2 mod NUM. Let's say n=50 and k = 20. So we know that we will have 20 groups, of which 10 will be of size 3, and 10 will be of size 2. Now your 6 numbers that are 2 mod NUM -- how will you determine whether these are 2 groups of 3, or

3 groups of 2? That makes a difference on your final answer! You could try both possibilities, but that has to be tried in combination, or so it seems, with all other choices of this nature!

Maybe I either 1) still don't understand your approach; or 2) maybe you have some efficient way of dealing with this sort of issue?

I came up with something that sounded quite similar and dismissed it because it didn't work, but maybe your solution is different. Can you clarify point 2 in your answer? What precisely do you mean by "The sum of k groups' remainder divisible by NUM"?

Let me give you what I think is a counterexample to what you're proposing, and let you clarify why it's not if it is indeed not a counterexample. Suppose you have an array that is size 30 and consists of all 1s, and that further k=9 and num=3. So it is 1, 1, 1, ..., 1, 1. Your analysis seems to claim that there will be "b groups of a + 1 numbers and k-b groups of a numbers", where b = n%k = 30%9 = 3 and a = 30/9 = 3. So I expect to see 3 groups of 4 numbers and 6 groups of 3 numbers. However, this is blatantly false for a case like this, because there's only one group of numbers after modulo.

It looks to me that you didn't consider that multiple groups may all contribute to the same remainder. Like here, all your 3 groups of 4 and 6 groups of 3 are contributing to a single remainder, 1. So then you have to ask the question as to how, given just a count of remainders, you'd see if they can be partitioned into these groups.

What exactly is the question? Is it that you're given a large number of queries [i...j], and so you need to preprocess the input in some way so that you can answer queries efficiently? Or do you need to answer only one query per input? You should really specify these sorts of things.

- eugene.yarovoi November 08, 2012@tera_baap: well no, of course not. I was just answering that one specific point someone else made, not addressing the whole discussion

- eugene.yarovoi November 08, 2012If a language wanted to support that...yes, that would be how.

- eugene.yarovoi November 08, 2012The mention of a hashmap was just for conceptual explanation -- some SQL implementations use B-trees and similar data structures since those are much more space-efficient.

- eugene.yarovoi November 08, 2012Worth pointing out that we need to add a check that n!= 0 too, as this condition evaluates to true for n=0. If this function was only for positive integers, then it's entirely correct as is.

- eugene.yarovoi November 08, 2012@Last Anonymous: No, you can't use references in STL containers. See this stack overflow topic: stackoverflow .com/questions/2934021/stl-map-containing-references-does-not-compile

- eugene.yarovoi November 08, 2012Unfortunately it's not that simple. Let's say you have 8 of a certain element after modulo, and n/k is 2 and n%k is nonzero. It could be that there are 4 of these elements in each chunk of size k, or it could be that there are 3 and the first n%k elements contain the remaining 2. There thus become lots of different possibilities and combinations of possibilities, so it's not as simple as you would have it.

- eugene.yarovoi November 08, 2012@buffalo: Consider the input 1 1 1 1 1... 1 1 (all ones for n=150 elements let's say), k=5 and and Num=100. The answer is no, but you would conclude the answer's yes simply because 150 1s can be partitioned evenly into 150/5 sections. There are other flaws too. I think this approach needs major revisions to work and is fundamentally broken as it stands.

- eugene.yarovoi November 08, 2012You upvoted yourself.

- eugene.yarovoi November 07, 2012There are many possible solutions here. Supposing that we're only given a small amount of time, I would narrow it down to three candidate ideas.

1) Cookies

---------------

In other words, we store the data on the user's computer. This is the easiest solution and the cheapest for the website owner.

Some websites don't have logins and only track users through the fact that they use the same computer as they shop. Such websites use this solution by necessity. Even when you provide users with accounts that maintain cart information between sessions on different computers, you will probably want a cookie-based solution in addition to one of the other solutions below, because you wouldn't want people to be unable to add things to a cart until they log in. That's bad for business.

2) SQL Databases.

--------------------------

The advantages of databases is that they are well-understood by developers, offer powerful data retrieval capabilities, and offer many strong guarantees pertaining to the data.

At this scale you probably need multiple database servers. You could have some sort of rule-based system to determine which users reside on which server. For example, you could hash the username to obtain a number i from 0 to N-1, where you have N servers, and store the data for that username on server i. There are of course several issues to think through here: when you need more servers, user data will need to be moved between servers so that the new servers can have users for which they're responsible. A typical hash approach will mean that the server number for almost every username will change when you change the total number of servers. This means you will need to move most of the data between servers, probably meaning substantial downtime for most users as you copy terabytes of data.

You could use ideas such as "consistent hashing" to be able to move only small amounts of data at a time, or you could design a rule system for determining the server that doesn't use a typical hash, but (as a dumb example) is based on last name, for example. A goes to 0, B goes to 1...Then if you need to, you can change "A goes to 0" to "AA-AG goes to 0, AH-AZ goes to 232", and move over the small amount of data you need for that change, and do the changes in small amounts incrementally. It's important to think about how you will maintain and further scale these types of large systems.

3) Distributed Hash Tables

------------------------------------

A general distributed key-value store (usernames will be the key, values will be all relevant user data). Look this topic up if you're interested in learning more about it. Issues here include the strategy for distributing data among nodes, whether there will be data replication to ensure smooth failover, the performance of this system, and more.

Define what it means for a 2D array to be "sorted".

- eugene.yarovoi November 07, 2012Scofield: I'm not sure what you're referring to on that page. Are you arguing for or against duplicates? If against, I've given a rather ironclad argument above for why we can rest assured that we can make a design that accommodates duplicates.

- eugene.yarovoi November 07, 2012Can you explain how the answer is 8 for n = 3?

- eugene.yarovoi November 05, 2012Why couldn't a BST have duplicates? Even if your existing BST design did not allow for them, you could easily accommodate duplicates simply by giving each node an integer that would store a count.

- eugene.yarovoi November 05, 2012@CST: I'm not sure I understand your question. If you want to know how I'd do it up to a certain precision, the answer is that the precision setting would determine how many iterations of the binary search I'd do.

- eugene.yarovoi November 04, 2012@sonesh: The problem does not require O(n^2) time. This solution is correct.

Looking at the constraints, O(n^2) would potentially time out, as there are up to 100 test cases. You have to read 100 (number of test cases) * 1000 (size of each test case) lines from input...that's why they didn't go for bigger inputs here.

It seems that you have some questions about how this works, but I can't understand what specifically you want to know. Can you formulate your question more clearly?

The answer is m + n - gcd(m, n).

The GCD of two numbers can be computed efficiently using Euclid's Algorithm (Wikipedia it). This allows you to give a blazing fast algorithm here that can give the answer for any m, n in just a couple of modulo and other arithmetic operations.

Explanation: as you traverse from (0,0) to (m, n), you will cross m-1 horizontal + n-1 vertical cell boundaries. Each boundary you cross will split a cell in two, unless you cross into the new cell through a cell corner, in which case you'll only split one cell in two while crossing both a horizontal and vertical boundary simultaneously. Reaching the end splits one additional cell. Therefore, the answer is (m-1) + (n-1) + 1 - number of times the diagonal line segment passes through a point with integer (x, y) inside the rectangle (not including (0,0) and (m, n)). It takes a little bit of not-too-difficult reasoning about prime factors and divisibility to see that the latter quantity is gcd(m, n) - 1. This results in the answer simplifying to m + n - gcd(m, n).

Examples: in a square, you will split exactly n cells. This is because gcd(n, n) = n and n + n - n = n.

In a situation where m, n are coprime, the diagonal will not pass through any grid points except the two corners. gcd (m, n) = 1 and the answer will be m + n - 1. So for example, if m = 6 and n = 5, the answer is 6 + 5 - 1 = 10

@tej: We can argue about definitions, but it'll take 2N linked list pointer accesses. That's all I meant.

- eugene.yarovoi November 02, 2012Whether it's a sub-problem or a super-problem is a matter of perspective, I suppose. What counts is that if you draw a graph showing all the function calls, that graph is a DAG.

- eugene.yarovoi November 01, 2012Nope. I already had a testing framework for this, so I converted your code to Java (the test was in Java) and ran it. You can see some failed test cases here: ideone. com/4e7tLZ

- eugene.yarovoi November 01, 2012I don't see why the graph would need to be connected. Consider k=10 and 1 9 2 8 11 19. We can divide this into (1, 9), (11, 19), and (2, 8). However 2 and 8 is not connected to 1 or 9 or 11 or 19 in your approach.

- eugene.yarovoi November 01, 2012None of the above. Testing is hard.

You could have tried every path, but not with every set of input values.

@code_guru: No, there can't, by construction. The left side is defined as the teams that lost to the pivot team, and the right side is defined as the teams that won against the pivot team.

- eugene.yarovoi November 01, 2012@Blizzard: the expression you give is correct...I lost a term when I was doing the math for that particular example. I've corrected the example above to match what you gave. The algorithm was and remains correct.

- eugene.yarovoi November 01, 2012The JVM doesn't decide that, AFAIK. Your program just hangs if a deadlock occurs.

- eugene.yarovoi October 31, 2012@gnahzy: yes, I'm afraid it's wrong. For aabbaabbaabb, every subarray is valid, not just those that consist of first a's and then b's or b's and then a's. aabbbaab, for instance, is supposed to be counted too. You can see how I'm getting 78. For the first letter, there are 12 substrings starting with it, 11 starting with the second letter, and so on. 12 + 11 + 10 + ... + 1 = 12*(12+1)/2 = 78.

- eugene.yarovoi October 31, 2012Anonymous, you said: "(1) For an string like "caaabbbaaac", your method is to count(caaa) + count (aaabbbaaa) + count(aaac) - count(c) - count(aaa) - count(bbb) - count(aaa) - count (c), which is incorrect."

That's not the right assessment of what this method computes. This method would compute count (aaabbbaaa) [a&b pass] + count(caaa) + count(aaac) [a&c pass] + count(c) + count(bbb) + count(c) [b&c pass] - count(c) - count(aaa) - count(bbb) - count(aaa) - count (c) [single letter sequence subtraction pass]. Canceling out like terms, this is count (aaabbbaaa) + count(caaa) + count(aaac) - count(aaa) - count (aaa).

This almost coincides with what you stated the correct answer should be, except that for some reason you additionally want to subtract out count(c) twice -- I'm not quite sure why. I think this is related to your second point, but I don't see the logic of it. I mean, for a case like "cac" we definitely want the answer to be 6, right?

I think you misunderstand my method. You *do* need to subtract bbb in the example you gave, because it will be counted by both the b&c pass and the a&b pass.

I'm going to provide some code and you can test this to your satisfaction.

Did you read the entire part of my solution that deals with how I avoid this double counting (I subtract out a bunch of stuff to account for this)? Or did you read it and still have doubts about its correctness?

I edited the last sentence or two of the second paragraph to hopefully clarify my solution. I've now stated things in somewhat more precise terms.

Incorrect. Try aabbaabbaabb. The answer should be 78, your code gives 38.

- eugene.yarovoi October 31, 2012Number of sub-arrays having at most 2 different letters = number of arrays having nothing but a's and/or c's + number of arrays having nothing but b's and/or c's + number of arrays having nothing but a's and/or b's - number of arrays having only a's - number of arrays having only b's - number of arrays having only c's.

Why is that true? It's because we know arrays that have a and c are disjoint from those that have b and c and so on, if indeed they have both letters. But, some of the arrays we count in the "having nothing but a's and/or c's" category just have a's and do not have c's, for example. These are double-counted when we count all arrays consisting of nothing but a's and/or b's, in this example. Following similar reasoning for the other letters, we see that every subarray composed only of repetitions of the same letter is double-counted. We thus need to subtract the number of subarrays consisting of repetitions of a single letter, hence the formula above.

So how do we use this to make an O(n) algorithm? We scan the array for runs consisting of only a's and b's, and we note the length of such runs. A run of size k contributes k(k+1)/2 valid sub-arrays to the total. (We don't need to inspect all of them, we just increment a counter to know k and then calculate k(k+1)/2 once we hit the third letter.) We evaluate the sum of the number of subarrays for a&b, a&c, b&c, a alone, b alone, and c alone, and then apply the formula at the top of this answer. The runs of single letters can all be obtained in one pass, so we'll make a total of 4 passes over the data.

"the substrings are either formed by the characters within same group or adjacent groups"

Not true. Consider something like ababaaaabba.

This doesn't seem like it could be correct, therefore.

Well, I'd just store them as a cumulative total, like pop[0], pop[0]+pop[1], pop[0]+pop[1]+pop[2] , ...

Then after choosing an index I'd binary search. Technically this may be O(logN) for a lookup, but the number of people in a country is potentially large, so I would prioritize avoiding storing 1 entry per person.

Your worst case can't be what you claim it is. What if the longest subarray is much smaller than the total size of the array? You have to look at all, or at least most of, the elements. Your complexity has to be at least O(input size).

- eugene.yarovoi October 30, 2012You need some kind of memoization here to prevent the complexity from being super high.

- eugene.yarovoi October 29, 2012Let a = x2 - x1 and b = y2 - y1

The answer is then C(a+b, b) (or C(a+b, a) -- these are equal) where C is the binomial coefficient function (sometimes read as "a+b choose a"). The reason is that you need to take a+b steps to reach your goal, and any b of those can be upward steps (or alternatively, any a of those can be to the right). The answer is therefore the number of different ways to pick b items from a sequence of length a+b.

Using a formula for the binomial coefficient, C(a+b, a) = (a+b)! / (a! * b!)

Why use threads to do a task that's inherently sequential?

- eugene.yarovoi October 29, 2012Hashmaps store things by key, so what you're asking for doesn't make much sense. Every stored value must have an associated key.

- eugene.yarovoi October 29, 2012-1: The sum of two arrays could be the same without the arrays having the same elements. Consider {3, 5} and {4, 4}

- eugene.yarovoi October 27, 2012Consider a test input like 6 1 -7. Here the optimal choice is to take the whole array, at which point you get 0. You will instead first consider 6, then consider a choice of 6 1, 6, and 1 (and you'll pick 1), and then you'll consider a choice of 1, -7, and 1-7 (and you'll pick 1).So you'll end up picking just the subarray that includes the single number 1, which is the wrong answer.

- eugene.yarovoi October 27, 2012

Rep**Gayle L McDowell**, CEO at CareerCupGayle Laakmann McDowell is the founder / CEO of CareerCup, which provides programming interview prep for candidates interviewing with Microsoft, Google ...

Rep**Loler**, Employee at Google

Rep

**CareerCup**is the world's biggest and best source for software engineering interview preparation. See all our resources.

That depends on your machine. See some of the comments below.

- eugene.yarovoi November 10, 2012