## eugene.yarovoi

BAN USERThat doesn't quite qualify as a non-instantiable (abstract) class, though. It's possible that this is what the interviewer wanted, but it's worth noting that child classes will be able to construct instances of the base class in contexts other than the child class constructors. For example, it would be possible to make a child class have a method that does something like "return new Base()", without any sort of enforcement of the idea that Base should never be instantiated directly.

I don't really have a better approach, in any case.

Really not sure what you're asking. Can you clarify the question?

- eugene.yarovoi November 27, 2012@Artemis: we can in fact show that this is so. Consider a heap whose bottom most layer is full (about n/2 elements there). Suppose that all the elements in all levels except the bottom level are larger than the number we're looking for. Then, the element we're looking for can be anywhere in the bottom-most level. We have to search in the bottom-most level, but checking any position in the bottom-most level reveals no information about any other element in the bottom-most level. In other words, if there's any position in the bottom-most level that you don't check in this particular situation, the element you're looking for could be hiding there. So in the worst case here, you must check every one of n/2 positions in the bottom-most level, which takes O(n) time.

- eugene.yarovoi November 27, 2012@Shivku: that would work, except then you're using more than just a heap.

- eugene.yarovoi November 27, 2012Strings in Java are immutable objects. This means that they can't be modified once created, and so the only way for their contents to be removed from memory is when they are garbage collected, at which point the memory once used for them will be freed and can eventually be overwritten with other data.

The problem with garbage collection is that it doesn't happen at any kind of guaranteed interval. The strings may persist in memory for a long time, and if a process crashes during this time, the contents of the string may end up in a memory dump or the like. With a character array, you can read the password, finish working with it as soon as you can, and then immediately zero out the array, for example via Arrays.fill (passwordArray, (char)0).

Even this isn't fully secure, because it's just reducing the window of opportunity for someone's password to show up in a memory dump somewhere, not eliminating the core problem. You should try to always minimize the need to store any passwords at all and try to store cryptographically strong hashes of passwords instead when possible.

@lfenjoy9: you've described the normal heapify process. You haven't discussed the key aspect here of how you would ensure the element is not already in the heap.

- eugene.yarovoi November 27, 2012You need some sort of supplementary data structure. It can't be done if all you have is a heap.

- eugene.yarovoi November 26, 2012@but: No, if you did that, "xyz" would be placed on the stack.

- eugene.yarovoi November 26, 2012@ dr.house: I'm not claiming any sublinear algorithms.

My response was a response to the claim that order statistics will take O(nlogn). I'm saying no, we can do order statistics in O(n) with the right algorithm.

I do agree that 3-buffer will be a good solution. If we only need the third largest, using quickselect or the like is overkill unless we want our algorithm to be able to generalize efficiently to finding other order statistics.

I haven't looked at the code, but from your description, I see what approach you're trying to take. I just wanted to let you know that the approach is O(n^2), not O(n).

- eugene.yarovoi November 23, 2012That's right. There are algorithms to find a number with any desired rank in O(n), but it's unlikely that such methods would outperform the naive buffer approach for the case of finding the 3-rd largest element.

- eugene.yarovoi November 22, 2012No, there are order statistics algorithms for finding a number with any chosen rank in O(n).

- eugene.yarovoi November 22, 2012You'd need another question to actually determine who's guarding the gate to heaven. You've assumed that the one guarding heaven is the truthteller, but there was no such constraint.

- eugene.yarovoi November 22, 2012You're completely missing the point. Yes, this works for generating triplets. What it doesn't do is provide any particularly efficient way to find all triplets present within a given list of numbers.

- eugene.yarovoi November 22, 2012Not correct, I'm afraid. See the test case [9, 36, 2, 12, 16, 39, 8, 20, 6, 10, 15] here: ideone. com/0lJ8M3

I expect to find [12 16 20] and [15 36 39], but no such luck.

@Saurabh: I don't see how that makes sense. Consider the fact that 32! / 32^ 2 is greater than 1.

- eugene.yarovoi November 22, 2012@Trickster: yours is incorrect for y = 0. However, aside from that case, I actually prefer your solution because it's safer against overflows.

- eugene.yarovoi November 22, 2012This doesn't answer how you intend to check for Pythagorean triplets within a specified list of numbers, like in this question.

- eugene.yarovoi November 22, 2012Why would the first two numbers of the triplet be next to each other in the input? There's no such restriction.

I doubt you can beat O(n^2) for this problem.

@sonesh: That's not a correct calculation of the probabilities.

- eugene.yarovoi November 22, 2012@last Anonymous: the definition of "constant time" is O(1).

- eugene.yarovoi November 21, 2012@Anonymous: not sure what I would be proving. What symbols we use for a number system is arbitrary. Just imagine that 5 stands for 4, 6 stands for 5, ... , 9 stands for 8. We have 9 symbols in what is a fixed-base number system, so it's a base-9 system. We're given a value in this base-9 number system and asked to represent, in base 10, this number's rank among the positive integers. The rank of an integer X among the integers is X, and so we're being asked to represent the base-9 value in base-10.

- eugene.yarovoi November 19, 2012This post describes the reverse of the process the question asks for. What the question asks for is analogous. You need to do the reverse of the digit converting process described here, and then convert from base 9 to base 10 (whereas this solution does the opposite process of converting from base 10 to base 9 and then digit conversion).

It's the right idea. It's backwards just because the author probably didn't pay attention to which direction they were asked to go in.

50 -> 40 by reverse digit conversion

40 in base 9 -> 36 in base 10

@Karthik: I assume we already have a random number generator that generates random numbers between 0 and some number X (such as the C rand() function, for which X is RAND_MAX).

If you want to know how, given a random number generator that generates numbers uniformly at random between 0 and some number X, we can generate numbers uniformly at random between 0 and a given number Y, see this CareerCup question I answered a while back: www .careercup. com/question?id=12426697. It contains a special case of what can be made into a more general algorithm.

I think talking about this issue is probably out-of-scope for this question though. Many programming languages already provide convenience functions to do this sort of thing, implemented similarly to how I solve the problem in the link I gave.

@akashi: This algorithm isn't meant to have an equal chance of picking each distinct word, only each word. You won't be able to do it without O(n) space if you need an equal chance for each distinct word. I see no reason to assume that interpretation of the question, however.

@Karthik Vvs: what do you mean by "output it in programming?"

@Karthik: Are you asking for an algorithm to give you a bit whose value is 1 with probability 1/k and 0 otherwise?

- eugene.yarovoi November 17, 2012Andi, I think you misunderstand the question. I believe the question is to choose a single word uniformly at random from a file.

- eugene.yarovoi November 17, 2012@Barney: Sure, you can use a technique known as "Reservoir Sampling". You can read about it in detail on Wikipedia.

The basic approach is that at any point you have a current random choice (which in the beginning is initialized to the first word), and as you see each additional word, you replace the current random choice with the new word you just saw with 1/k probability, if k is the number of words you've seen so far.

After reading the first word, the random choice is the first word with 100% probability

After reading the second word, make it the new random choice with 50% probability. So now the probability distribution is 50% first word, 50% second word.

After reading the third word, make it the new random choice with 1/3 probability. This means the two previous words have 1 - 1/3 = 2/3 probability of remaining selected if they were selected already, which is true with a probability of 1/2 for each word, so the probability is now (2/3)*(1/2) = 1/3 for each of the first two words as well. So every word has probability 1/3 of being chosen.

The statement that after k words, every word will have a 1/k chance of being the selected random word can be formally proven by induction. k=1 is the trivial base case and is true by construction.

For k >= 2, we opt to select the k-th word with probability 1/k, and by induction, every previous word was selected with probability 1/(k-1). Any previously selected word now has probability 1 - 1/k = (k-1)/k probability of remaining selected, and had 1/(k-1) probability of being selected in the first place, so it now has 1/k probability of being the random word. So all words now have probability 1/k of being the random word.

Barney, I think you're focusing on the wrong issue here. rand() % n is very close to being distributed uniformly at random for small n. If you need the probability to be exactly equal, there are algorithms for selecting a number uniformly at random between 0 and X given a generator for generating numbers uniformly at random between 0 and Y. I think the point of the problem was to focus on the aspects of the problem mentioned above.

- eugene.yarovoi November 16, 2012Can you define "the union of two sorted arrays (with duplicates)"? I'm interested in what that means for duplicates.

- eugene.yarovoi November 16, 2012I like this presentation of it, since it makes the mathematical properties of this problem clear.

I would add that most languages have functions that can turn an int into a string for you using a specified base. So you could write this program with very little code.

That seems to be the right summary of what we're seeing. 2147483647 is the max value for a *signed* int.

- eugene.yarovoi November 13, 2012I think the number of partitions of a number is at least exponential in the size of the number. An O(n^2) runtime won't be possible, since you have an exponential amount of printouts to do.

- eugene.yarovoi November 13, 2012Segment trees seem like overkill, but how would you do it using segment trees? What would your segments be?

- eugene.yarovoi November 12, 2012"Please prove my code wrong because i haven't used either min heap or "

The burden of proof is on you to prove the correctness of your code, not on other people to prove it wrong.

@nanny: So you can only buy one share and then you have to sell it before buying the next one, and then you have a limit of N such transactions?

- eugene.yarovoi November 12, 2012That doesn't mean we can't use a hash table -- just means we have to make a good hash function.

"However, if you use something like B+/AVL/RB Tree instead of a linked list, then hashtable might work, however this is assuming that your hash function is uniform, otherwise, you wont get any improvement."

Really not sure I'm following you there. Are you saying that we should use B+ trees as a *collision resolution mechanism*?

No, no, it's a bitmap. It's just that you propose creating a bitmap only for the first 1 billion values. It looks like DarkKnight was proposing creating a bitmap for all possible values (2^32 for 32-bit ints). Then his technique would not be extensible to 64-bit ints.

- eugene.yarovoi November 12, 2012If the numbers are 32-bit numbers, since there are 2^32 possible 32-bit numbers, he would allocate 2^32 bits, regardless of the input size.

- eugene.yarovoi November 11, 2012@Nanny: As you were writing your response, I realized what you were talking about and edited my response above before I saw your reply. See my changes.

Your algo makes sense. DarkKnight was almost certainly proposing a different solution, as evidenced by:

"There are 2^32 possible values ( about 4 Billion )

Allocate a single bit for each value ( This would required 512 MB of memory )"

He is proposing allocating a bit for every possible value of a number and doesn't incorporate the pigeonhole principle.

@nanny: I see what you're saying. You're saying that by pigeonhole principle, we will conclude that if there's a billion numbers, at least one number between 1 and 1 billion + 1, inclusive, is unused. We will then make a bitmap with 1 billion + 1 bits and mark off occurrences of only numbers from 1 to 1 billion + 1. We will be able to do this with 128 MB. If the inputs were 64-bit integers, this would take no more memory, because we still only care about integers in the 1 to 1 billion + 1 range.

That's a decent solution. It's a different solution from what DarkKnight was referring to, though.

@Algos: No one said anything about quicksort, first of all. External sort is usually done via a mergesort phase followed by an N-way merge algorithm (which is either done by more mergesorts or by something that looks like a modified heapsort).

Second of all, there's no reason why quicksort couldn't exploit locality of reference, as it does operate on largely contiguous data. It arranges the array from the front and the back at the same time, but that's only 2 locations and writes are contiguous in both those places, so with a proper buffering strategy, things would be fine.

Third, you're right that binary search is not always a great algorithm to use on files, but we only need to use it once, so the cost of it relative to the cost of sorting is minimal.

@Miro: The output is written to a new file. Nothing is overwritten.

- eugene.yarovoi November 11, 2012and why does that mean you can't use a hash table?

- eugene.yarovoi November 11, 2012The question asks you to do it without "branching". Try-catch blocks definitely qualify as "branching", even if they don't contain if-else statements (internally exceptions use a mechanism similar to if statements).

- eugene.yarovoi November 11, 2012@nanny: I doubt that you and DarkKnight are talking about the same thing. Do elaborate on your idea though.

- eugene.yarovoi November 11, 2012That's all I can think of too. We have to use DP here, and the running time will be reasonable if NUM is small. We should be able to get something like O(NUM*k*n), which is bounded by O(NUM*n^2). So if NUM is small, we have a solution that is possibly very reasonable. If k and NUM are both small constants, then we even have a linear-time algo.

- eugene.yarovoi November 10, 2012I see. So they really did want a highly optimized solution. O(N) preprocessing and O(logN) query time should meet this easily.

- eugene.yarovoi November 10, 2012Kind of a cool problem. You can get O(N) preprocessing, O(logN) queries. Don't know if we can do even better. What were the constraints?

- eugene.yarovoi November 10, 2012

Rep**Gayle L McDowell**, CEO at CareerCupGayle Laakmann McDowell is the founder / CEO of CareerCup, which provides programming interview prep for candidates interviewing with Microsoft, Google ...

Rep**Loler**, Employee at Google

Rep

**CareerCup**is the world's biggest and best source for software engineering interview preparation. See all our resources.

I'm not saying you can't get an algorithm with O(n^2*m) complexity, but this solution isn't quite right. You can't just calculate the max 1D solution for each row and then try every contiguous block of rows, if I'm understanding your idea correctly. That's because the parts of the rows you select also need to line up and form a rectangle.

- eugene.yarovoi November 28, 2012