eugene.yarovoi
BAN USER
If you're going to go down this route, here's an idea that will guarantee that toggle and update both run in no worse than O(sqrt N).
Use your existing idea to keep a list of all toggles made. When k, the size of the list, exceeds sqrt(N), apply all the toggles in a batch all at once to the array -- this can be done in O(N) time for all the toggles as described below. Since the list will not exceed sqrt(N) in size, the runtime of isOn is capped at O(sqrt(N)).
We analyze the runtime of toggle as follows: the act of recording a new toggle interval is O(1). As for applying the toggles to the array, we only do this every sqrt(N) toggles, and it takes O(N) time, giving an *amortized* time complexity of O(sqrt(N)) for toggle.
I now describe how to apply k toggles in a batch in O(N + K) time. Since we'll be applying sqrt(N) toggles, our batch-toggle method will be O(N) per every O(sqrt(N)) toggles.
Create a new list of size 2k of integers, and for each interval, put 2 values into a list: the start index and the end index of the interval. Then, sort the list using counting sort (the values are all in the range 0 to N, so this is linear time with counting sort).
Now, traverse the sorted list to determine which elements in the original array of size N were toggled. Suppose our sorted index list has elements a0, a1, a2, ... All the elements in the original array of size N having index between 0 and a0 should not be flipped, all the ones with index between a0 and a1 should be, the ones between index a1 and a2 should not be, the ones between index a2 and a3 should be, etc.
Imagine the 4*n grid positioned with the 4 units going horizontally and n units going vertically. Without loss of generality, we start filling from the top left, since we will have to fill that space eventually anyway and the order of placing blocks doesn't matter. We have a choice: we can place a wall vertically from the top left corner (4 down, 1 across) or horizontally (4 across, 1 down). If we place a block vertically, the rest of the 3 horizontal spaces will need to be filled in the same way, vertically, since no blocks positioned horizontally could ever fit there. After that's done, there will be a 4*(n-4) grid left to fill. You also have the option of placing the 4 side horizontally and the 1 side vertically, leaving behind a 4*(n-1) grid.
All of this means that if f(n) is the number of ways to arrange a 4*n grid, then f(n) = f(n-4) + f(n-1), using f(0) = 1 and f(any negative number) = 0 as base cases. You can then compute f(n) using a similar approach to how Fibonacci numbers are computed. You have the option between 3 standard solutions that are used for Fibonacci:
1) The naive O(n) loop that calculates f(1), then f(2), then f(3), and so on. Since f(n) depends only on smaller values of the function, you will always have all the values you need.
2) A recursive solution that caches previously computed values -- the caching (essentially, dynamic programming) is necessary to avoid recomputing a lot of cases, just like in Fibonacci.
3) A solution based on representing the linear recurrence as a matrix. This allows the answer to be computed in O(log n) time. The linear recurrence is f(n) = f(n-4) + f(n-1) instead of the f(n) = f(n-2) + f(n-1) seen with Fibonacci, but other than adjusting the matrix, this is the same solution as finding the n-th Fibonacci number in log time.
It's because the crow doesn't know which pot is which. So, the crow may get lucky and try the 5 pot first, but in the worst case he may get unlucky and try the 58 pot first. He would only know it's the 58 pot after depositing 5 stones in it and seeing that it has not overflowed. The problem asks about the minimum number of stones in the worst case. It's a much more difficult problem to determine the behavior of the crow to minimize the expected number of stones needed.
- eugene.yarovoi May 13, 2015Your solution, from what I was able to understand of it, is correct only in the case where you are allowed to partially attempt a problem and receive partial credit for problems based on the fraction of the problem that was completed. If you can only receive credit for a problem once it's fully completed, the greedy algorithm is not correct.
Consider the possibility of having three tasks specified as (points, time): (50, 50), (45, 30), (45, 30). Then suppose you have 50 minutes total. While the second and third tasks offer you 1.5 points / minute vs. the first task's 1 point / minute, it's better to take the first task, since you'll be able to use your entire time scoring points that way.
The solution in this case would need to use the normal methods for solving the knapsack problem.
My suggestion is to not tie how you see yourself as a developer to any one technology. Don't be a Python developer OR a java developer. Be a developer. Focus on building the fundamentals and intuition needed to pick up any language quickly. Know the basics of a couple languages -- a good starting set might be Java, Python, and C.
To be good enough to work at Google, you should be good enough to learn any language quickly.
public static bool ConvertToNumber(string str)
First, the name should be something like IsConvertibleToNumber, since the number isn't returned. Either that or the number should be returned, whichever is more useful. But if we want to return the number plus a boolean saying whether the conversion was successful, there are already standard library methods for this, such as Int16.TryParse.
Furthermore, we might be inclined to question whether this method is supposed to handle things that are not Int16. If yes, we're missing a lot of logic. If not, the method should be named IsConvertibleToInt16.
bool canConvert = false;
try
{
int n = Int16.Parse(str);
if (n != 0)
{
canConvert = true;
}
}
catch (Exception ex)
{
}
Try-catch should not be used for ordinary control flow. Testing for n != 0 doesn't make sense because the parsed number could be a zero.
bool retval = false;
if (canConvert == true)
{
retval = true;
}
return retval;
Everything here can just be replaced with
return canConvert;
The best way to write this method would have been as a thin wrapper around TryParse. In that case, the method saves you from having to declare a temporary variable.
public static bool IsConvertibleToInt16(string str)
{
Int16 val;
return Int16.TryParse(str, out val);
}
The problem statement only allowed for positive numbers, so this approach is correct. If negative numbers were also allowed, here's a different approach that works for all numbers.
Build a cumulative array where the i-th element contains the sum of the first i numbers. Then, scan through the cumulative array, and for each value v, check whether v-T has been previously seen in the cumulative array (to do this efficiently, use a hash table and add the elements of the cumulative array to it as you go).
Consider trying some questions from an online judge like SPOJ.
- eugene.yarovoi February 27, 2015I don't know about your situation in particular, but generally speaking, if the email came from a hiring manager or recruiter directly associated with the company, it means your resume was selected for participation in this next round. Chances are some human did review your resume (perhaps briefly).
The online assessment will likely first be evaluated by an automated judge that runs test cases against your code. It is possible that a human will review your code later in the process, if they're interested in seeing how elegant your code is.
We have two kinds of allocation requests: requests for specific numbers ("custom request"), and requests for arbitrary numbers ("generic request"). We also have a need to be able to verify whether a given number is used or not. We can fulfill all of these operations in O(1) expected amortized time.
Create a hash table that will store all of the numbers allocated through custom requests. Additionally, store a counter, nextGenericNumber, that will start at 0. This counter will represent the next number that will be allocated for a generic request -- all lower numbers will have already been allocated.
When we receive a custom request for a requestedNumber, we check that requestedNumber is absent in the hash table, and also that nextGenericNumber <= requestedNumber. If so, we allocate the number, add it to the hash table, and return true. If not, we return false since the number has already been allocated either for a custom request or a generic request.
When we receive a generic request, we check whether nextGenericNumber is absent from the hash table. If yes, we return nextGenericNumber and increment it (for the next time this operation is done). If not, we keep incrementing nextGenericNumber and checking whether it's in the hash table. This is O(1) expected amortized time because a failed check and the need to try another lookup can occur only once for every number that was inserted in the hash table. We may add a small optimization to reduce the hash table's space usage: remove from the hash table any entries that become covered by nextGenericNumber.
Another approach
---------------
Other approaches are possible. For example, we could store all the allocated numbers (whether allocated through custom or generic requests) in a hash table. We serve custom requests simply by checking against the hash table. When serving generic requests, we pick a random number and see if it's taken, and then repeat until we pick a random number that's not taken. As long as there's always a good amount of unused values, this has good expected performance -- O(1) if there exists a fixed K such that K percent of the values are unused at any given time. One advantage is that because the numbers are being allocated randomly, they may interfere less with custom requests. In our previous approach, many custom requests could have gotten declined if everyone was requesting small numbers that would have already been taken by generic requests. The disadvantage is that now we need to store all the allocated numbers in the hash table, not just the ones allocated through custom requests.
Optimization
---------------
One other possible optimization, as suggested by some others on this thread, is that once the data structure is using more than 1.25 GB of space, we can switch to using a bitset. Since there's 10^10 possible numbers, the bitset will use 1.25 GB. That would be, in most cases, a very large amount of space to consume up-front, but it makes sense to switch to it if the hash table gets bigger than that.
Save the data as a byte array. Reserve the first X bytes for the length of the first string (X depends on how many bytes may be needed to encode the largest possible string you can have). After that, place the bytes corresponding to the first string, and then the bytes corresponding to the second string.
To read back the strings, you would first read the first four bytes, which would tell you both how many more bytes you need to read to get the first string, and at what offset you should start reading if you want to access the second string (str1.length + X, relative to the start of your byte array).
We can show that it's not possible to deal with the cases where X or Y start out negative or 0, so the solution given here is complete.
In the first problem, since there's no available instruction that increases the number, you cannot reach 0 if you start with a negative number. If you start with 0, you don't know that unless you first use the DBNZ instruction, and if you do, you now have a negative number.
In the second problem, we argue that if any variable starts out <= 0, it stays <= 0 forever, since there's no instruction that can increase the value of the variable. Then, that variable's DBNZ statements all have the same, *unconditional* action: they always branch and decrement by 1. We can think of it as replacing each DBNZ on that variable with an unconditional decrement followed by an unconditional jump to the label in the DBNZ instruction. This means that all conditions tested in the program pertain only to the other variable. We reach the conclusion that either the number of times Y is decremented depends only on X, only on Y, or on neither. Such a program could not be correct because the program is required to decrement Y precisely value(X) + value(Y) times.
You can do this in linear time. Obtain the in-order traversal of both trees -- these are sorted sequences. Then, merge the two sequences (the same way the merge step of mergesort works). Since the resulting sequence will itself be sorted, all the duplicates can then be easily removed because they will be adjacent to each other.
I use the term "sequence" because the exact implementation can vary. You can use arrays or linked lists to hold the intermediate sequences. You can also avoid the use of intermediate structures altogether -- just traverse the trees and keep track of where you are in the traversal (that's what using an iterator does). Then merge as you go, and don't output an element to the output if the previous element placed there is equal to the current one.
The interviewer's method is preferable because it involves flipping an expected O(log n) coins if there are n days. Your method is also correct, but involves flipping an expected O(n) coins.
- eugene.yarovoi January 31, 2015You could also manipulate the tree pointers to flatten the tree nodes into a sorted linked list, and then convert the linked list into a perfectly balanced tree.
- eugene.yarovoi January 17, 2015No, threads don't share program counters. How could they? A program counter represents which instruction you're currently executing, which is related to what line of code you're currently executing. Since the whole point of threads is that they allow you to execute different parts of the program at the same time, each thread must have its own program counter.
- eugene.yarovoi January 03, 2015There is a method by which you can extract the top X values in a heap in order in O(X log X) time, improving the worst-case bound given by @Iasthope to O(N + X log X).
Although, there is a deterministic version of Quickselect that solves this problem in O(N).
I think you meant "else water stored is (C - height of building i)". I would just express it as:
water stored = max (0, C - height of building i)
An interesting follow-up problem to this question is to solve it in 3 dimensions.
O(n) is a lower bound for the worst-case. Consider that in the example
2,2,2,2,2,2,2,2,2,2,2,2,2,2,3 the value of 3 could have been anywhere and the input would still have been legal. Any location you inspect that has a 2 reveals no information about the location of the 3. As long as you haven't found the 3 yet and there's some entry you haven't inspected, the 3 could be there. Any correct algorithm will therefore have to read all of the input in the worst case.
The best we can hope for is a heuristic that requires fewer than N array reads on some (but not all) inputs, which is exactly what this answer provides.
@RG: This problem is far more restricted than the general subset-sum problem, so there's no reason to believe it's NP-complete. In fact, it's not. For example, Sunny gives a worst-case O(n^2) solution.
- eugene.yarovoi November 19, 2014@mee: The inner loop doesn't iterate over all the subsets. It iterates over each distinct subset sum mod M, which means the inner loop executes O(MN) times. If M = N or M = 2N, the inner loop executes O(N^2) times.
Now, it does appear at first that maybe the solution is actually O(N^3), because the line LinkedList<Integer> list = (LinkedList<Integer>)subsets.get(i).clone(); actually takes O(N) time to execute. However, that line is only hit if the current subset sum mod M is distinct from any other subset sum mod M seen so far. Since there can only be M different subset sums mod M, the line can only be executed M times, for a total of O(MN) time spent on that line.
@gs: Not so. Please provide a justification for your analysis. The justification for the O(m*n) runtime is that it's no different from a regular BFS, with an initial O(m*n) pass to find all the doors with which to initialize the queue.
- eugene.yarovoi July 23, 2014Note that I said the algorithm is "linear in time and space with the size of the input", not linear in terms of N. The input size is N^2, so the time and space complexities are O(N^2). I chose to talk about the performance relative to the size of the input because that makes it clear that this algorithm is asymptotically optimal, since any algorithm must look at all or most of the input.
- eugene.yarovoi July 16, 2014I wouldn't say you should "stop worrying" about coding or application development, but you also should know math and algorithms. If that's where you're weakest, it may make sense for you to devote the most time to studying that.
- eugene.yarovoi July 16, 2014Yes, the algorithm is only valid if all the integers are distinct. We can still solve in linear time and space with respect to the size of the input if they're not distinct, but you would need a different algorithm. You could do it using dynamic programming. Let F(x,y) be the longest sequence starting from coordinates (x, y). Then F(x, y) = 1 if there are no neighbors with value arr[x][y]+1, or 1 + max over all such neighbors if any exist.
- eugene.yarovoi July 15, 2014What would you make private? If it's the class, then the class would not be visible. If the constructor, then you wouldn't be able to build an instance of the object through the constructor. If all the methods and properties, then you could still have a subclass that adds new methods or properties. Either way, sealed does something different: it allows you to see and instantiate a class without the ability to create any subclasses of it.
- eugene.yarovoi July 15, 2014You have the length of the string. After a certain number of attempts all coming up "aaa", you really could conclude that, with high probability, the string is "aaaaaaaaaaaaaaa". I'm guessing the question only requires you to return an answer that is correct with high probability, because clearly there is always some chance that you will never even learn about the existence of a character. That chance can become vanishingly small with enough uses of the random triplet method, however.
- eugene.yarovoi July 15, 2014Here's a very simple way to do it. Make an array of booleans (or bits) of size N^2, where arr[i-1] indicates whether i+1 is adjacent to i. Then, iterate over the matrix, checking for each cell the four neighbors and populating the relevant entry in the boolean array. Then just look for the longest run of "true" values in the boolean array, which can be done with one pass.
This approach is linear in time and space with the size of the input. The space needed is only 1 bit per element, so the constant factor is really low for space. The constant factor for time should be reasonably good as well because we only do a few array accesses and comparisons for each element in the input.
In many languages, a double has more precision than a float (and occupies more space for that reason). Therefore, calculations done with doubles will suffer from less roundoff error and be more accurate.
There is no problem to be remedied here -- it's best to think of all floating-point (float or double) calculations as approximate answers, with the double type offering greater precision.
Other methods may be faster or may offer more convenient APIs for reading / writing data.
- eugene.yarovoi July 13, 2014The median of all the medians is not necessarily the median of the entire combined dataset. Consider an example with just two machines:
Machine 1: 2, 3, 4, 5, 6. Median is 4.
Machine 2: 6, 6, 6, 6, 6. Median is 6.
The correct answer for the median of the combined data would be 6, but here you would get 5.
The two heaps never differ by more than 1 in the number of elements they have. You can therefore derive an O(log n) runtime for adding a new element. The insert takes O(log n), and you have to move at most 1 element from one heap to the other, which also takes O(log n). Querying the median is always just a matter of looking at one or both of the roots, and is O(1). There are no O(N) operations here.
- eugene.yarovoi July 10, 2014In Java, a TreeMap is a Map interface implemented using a self-balancing binary search tree.
- eugene.yarovoi July 10, 2014You could use a Set to keep track of what you've seen already, and only delete an element if you've seen it before. That way, the first occurrence will not be deleted.
- eugene.yarovoi June 08, 2014Column names are only guaranteed to be unique within the context of a particular database, schema, and table, so the operation you're asking for is not well-defined.
If you happen to know for a fact that the column you want has a name that is unique throughout all the databases, schemas, and tables on a particular machine, you should be able to run a query on the system metadata to determine the name of the database and table containing the column. For example, in T-SQL you could try running the following query on each database where the column could be located:
select s.name + '.' + t.Name,
from sys.columns c
join sys.tables t
on t.object_id = c.object_id
and c.name = @VariableHoldingYourColumnName
join sys.schemas s
on t.schema_id = s.schema_id
This should give you the full name of the table containing the column. You can then dynamically build a query that queries this table.
If you don't even know the database name, you'd have to use a temp table to hold the results while using a cursor to iterate over the databases.
1. Just because you didn't hear about a limitation in your OS textbook doesn't mean it doesn't exist in practice. There is no hard technical reason for why an OS couldn't accommodate billions of files in theory; it's that most OSes are not optimized for it. Go ahead and try to create, say, even a mere 100 million files in the same directory. See how your OS responds.
2. On many OSes, an empty file takes as much disk space as one minimal disk allocation. This can often be something like 4 KB. If you think that's not a lot, think that you will require 4 KB for each distinct word, when the size of the word itself is a few orders of magnitude smaller. For 1 billion distinct words, you will require 4 TB.
3. Creating files on disk requires I/O. You didn't suggest using anything as key-value pairs, but that's effectively what you're doing with this solution.
4. You seem to think that something like touch -a word.txt takes O(1) time. But...if the number of files in a directory gets large, the OS may not be optimized to do this operation in O(1) time. In any case, even if the operation were done using an efficient lookup, the constant factors here are absolutely enormous.
5. Why would the file system's sort be O(n)?
All in all, while I like that you're trying to think outside the box, this is not a solution you'd want to give in an interview. If you still don't believe me, try coding this solution and comparing its performance to some other approaches. Make sure to try on large inputs where billions of files would need to be created.
There is only 1 BFS. It just starts at all the door positions simultaneously. If there were O(n^2) doors, the BFS actually would be constant time per door when amortized over all doors. By O(grid size), I mean O(m*n) on an m x n grid. Maybe I should have said "grid area" or "size of input".
- eugene.yarovoi May 28, 2014We can prove that if there is a run of consecutive horses of the same color, it is always optimal to place them into the same stable. Suppose there is some arrangement of horses violating this constraint. Consider these two horses and their two respective stables. If one of the two stables has fewer horses of the opposite color, the function we're optimizing would be increased by having both horses there. If the two stables have equally many horses of the opposite color, transferring one of the horses to the other stable does not change the value of the function. Therefore, we need not consider breaking up consecutive runs of horses of the same color, and we can reduce the N in the O(K *N^2) runtime to just the number of runs of horses of the same color.
- eugene.yarovoi May 25, 2014You can just do BFS with multiple sources. This works in exactly the same way as regular BFS, but instead of starting with just one node, you would put all your sources (doors) in the queue at the beginning. That is, make an initial pass over the grid to find all doors and start your BFS queue with all of them at distance 0. Then proceed with BFS as normal. This solution is O(grid size), which is optimal because you have to look at all the input.
- eugene.yarovoi May 25, 2014@Anon: Oops. Edited. Also clarified the logic.
- eugene.yarovoi May 21, 2014This turns out to simply be the Fibonacci sequence. Consider in what ways you can cover the top left corner. If you lay down a tile horizontally, you reduce the problem to 2 x (W-1), and if you lay down a tile vertically, you have no choice but to lay down another tile vertically right next to it (top right corner), leaving you with a space of 2 x (W-2) to fill when you're done. All the tilings you get with the first choice are different from all the tilings you get from the second choice, so the total number of tilings is the number of ways to tile a 2 x (W-1) board plus the number of ways to tile a 2 x (W-2) board. Thus, letting F(W) be the number of tilings for a 2 x W board, F(W) = F(W-1) + F(W-2) subject to F(1) = 1 and F(2) = 2 (if you want, you can define F(0) = 1). This is just the Fibonacci sequence and may be computed in any manner in which you normally compute Fibonacci numbers -- with recursion & DP, iteratively, or using the efficient matrix-squaring approach.
- eugene.yarovoi May 20, 2014One problem with this solution is that it could create an absolutely enormous number of files. In the event that most of the strings are distinct and short, this solution might create billions of files, overwhelming just about any operating system. Directory file lists are not optimized to serve as a giant key-value store.
This also does not produce the correct output. At the end, you end up with the K most recently seen words, not the K most frequent words. While frequent words are more likely to be recently seen than infrequent ones, you will not always get the correct answer.
@csemanoj: That wouldn't be the condition that would make this solution right. Adjacent nodes in a BST are not always in-order successors, and not all in-order successors are adjacent. To make inversion counting correct, you'd have to have a condition that says you can only swap in-order successors, but that seems very contrived.
- eugene.yarovoi May 20, 2014Consider that a convolution is of size O(n^2) and therefore contains the sums of only O(n^2) submatrices. However, there are O(n^4) different submatrices, so you're not considering all the possible submatrices.
- eugene.yarovoi May 20, 2014@Le Subbu: That works too. I like that idea actually -- that way you don't have to take a special case just to guarantee termination.
@Fake Le Subbu: Most of the "theoretical opinions" you're criticizing are situations where the algorithm is stated without code. This has nothing to do with not spending enough time on the problem, and everything to do with an assumption that readers can code for themselves if they know the algorithm.
When you do int* myPointer = new int(); you always get back a pointer to valid, writeable memory assuming the allocation itself doesn't fail. If you just cast some arbitrary address to a pointer, the memory address could be not allocated to the process or not writeable.
Another thing that you have to consider with an address like 0x12345 is that it's not aligned to a 4-byte boundary. On some architectures integer writes must be aligned to 4-byte boundaries, so this could be another reason it fails. When you do new int(), the returned memory is always correctly aligned.
Suppose you could do this in O(n). Then, you could iterate through the list in sorted order in O(n), which would yield an O(n) sorting algorithm.
- eugene.yarovoi May 14, 2014I think you meant that you guarantee that it won't crash if they do cout<< p <<endl. If the pointer is dereferenced (as in cout<< *p <<endl), it might well crash.
To set the value, you'd set it in the same way you set any other pointer, e.g. *myPtr = 55. The only problem is that depending on what memory address you're pointing to, it might not be legal to write that address, in which case you'd get a segfault. That has nothing to do with the fact that the pointer was cast from int, and everything to do with the address in the pointer.
The memory location numbered 1000 is likely not writeable, which would explain the segfault.
- eugene.yarovoi May 10, 2014
RepGayle L McDowell, CEO at CareerCup
Gayle Laakmann McDowell is the founder / CEO of CareerCup, which provides programming interview prep for candidates interviewing with Microsoft, Google ...
RepLoler, Employee at Google
Rep
Here is an approach that runs in O(n) time.
- eugene.yarovoi October 10, 2015To clarify the idea, we will first assume that every character occurs in the string less than N/2 times, where N is the length of the string. This is not a perfect assumption, because the string may still be valid if there's one letter that occurs N/2 or (N+1)/2 times. e.g. abaca is a valid arrangement. After discussing the general approach, we can take care of this special case.
First, we count the number of occurrences of each character using an array or hash table, depending on whether the size of the alphabet is small or large. After that, we place occurrences of the first character at indices 0, 2, 4, 6, ..., etc., so that occurrences of the character are never together. If the first character used indices 0, 2, ..., 20, the second character would be placed at 22, 24, 26, etc. Upon reaching the end of the array, we would wrap around and start filling odd indices, like 1, 3, 5, in that order
To be precise, we keep a numCharsPlaced variable. If we've already written a total of numCharsPlaced characters (counting repeated chars), the next character is to be placed at (2*numCharsPlaced)%arr.length+adjustment, where adjustment is 1 if the array size is even, 0 if it's odd.
Note that this algorithm never places two identical characters next to each other, except when a character's count is so high that the wraparound to odd indices occurs and we get all the way back to where the even-indexed occurrences of the character occurred. This can only happen if a character occurs at least N/2 times.
If a character occurs (N+1)/2 times, all other character counts are less than N/2. The solution is to detect that character when validating the character occurrence counts, and ensure we start (at index 0) with that character (this will give this character enough space, and all other characters have enough space by the general proof given before). If a character occurs exactly N/2 times, we use the same remedy as for the (N+1)/2 case. In the N/2 case, it's possible that there is one other character that also occurs N/2 times, but we verify that the algorithm behaves correctly in this case.
If a character occurs more than (N+1)/2 times, there is no solution. That's because you can split the input into that many pairs of adjacent locations, and you can argue by pigeonhole principle that at least one such pair of locations must have more than one of the same character, which would mean that the same character occurs twice consecutively.