Miguel Oliveira
BAN USERI enjoy algorithms, puzzles and programming competitions :)
- 1 Answer Interview "red flags"
Hi,
- Miguel Oliveira September 09, 2013
In the beginning of CtCI book, Gayle talks about the case of a student which she referred but had too many "red flags".
Could you elaborate more on the "red flags"? Maybe with a few more examples of these kind of mistakes that we should avoid in an interview?| Flag | PURGE
Lets say we have N words and K = 100.
First, I would use a hash map to count the number of occurrences of each word. A Trie is another option but it consumes a ton of memory.
Then, we have 2 options:
1) Process all <word, frequency> pairs. Use a min-heap to keep the current 10 most frequent words. If the current word has a higher frequency than the minimum of the heap, remove the minimum and add this word. Finally, extract the 10 words from the heap. This will take O(N log K) time and O(N + K) space.
2) Use the median of medians selection algorithm (search wikipedia "Median_of_medians") to find the 10th highest frequency. As a result, we find the 10th highest and the top 9 are the words to the right of the 10th word.
This will take O(N) time and O(N) space.
Lets say the string has N characters and we can search a substring from positions i to j in the dictionary in O(1) using the given function.
a) Create a boolean array V of size N and initialize its elements to false, except V[N] = true. Traverse the positions from end to beginning. V[i] will be true if we can reach position N from position i. We check all positions j (j > i) such that characters i..j-1 form a word from the dictionary. If V[j] is true, so is V[i]. The text can be converted if we reach position N. The time complexity is O(N^2). Space complexity is O(N).
b) Very similar to a) except that V will count the number of ways we can reach position N from i. V[N] is 1 and we sum the number of ways, V[j], for each valid j > i. The time complexity is O(N^2). Space complexity is O(N).
c) Very similar to b) except that we don't sum, but take the minimum. V[i] = min(V[j]+1) for all valid j or infinity if there is no valid j.
d) Save in an additional array Next the index of the minimum for each position.
e) Traverse Next, starting from 0 and jumping to the next position.
1) Greedy approaches will take a local optimum and may fail to find an existing solution. Simple example: string = "aab", dict = {"a", "aa", "ab"}. There is a solution "a ab" but the most common greedy algorithm will pick "aa" first and not find a solution because there is no "b" in dict.
2) Backtracking works but is extremely slow. There are 2^N possible splits and it would explore them all, yielding a O(2^N) solution.
Eric, a N^2 table also wouldn't fit in memory. Refer to the comment in the code
// for simplicity. we should use only a window of size 2*k+1 or
// dp[2][MAX] and alternate rows. only need row i-1
It's not O(n^2), otherwise the partition problem would not be NP-Complete.
It is O(n * sum_all_values). When half the sum of numbers is too large, we need to use the O(2^n) approach.
it's not enough to check just 2 consecutive rows.
in the following example, only row 4 and column 3 match the problem's requirements, but there is no way to "guess" that we need to look up row 4. we need to check all rows
0 0 1 1 0
0 1 1 0 0
1 1 1 0 0
0 0 1 0 0
0 0 1 0 1
why use regular expressions instead of something like
if c in ['a', 'e', 'i', 'o', 'u']:
?
About the Binary Tree check, I think it gives the correct answer, but just a minor detail: when the node is already contained in the HashMap, it does not necessarily means there's a cycle. Example:
A -> B
A -> C
B -> D
C -> D
D will be revisited twice. It is not a cycle, but this is not a tree, just a DAG, and the code returns false as expected.
Yes, it would have to be a tree rooted on the given node. In my opinion, this does not make much difference.
Interview questions are not set in a "real life" software project. So, unless the interviewer said so in the question, I believe we should treat it as a normal question like "check if a binary tree is a bst"
The "possibility" part of the question is about using the same structure to represent a DLL, Binary Tree or something else.
Then, the question is very direct "Given a random pointer recognize whether it forms DLL, Binary Tree or none.". Thus, we should traverse from this pointer to its adjacent nodes.
To check if it is a DLL, we need to traverse the list both forward and backwards and check if the previous and next pointers are correct. Also, if the DLL may be circular (ask the interviewer), we need to detect cycles.
Checking if it is a binary tree is a bit more complex. A Tree does not have cycles. On top of that, we also need to check that each node has at most one parent (otherwise it is just a Directed Acyclic Graph and not a Tree)
the Binary Tree part is incorrect. A Tree (binary or not) does not have cycles and each node has at most one parent. we need to check that.
- Miguel Oliveira December 16, 2013the Binary Tree part is incorrect. A Tree (binary or not) does not have cycles and each node has at most one parent. we need to check that.
- Miguel Oliveira December 16, 2013This question is from an ongoing competition on CodeChef.
Don't use this site to cheat.
the code does not work because it does not count smaller palindromes like 'iai' in 'diaid'. Also, doing it that way and checking with a hash table will make the algorithm run in O(N^2) - just use a string like "aaaaaa"
- Miguel Oliveira December 13, 2013The question asks for distinct palindromes. I believe Manacher's algorithm won't work in this case.
I also think the best answer involves suffix trees, but I'm not sure how to construct the tree to check the distinct palindromes
if you consider closed intervals like [start, end] in seconds, the amount of time is end-start+1 seconds.
Example [2, 10] : 10 - 2 + 1 = 9 seconds.
I believe O(n) is possible since they are sorted by start time.
Keep auxiliary variables storing the current maximum end time of the processes seen so far and the time the current set of consecutive processes started.
Go through the entries in the log in order. If the current process starts at a time larger than the previous maximum time the processor was busy, then this is a break (it was idle) and update the current start time.
// for simplicity, consider the log as an array with N entrys each with start and end times.
// in reality this would be just fetching from the log file - no big difference.
int GetMaxBusyTime(Entry* entries, int N) {
int max_end = 0, res = 0, cur_begin = 0;
for (int i = 0 ; i < N ; i++) {
if (entries[i].start > max_end) { // it was idle before this one
res = max(res , max_end - cur_begin + 1);
cur_begin = entries[i].start;
max_end = entries[i].end;
}
else {
max_end = max(max_end, entries[i].end);
}
}
return max(res , max_end - cur_begin + 1); // also check the final one
}
True @sambatyon, but I don't think sorting huge files works exactly in O(n log n) does it? I never did it (huge files) in practice.
- Miguel Oliveira December 05, 2013I think the point of mentioning "billion usernames" is that they won't fit in memory. So the question is more about exploring other options
- Miguel Oliveira December 05, 2013the drawback of tries is the memory. it would take too much memory to store 2 billion usernames in memory as Alexis mentioned as well.
- Miguel Oliveira December 05, 2013This is a sort of open ended question. It is crucial to ask more details from the interviewer.
- Are we using a single core or a cluster of cpus?
- How much memory is available? Do the files fit in memory?
- Can we create auxiliary files?
Assuming 1 cpu core, a possible approach could be partition the usernames into buckets. Each bucket contains a portion of the valid usernames.
Assume for simplicity that the alphabet used in the usernames are only letters A-Z. One possible partition would be to split all usernames starting with A to one file, B to a another file, etc. Then, each file will have a fraction of the billion usernames and we can intersect the usernames from the log files.
The above approach is very simple and won't partition the usernames into the buckets uniformly. We should discuss it further with the interviewer. We could try to check the usual distribution of the usernames over the alphabet and partition (start with A is probably more likely than starting with Q). Of course we don't need to split just by the starting character, but the first 2, 3, etc chars.
Note: don't assume the usernames use the latin charset. Ask first if other chars like Chinese or Japanese are used (unicode in general).
@kilaka, you didn't read the posts. The selection algorithm page has a link to a deterministic O(n) algorithm called "median of medians". In my latest reply, I posted the link - en.wikipedia.org/wiki/Median_of_medians
- Miguel Oliveira December 01, 2013Hmm that's pure backtracking.
Dynamic programming implies that you're saving subproblems results somewhere (like a table) and reusing them later.
that's *if* you assume the compiler will make those optimizations
- Miguel Oliveira November 16, 2013The recursive version has O(log n) space complexity.
2 possible reasons why you see O(1) everywhere may be:
a) Many people forget about the recursion stack
b) I believe most people/libraries implement the iterative version
you misread my approach. that was just an example.
K should just contain numbers between 0 and N
the values go up to 50 million whereas the unsigned 4 byte int holds values in the order of 4 billion..
- Miguel Oliveira November 08, 2013you're right. it had been a while since i first read the problem and forgot about that
- Miguel Oliveira November 06, 2013"uniform()" is just a pseudo-code used by the author of the question. i just reused it. as the comment says, it generates a random number uniformly between 1 and X
The question is not entirely clear, but it doesn't make much sense if the elements in K would be greater than N. i think "probability of generated number should be 1/(N-K.length)" would be impossible
Both players play optimally. Hence if B changes to 10011, then A changes to 00111 and A wins
- Miguel Oliveira November 02, 2013yeah, it sucks that the edit/remove feature is gone :s
- Miguel Oliveira November 02, 2013"(K[i]-last > x) = true"
No. x = 1, K[0] - last = 1 - 0 = 1 which is not > x. So the cycle will ignore all values in K and return 1 + K[2] = 1 + 3 = 4
from the example given in the question, what we want is to select one number in the interval [0, N) except the numbers in array K. Hence, there are N - K.length valid numbers.
Example: N = 10 and K = [2,3, 5, 8]. We should give a random number of the set [0, 1, 4, 6, 7, 10] with 1/6 probability
from the example given in the question, what we want is to select one number in the interval [0, N) except the numbers in array K. Hence, there are N - K.length valid numbers.
Example: N = 10 and K = [2,3, 5, 8]. We should give a random number of the set [0, 1, 4, 6, 7, 10] with 1/6 probability
can't edit or delete. please ignore. correct post below
- Miguel Oliveira November 01, 2013can't edit or remove the post. there's a little bug >= should be >. fixed below
- Miguel Oliveira November 01, 2013I'm assuming numbers between 1 and N.
Generate a random number X in the range 1..N-K.length. Then, process the K array while we haven't seen X numbers.
public int randomNumber(int N, int[] K) { // K is sorted
int x = uniform(N - K.length); // 1 .. N - K.length
int last = 0 , i;
for (i = 0; i < K.length; i++) {
if (K[i] - last > x)
return x + last;
x -= (K[i] - last - 1); // we've seen K[i] - last - 1 valid numbers
last = K[i];
}
return x + K[ K.length-1 ];
}
runtime O(K), O(1) space
- Miguel Oliveira November 01, 2013I'm assuming numbers between 1 and N.
Generate a random number X in the range 1..N-K.length. Then, process the K array while we haven't seen X numbers.
public int randomNumber(int N, int[] K) { // K is sorted
int x = uniform(N - K.length); // 1 .. N - K.length
int last = 0 , i;
for (i = 0; i < K.length; i++) {
if (K[i] - last > x)
return x + last;
x -= (K[i] - last - 1); // we've seen K[i] - last - 1 valid numbers
last = K[i];
}
return x + K[ K.length-1 ];
}
runtime O(K), O(1) space
I'm assuming numbers between 1 and N.
Generate a random number X in the range 1..N-K.length. Then, process the K array while we haven't seen X numbers.
public int randomNumber(int N, int[] K) { // K is sorted
int x = uniform(N - K.length); // 1 .. N - K.length
int last = 0 , i;
for (i = 0; i < K.length; i++) {
if (K[i] - last >= x)
return x + last;
x -= (K[i] - last - 1); // we've seen K[i] - last - 1 valid numbers
last = K[i];
}
return x + K[ K.length-1 ];
}
runtime O(K), O(1) space
- Miguel Oliveira November 01, 2013@Keith, that was explained before. The code answers 2 because it is transforming the string into its reverse. If we need 1 delete in the original string, we need another delete in the reversed string to make them equal. That's why we compare it with 2*K
- Miguel Oliveira October 30, 2013@Zuzu, that code is incorrect. See the other post with test cases
- Miguel Oliveira October 28, 2013@Zuzu, it's easy to check by hand or even by brute force that the answer to "abbc" is 25. It's the strings "abab", "abac", "abad".. "abaz".
- Miguel Oliveira October 28, 2013I didn't think too much about it because i was considering reading the input as part of the algorithm complexity.
but you have a point there. the approach you mentioned is much better to actually solve the problem
i don't see any better method. i think we always have to process all the input in the worst case.
- Miguel Oliveira October 28, 2013If we can't sort, then we probably can't change the array in any way.
A possible approach would be to use a min-heap of size k.
- Insert the first k numbers into the heap
- for each number insert it in the heap if it is larger than the current minimum and pop the minimum
The runtime will be O(n log k) with extra O(k) space.
Im my opinion, I expect 'k' to be much smaller than 'n'. Thus this approach ends up being better in practice than other approaches that run in O(n) but need to either
a) change it (it seems we're forbidden to change the array)
b) or use extra O(n) memory (remember the "enormous" word)
If we can't sort, then we probably can't change the array in any way and those methods require that.
- Miguel Oliveira October 26, 2013it is incorrect because the word "hiree" is invalid and the regular expression would accept it.
- Miguel Oliveira October 24, 2013the way the language is defined, the greedy is ok if there are no equal consecutive letters like the way the question was posed. no need to overcomplicate things
- Miguel Oliveira October 24, 2013If you have good grades in a top school, I don't see why you don't get an interview call.
Maybe you need to work on your CV to stand out your qualities in an objective way (and quantitative if possible).
Do you have any relevant extra-curricular activities related to CS or software engineering?
I suggest you watch this video about CV tips from Google recruiters youtube.com/watch?v=5wa9J7iXOh0
Ups, K = 10
- Miguel Oliveira May 20, 2014