Miguel Oliveira
BAN USERI enjoy algorithms, puzzles and programming competitions :)
 1 Answer Interview "red flags"
Hi,
 Miguel Oliveira September 09, 2013
In the beginning of CtCI book, Gayle talks about the case of a student which she referred but had too many "red flags".
Could you elaborate more on the "red flags"? Maybe with a few more examples of these kind of mistakes that we should avoid in an interview? Flag  PURGE
there's no need for dynamic programming. hashing strings will be much more expensive than performing a O(string_len) check.
 Miguel Oliveira October 24, 2013O(1) means that the time complexity is constant disregarding the input size. This does *not* mean "instant time". e.g.
int func(int n) {
int res = 0;
for (int i = 0; i < 100; i++)
res += i * n;
return res;
}
This function runs in constant time, because it always does 100 operations regardless of the value of n.
HashMap operations run in amortized constant time. Just look up the java documentation for example.
it fails for n = 12. it's not a power of 4
 Miguel Oliveira October 21, 20132 problems:
1) Set operations run in O(log n), so the overall solution will work in O(n log n), not O(n).
2) You would be using extra O(n) memory, violating the constant space constraint.
Since the values are only in the interval [0, n[ we could just use a boolean array with n positions to keep it O(n), but it still violates 2).
i think it's better to use different letters. mixing 'n' and 'N' makes it unclear.
change to something like M log N, and say that M = N^2, the total amount of numbers.
The runtime of this approach is O(N^3). The merge sizes are
n+n
2n+n
3n+n
...
n*n+n
Summing up: (1+2+3+..+N)*(N+N) = N*(N+1)/2 * (2N) = O(N^3)
You can do it in O(N^2 log N) using the same method to merge k sorted arrays (a heap). In this question, k = n.
like was mentioned before, you can use a heap to make it O(n log k) with only extra O(k) memory as well.
 Miguel Oliveira October 19, 2013why are you using xor instead of a[k] == k?
 Miguel Oliveira October 18, 2013for quick select yes. but there is a O(n) worst case selection algorithm. you may check the link i gave above or this one en.wikipedia.org/wiki/Median_of_medians
 Miguel Oliveira October 18, 2013we're given a list of points (x, y). your code does not make any sense..
 Miguel Oliveira October 18, 2013constants are ignored in bigO notation. thus O(n^2 + n^2) = O(n^2)
 Miguel Oliveira October 17, 2013Yes, the selection algorithm guarantees O(n) regardless of K and it is not considering K as a constant.
you can look it up on the Introduction to Algorithms book or wikipedia en.wikipedia.org/wiki/Selection_algorithm
the question says using constant space, so the interviewer wants us to avoid the trivial solutions.
 Miguel Oliveira October 17, 2013read the other posts. heap operations will make the solution O(n log k)
 Miguel Oliveira October 17, 2013we still need to delete the maximum when we find a lower value and deletions run in O(log n) even with a Fibonacci heap
 Miguel Oliveira October 17, 2013You can do it in O(n), but you have to avoid heaps.
First build an array with the distances to the origin and the corresponding point. O(n)
Then find the Kth largest distance using the Selection algorithm. O(n)
The K1 smallest distances are to the left of the Kth distance, in no particular order.
So the total is O(n).
this is not O(n). each heap operation costs O(log k). Hence the runtime is O(n log k)
 Miguel Oliveira October 17, 2013Basically, if there are no duplicates, we receive a permutation of the numbers 0..n1.
We can rearrange the array such that number i is at position i.
Take this example: 0,3,2,5,4,1
0 is in the correct position.
3 is at position 1, we will swap it with the number at position 3 (5) to put it in the right position
0,5,2,3,4,1
now 5 is wrong, swap it with the number at position 5 (1)
0,1,2,3,4,5
now 1 is ok. 2, 3, 4 and 5 are too.
If at any step we try to swap a number into a position which already contains that number, then there are duplicates.
#include <iostream>
using namespace std;
bool Check(int* v, int n) {
for (int i = 0; i < n; i++)
if (v[i] != i) {
do {
if (v[v[i]] == v[i])
return true;
swap(v[i], v[v[i]]);
} while (v[i] != i);
}
return false;
}
int main() {
int v1[] = {0,1,2,3,4,5};
int v2[] = {0,5,2,3,4,5};
int v3[] = {0,3,2,5,4,1};
cout << Check(v1, 6) << endl;
cout << Check(v2, 6) << endl;
cout << Check(v3, 6) << endl;
return 0;
}

Miguel Oliveira
October 16, 2013 you code uses O(n) extra memory.
 Miguel Oliveira October 16, 2013oh, it's so bad that people post questions here to cheat :S
at least i didn't post the code, but it was a huge spoiler that way
Gayle talks about this quite a bit. The companies will never tell you why they rejected you.
The most likely reason is that someone else performed better and was hired instead. The visa issue and the lack of experience with the Microsoft stack can affect this. It depends on the other points of comparison between candidates.
 You never know how well you did in the interview. Interviewers being really friendly or gloomy does not mean much.
 Even if you gave correct answers, there may be better answers (or answers that the company prefer)
 The company may be looking more into how the candidate thinks rather than the right answers.
 Cultural fit matters a lot
I've seen a true genius (medals in international scientific olympiads, really high ratings in programming competitions, etc) post on his facebook page that he was rejected at a few large companies and startups, but eventually got a job at one of the most popular big companies.
IQ is not enough. This is highly subjective.
You may check my answer to Gedo Mazo to see why that's incorrect.
 Miguel Oliveira October 13, 2013I was using a general graph as example because it's easier to find counterexamples to the greedy approaches.
To see why your greedy approach is wrong, imagine the grid has only one line. 's' denotes the starting position and A,B,C the special places.
A..s.B.....C
The best path is s > A > B > C with cost 3 + 5 + 6 = 14. However, the greedy algorithm will go first to B because it is closer to the start, then to A and finally to C yielding a cost of 2 + 5 + 11 = 18
 Miguel Oliveira October 13, 2013An MST gives us the minimum cost of connecting all vertices. However, in this problem we want a route that will pass through all specific islands. This is not the same as MST and it's why it's similar to TSP.
Consider this graph as counterexample:
A  B, cost 1
A  D, cost 2
B  C, cost 2
D  E, cost 2
The MST has cost 7, using all edges. However, lets say A is the starting point and B..E the specific islands. The best route would be A > B > C > B > A > D > E with cost 1+2+1+2+2 = 10.
No, I intended that way. If we consider all permutations of the specific islands like
start > island_1 > island_2 > island_3
start > island_1 > island_3 > island_2
start > island_2 > island_1 > island_3
start > island_2 > island_3 > island_2
start > island_3 > island_1 > island_2
start > island_3 > island_2 > island_1
to compute the optimal cost, this will take O(S! * S) because there are S! permutations and we take O(S) time to compute the cost of each permutation (path), using the previously computed distances between islands.
However, we can use the dynamic programming approach i mentioned above to compute the optimal cost in O(S^2 * 2^S) which is much better than the factorial time complexity.
my first interpretation was the same as yours, but like the question samples show, the fibonacci sequence starts with 2 equal numbers. so, i think the ones you mentioned are not valid.
anyway, my first version of the code implemented that approach. it's a trivial change: just add a nested cycle for the second number
for (int a = 1; ; a++) {
long long n = AppendNumber(AppendNumber(a, a), a+a);
if (n > to)
return;
for (int a1 = a; ; a1++) {
n = AppendNumber(AppendNumber(a, a1), a+a1);
if (n > to)
break;
// same prevs and the while loop

Miguel Oliveira
October 13, 2013 yeah, you're right.
btw, you can always omit the http and www part.
Instead of checking all numbers in the given range, we can actually generate all numbers.
Note that if we fix A, the other digits must follow the fibonacci rule.
This will be much faster than checking all the numbers in the range. It takes 0.006s to print all fibonacci numbers between 1 and 1,000,000,000
#include <stdio.h>
#include <math.h>
int NumDigits(long long num) {
int nd = 0;
do {
nd++, num /= 10;
} while (num);
return nd;
}
long long AppendNumber(long long cur, int num) { // Append num to the end of cur
return cur * pow(10, NumDigits(num)) + num;
}
void PrintFibInRange(int from, int to) {
for (int a = 1; ; a++) {
long long n = AppendNumber(AppendNumber(a, a), a+a); // number "a a a+a"
if (n > to)
return;
int prev1 = a+a, prev2 = a, tmp;
while (n <= to) { // generate the remaining numbers
if (n >= from) // using the fibonacci rule
printf("%lld\n", n);
tmp = prev1+prev2;
prev2 = prev1;
prev1 = tmp;
n = AppendNumber(n, tmp);
}
}
}
int main() {
PrintFibInRange(1, 1000000000);
return 0;
}

Miguel Oliveira
October 12, 2013 This is similar to the Travelling Salesman problem, which is NPhard.
 If the number S of specific islands is quite small, say <= 20, we can perform S+1 Breadth First Searches to find all pairs of distances between the S islands and the initial position. Then, we can check all permutations of the S islands and calculate the cost of the possible paths "Initial pos > Island_1 > .. Island_S" using the distances calculated above. We can use dynamic programming to find the best path. The DP state can be the current specific island and a bitmask with the covered islands so far.
The running time will be O(S^2 * 2^S + S * N^2).
Note that 2^20 is about 1 million which allows this solution to be fast.
 If there are more specific islands, we can try A*. I don't know much about heuristics but we should try to discuss them with the interviewer. Like using the number of the islands covered so far and the current cost.
@aka777, G1 is not part of the top of server A but it will be part of the top of server B with visits >= 10 / 2. So, the algorithm will ask for all G1 occurrences in other servers and it will correctly put this as top1.
it will yield the correct result, with possibly the cost of multiple rounds.
When a "stream of integers" is mentioned, we don't have the numbers saved in something like an array and we should process them one by one as they arrive.
Hence, your part2 answer would actually require O(n) memory. It sounds to me one of those questions where we can't get 100% correctness given those constraints and we have to be creative and talkative with the interviewer.
I think so as well.
Someone 1 all my replies to this and a few other threads, for some reason, and I guess this reply was kind of forgotten.
The correspondence between characters inserted and deleted is done because we're transforming the input string into its reverse. So those operations will lead to a palindrome.
"You can do (1 insertion + 1 deletion)*n times, and you will still be in the main diagonal"
Sure, but the cost will be 2*N. As explained above, the final step is to compare DP[N][N] with 2*K. only then we decide the answer.
test it with a few cases. you can't check palindromes that way
 Miguel Oliveira October 09, 2013very nice gaoxiao, that seems correct. you should post in a main reply to the question to get your answer more visible
 Miguel Oliveira October 09, 2013That sounds like homework, not an interview question..
 Miguel Oliveira October 09, 2013It depends on the encoding. UTF8 uses 8, 16 or 32 bits per unicode character, depending on the character's identifier.
There are other possible encodings. I suggest you check Wikipedia.
I don't know about console colors, but IIRC standard error stream is flushed automatically while standard output is not. So if both stderr and stdout are directed to the console, the output is likely to be
0 2 4 6 8 10 12 .. 98 100 1 3 5 7 9 11 .. 97 99 (spaces for clarity)
no. first, we don't know "n" (infinite number of floors). second, if the egg breaks, we can't use it again, so it disqualifies binary search
 Miguel Oliveira October 06, 2013your recruiter probably gave you some information about interview preparation. this usually includes topics and websites with information. if the recruiter didn't provide this, you should ask.
if it is for an infrastructure role, it will go beyond UI for sure.
radix sort runs in O(n*k), where k is the average key length. I think it's unlikely that this k is smaller (or considerably smaller) than log(n) in this case.
 Miguel Oliveira October 04, 2013To count the number of pairs in an efficient way, we need to have some ordering on the numbers.
Besides sorting, i think the most efficient way is to use binary search for each number to count the number of other valid integers such that the sum is <= threshold. So, I think O(n log n) is the best we can do.
Why do you say O(log n) ?
for (Integer number : integers) {
sortedSet.add(number);
}
this alone is O(n log n). each operation in a TreeSet costs O(log n).
With a bit enough threshold, there are n*(n1)/2 pairs. You can't find the actual O(n^2) pairs in O(n) time.
beaten by zero2 by a few secs :)
 Miguel Oliveira October 04, 2013As oldtimer said, the game ends when all ones are on the least significant bits.
So we basically need to count how many swaps we need to make the zeros go to the most significant bits.
For each bit set to 1, the number of swaps needed is equal to the number of 0s in the least significant bits.
If the total number of swaps is odd, player A wins, else B wins.
void Solve(int n) {
int i, zeros = 0, sum = 0;
for (i = 0; (i << 1) <= n; i++)
if ( n & (i << 1) ) // bit set to 1
sum += zeros;
else
zeros++;
if (sum & 1)
puts("Player A wins");
else
puts("Player B wins");
}

Miguel Oliveira
October 04, 2013 jobs.amazon.com
 Miguel Oliveira October 03, 2013yes, it's similar to dhamu's answer, but for more numbers (if possible, i'm not sure, i didn't think about it properly) i think we need to split the numbers in groups depending on their lsb.
 Miguel Oliveira October 02, 2013The constraints puzzle me a bit, especially the "using MapReduce directly, is not allowed" one. I would try to discuss what that means exactly in an interview.
I'll give another shot at the question:
Denote N as the number of computers in our network.
1) Pick a good string hash function. This function must take urls as input and produce hash values uniformly in the range [0, MAX_HASH_VALUE]
2) Divide the hash range [0, MAX_HASH_VALUE] into N intervals with equal length MAX_HASH_VALUE / N, and assign each interval to 1 computer in the network, e.g.
CPU_1) [0, length1]
CPU_2) [length, 2*length1]
CPU_3) [2*length, 3*length1]
...
CPU_N) [(N1)*length, MAX_HASH_VALUE]
3) Each computer computes the hash values of its list of urls and sends the url and its visited information to the computer responsible by the hash of that url.
4) Each computer received a list of information url>visits for the urls in its hash interval. Now it must combine the visits of the urls and produce its top 10.
5) Each computer sends its top 10 to a central computer. This central computer will receive 10*N urls and compute the overall top 10.
Due to our good hash function, we expect that each computer will receive roughly the same amount of urls to process in the 4th step. I think this approach is the best we can do to distribute the work among the cluster.
About the constraints, they're not very clear.
a) This is sending all urls over the network but not to a single computer. In order to produce an exact result, I think all approaches end up sending all urls over the network in the worst case. Again, we would need to discuss with the interviewer if this is ok (the "especially sending all of them to a central server" part).
b) This is similar to MapReduce. I think that by saying "using MapReduce directly is not allowed", the interviewer meant that we have to give a detailed explanation about how the work is distributed among the network of computers, instead of just saying "the MapReduce framework will distribute and combine the work for us".
i didn't downvote. i think it's more likely that the interviewer was looking into xorbased approaches instead of sorting or hashing, but this is a valid approach.
 Miguel Oliveira October 01, 2013
you misunderstood the problem. "hiirrreeee" is invalid.
 Miguel Oliveira October 24, 2013