Safi
No, merging does not imply removing duplicates. Merging means that you create the union of two or more (data) sets where the set property (if any) is maintained. For instance, when you do a kway merge you don't omit duplicates by default (you just make sure that the final set is sorted as well).
 Safi December 20, 2014Java code without using String.split()
public static void main(String[] args) {
String rev = "esrever hcae drow NI siht enil";
String orig = "reverse each word IN this line";
assert rev.equals(reverse(orig));
}
private static String reverse(String str) {
if (str == null) {
throw new IllegalArgumentException();
}
if (str.trim().isEmpty()) {
return str;
}
StringBuilder sb = new StringBuilder();
int ws = 1; // word start index
for (int i = 0; i < str.length(); i++) {
if (str.charAt(i) == ' ') {
if (ws != 1) {
sb.append(revWord(str.substring(ws, i)));
}
sb.append(' ');
ws = 1;
} else {
if (ws == 1) {
ws = i;
}
}
}
if (ws >= 0) {
sb.append(revWord(str.substring(ws, str.length())));
}
return sb.toString();
}
private static String revWord(String str) {
StringBuilder sb = new StringBuilder();
for (int i = str.length()  1; i >= 0; i) {
sb.append(str.charAt(i));
}
return sb.toString();
}

Safi
December 20, 2014 Recursively try all possible characters for current digit (e.g. {a,b,c} for 2).
private static List<String> listWords(String num) {
List<String> res = new ArrayList<>();
gen(num, 0, new StringBuilder(), res);
return res;
}
private static void gen(String num, int i, StringBuilder sb, List<String> res) {
if (i >= num.length()) {
res.add(sb.toString());
} else {
char ch = num.charAt(i);
if (ch == '1') {
sb.append(ch);
gen(num, i + 1, sb, res);
sb.delete(sb.length()  1, sb.length());
} else {
char first = getFirst(ch);
for (int j = 0; j < (ch == '7'  ch == '9' ? 4 : 3); j++) {
char c = (char) (first + j);
sb.append(c);
gen(num, i + 1, sb, res);
sb.delete(sb.length()  1, sb.length());
}
}
}
}
private static char getFirst(char ch) {
switch (ch) {
case '2':
return 'a';
case '3':
return 'e';
case '4':
return 'g';
case '5':
return 'j';
case '6':
return 'm';
case '7':
return 'p';
case '8':
return 't';
case '9':
return 'w';
default:
throw new IllegalArgumentException();
}
}

Safi
December 18, 2014 I like your solution but it also not O(1); it's O(b/8). Where b is number of bits.
If you argue and say that the number of bits itself is a constant (16 or 32 or 64 or whatever) and so O(b/8) is also a constant then why this fancy algorithm? In this case every algorithm someone could come up with (even the most trivial ones) would be constant time algorithms. Counting the 1bits, or counting the number of times "value & (value  1) != 0" can happen are all O(b) algorithms and thus would count as constant time algorithms the very same way...
No guys, Anonymous is right. He splits up the 1TB file into 1024 smaller files. Sorts each file separately in memory (each small file is 1GB). Then does a kway (1024way) merge of the sorted smaller files in memory (and writes out result to a new file).
For the merge part we don't need to have all 1TB available in memory. We just need the (current) smallest element from each chunk (smaller file); the smallest of these 1024 smallest elements gets written to the final result file. We can maintain the smallest element with a minheap. Merging (i.e. the heap) requires only 1024*4 bytes (assuming one element is 4 bytes) which is just 4K.
So we'll have plenty of space for extra stuff to improve performance like 1024 buffers to read elements from the smaller files and an output buffer to write elements into the resulting file (after popping head of the heap 2^38 times)  so we don't have to read/write elements from/to disk onebyone.
Two way? That would mean splitting up the file to 2 smaller files each with size of 512 GB. To perform the merge sort you have to sort the smaller files first; but they don't fit into the main memory... Of course you can recursively twoway external sort each of the smaller files; if that's what you meant. ;)
 Safi December 16, 2014In case of plainold recursion time complexity is O(2^n) and space complexity is O(n). In case of DP both time and space are O(n).
Note that these complexities are expressed in the value of n and not in the length (number of bits) of n. So O(2^n) is exponential and all the other O(n) complexities are pseudopolynomial.
In short, it doesn't matter how you implement it, both time and space complexities are exponential. ;)
Trivial solution is to sort the array in decreasing order and start picking elements until sum is >= K. That's O(n*logn).
You can't use bucket sort because we don't know anything about the n objects. Yeah, sure you can create 2^32 buckets but at the end iterating over those buckets won't be that efficient. :) You can try arguing that 2^32 is a constant which is better than O(n*logn). :D
You can't use radix sort because, again, we don't know anything about the n objects. We don't know how big they can be thus we don't know how many 0/1 bits are needed to represent them. In the worst case each number is represented with logn bits so radix sort won't give us better than O(n*logn).
IMO, smartest and easiest thing to do is to maxheapify the n objects and pop head as long sum of popped elements doesn't reach K. This will be O(n + k*logn) where k is the number of elements required to be popped. On average this is better than n*logn but because k = O(n) it's still O(n*logn). :)
You are right this is a weighted interval scheduling problem. But the time complexity is not linear. You can't do better than O(n*logn) time and O(n) space where n is the number of intervals ("gene predictions"). Just sorting the intervals takes more than linear time...
 Safi December 13, 2014Java implementation. No need for "visited" set because when we color a pixel with the new color we know we've already visited it.
private static void fill(int r, int c, int newColor, int[][] mat) {
int rows = mat.length;
if (rows == 0) {
return;
}
int cols = mat[0].length;
int oldColor = mat[r][c];
if (oldColor == newColor) {
return;
}
Queue<Integer> queue = new ArrayDeque<>();
queue.add(r * cols + c);
while (!queue.isEmpty()) {
int pos = queue.poll();
int currR = pos / cols;
int currC = pos % cols;
if (mat[currR][currC] != oldColor) {
continue;
}
mat[currR][currC] = newColor;
if (currC > 0)
queue.add(currR * cols + currC  1);
if (currC < cols  1)
queue.add(currR * cols + currC + 1);
if (currR > 0)
queue.add((currR  1) * cols + currC);
if (currR < rows  1)
queue.add((currR + 1) * cols + currC);
}
}

Safi
December 12, 2014 1) "The weight of the edge should be proportional to the darkness of the pixel." you said the VERTEX represents the pixel. So which pixels' darkness did you mean when mentioning edge weight?
2) "an edge between 2 neighboring pixels" based on this you will have no separate islands in the graph; but one single component. So why the spanning tree algorithm?
Your solution doesn't make any sense to me.
Java implementation with distance based maxheap. Time complexity O(n^2).
private static List<List<Integer>> getPoints(Point[] points) {
List<List<Integer>> res = new ArrayList<>();
for (int i = 0; i < points.length; i++) {
List<Integer> closestPoints = new ArrayList<>();
Queue<PointWithId> queue =
new PriorityQueue<>(3, new ClosestComparator(points[i]));
for (int j=0; j<points.length; j++) {
if (j==i) {
continue;
}
queue.add(new PointWithId(points[j], j));
if (queue.size() > 3) {
queue.poll();
}
}
while (!queue.isEmpty()) {
closestPoints.add(queue.poll().id+1);
}
Collections.reverse(closestPoints);
res.add(closestPoints);
}
return res;
}
static class Point {
double x, y;
Point(double x, double y) {
this.x = x;
this.y = y;
}
}
static class PointWithId extends Point {
int id;
PointWithId(Point p, int id) {
super(p.x, p.y);
this.id = id;
}
}
private static class ClosestComparator implements Comparator<Point> {
private Point point;
public ClosestComparator(Point point) {
this.point = point;
}
@Override
public int compare(Point p1, Point p2) {
double d1 = Math.pow(Math.abs(point.x  p1.x), 2) +
Math.pow(Math.abs(point.y  p1.y), 2);
double d2 = Math.pow(Math.abs(point.x  p2.x), 2) +
Math.pow(Math.abs(point.y  p2.y), 2);
if (d1 == d2) {
return 0;
}
return d1 > d2 ? 1 : 1;
}
}
You can get O(n*logn) average time complexity with a kdtree and its nearestneighbour query. Good luck coding that on an interview. :)
 Safi December 12, 2014I don't think average is going to work. Consider this: let's simplify the problem to 1D and let's assume each guess has the answer "the point you are searching for is to the {left, right} relative to your guess".
Let's say the hidden point is at x=14. These are my guess, answer pairs:
0, totheright
1, totheright
3, totheright
7, totheright
15, totheleft (ok, now I know it's (7..15) let's switch to binary search)
11, totheright
13, totheright
14, foundit!
If you take the average of these guesses then you are far from the actual point we were searching for.
If you look at these points and assume that they were guessed based on the above method then the best guess is the LAST guess. If you can recreate the guessing process then it can be found.
How can you recreate? Start at the left/rightmost point, then take points onebyone as long as distance increases exponentially. If 2^i distance stops being true then you reached the part where binary search kicks in. You do binary search along the remaining point as long as any point remains. If no point remains then that was the last point. That's the best guess.
In general the whole solution depends on how the guesses are made and what responses you receive for them.
Yes, this is a quadratic time and linear space algorithm; without sorting. Count the occurrence of each number then check each pair of numbers and see whether the third number is in the map. You need the occurrence for inputs like {4, 2, 100} where there is no result if you (correctly) don't use a number more than once (if you would just simply use a Set then you might use up a number more than once and the above input would give a false result like {4, 2, 2}).
private static List<Integer> find(int[] arr) {
Map<Integer, Integer> occs = new HashMap<>();
for (int i : arr) {
Integer occ = occs.get(i);
if (occ == null) {
occ = 0;
}
occ++;
occs.put(i, occ);
}
for (int i = 0; i < arr.length; i++) {
occs.put(arr[i], occs.get(arr[i])  1);
for (int j = 0; j < arr.length; j++) {
if (j == i) {
continue;
}
occs.put(arr[j], occs.get(arr[j])  1);
int third = (arr[i] + arr[j]);
Integer thirdOcc = occs.get(third);
if (thirdOcc != null && thirdOcc > 0) {
List<Integer> res = new ArrayList<>();
res.add(arr[i]);
res.add(arr[j]);
res.add(third);
return res;
}
occs.put(arr[j], occs.get(arr[j]) + 1);
}
occs.put(arr[i], occs.get(arr[i]) + 1);
}
return null;
}

Safi
December 11, 2014 // example
// f(4)
//
// 1 1 1 1
// 2 1 1
// 1 2 1
// 1 1 2
// 2 2
// 3 1
// 1 3
// 4
public static void main(String[] args) {
assert getSeqCount(4) == 8;
assert getSeqCount2(4) == 8;
}
/**
* DP (quadratic)
*/
private static int getSeqCount2(int n) {
int[] dp = new int[n + 1];
dp[0] = dp[1] = 1;
for (int i = 2; i < dp.length; i++) {
for (int j = 0; j < i; j++) {
dp[i] += dp[j];
}
}
return dp[n];
}
/**
* naive recursive (exponential)
*/
private static int getSeqCount(int n) {
if (n <= 1) {
return 1;
}
int res = 0;
for (int i = 1; i <= n; i++) {
int newN = n  i;
if (newN >= 0) {
res += getSeqCount(newN);
}
}
return res;
}

Safi
December 11, 2014 public static void main(String[] args) {
assert isPalindrome("abbbcbbba");
assert isPalindrome("aba");
assert isPalindrome("aa");
assert isPalindrome("a");
assert isPalindrome("");
assert !isPalindrome("ab");
assert !isPalindrome("abc");
}
private static boolean isPalindrome(String str) {
int l = 0, r = str.length()  1;
while (l < r) {
if (str.charAt(l++) != str.charAt(r)) {
return false;
}
}
return true;
}

Safi
December 11, 2014 Simple DFS implementation.
public static void main(String[] args) {
char[][] mat = {
{'.', '.', '.', '.', 'W', 'X'},
{'.', '.', 'W', '.', 'W', '.'},
{'.', '.', 'W', '.', '.', '.'},
{'.', '.', 'W', 'W', 'W', '.'},
{'.', 'O', '.', '.', 'W', '.'},
{'.', '.', '.', '.', '.', '.'}
};
assert existsPath(mat);
}
static class Pos {
int r, c;
Pos(int r, int c) {
this.r = r;
this.c = c;
}
}
private static boolean existsPath(char[][] mat) {
int rows = mat.length;
if (rows == 0) {
return false;
}
int cols = mat[0].length;
Pos start = find(mat, 'O', rows, cols);
Pos end = find(mat, 'X', rows, cols);
if (start == null  end == null) {
return false;
}
Set<Integer> visited = new HashSet<>();
return exists(start, end, visited, mat, rows, cols);
}
private static boolean exists(Pos curr, Pos end, Set<Integer> visited, char[][] mat, int rows, int cols) {
if (curr.r < 0  curr.r >= rows  curr.c < 0  curr.c >= cols) {
return false;
}
if (curr.r == end.r && curr.c == end.c) {
return true;
}
final int pos = curr.r * cols + curr.c;
if (visited.contains(pos)) {
return false;
}
if (mat[curr.r][curr.c] == 'W') {
return false;
}
visited.add(pos);
if (exists(new Pos(curr.r  1, curr.c), end, visited, mat, rows, cols) 
exists(new Pos(curr.r + 1, curr.c), end, visited, mat, rows, cols) 
exists(new Pos(curr.r, curr.c  1), end, visited, mat, rows, cols) 
exists(new Pos(curr.r, curr.c + 1), end, visited, mat, rows, cols)) {
return true;
}
visited.remove(pos);
return false;
}
private static Pos find(char[][] mat, char ch, int rows, int cols) {
for (int r = 0; r < rows; r++) {
for (int c = 0; c < cols; c++) {
if (mat[r][c] == ch) {
return new Pos(r, c);
}
}
}
return null;
}

Safi
December 11, 2014 Considering how poor the question was formulated with all those typos I assume the last 'A' is a typo and it should be 'CAE2W3D'. Btw, it can be solved in linear time: assume only uppercase characters and numbers. Iterate over input and count occurrences of letters (~ bucket sort); plus maintain a sum variable for cases a number is encountered. At the end go over occurrences array and at the end append sum.
public static void main(String[] args) {
assert "ACDEW5".equals(sort("CAE2W3D"));
}
private static String sort(String str) {
if (str.length() == 0) {
return str;
}
int[] occs = new int[26];
int sum = 0;
for (int i = 0; i < str.length(); i++) {
char ch = str.charAt(i);
if (ch >= 'A' && ch <= 'Z') {
occs[ch  'A']++;
} else {
sum += ch  '0';
}
}
StringBuilder sb = new StringBuilder();
for (int i = 0; i < occs.length; i++) {
char ch = (char) ('A' + i);
while (occs[i] > 0) {
sb.append(ch);
occs[i];
}
}
if (sum > 0) {
sb.append(sum);
}
return sb.toString();
}

Safi
December 11, 2014 Great algorithm. Here is the java version:
public static void main(String[] args) {
assert "bac".equals(produce("abacb"));
assert "cba".equals(produce("bacba"));
assert "ba".equals(produce("babab"));
assert "tsocrpkijgdqnbafhmle".equals(produce("nlhthgrfdnnlprjtecpdrthigjoqdejsfkasoctjijaoebqlrgaiakfsbljmpibkidjsrtkgrdnqsknbarpabgokbsrfhmeklrle"));
}
private static String produce(String str) {
int[] remaining = new int[26];
for (int i = 0; i < str.length(); i++) {
char ch = str.charAt(i);
remaining[ch  'a']++;
}
StringBuilder output = new StringBuilder();
for (int i = 0; i < str.length(); i++) {
char ch = str.charAt(i);
int prevOccIndex = output.indexOf("" + ch);
if (prevOccIndex == 1) {
output.append(ch);
} else {
int lexGreaterIndex = 1;
for (int j = prevOccIndex + 1; j < output.length(); j++) {
if (output.charAt(j) > ch) {
lexGreaterIndex = j;
break;
}
}
if (lexGreaterIndex != 1) {
boolean allHaveSubs = true;
for (int j = prevOccIndex + 1; j < lexGreaterIndex; j++) {
if (remaining[output.charAt(j)  'a'] <= 0) {
allHaveSubs = false;
break;
}
}
if (allHaveSubs) {
output.delete(prevOccIndex, lexGreaterIndex);
output.append(ch);
}
}
}
remaining[ch  'a'];
}
return output.toString();
}

Safi
December 10, 2014 Both 3,4,6 and 1,3,4,6 are sequences and subsequences of 1,5,3,4,6,4. :)
The question is about contiguous subsequences.
Otherwise we would have an online longest increasing subsequence problem. In this case imo there is no difference between the offline and an online algorithm
Not the length but the actual sequence is asked. So in the worst case we need to store the whole input. Plus we don't need to answer the question while the numbers come in; only at the end (and because we are reading from a file we know where the end is). Thus, again, there's no point having an online algorithm. We can simply store all numbers in a list and do the offline LIS algorithm after the last number.
It's not so difficult if you are going with the O(n^2) time, O(n^2) space implementation of the suffix tree. The collapsed O(n) space suffix tree is a bit harder; to build it in O(n) time is even more.
I'd doubt they would require you to implement it with a linear time and space suffix tree. THAT would be quite difficult indeed.
Open Chat in New Window
We don't win anything with the bloom filter. We only want to store "presence"; whether a number was present in those billion numbers or not. We can do this by using a BitSet, for instance, which requires 1 bit per element. Again, we don't win anything with a bloom filter if it requires ~10 bits per element. Not to mention the error rate that comes with it.
 Safi January 26, 2015