skeptical.scientist
BAN USER 2of 2 votes
AnswersWe define C(n) as the number of ways to take n identical objects out of a bucket, where objects may be taken 1, 2, or 3 at a time.
 skeptical.scientist in United States
Example: C(4)=7, because you can take 4 objects in the following 7 ways:
1,1,1,1
2,1,1
1,2,1
1,1,2
2,2
3,1
1,3
Write a function for C(n) in the language of your choice. Report Duplicate  Flag  PURGE
Google Software Engineer / Developer Coding  0of 0 votes
AnswersHow would you define a data structure that stores a set of values (i.e., a value cannot appear more than one time), and implements the following functions:
 skeptical.scientist in United States
add(p)adds the value p to the set
delete(p)removes the value p from the set
getrandom()returns a random value from the set (all items should be equally likely). (Assume you have access to some nice random() function.)
Need not write actual code, just sketch a structure to implement this efficiently. Report Duplicate  Flag  PURGE
Google Software Engineer / Developer Data Structures  1of 1 vote
AnswersImplement a bucket fill function for a bitmap image. Assume your bitmap image is a 2d array of integers, where each integer corresponds to a different color. bucketfill should take three inputs, newcolor, x, and y, and change the color to newcolor in the largest contiguous monochromatic region containing the point (x,y)
 skeptical.scientist in United States Report Duplicate  Flag  PURGE
Google Software Engineer / Developer Coding
"But I will explain it later (after a few days), if no one posts an explanation and someone is actually interested in knowing."
Ha. Haha. Ha.
I guess I'll have to figure it out myself.
This is based on the recurrence C(n+m) = C(n)C(m3)+C(n+1)(C(m3)+C(m4))+C(n+2)C(m2). The recurrence is easily proved by induction on m. By using this recurrence, you can find C(2n), C(2n+1), C(2n+2) in terms of C(n), C(n+1), C(n+2) in O(1) time. This gives an O(log(n)) algorithm for computing C(n).
The way this works in the python code here is by substituting m=n, n+1, n+2 in this recurrence relation, to get equations for C(2n), C(2n+1), C(2n+2) in terms of C(n4)...C(n+2). On input k, set n = k//2, so k==2n or k==2n+1. The function is first called recursively to determine C(n)...C(n+2), then the standard recurrence relation C(n+3) = C(n)+C(n+1)+C(n+2) is applied backwards to find C(n4)...C(n1). Then the three equations can be applied to find C(2n), C(2n+1), C(2n+2). Finally the algorithm returns either [C(2n+2), C(2n+1), C(2n)] or [C(2n+3), C(2n+2), C(2n+1)], depending on whether k is even (so k==2n) or k is odd (so k==2n+1).
@Anonymous:
Since Alva's hasn't answered your time space question, I will. It's O(1) space and O(log(N)) time.
For this kind of thing, you want to work it out function by function. In this case we start with matrix_multiply since it only uses elementary operations, then work out matrix_pow since it only uses matrix_multiply and elementary ops, and finally C since it calls matrix_pow.
1) matrix_multiply() takes O(N^2) space (because of the new long[N][N]) and O(N^3) time (because of the three nested for(i,j,k < N) loops) to multiply NxN matrices. All other operations are elementary O(1) operations.
2) matrix_pow() takes O(N^2) space (because of the new long[N][N]) and O(N^3*log(pow)) time (because while(pow>0) { pow /= 2; } runs for log(pow) iterations, and in each iteration we are doing two O(N^3) matrix_multiply() calls on NxN matrices).
3) C takes O(log(N)) time and O(1) space, since it calls matrix_pow() on a 3x3 matrix with pow=N, and otherwise only does elementary operations.
1+1+1+2 = 5 objects. We are looking at ways to take 4 objects total, not ways to take objects in 4 different sets.
 skeptical.scientist September 26, 2013This doesn't implement getrandom() efficiently.
 skeptical.scientist May 28, 2013C(1)=1: take 1
C(2)=2: take 2, or 1+1
C(3)=4: take 3, 2+1, 1+2, or 1+1+1
C(4) is 7, as in the problem statement.
C(5)=13: take 3+2, 2+3, 3+1+1, 1+3+1, 1+1+3, 2+2+1, 2+1+2, 1+2+2, 2+1+1+1, 1+2+1+1, 1+1+2+1, 1+1+1+2, 1+1+1+1+1
Try again.
Let's say there are n objects in the bucket. You can start by picking 1, 2, or 3. If you start by picking 1, there are C(n1) ways to pick the remaining n1 objects. That means there are C(n1) ways to pick n objects, if you start by taking a single object. Similarly, there are C(n2) ways to pick n objects if you start with taking 2 objects, and C(n3) if you start with 3. So in total there are C(n1)+C(n2)+C(n3) ways of picking n objects out of the bucket.
 skeptical.scientist January 11, 2013That works. But it is extremely slow. It will take you around 10 minutes to compute C(40) with this method, and two decades to compute C(60). A better algorithm will take under a millisecond.
This is because your algorithm requires O(1.84^n) arithmetic operations to compute C(n). A reasonable algorithm will take O(n) arithmetic operations to compute C(n). A fast algorithm will take O(log n) operations.
Ah, thanks anonymous. I was forgetting to update the hash table with the new location of the value formerly at the end of the array when I moved the value.
 skeptical.scientist January 07, 2013Okay, two people have now said there's a problem with my delete function, which probably means there is... but what? I'm not seeing it.
 skeptical.scientist January 07, 2013@anonymous: if you are turning the hashmap into an array every time get_random() is called, that will take O(n) time. Unless you have the array stored all the time, and just update it when you add/remove an element from the set. That is exactly what my solution does.
 skeptical.scientist January 07, 2013zylo, get *which* node? You need to write a function which returns each element with equal probability. I'm not saying you can't do this in O(1) time with just a hash table, but I don't see how you would do it.
 skeptical.scientist January 06, 2013Reread the question. Items can be taken 1, 2, or 3 at a time, regardless of n.
 skeptical.scientist January 05, 2013Good point.
 skeptical.scientist January 05, 2013If you do linkedlist+hashtable, get_random() will take O(size) time instead of O(1) time, because you need to traverse the first k list elements to find the kth.
The array size being fixed is not a big issue  if you run out of space, you can move the array somewhere else where you have twice as much space. That means you get n O(1) time insertions before a single O(n) time insertion (because of an array move), which means that insertion/deletion is still O(1) time on average.
If you have a linked list or array, you can't quickly insert/delete, because you can't quickly check whether the element is present and where in the list/array it is. You can do it, but it will be slower.
These operations will take O(n) time once your set grows to size n, whereas in the array+hashtable implementation both operations are O(1).
Yes, C(0)=1. There is exactly 1 way to take 0 items: just sit there. Also, if C(0) is defined at all, it must be defined as 1 in order to satisfy the recurrence relation at n=3.
However, if having C(0) defined bothers you that much, you could start the definition at 1:
C(1)=1
C(2)=2
C(3)=4
C(n)=C(n3)+C(n2)+C(n1) (if n>3).
Use an array to store the elements of the set, and a hash table to store a list of (key,value) pairs, where the keys are elements of the set, and the values are the array indices for the keys. For convenience, we also store the size of the set.
add(p):
if p is already a key in hash table, return
array[size]=p \\move array if overflow
add p:size to the hash table
size++
delete(p):
i = value stored in hash table with key p
size
array[i]=array[size]
update the value in hash table with key array[i] to i \\Thanks, anon.
delete the key p from the hash table
get_random():
return array[randrange(0,size)]
The function randrange(x,y) is assumed to return a random integer in the interval [x,y).
This has amortized O(1) time for all three functions.
It's not hard to see that C(n) has the following recursive definition (similar to the Fibonacci numbers):
C(0)=1
C(1)=1
C(2)=2
C(n)=C(n3)+C(n2)+C(n1) (if n>2)
This recursive definition leads to an obvious (and slow) algorithm for computing C(n). Slightly better is:
def C(n):
L=[1,1,2]
while n:
n=1
new=L[0]+L[1]+L[2]
L[0]=L[1]
L[1]=L[2]
L[2]=new
return L[0]
This is okay, but still slow if n is very large. Faster algorithms can be found using linear algebra, similar to fast algorithms for the Fibonacci numbers. One way to regard the previous algorithm is to think of L=[1,1,2] as a vector, and then each iteration replaces L by M*L, where M is the matrix
[0 1 0]
[0 0 1]
[1 1 1]
So what we are really doing is computing the first entry of (M^n)*L. Fast algorithms for matrix exponentiation can accelerate this much faster than just repeatedly multiplying (on the left) by M, n times in a row.
 skeptical.scientist January 05, 2013def bucketfill(image,newcolor,x,y):
width=len(image)
height=len(image[0])
oldcolor=image[x][y]
if oldcolor==newcolor: return #added because of eugene's reply
image[x][y]=newcolor
if x>0:
if image[x1][y]==oldcolor: bucketfill(image,newcolor,x1,y)
if y>0:
if image[x][y1]==oldcolor: bucketfill(image,newcolor,x,y1)
if x<width1:
if image[x+1][y]==oldcolor: bucketfill(image,newcolor,x+1,y)
if y<height1:
if image[x][y+1]==oldcolor: bucketfill(image,newcolor,x,y+1)
return

skeptical.scientist
January 05, 2013 Open Chat in New Window
"Python seems so cryptic. Your code is hard to understand."
 skeptical.scientist October 12, 2014I disagree. Python is normally very readable. It's just this code in particular which is cryptic, not the language in general.