Google Interview Question
Site Reliability EngineersTeam: Site reliabilty
Country: United States
Interview Type: Phone Interview
In this algo, the probability of generating each valid number is not same ... the numbers next to the blacklisted numbers are having double the chance than others
@mrb, 6 should become 9, because, it just three times.
6 > 2 => 6 -> 7
7 > 4 => 7 ->8
8 > 7 => 8 ->9
@Ashupriya no, the probability is the same. my method is actually the same as the follows:
1. generate a sequence s from 1 to n,
2. remove the elements in the blacklist from s.
3. generate a random number i, we should get the element from s with index i.
I am not sure if I get this question. In your example if you get 5, why can't we simply return 5 since it is not there in the blacklist?
Probability is the same for all, I tried this with a range(0, 9), with a blacklist [1, 2, 5] and counted how many times each number got selected. After 10 million iterations this is what I got:
[1429912, 0, 0, 1428264, 1428591, 0, 1426628, 1430268, 1428323, 1428014]
You will never get 5 as the ans.
In a way, you have added 5 in the blacklist as well.
Sry. -1 to this soln.
@upper Anonymous, if you get a rand 3, the output will be 5. I think this is a perfect solution.
What is the actual question here?
1. If the actual question is to generate a random number in a range but not in a blacklist without using rand function of a particular programming language then its a different question altogether.
( You have to write your own random number generator in this case. )
2. If actual question is to generate the random numbers uniformly which are not in the blacklist and you can use built-in rand function of programming language of your choice then many algorithms discusses here can achieve this.
( Any algorithm you devise to achieve this depends upon the uniformity of build-in rand function you chose to use. )
3. If the question is to do it efficiently then none of the above answers talk about doing it in linear or sub-linear time.
( Need to optimize any methods discussed above. )
create a hashtable from [2,3,5,9,199,200,344] list
while(true){
int num = random(1,n);
if(!hashtable.contains(num))
echo num;
}
What if random function always returned the same blacklisted number. The while loop will never terminate
This method will always work because the probability of the event where there would be an infinite loop is given by P(e)=(r/n)^k where k approaches infinity and r is the total number of unique words in the list that are in range (1,n).
So if you apply limits P(e) = 0 as long as all the numbers in range (1,n) aren't present in the blacklist(which is an invalid test case and should be checked before entering the loop)
This is by far the most efficient in both compute and space, and produces the best random distribution.
Hashing the original "exclude" list is O(n), where n = size of exclude list. Memory use is n, and compute is bounded by the time it takes to create a hash table entry of an integer.
All subsequent calls to solve this question can then reference that hash table. And a hashtable lookup is a fixed time.
So, you have O(n) setup time, O(n) storage space requirement, and O(1) time for execution.
1. Do number1 = random(1000), if the number1 is not in the list, return
2. If it's in the list, switch the value on index (1000 - 1) with the value on index(number1 - 1)
3. Do number1 = random(999), if the number1 is not in the list, return; else go to step 2
Keep doing it till all the numbers in the list are switched out.
#python
import random
n = 100
blacklist = [2,3,59,99]
max_rand = n-len(blacklist)
blackhash = {}
for i, b in enumerate(blacklist):
blackhash[b] = i+1
def gen_random():
r = random.randint(1, max_rand)
if (blackhash.has_key(r)):
r = max_rand + blackhash[r]
return r
print gen_random()
Just a thought:
1) keep an array from A[]={12-----n}
2) for each number in the blacklist, find their index in the array and swap it with the first,second,third and so on...such that all the blacklisted numbers are out of our probes and in the beggining of the array
or example range is (1,1000), list is [2,3,5,9,199,200,344] == [1 4 6 7 8....198 201 ....343 345...1000]
3) now we have a new range say [i - n] Here i = 8 and N = 1000
4) Generate numbers between this Range.
Can we take the question literally? It doesn't say anything about probabilty. Just find the first number in (1,n) that is not in (i,j) and return it :P
The efficient way is as follows:
If the list length is k, it divides the range to k+1 intervals. Randomly choose the interval first, weighing the intervals according to their lengths (i.e. [1,4] is twice as likely to pe picked as [6,7]). Intervals can easily be picked by mapping the interval range to [0,1] and selecting a random number from there, and seeing which interval it corresponds to. Once you chose the interval, choose the number uniformly from that interval.
On the other hand, choosing over and over until you find a number not in the list would work in most of the cases.
Does this work? It's either O(N) with the blacklist or O(NlogN) with the blacklist (wasn't clear if a sort was required).
We could also do O(N + R) if R was small enough. Make a hashtable, like was suggested, then traverse R looking for presence in the table.
// assume all the numbers are positive so -1 means no value possible
int findValue( int rangeStart, int rangeEnd, int[] blacklist ) {
sort(blacklist) // assumes its not already sorted...example was
int lookingFor = rangeStart;
for (int i = 0; i < blacklist.length; i++) {
if (blacklist[i] < rangeStart) continue;
if (blacklist[i] > rangeEnd) break;
if (blacklist[i] == lookingFor) {
lookingFor++;
} else {
return lookingFor;
}
}
return -1;
}
1. Trivial solution: pre-build a mapping from [1..n-len(blacklisted)] to [1..n]. Time O(1), but a wasteful O(n) in space.
2. Pick a random number in [1..n-len(blacklisted)] and add an offset to skip the blacklisted values. You get this offset by bisecting a pre-generating ordered interval calculated from the blacklisted values. A good O(log n) on time and much better O(m) in space (m=len(blacklisted)). Python code:
import random
import bisect
def generate_random(lower, upper, blacklisted):
intervals = [b - i for (i, b) in enumerate(blacklisted)]
rand = random.randint(lower, upper - len(blacklisted))
offset = bisect.bisect_right(intervals, rand)
return rand + offset
Sort the list.
Now divide 1..n into intervals,till the blacklisted number.
e.g in this case
for example range is (1,1000), list is [2,3,5,9,199,200,344]
intervals will be {{1}, {4}, {6..8}, {10..198}, {201..343}, {345..1000}}
each interval can be stored with start and end and all intervals can be stored in array.
Pick one interval randomly among the interval, by using rand generator on total interval length.
One the interval is picked, a rand number can be generated between interval start and end and returned.
Let N be the size of the list. Split the range into intervals (at most N+1) using the numbers in the list as markers. Pick an initial random number from each of these intervals. Now, from among these initial random numbers, pick one final random number according to some distribution based on interval size. So, larger the interval, higher the probability that the final random number will be chosen from this interval.
eg: range is (1,1000), list is [2,3,5,9,199,200,344]
[1], [4], [6, 8], [10, 198], [201, 343], [345,1000]
#include <iostream>
#include <vector>
unsigned int random(unsigned int n, const std::vector<unsigned int> &list)
{
unsigned int maxnum = n - list.size();
unsigned int randomNum = (rand() % maxnum) + 1;
int index = 0;
for (int i = 0; i < randomNum; ++i)
{
if (i == list[index])
{
randomNum++;
index++;
}
}
return randomNum;
}
int main(int argc, char *argv[])
{
std::vector<unsigned int> list;
list.push_back(2);
list.push_back(3);
list.push_back(4);
list.push_back(5);
list.push_back(199);
list.push_back(200);
list.push_back(344);
srand(4515345123412534);
printf("random number: %u", random(350, list));
getchar();
}
// TIME: O(log(L))
// MEMORY: O(1)
// L -> length of the blacklist
// Generate random numbers with some of the numbers blacklisted,
// for example range is (-2,12), list is [2,3,5,9].
#include <iostream>
#include <cstdlib>
#include <ctime>
#include <map>
int generate(int lb, int ub, const int bl[], int L) {
int gen = rand() % (ub - lb + 1 - L);
int l = 0;
int u = L;
while (l < u) {
int m = l + (u-l)/2;
int f = bl[m] - m - lb;
if (f >= gen+1) {
u = m;
} else {
l = m + 1;
}
}
int f = bl[l] - l - lb;
int d = f - gen;
return bl[l] - d;
}
int main() {
srand(time(NULL)); // init seed
const int lb = -2;
const int ub = 12;
const int L = 4;
const int bl[L+1] = { 2, 3, 5, 9, ub+1 };
std::map<int, int> count;
for (int i = 0; i < 1000000; ++i) {
++count[ generate(lb, ub, bl, L) ];
}
for (std::map<int, int>::iterator it = count.begin(); it != count.end(); ++it) {
std::cout << "cnt[" << it->first << "] = " << it->second << std::endl;
}
}
Umm, folks:
This is a classic, simple Dynamic Programming question, where memo-ization is the answer.
step 1: traverse the "forbidden" element array, putting each value into a hashtable
step 2: use rand() to pick a number in the 1..n range.
step 3: compute the hash of #2, and see if there's anything in the hashtable with that value.
Step 1 is O(n) + hash_insert_time, which is O(n). But it's ONLY DONE ONCE.
Step 2 is whatever time rand() requires (which should likely be O(1))
Step 3 is constant time (hash_insert_time), or O(1)
Since you only pay Step 1 costs a single time (all subsequent calls to this function don't need to generate the hashtable), the time is:
Initial run: O(n), where n = size of FORBIDDEN array
All subsequent runs: O(1), as it consists only of the cost to run rand() and a hash table lookup.
You use a rand() generator to pick a number.
Assuming that we will be trying to get a random number multiple times, with O(L) preprocessing (L is the length of the list), we can get it to 1 random number call + O(log L) time.
Given a list L, we can form a list of intervals which have the numbers we are interested in (i.e. the complement of L).
The idea is to maintain an almost complete tree of the relevant intervals (the interval being only on the leaf nodes), which each internal node maintains the sum of lengths of intervals which are its descendants.
The root will contain the total length (basically n - L).
Make a random number call to generate a number R in 1 to n-L.
Do a search for R in the tree we created.
Example: If R > total length of intervals in left subtree (say C) of root, subtract that length and recursively look for R-C in the right subtree.
Writing code for this approach in an interview would probably be too difficult.
If the random calls aren't expensive, it is probably better to use a hashtable and generate random numbers multiple times till we find one not in the hashtable (as in kamoliddin's answer). I believe this is called rejection sampling, and is quite a useful technique.
actually, we can make the number in the blacklist a hole in valid range.
1. generate a random number in the valid range(1, 1000 - len(list)).
2. iterate through the blacklist, if the generated number is greater than or equals than the number in the blacklist, we should increase the generated number by 1.
Example: valid range [1,10], black list[2,4,7]
we should generate a number between[1, 10 - 3]
if we got a 1, since 1 is smaller than any number in the blacklist, we don't need to jump any hole, just return 1
if we got a 5, we iterator through the black list:
5 >= 2: jump the hole, 5 -> 6
6>= 4: jump the hole, 6 ->7
7>=7, jump the hole, 7 ->8
so 8 will be returned.
Code as follows:
- gnahzy September 27, 2012