Knewton Interview Question
Software Engineer InternsCountry: United States
Interview Type: Phone Interview
Actually, you can generate a random sequence of distinct indices in a lazy fashion (or, generate a random permutation in a lazy fashion).
You don't really have to do all the scan stuff etc at the end.
The algorithm will look like
for index in random_permutation(n):
if a[index-1] != null:
return a[index-1]
throw AllNullException
random_permutation can be implemented to give the permutation lazily, by using an algorithm similar to Fischer-Yates, but maintaing the array as a hashtable.
# Generates random permutation of 1,2,...,n lazily.
def random_permutation(n):
rand = random.SystemRandom()
num_left = n
table = {}
while num_left > 0:
r = rand.randint(1, num_left)
oldr = r
if r in table:
# the new element in that slot.
r = table[r]
if num_left in table:
# The case when the last element was picked.
table[oldr] = table[num_left]
else:
# the swap last element with picked element step
# of Fisher-Yaters.
table[oldr] = num_left
num_left -= 1
yield r
Each non-null is equally likely to show up first.
@vz, well Eugene is sometimes right too, and here his idea doe work. His problem is he thinks too highly of himself to spend too much time on any problem, so he ends up mentioning some theoretical opinions of his which are right about only half the time.
@Le Subbu: That works too. I like that idea actually -- that way you don't have to take a special case just to guarantee termination.
@Fake Le Subbu: Most of the "theoretical opinions" you're criticizing are situations where the algorithm is stated without code. This has nothing to do with not spending enough time on the problem, and everything to do with an assumption that readers can code for themselves if they know the algorithm.
* What is implied by 10million? Doesn't look large to me to hold in memory - only 40MB.
* Can I modify the array?
If so, perform binary search until you find a non-null element, then swap out that element with central-most 'null' element in the array. Then return it.
The modifiability and the swap step is so that over time you collect non-null elements near center of the array, gradually making hunt more efficient and perhaps seize to do swap step eventually - say after about (10million/# non-nulls) calls to the method.
An alternative solution is to iterate through the array to filter out the null elements. You can then just choose an element at random from the filtered array.
This has a time complexity of O(n) and a space complexity of O(n). However, if the array does not change regularly, you only need to do the filter stage once ( O(n) ) meaning the select method is constant time.
For methods where the space matters, the array changes regularly or the method is only called once, my other answer is better (as it is constant space and linear time).
If the numbers in given array are randomly distributed, without any prior information on distribution type, then simple linear scan is good enough and would be equally likely to find non-null as would picking random index.
It is because, randomizing an already random array doesn't add any benefit.
A simple (and quick, O(n) ) answer is to choose a random start position then linear search to find a non-null element.
public static int randomElement(Object[] a, int start, int end) {
int offset = (new Random()).nextInt(end - start);
for (int counter = 0; counter < end - start; counter++) {
// i is the start + counter + offset, but looped such that start <= i < end
int i = start + ((counter + offset) % (end - start));
if (a[i] != null) {
return i;
}
}
// Return -1 if all elements are null
return -1;
}
This will be fine if the elements are evenly distributed, but if they are grouped, the element at the start of the group will be returned more often. For example, in the array [null, null, null, foo, bar], foo will be returned more often than bar.
What happens if the random start position generated contains no non-null elements after it?
If you look at the assignement to i, you can see that it is the combined offset (counter + offset) is modulo size of array plus the start index.
This means that i is always within the range start <= i < end and it's value goes [end-2, end-1, start, start+1].
Therefore, if there are no non-null elements after the (start + offset) position (and the linear search reaches end), it continues the linear search from the start position.
If there are no non-null elements, the loop will finish (after end-start iterations) and return -1 (to indicate there are no non-null elements)
I don't like this question.
1) Situation is stupid. Why would someone who wants this query allow such an array to be built?
2) We don't know what process created the 10million element array (i.e., if we can assume it was a uniform random process, then we can scan to the first non-null element an return it) How many such queries will be performed? This, along with other considerations, will determine if preprocessing makes sense or not.
3) Why would
Eh?
This is a simplified version.
Think of it as selecting a random element satisfying a specific predicate and/or that predicate isn't known before.
If you are totally in the dark in predicting possible predicates (even sets of them?), then make sure you place the elements into a data structure that randomizes on insertion.
Perhaps the data structure is that way because of other constraints unknown to us?
Don't call a problem stupid so easily. This is one of the problems which sounds likely to have come from a "real world" problem.
Funny that you of all people find it stupid.
Anyway...
The way the question is posed has no interesting/optimal/"cool" answers.
Prove the above wrong and then talk.
Why should there be optimal/cool solutions? Real world problems usually don't.
btw, your points are what make this a good question!
It is ripe for discussion, discuss a solution, see if a different data structure works, etc.
Anyway. I don't want to "talk" anymore. You are worse than I thought.
The question is stupid the way it is posted ("in a reasonable amount of time" ??). The only answers possible with the current form of the question are brute-forcish.
A good engineer would know the predicates before hand and split off relevant streams into auxillary arrayLists/vectors, or without knowing the predicates, would have duplicated/stiphened off the whole stream into a ranomize-on-insert DS.
The OP should post the discussions he had with the interviewer or he should mention more details.
Don't support the current quality of postings on this website.
Pick an index at random and check if the element there is non-null. Keep re-picking indexes at random until you get a non-null element. This approach makes no assumptions about the distribution of the data and has an expected running time of O(N/K) if there are K non-null elements in the array, which is the best you can hope for if you have no information about how elements are distributed.
- eugene.yarovoi March 16, 2014The algorithm as stated above is unbounded in terms of worst-case (as opposed to expected) time, but that can be remedied by adding a condition that if we fail to find a non-null element after n steps, we will do a linear scan of the array to find all the non-null values, and pick randomly from that list. This change caps the algorithm at O(n) time. Since the linear-time scan is only done when O(n) time has already been used anyway, the bounded algorithm is slower than the unbounded one by at most a constant factor for any input. It is therefore O(N/K) expected time and O(N) worst-case time.