Facebook Interview Question
Software Engineer / DevelopersI think you can solve this in expected O(n), with the same method that u can find
the k-th order statistics, which uses the Quicksort preprocessing method.
It's better than all the above solutions for the same reason Quicksort is the best sorting algorithm in practice.
I think that your solution is correct, choosing a randomly pivot in each step, the average time will be O(N).
1. "heapify" (build heap in place in the input array) a max heap in O(n).
2. Keep extracting the max elements and compute the sum. If the x number of max elements sum to at least K then the complexity is O( x log n).
Overall complexity is O( n + x log n ).
There is a method by which you can extract the top X values in a heap in order in O(X log X) time, improving the worst-case bound given by @Iasthope to O(N + X log X).
Although, there is a deterministic version of Quickselect that solves this problem in O(N).
Assume you know the linear algorithm for finding the median.
1. Find the median M.
2. Partition the array such that the first half is smaller than M, and the second half is larger than M.
3. Compute the sum S of the elements in the second half.
4. If S < K, recurse on the first half to find the smallest set whose sum is larger than K - S. If S > K, recurse on the second half. If S == K, you find the solution.
Forgot the mention the running time.
As you only need to recurse on one half, so the recurrence becomes T(n) = T(n/2) + O(n), which is linear.
exactly my thought, but you need to take care of 0, and negatives. E.g. {0,0,1,2}, k=3, {-1,-2,-3}, k=-2. Here is my code
public class SelectMinNumWithAtLeastSum {
int tmpSum = 0;
private void swap(int[] a, int i, int j) {
int tmp = a[i];
a[i] = a[j];
a[j] = tmp;
}
private int partition(int[] a, int s, int e, int pivot) {
int pv = a[pivot];
swap(a, pivot, e);
int i = s, tmp = s;
while (i < e) {
if (a[i] < pv) {
swap(a, tmp++, i);
}
i++;
}
swap(a, tmp, e);
tmpSum = 0;
for (i = tmp; i <= e; i++)
tmpSum += a[i];
return tmp;
}
private int selectCutPoint(int[] a, int s, int e, int sum) {
int pivot = new Random().nextInt(e - s + 1) + s;
int p = partition(a, s, e, pivot);
if (tmpSum >= sum) {
if (p < e && tmpSum - a[p] >= sum) // make sure advancing p still
// make tmpSum>=sum
return selectCutPoint(a, p + 1, e, sum);
else
return p; // can not advance p, so p is the cut point
} else {
if (a[p] >= 0) {
if (p > s)
return selectCutPoint(a, s, p - 1, sum - tmpSum);
else
// p==s, but tmpSum<sum, so it's impossible to exceed the
// sum
return -1;
} else { // if pivot is negative
if (p < e)
return selectCutPoint(a, p + 1, e, sum);
else
return -1;
}
}
}
public int selectMinNum(int[] a, int sum) {
int pos = selectCutPoint(a, 0, a.length - 1, sum);
if (pos != -1) {
for (int i = pos; i < a.length; i++)
System.out.print(a[i] + " ");
System.out.println();
return a.length - pos; // return number of nums with at least sum
} else
return Integer.MAX_VALUE;
}
public static void main(String[] args) {
for (int i = -13; i < 22; i++) {
int[] a = { -2, -3,-2, -8, -6, -100 };
System.out.println(i+" "+new SelectMinNumWithAtLeastSum().selectMinNum(a,
i));
}
}
linear algorithm to find median is selection rank algorithm. However, this is not stable on o(n)。 worst case may be o(n^2). However, I think this is a amazing solution. I have considered using heap, but it is not comparable to this method on average case.
There is actually a method to find a median of a list in deterministic linear time. It's called median of medians.
If S<K, how can the left side contain your answer? The left side < right side and right side < target
Or what if your target = 7, and array looks like: 1,2,3
Neither side is >=7 but if you use all of the numbers you get 7
Assuming the minimal subset contains m numbers. We can do:
1. put all numbers in an array and compute the total sum along the way - O(n).
2. heapify the array (use min heap) - O(n)
3. start removing min from root as long as the total sum is still >= k - O((n-m)*logn)
Above will work well if m is close to n, i.e. it takes most of the numbers to sum up to k.
If it's the opposite case, i.e. it only takes few numbers to sum up to k, then we can try a different approach:
1. insert array elements into min heap until the sum of heap is >=k.
2. remove min from root as long as remaining sum is >=k.
3. scan the remaining array elements. If bigger than heap root than add it to the heap.
4. repeat step 2.
Above will take O(n*logm) where m is heap size. Since the assumption is that m is small, it'll be better than O(n*logn).
1) sort the list
2) start from the end of the list and keep inserting the elements in the set until the sum is alteast K....i guess this is the greedy approach
O(nlogn)
may be use count sort to sort the list in O(n)...
What if the list contains negative numbers?
ex: { -5, 1, 1, 3, 10 } K = 5
your algorithm will find { 1, 1, 3 } while a better solution would be { -5, 10 }
Here is my code, which uses Radix sort. It runs in O(logbase2(k) * n) which for most values of n and k is better than n*log(n). I don't think it's possible to guarantee performance better than n*log(n) for all values of n and k.
/**
* Run from command line passing n then k
*
* Example: java MinSet 1000 25000 Creates an array of 1000 ints,
* returns minimum set that sums to 25000
*/
import java.util.*;
class MinSet
{
/**
* Can be called from the command line passing n and k
* ie:
* java MinSet 1000 25000 - Creates an array of 1000 ints,
* looking for minimum set that sums to 25000
*/
public static void main(String[] args)
{
int n, k;
if(args.length == 2){
n = Integer.parseInt(args[0]);
k = Integer.parseInt(args[1]);
} else {
n = 1000;
k = 25000;
}
int[] tester = new int[n];
Random generator = new Random();
for(int i=0; i<n; i++){
tester[i] = generator.nextInt(n*5);
}
ArrayList result = minSet(tester, n, k);
if(result != null){
System.out.println("k is: " + k + " and n is: " + n);
System.out.println(Arrays.toString(result.toArray()));
} else {
System.out.println("It is impossible to sum to the value: " + k);
}
}
/**
* Uses radix sort to sort the numbers, which is O(n*k), where
* k is the length of the key
*
* In this case, the keys being used are the 0-k rightmost binary bits
* in the binary representation of every int in sums[]
*
* @return ArrayList An ArrayList representing the minimum set of numbers that sums to K
* returns NULL if no set sums to K
*/
public static ArrayList minSet(int[] nums, int n, int k)
{
//Because we are summing to an int k, we only care
//to on the rightmost Log(k)+1 bits of k
int keyLength = ((int)(Math.log(k)/Math.log(2))) + 1;
//Create the buckets for every digit in the key
LinkedList[] buckets = new LinkedList[keyLength];
for(int i=0; i < keyLength; i++){
buckets[i] = new LinkedList();
}
//Implement the Radix Sort. Note that we are not sorting fully
//the original array fully. We are sorting on as many bits as
//necessary to "cover" the value k
for(int i=0; i < keyLength; i++){
for(int j=0; j < n; j++){
//The buckets must implement FIFO, so must use insertLast()
int bucketIndex = (k > 1) ? getBit(nums[j], i) : 0;
buckets[bucketIndex].addLast(nums[j]);
}
//Copy the values from the buckets back to the array
int m = 0; //Use this as index for current position in original array
for(int j=0; j < keyLength; j++){
while(buckets[j].size() != 0){
//Remove items from bucket in FIFO fashion, otherwise sort wont work
nums[m++] = (Integer)buckets[j].removeFirst(); //Override values from orignal array
}
}
}
//At this point, the original array is sorted such that the largest
//numbers are to the right.
//Note that the array is not perfectly sorted -- only sorted as much
//as it needs to based on k
//Iterate backwards through the original array and add elements to the
//set to be returned
int sum = 0;
int i = n-1;
ArrayList<Integer> toBeReturned = new ArrayList<Integer>();
while(sum < k && i > -1){
toBeReturned.add(nums[i]);
sum += nums[i--];
}
if(sum >= k){
return toBeReturned;
}
//If we didn't return above, then the sum is not larger than k
//and therefore it is impossible to find a set such that its
//sum is at least k so return null
return null;
}//subSet()
/**
* Return the bit located at position of a given decimal number
* @param int The number whose bit value is being requested
* @param int The position starting from the right of the binary representation (rightmost bit = 0)
*/
private static int getBit(int decimal, int position)
{
int bitVal = decimal & (1 << position);
return (bitVal > 0) ? 1 : 0;
}
}
It's really funny to claim that O(n lok) is better than O(n logn). Even for your OWN input k = 25000 where as n = 1000.
What if n = 10, but k = 2^32? Think deeply.
Radix sort is based on a radix base. I implemented the function using a base of 2, because this allows for bitwise operations, making comparisons inherently faster, and increasing the speed at which the function is completed.
Had I chosen to implement radix sort based on base 10, the computational cost would be n * log_base_10(k), which is better than n * log(n). You're right that for my input, n*log(n) is better than n * log(k), but the assumption is that n * log(k) using bitwise operations is faster than n * log(n) using decimal comparisons. If you assert this assumption is not true, the code could easily be changed to use decimal comparisons with runtime O(n * log_base_10(k)).
Do you know where from lower bound of comparison-based sorting came from? It's came from a decision tree with branching factor 2 (i.e. binary tree). Now, if I claim that I want to branch with 100, it'd be indeed n*log_100 (n).
To cut it sort, n*log k is equal to n*log n IF and ONLY if it's guaranteed that k = O(n). Otherwise, O(n log n) is preferred than O(n logk). If your claim is good, how DP based solution for 0/1 knapsack is considered as NP complete even though O(nW) solution exists (where, W = size of knapsack). Did you ever heard that O(n logn) solution falls in NP-complete?
Take first m numbers and find sort them find sum if sum is less than k then add next number and delete the smallest number till you find sum as k if sum is not k then add numbers in reverse order as deleted till you find sum k
this will take O(mlogm)+O(n-m) for optimization m is calculated on value of n given
This can be done using partial quick sort technique.
1. Take k as the pivot element and parse the array, if there is atleast 1 element greater than k return that element. If not, go to 2,
2. Now take k/2 as pivot and parse the array, if there are atleast 2 elements greater than k/2 return those elements. Here, the output could be 0 elements greater than k/2, 1 element greater than k/2 if that is the case in the next step update no_of elements_required accordingly.
.....if we continue doing that, at the log(n) step, we will have k/2^(logn) expecting n elements if we don't get then we are done.
So the complexity is there are log(n) passes and every time we parse <= n elements. so the over all complexity is <= O(nlog(n))
This can be done using partial quick sort technique.
1. Take k as the pivot element and parse the array, if there is atleast 1 element greater than k return that element. If not, go to 2,
2. Now take k/2 as pivot and parse the array, if there are atleast 2 elements greater than k/2 return those elements. Here, the output could be 0 elements greater than k/2, 1 element greater than k/2 if that is the case in the next step update no_of elements_required accordingly.
.....if we continue doing that, at the log(n) step, we will have k/2^(logn) expecting n elements if we don't get then we are done.
So the complexity is there are log(n) passes and every time we parse <= n elements. so the over all complexity is <= O(nlog(n))
Trivial solution is to sort the array in decreasing order and start picking elements until sum is >= K. That's O(n*logn).
You can't use bucket sort because we don't know anything about the n objects. Yeah, sure you can create 2^32 buckets but at the end iterating over those buckets won't be that efficient. :) You can try arguing that 2^32 is a constant which is better than O(n*logn). :D
You can't use radix sort because, again, we don't know anything about the n objects. We don't know how big they can be thus we don't know how many 0/1 bits are needed to represent them. In the worst case each number is represented with logn bits so radix sort won't give us better than O(n*logn).
IMO, smartest and easiest thing to do is to max-heapify the n objects and pop head as long sum of popped elements doesn't reach K. This will be O(n + k*logn) where k is the number of elements required to be popped. On average this is better than n*logn but because k = O(n) it's still O(n*logn). :)
Using Direct Hashing it can be done in O(n)..let me know if more clarification needed..??
I guess you would use direct hashing to sort the list in O(n). Then walk the sorted list from the end and keep adding elements until the sum is at least K, which is also O(n).
You guys are too blind to claim that sorting can be done in O(n) using a hashtable! Let me see your smart solution. I'm pretty sure you are missing some issues in analyzing complexity.
You can't use Bucket Sort.
use RADIX SORT, then iterate backwards from the end of the list adding elements to the set until the sum of elements is greater than or equal to k. I'll post the solution in a bit.
Guys,
Direct Hashing is not a Hashtable, it's just a way of using an existing value (e.g. the value of the element) directly as the key. An example is the way we usually "map" characters in an array using their ascii values as the indexes.
Such array/mapping can be precomputed in O(n) time, however the correct solution to this problem depends on the following circumstances:
1. The values are not repeated
2. The value range has to be known in advance and can be placed in another array that would fit in memory
NOTE: Even though the problem states "n objects", I'm assuming Ints since we're talking about a sum.
This way the algorithm described above will work in O(n) time. However the space complexity could be horrible depending on the values.
Max heap can be used here. If m is the minimum number of elements (from given n) that sums up atleast k, then the complexity would be O(m logn). In worst case it could be O(n logn).
- lol May 24, 2011