Groupon Interview Question
Software Engineer / DevelopersCountry: United States
Interview Type: Phone Interview
How large is k? If k could be something like the number of integers / 2, an array might not fit into memory.
You could try the Quickselect algorithm. Just search for it. To implement it on disk, you'll just bring the input into memory one piece at a time.
I asked this question and interviewer said , we don't know . we don't know how many integers are there .
If you want your code to work well for large values of k, Quickselect is a good choice. Well, that depends, actually. Is it OK to either destroy the original input or use 4gb of extra space, on disk?
you can't use 4gb of extra space .
what are your suggestions if you can destroy original input ? and if you are not allowed to destroy original input.
You can't use the extra space even if it's on disk? Well, basically, using Quickselect requires you to destroy the input. So if you can use extra space, you copy the input and destroy the copy. If you can't, you have to destroy the original to use Quickselect.
If you can do neither, it's sort of hard to solve this if k can be large. If k=10 or a small number, just keep a heap of the largest or smallest elements seen so far. By small number, I just mean that the heap should fit into memory.
so u mean we should maintain largest k elements seen so far in an array or stack and keep on updating it as we encounter new numbers.
No, we should maintain them in a heap for greater efficiency. Although an array will do for really small k (like k=5 or something)
@eugene.yarovoi, I had some concerns regarding quickselect.
The file may not have fixed size records. How would you implement iterating thru the file? It may not be trivial. There may be multiple integers on one line and each line may have different number of integers. Even if we have one integer per line, finding the next number would require some parsing (like find the bytes between the next 2 newlines)
Assuming that 4GB can fit into memory all at once may or may not be appropriate. Here is a solution in java that runs in O(n) where n is the size of the list:
void printKLargestKInt(File file, int k){
Scanner scanner = new Scanner(file);
PriorityQueue<Integer> queue = new PriorityQueue<>(k,
new Comparator<Integer>(){
public int compare(Integer x, Integer y){
if(x == y){ return 0; }
else if(x > y) { return 1; }
else{ return -1; }
}
});
while(scanner.hasNext()){
queue.add(scanner.next());
while(queue.size() > k) queue.poll();
}
scanner.close(); // should probably do this in finally clause
return queue.poll();
}
yes, quick selection is the best choice. After partition, just check where (k-1)th index falls in search space, lower index to pivot index or pivot index to upper index, reduce search space by eliminating other till pivot element doesn't fall into (k-1)th index. in each partition search space will be reduced by half nearly.
- shani July 18, 2012Time Complexity O(n), if list not already sorted.