Google Interview Question
Software Engineer / DevelopersCountry: United States
Interview Type: In-Person
Can you please provide some code snippet?
Also why do you need a min-heap?
Shouldn't the external merge sort be enough?
@Subhajit I was thinking of two different approaches. One of them is external sorting and it is clear to understand (see wiki for details). This is a good solution, if you read the entire streams in to a file/disk first and then merge them. What if though, you're doing the external sorting and meanwhile you have still numbers coming in from the input streams non-stop?
The other approach is sorting the numbers as they are coming in from multiple input streams and they may be coming asynchronously. For this, you need to find a way to sort the numbers as they are coming in in different speeds. A heap is a perfect data structure for asynchronous streams to sort the streaming elements. The single insert operation is O(log n) and you can build the heap in O(n log n) time from multiple streams for n numbers.
Design a stream data structure (interface) that is "iterable" by numbers for your input streams, which will have let's say hasNext() and next() method implementations. For instance for Java, this is like implementing the Iterable interface:
public interface MyStreamInterface<Integer> implements Iterable<Integer> {}
All of your input stream classes will implement this interface. Then you can have two threads sharing a min-heap, one for reading from the streams and inserting them into the min-heap (Producer) and the other one writing from the min-heap to a file (Consumer):
Producer:
for (MyStreamInterface stream: streams) {
if (stream.hasNext()) {
minHeap.insert(stream.next());
}
}
Consumer:
while (!minMeap.isEmpty()) {
file.writeInt(minHeap.extractMin())
}
Of course, writing to and reading from the shared min-heap has to be synchronized as in Producer-Consumer problem.
I like the heap idea and the Iterator idea. The whole merge could be also wrapped as Iterator:
public class MergeIterator implements Iterator<Integer> {
public MinHeap h = new MinHeap();
class Node { int v; Iterator<Integer> i; }
public MergeIterator(Collection<Iterator<Integer>> inputs) {
for(Iterator<Integer> i : inputs) {
if(i.hasNext())
h.insert(new Node(i.next(), i));
}
}
public boolean hasNext() { return ! h.isEmpty(); }
public Integer next() {
Node min = h.extractMin();
if(min.i.hasNext())
h.insert(new Node(min.i.next(), min.i));
return min.v;
}
}
N = input sources
int A[N];
for( int i = 0; i < N; ++i )
{
int ret = readNextValue( i, &A[i] );
if( ret == NO_MORE_INPUTS )
{
A[i] = MAX_INT;
}
}
int min_index;
do
{
min_index = 0;
for( i = 1; i < N; ++i )
{
if( A[i] < A[min_index] )
{
min_index = i;
}
}
if( A[min_index] != MAX_INT )
{
writeIntoSortedArray( A[min_index] );
int ret = readNextValue( min_index, &A[min_index] );
if( ret == NO_MORE_INPUTS )
{
A[min_index] = MAX_INT;
}
}
} while( A[min_index] != MAX_INT );
import heapq
class HeapNode:
def __init__(self, value, collIndex):
self.value = value
self.collIndex = collIndex
def __eq__(self, other):
return self.value == other.value
def __lt__(self, other):
return self.value < other.value
def __gt__(self, other):
return self.value > other.value
class MergeIterator:
def __init__(self, sources):
self.sources = sources
self.sourcesCount = len(sources)
self.minHeap = []
for i in range(0, self.sourcesCount):
current = self.sources[i]
if len(current) > 0:
node = HeapNode(current.pop(0), i)
heapq.heappush(self.minHeap, node)
def hasNext(self):
return len(self.minHeap) > 0
def next(self):
if not self.hasNext():
return None
min = heapq.heappop(self.minHeap)
collection = self.sources[min.collIndex]
if len(collection) > 0:
node = HeapNode(collection.pop(0), min.collIndex)
heapq.heappush(self.minHeap, node)
return min.value
en.wikipedia.org/wiki/External_sorting#External_merge_sort
- oOZz June 15, 2013Read one item at a time from each input stream and put them in a min-heap.