Google Interview Question
Software Engineer / DevelopersCountry: United States
Interview Type: In-Person
I like this idea. Do you have the implementation? Do you use an array of counters to keep track of which element of each chunk has been added to the minheap? Also, when you insert the element to minheap, you have to keep a reference of the chunk index along with the element, so that we will know which chunk the element belongs to when we pop it out. I am guessing we will have to create a HeapNode class that stores a String element and a int chunk_index?
This is looks like merging k sorted array into a single array!
We can merge arrays in O(nk*Logk) time using Mean Heap. Following is detailed algorithm.
1. Create an output array of size n*k.
2. Create a min heap of size k and insert 1st element in all the arrays into a the heap
3. Repeat following steps n*k times.
a) Get minimum element from heap (minimum is always at root) and store it in output array.
b) Replace heap root with next element from the array from which the element is extracted. If the array doesn’t have any more elements, then replace root with infinite. After replacing the root, heapify the tree.
geeksforgeeks.org/merge-k-sorted-arrays/
K-way merge. Keep merging one pair of chunks at a time. After the first pass, k/2 chunks will be left. After the second pass, k/4 chunks will be left and so on. At each pass, need to take care of merging the last remaining chunk as a special case in case k/i is odd, where i = 1..k
I had this idea of merging one pair of chunks at a time too, but my original approach was a little naive that it merges chunk 1 and 2, then the merged result of 1 and 2 will merge with chunk 3, and the merged result of chunk 1,2,3 will merge with chunk 4 and so on. I think your approach will be more efficient that it will concurrently merge chunk 1 and 2, 3 and 4, 5 and 6 ... n-1 and n. But could you please provide some code to show how it should be implemented? I believe its a great idea, but am not sure how to code this. Thanks a lot.
How are you going to do this parallely for each pass. even if you divide and conquer, you would endup with O(n log n). You have to merge each search string with the subsequent one. Why dont do it in linear time O(n)
StringBuilder sb = new StringBuilder ();
for(String s : searchList)
sb.append(s);
"or am I missing something."
Further optimize with something similar to reduce step in map-reduce? Usually map would do the sorting, and the reduce would be merging; use a good hash such that reducer[i] has all the elements that are smaller than reducer[i+1]; merging on an individual reducer now can be done with the heap; [->these step can be parallelized]
We should create MinHeap containing first elements of chunk. Each time we extract min from the heap and push it to the output. And add new element from head of chunck, which element was pushed to output.
- glebstepanov1992 January 30, 2014