30 Day Risk-Free Guarantee:
100% money back if you're unsatisfied.
Book (308 Pages):
  • 150 Programming Interview Questions and Solutions
  • Five Proven Approaches to Solving Tough Algorithm Questions
  • Ten Mistakes Candidates Make -- And How to Avoid Them
  • Steps to Prepare for Behavioral and Technical Questions
  • Interview War Stories: A View from the Interviewer's Side
  • Book Preview and More Info

Video (One Hour):
  • Watch CareerCup's founder perform a totally unscripted technical interview of a candidate.
  • Overview of what interviewers look for in software engineering jobs.
  • Technical questions with white-boarding coding where the candidate does well - and struggles.
  • Interviewer reviews with what went well and poorly.
  • Video Preview and More Info

Resume Review (24 - 48hr)
  • Get your resume reviewed by a trained engineering interviewer. More Info
All Products / Services


Express Prep Package (Book)
$29.99 for e-book | $32.45 for paperback
Buy E-Book Buy on Amazon


Standard Prep Package (E-Book & Video)
Emailed Instantly | $39.99
Buy Standard

Elite Prep Package (E-Book, Video & Resume)
Emailed Instantly | $99.99
Buy Elite
 



Given a document and a query of K words, how do u find the smallest window that covers all the words at least once in that document? (given you know the inverted lists of all K words, that is, for each word, you have a list of all its occurrrences). This one is really hard. Could someone propose an algorithm in O(n)?

36


geniusxsy on November 11, 2009 |Edit | Edit

very easy. I hope I will have such easy questions in my interview.

You can do it in one scan of K arrays, O(KN). Each array records the integer positions of every occurrence of a word.

Anonymous on November 11, 2009 |Edit | Edit

please at least leave the idea. What you said is the information I give u.

geniusxsy on November 11, 2009 |Edit | Edit

Hint:

1. use k pointers to iterate through each array
2. at a single step, you can only increment one pointer
3. so, which one? you should be able to find some clue if you draw a simple sketch of multiple arrays on several parallel number axises

Anonymous on November 12, 2009 |Edit | Edit

geniusxsy,
please give the pseudo code

Anonymous on November 12, 2009 |Edit | Edit

geniusxsy,
please give the pseudo code

Anonymous on November 12, 2009 |Edit | Edit

The interviewer asked for a O(n) solution. You guys should find better approach.

geniusxsy on November 11, 2009 |Edit | Edit

By the way, I heard this was (also) an MS interview problem?

Anonymous on November 12, 2009 |Edit | Edit

I was asked this question google more than 4yrs back.

alex on November 12, 2009 |Edit | Edit

I have a solution:
1. first, combine the inverted lists of all K words into a array A(length l), and sort it, and we suppose the initial smallest window is [A[0], A[l-1]],
2. next, use a cursor on the beginning point of the array, one by one, check whether the array element is the last occurrence for specific Word, e.g. if A[0] is one of occurrences of Word W1, check if it is the last occurence, if not, move the cursor to the next element, else stop at pos1
3. Do the similar job like Step 2, use another cursor from the last element of the array. Stop at pos2
4. The smallest window is [pos1, pos2]

alex on November 12, 2009 |Edit | Edit

Improved:
1. let K pointers point to the start point of K arrays(which each contain all the occurrences of specific word), get the minimum value of all this K values, check whether the min value is the last element of the containing array. If yes, stop at this point, the position of this occurrences is pos1.If not, move this pointer to the next, and do the same job again.
2. let K pointers pointing to the last element of K arrays, get the maximum value ,check whether it is the first one of the array whose element's value bigger than pos1. If not, move to previous element, else stop. Get the position, pos2.
We get the smallest window is [pos1, pos2]

amit on November 14, 2009 |Edit | Edit

@alex
I am still not able to visualize the solution. Could you please try explaining it taking an example or something

Anonymous on November 15, 2009 |Edit | Edit

alex, does your method work for below example?

w1: 1, 1000, 2500
w2: 400, 1001, 2000
w3: 500, 1002, 1500

Anonymous on November 15, 2009 |Edit | Edit

alex, does your method work for below example?

w1: 1, 1000, 2500
w2: 400, 1001, 2000
w3: 500, 1002, 1500

Anonymous on November 15, 2009 |Edit | Edit

Suppose N(i) gives the occurrence count of i-th word, i going from 1 to K words.
The total number of possible window sizes is the product P(N(i) for i= 1..K).
Go through each of the possible windows and find the one(s) where the difference
of the largest offset and smallest offset is the lowest.
So the algorithm depends on the number of times the K words occur, and not the document size. It can stop once it finds a window size that equals K, or continue to find other windows of the same size.

Anonymous on November 15, 2009 |Edit | Edit

Example: Suppose the document is Z A Y B C X Z D E F Y G, and you want to find smallest window of XYZ. The inverted index list is X = {5}, Y = {2, 10}, Z = {0, 6}
The total number of windows is the product 1 x 2 x 2 = 4. They are (5, 2, 0)-> size 6,
(5, 2, 6) -> size 5, (5, 10, 0) -> size 6, (5, 10, 6) -> size 6. The smallest window is (5, 2, 6) i.e. from 2 (Y) to 6 (Z) -> "Y B C X Z". I think this is correct.:-,

anony on December 21, 2009 |Edit | Edit

the size of (5,10,0) -> 11 not 6

trey on November 15, 2009 |Edit | Edit

That solution takes O(<Avg. Occurances>^K)...cleanly in the NP arena. Probably not a workable solution. What optimizations can you make?

Anonymous on November 17, 2009 |Edit | Edit

One optimization I can think of but not sure is as follows:
For K words, the minimum window size is K. So find the shortest occurrence list, then
for each offsets O, look for offsets in others words (K - 1) to be within 0 - K < N < O + K. That is a 2K window size. If you find an occurrence of all the words in that window,
then eliminate the rest, and find the set with lowest window size. So in the
above example, K = 3, X is the shortest occurring word. For O = 5, find offsets of Y and Z between 2 (5 - 3) and 8 (5 + 3). The only set, in this example, is Y = 2, and Z = 6. So the answer is X = 5, Y = 2, Z = 6, with window size of 5, which is K + 2. If no offsets are found in the range, increase the bound to O - 2K < N < O + 2K..and so on.

Anonymous on November 22, 2009 |Edit | Edit

for given ex:
w1: 1, 1000, 2500
w2: 400, 1001, 2000
w3: 500, 1002, 1500

keep 2 arrays negdev (negative deviation from base) and posdev(positive deviation from base) of size 3 (same as size of base array) along with base array which is same as w1:
base=[1, 1000,2500] same as w1 ()
negdev=[0,0,0] and posdev=[0,0,0] (zero initialization)
now calculate both arrays traversing through the k array of occurances of given words , here w1,w2 and w3
negdev(i)= maximum negative deviation from base[i] to find all k words
posdev(i)= maximum positive deviation from base[i] to find all k words
so after traversal we get
posdev=[499,2,0]
and negdev=[0,0,1000]
now calculate range array range(i)=posdev(i)+negdev(i)
so range=[499,2,1000]
so minvalue of range will be the answer.
time complexity : O(n)
space complexity: 2 arrays of size of w1 (if you want to minimize this, then instead of taking w1 as base first find base in set of k array , word which has minimum occurances)

Anonymous on December 03, 2009 |Edit | Edit

I don't see you can do it in time O(n). For each occurrence in w1, you have go through all occurrences in w2 and w3 to get max deviation. You can sort, then binary search the max deviation, but O(n) is impossible.

Anonymous on December 03, 2009 |Edit | Edit

I was wrong. You were right. O(n) is the one.

DMB on December 04, 2009 |Edit | Edit

Still don't see how this is linear. To me it seems that, as you rightfully pointed out, while scanning through w1 is indeed linear, you need to do O(n^(m-1)) operations (worst case) to exhaustively search for minimal spans.

hero on December 08, 2009 |Edit | Edit

can be solved in O(nlogK).

nec on December 08, 2009 |Edit | Edit

Use dynamic programming, and it can be solved with O(nk) time and O(k) space.

fragzilla on December 11, 2009 |Edit | Edit

here is my take and quite certain its not accomplished in O(nlogn) but worth a shot.

create a kxk graph where the edges signify the minimum distance from each word to the other. ( this part needs the optimization)

once the graph is generated, compute the minimum spanning tree and the resultant tree will be your desired positions.

what now needs to be done is figure out how to efficiently generate the KxK graph.

cheers

WTF on January 02, 2010 |Edit | Edit

I don't understand what your MST is supposed to accomplish, even if I give you the matrix. How does a tree in your scheme even translate into a window, let alone a minimum one?

fragzilla on January 07, 2010 |Edit | Edit

let me try to explain this.

you have a list of words and the list of distance from start.

abc - 2 , 3 , 15, 19 , 24
def - 4 , 5, 10 , 28
ghi - 12,33,55

random values
etc.

these are distances from the start. let each word represent a node. let each node have an edge to each other node. the number of edges from one node to another node will be more than 1 (because of multiple occurrence of the words in the text).

compute the difference for each word distance with regard to every other word distance and this difference will be the distance on the edge for that word (for optimality we can take the minimum distance for two words and just add one edge between them) .

as a result you have a graph where all nodes are connected with each other with the relative distance. with this we have the information on how close the words are to each other.

now to find the smallest window, we need to find the edges that would connect all the nodes(words) such that the weight of the edges will be the smallest. this translates as finding an mst for the graph.

WTF on January 07, 2010 |Edit | Edit

Ok let's use the 3-node example you have given:

min_dist between abc and def is between 3 and 4 = 1. min_dist between abc and ghi is between 12 and 15 = 3. min_dist between def and ghi is between 12 and 10 = 2. The MST in this case will of course be a length two path going from abc to def to ghi i.e. sum 1+2 = 3. But what is the window then? Recall that abc to def is (3,4) and ghi to def is (10,12). So I'm at a loss as to what your window is. Surely it can't be (3,12)?

Maybe I'm missing something, but your latest explanation still does not make clear how MST gives the desired minimum window, or even a window at all.

bbb on January 10, 2010 |Edit | Edit

Here is a solution I have come up with:

Let a1, a2, a3 and so on be the array that holds the position of occurrence of each word.
k1, k2, k3 and so on be the pointers to the start of each array.
Initialize the minimum_window = N, where N is the number of words in the input
Follow the steps until the pointer reach the end of each array.
1. Compute the value: largest-smallest+1 for each pointer. This is the current window. If this value is less than the minimum_window then update the minimum window and temp_result = current positions of the pointers.
2. Check if all the pointers pointing to the last position in respective array. If yes, minimum_window is the smallest window and If not , increment the pointer with the minimum value.

Result: current positions of the pointers in the temp_result and size is minimum_window.

Mahesh on April 14, 2010 |Edit | Edit

This works.

peng on April 16, 2010 |Edit | Edit

I don't see how it's O(n). Suppose you have k words and the each word occurs m times, to find the "largest" and "smallest", you have to scan k arrays. Then, perhaps you have to move pointers m times, then it's at least O(k*m).

Please clarify.

Anonymous on February 22, 2010 |Edit | Edit

This is for generating a snippet of a document.

Bala on April 18, 2010 |Edit | Edit

As all the inverted lists are sorted, lets suppose there are k of them.
Now build a min-heap with the first index of all the k lists and also keep track of the max element. Now check the value of max - min and store it.
Now keep reading the indexes from each list, and check if its index > min value, then insert it into the heap and pop the min element. Keep doing this till you exhaust elements in one of the lists.

godlIkeNoob on April 25, 2010 |Edit | Edit

Imagine the entries in the lists as points in 2-D planes, where Y axis = the word and X axis = position . Imagine the posting lists for I am one

I- > 1 5 7 11
am -> 3 6 9 12
one -> 7 8 99

Sort all the entries - so u get
1 3 5 6 7 8 9 11 12 99 in o(n) time as the lists are already sorted

Now, we just have to come up with a window having 3 non-similar elements. As there can be max n-k (where k=no of term we are looking it, here 3), looking at those windows and determining the min window shouldn't take more than o(n) time

Does this seem too easy? Am I missing something?

KC on May 24, 2010 |Edit | Edit

This seems good. You can use hash table to find which index corresponds to which word

Anonymous on August 06, 2010 |Edit | Edit

come on, this is not o(N), it is o(NlogK)...

Add a Text Comment | Add Runnable Code
Name:
Comment:

Writing Code? Surround your code with {{{ and }}} to preserve whitespace.








Amazon (1033)Bloomberg LP (403)Qualcomm (117)Adobe (88)Goldman Sachs (78)NetApp (49)IBM (43)Morgan Stanley (33)CapitalIQ (30)Sophos (25)Achieve Internet (23)Electronic Arts (19)Motorola (18)Research In Motion (17)Flipkart (16)
Microsoft (867)Google (141)NVIDIA (98)Yahoo (82)Epic Systems (69)Expedia (47)VMWare Inc (43)Apple (32)Cisco Systems (28)Facebook (23)Infosys (22)Agilent Technologies (19)Sage Software (17)Deshaw Inc (16)FlexTrade (15)
More Companies »
Software Engineer / Dev... (1062)Financial Software Deve... (170)Testing / Quality Assur... (56)Analyst (35)Virus Researcher (25)Field Sales (15)Developer Program Engin... (9)Front-end Software Engi... (6)MyJoB (5)area sales manager (4)Assistant (3)Cabin crew (2)Accountant (1)personnel (1)Intern (1)
Software Engineer in Te... (288)Program Manager (65)Development Support Eng... (47)INTERN(MSIDC) (28)Web Developer (18)System Administrator (10)Consultant (10)Production Engineer (5)Associate Technology L2 (5)AcquireKnowledge (4)Product Security Engine... (3)Solutions Architect (2)Gamer (1)mts (1)Fresh graduate interview (0)
More Job Titles »
Algorithm (1073)Terminology & Trivia (294)C (166)Object Oriented Design (159)Java (121)Testing (114)Arrays (101)Operating System (78)Database (70)Linked List (62)String Manipulation (56)Networking / Web / Inte... (44)Threads (36)Linux Kernel (33)PHP (22)
Coding (511)C++ (204)Behavioral (159)Data Structures (155)Experience (116)Brain Teasers (111)Computer Architecture &... (79)General Questions and C... (73)Trees and Graphs (69)Math & Computation (57)Application / UI Design (45)Ideas (38)System Design (35)Puzzles (30)Bit Manipulation (20)
More Topics »
CareerCup Official Interview Book Daily Questions Requests for Help Mock Interviews Video for Cracking the Coding Interview Job Placement Service CareerCup Blog
My Profile Edit Profile & Email Settings Sign Out