Google Interview Question for Software Engineer / Developers


Country: United States
Interview Type: In-Person




Comment hidden because of low score. Click to expand.
13
of 13 vote

Strategy of splitting a search query across multiple machines would depend on the type of distributed index maintained by the machines.

Typically for large scaled index, the architecture be, there will be set of master index servers 1 to n that index documents from the crawler. Index is maintained across all these servers.

For every index master server, there will be multiple (1 to m) slave servers with replicated indexes. These slave servers will handle queries.

Now typically a query analyzer will receive query fired from the browser. Query analyzer will split the query into multiple sub queries as follows:
So if the query is like "Google Mountain view jobs"
Sub queries:
1. complete query: "Google Mountain view jobs"
2. boolean query: "Google AND Mountain AND view AND jobs"
3. Distance / Proximity query: (Google Mountain) AND (Mountain view) AND (View jobs) AND (Google Mountain View) AND (Mountain View Jobs)
4 .Single word queries: Google, Mountain, view, Jobs

Typically there will be another fuzzy query analyzer which will re-issue single word queries as fuzzy queries
5. Single word fuzzy queries: Google~, Mountain~, view~, jobs~
6. Range Query: if the search term includes a date range, regular expressions will pickup the date range and create a date range query.

Once all the queries are finalized, the queries are fired across multiple shards (slave servers) in parallel.

Now this could be done using Publish Subscribe Messaging model or by directly firing query across multiple slave server using software or hardware load balanced way.

All the servers will receive one query each and will come up with the matching document result with the type weight (vector like structure) and count weight (no of word counts matched in the document).

Based on their individual weights, a dot product will be calculated to compute an IR score.Also a page rank score is calculated based on the search result page weight. (example: if the search resulted in wikipedia.org page then this result will have heigher value than mamapapa.com page which happens to have same searched terms)

Multiple results will be combined together based on their weight rank and page rank and first K results will be sent back to the server caching the complete result in the cache servers so that if the user clicks on page 2 of the result, the result can be sent back form the cache server than to compute the result all over again.

- Saurabh March 13, 2014 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

if we are dealing with boolean search query we can store different sets of documents on different machines, and do the intersect of the results from each machine.

for example if we shard our documents to 1 machine having a-h, 2 machine i-o , 3 machine p-z

and we get a search query like "number of employees at google today"
employees, at google -> machine 1
number, of -> machine 2
today -> machine 3

if we store the documents on inverted index for each word, for example
employees -> doc1, doc3, doc5
at -> doc1, doc2, doc3, doc4, doc 5
google -> doc1 doc5
do machine 1 will return doc1, doc5 and intersect it with the result of the other machines.

- abe February 11, 2014 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

if we are dealing with boolean search query we can store different sets of documents on different machines, and do the intersect of the results from each machine.

for example if we shard our documents to 1 machine having a-h, 2 machine i-o , 3 machine p-z

and we get a search query like "number of employees at google today"
employees, at google -> machine 1
number, of -> machine 2
today -> machine 3

if we store the documents on inverted index for each word, for example
employees -> doc1, doc3, doc5
at -> doc1, doc2, doc3, doc4, doc 5
google -> doc1 doc5
do machine 1 will return doc1, doc5 and intersect it with the result of the other machines.

- abe February 11, 2014 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

First, talk with the interviewer, "Why split a search query across multiple machines?". It is for high performance or completeness or both. Or "What is on the multiple machines?"

Second, Split Query Index and Document.

Third, if the index is slitted into K parts kept on K machines.
then the query must be split into K data-based sub-queries.
(It depends on the split method)

Forth, if the index is duplicated into M parts on M machines.
If the query is time costly, one way is to split the query into M logic-based sub-query.

What is data-based sub-query and logic-based sub-query?
data-based sub-query: each sub-query scans different parts of the Index using same keywords.
logic-based sub-query: each sub-query scans the same Index using different keywords.
logic-based sub-query (Divide and Merge) is more complex.

Finally, when the target index is complete. Get the target documents on Document Server.
The same method is used on duplication servers and partition servers.

- xuyan.nus June 26, 2014 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

What about MapReduce execution in Google's clusters?

- Inucoder August 15, 2014 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

ll

- sdf January 20, 2015 | Flag Reply
Comment hidden because of low score. Click to expand.
-1
of 1 vote

if we are dealing with boolean search query we can store different sets of documents on different machines, and do the intersect of the results from each machine.

for example if we shard our documents to 1 machine having a-h, 2 machine i-o , 3 machine p-z

and we get a search query like "number of employees at google today"
employees, at google -> machine 1
number, of -> machine 2
today -> machine 3

if we store the documents on inverted index for each word, for example
employees -> doc1, doc3, doc5
at -> doc1, doc2, doc3, doc4, doc 5
google -> doc1 doc5
do machine 1 will return doc1, doc5 and intersect it with the result of the other machines.

- abe February 11, 2014 | Flag Reply
Comment hidden because of low score. Click to expand.
-1
of 1 vote

if we are dealing with boolean search query we can store different sets of documents on different machines, and do the intersect of the results from each machine.

for example if we shard our documents to 1 machine having a-h, 2 machine i-o , 3 machine p-z

and we get a search query like "number of employees at google today"
employees, at google -> machine 1
number, of -> machine 2
today -> machine 3

if we store the documents on inverted index for each word, for example
employees -> doc1, doc3, doc5
at -> doc1, doc2, doc3, doc4, doc 5
google -> doc1 doc5
do machine 1 will return doc1, doc5 and intersect it with the result of the other machines.

- abe February 11, 2014 | Flag Reply
Comment hidden because of low score. Click to expand.
-1
of 1 vote

if we are dealing with boolean search query we can store different sets of documents on different machines, and do the intersect of the results from each machine.

for example if we shard our documents to 1 machine having a-h, 2 machine i-o , 3 machine p-z

and we get a search query like "number of employees at google today"
employees, at google -> machine 1
number, of -> machine 2
today -> machine 3

if we store the documents on inverted index for each word, for example
employees -> doc1, doc3, doc5
at -> doc1, doc2, doc3, doc4, doc 5
google -> doc1 doc5
do machine 1 will return doc1, doc5 and intersect it with the result of the other machines.

- abe February 11, 2014 | Flag Reply


Add a Comment
Name:

Writing Code? Surround your code with {{{ and }}} to preserve whitespace.

Books

is a comprehensive book on getting a job at a top tech company, while focuses on dev interviews and does this for PMs.

Learn More

Videos

CareerCup's interview videos give you a real-life look at technical interviews. In these unscripted videos, watch how other candidates handle tough questions and how the interviewer thinks about their performance.

Learn More

Resume Review

Most engineers make critical mistakes on their resumes -- we can fix your resume with our custom resume review service. And, we use fellow engineers as our resume reviewers, so you can be sure that we "get" what you're saying.

Learn More

Mock Interviews

Our Mock Interviews will be conducted "in character" just like a real interview, and can focus on whatever topics you want. All our interviewers have worked for Microsoft, Google or Amazon, you know you'll get a true-to-life experience.

Learn More