Facebook Interview Question Software Engineer / Developers

  • facebook-interview-questions
    0
    of 0 votes
    2
    Answers

    Design a system for showing quotes on the web?

    For example, when the user is looking at page A, part of which is reproduced in page B, the system could highlight part of page A present the user with a link to page B.
    What constitutes a quote?
    How do you find quotes?
    How do you make it scale to the web?
    How do you handle updates?
    How would you arrange the servers?
    What data structures would you use?

    - Steve on September 24, 2012 in United States Report Duplicate | Flag
    Facebook Software Engineer / Developer Application / UI Design

Country: United States
Interview Type: In-Person


Comment hidden because of low score. Click to expand.
0
of 0 vote

I would chunk each page in blocks of text (I can leverage the DOM of the page... the way you optimize this step deeply affect the final result). Then I would apply an hash function on each chunk and represent the page A as the set of that hash numbers A = [h_0, ... h_n].

Now, I can build a inverted index of the hashes so that h_i => {A,B} meaning that the h_i is present both in A and in B. At this point whenever I render the page A I will generate a link to the page B associated to the chunk of text having h_i as hash number.

I will keep updated the index by keep crowling the Web and updating internal representation of the pages and the index like a SE does.

- Claudio on October 05, 2012 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

What constitutes a quote? - A fragment of text in a page constitutes a quote. For now we will only consider a text dump rather than gathering any semantic meaning to it.

How do you find quotes?
If more than one page has common occurrences of fragments of text, then it constitutes a quote. As for which page is going to be quoted can be determined by some page rank or the timestamp of the page that was created.

How do you make it scale to the web?
We crawl the web across to gather documents. These documents would then have to be compared amongst each other. The comparison can be limited to similar domains of the pages.

How do you handle updates?

The way the information would be laid out is, typically there would/could be a many(children) to one (parent) relationship of pages. Lets call this a group.

When a new page comes in, there are three possibilities:
1. It is a page which will be added to an already existing grouping.
2. It is a page for which a grouping doesn't exist yet. And this will be the new parent referring to itself.
3. It is a page that will replace the parent of a group.

Even within a domain there would be many such groupings. To assign some priority, we can have the number of children quoting the parent treated as a rank. So the node with the highest rank gets checked against first.

#2
If there is no match against any of the highest rank nodes, then that means the children will not match as well. So we create a new parent with rank 1 and the page points to itself.

#1It matches one of the high rank document and gets added to the other nodes.

#3
Along side checking against the highest rank i.e. checkIfQuoteFrom(parent,new_page) , we should also do a checkIfQuoteFrom(new_page,parent). If it returns true for the second call, then either a) the pages are the same. b) new_page is the parent of the current parent ( determined by timestamp amongst other metrics).

If it is a) then we can discard it. If it is b) then we replace the parent with new_page and have the parent join the other set of nodes.

How would you arrange the servers?
There would be a set of servers that would be responsible for crawling the web.
Another set of servers who would be doing a comparison of pages to get the quotes.
As soon as the the crawlers would return back with the set of results, it would be written to some cache and not the database. After massaging the data received is when they will be stored in the db with their state ( discussed above).

Even to fetch the results, we would cache the high frequency requests to ensure minimum db reads.


What data structures would you use?

A decreasing order of nodes ( based on rank ) for the pages to scan. Each of these nodes would be referred to by children. An object representing a domain pointing to these nodes.

A tree like ds that represents the reverse domain and each node there in would store the location to access the aforementioned list.

- capt_caveman on March 17, 2014 | Flag Reply


Add a Comment
Name:

Writing Code? Surround your code with {{{ and }}} to preserve whitespace.

Books

is a comprehensive book walking you through getting a job at a top tech company, while focuses on dev interviews and does this for PMs.

Learn More

Videos

CareerCup's interview videos give you a real-life look at technical interviews. In these unscripted videos, watch how other candidates handle tough questions and how the interviewer thinks about their performance.

Learn More

Resume Review

Most engineers make critical mistakes on their resumes -- we can fix your resume with our custom resume review service. And, we use fellow engineers as our resume reviewers, so you can be sure that we "get" what you're saying.

Learn More

Mock Interviews

Our Mock Interviews will be conducted "in character" just like a real interview, and can focus on whatever topics you want. All our interviewers have worked for Microsoft, Google or Amazon, you know you'll get a true-to-life experience.

Learn More