Facebook Interview Question for Software Engineer / Developers


Country: United States
Interview Type: In-Person




Comment hidden because of low score. Click to expand.
0
of 0 vote

I was asked the same question in amazon , 1 yr back

- nr November 09, 2012 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

Hi, Steve, will you bother to reply more about what you were thinking and what the interviewer responded to your thought? Thanks a lot.

- amyue.xing November 15, 2012 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

My approach:

1) Data Model 
I'll have 2 hash maps:

  - Based on source page address as key 
  - Destination page as key. 
  - For both maps value will be position in a vector  which will contain pointer to struct.
     
    struct quote
    {
      string src;                  // Links
      string dest;                 // Links
      string src_element;          // We can have div, p etc
      int src_id;                  // Id of element on source page i.e. to be quoted
      string dest_element;         // We can have div, p etc
      int dest_id;                 // Id of element on source page i.e. to be quoted
    };

    unordered_map<string, quote*> srcMap;
    unordered_map<string, quote*> destMap;

2) Operations supported will be:
    
    - void insertQuote(string src, string dest ...)    // properties to be added to our struct
    - string fetchQuotedString(string dest, string element, int id); 
    - string fetchDestPage(string src, string element, int id);

  Hash can be calculated on complete web address which might look like: "http : / / w w w . b l a h . c o m & t y p e = i d "

3) Execution
 
   - Client sends insertquote request to server. 
   - Insertion can be an asynchronous event, which returns with a success error code and lazily updates the data structures.      
 
4) Complexity 

   - Since we're using hash maps and simply appending a new node to end of a STL list, our time complexity for each operation will be O(1). 

5) Write to Disks
We can keep a threshold for time or for size of data in memory after which contents can be dumped to the disk. (These will basically be serial writes)

6) Concurrency 

  - Read operation 

   - For displaying page A, whose text is quoted we'll have to invoke fetchDestPage(...) and for displaying page B, we'll invoke fetchQuotedString(...)
   - For read, we can use reader-writer locking mechanism on the maps. If no writer is updating the corresponding entry, reader process can fetch node value based on the type of page being displayed (hashing source or destination web address). 
  - Write operation 

    - As mentioned earlier, insertion can be asynchronous in nature. Once a client invokes insertQuote(...), preforked server process can record the invocation and return a success code to the caller. 
    - Each process update the List of nodes and receive an iterator. In order to insert entry to the maps, the process has to wait to acquire locks on the hashed positions and once received can insert a new entry to each map.  

7) Scalability
We can distribute our hash maps to different nodes with a DHT based setup. (Caching most frequently used data). Therefore, when a server will receive request for fetching/ inserting an entry it'll:

  - Return 2xx to client 
  - Hash the "web-address" and accordingly find the destination host
  - Route the message to the destination 
  - Recipient node will have it's own segment of hash tables and list.
We basically import all the features of DHT to share resources between number of hosts and provide better performance to the user.

9) Replication for availability 

Data can be replicated on many DHT nodes, with DHT based bootstrapping and division of labor. For every write operation once a node receives a request  it can be considered at the co-ordinator who'll take care of the successful commit to each replica using 2-phase commit protocol. In case of aborted transaction failure code can be returned to the client. Client is responsible for retrying. 

10) Fault Tolerance

- Again we're importing fault tolerance features from DHT. In case of faulty destination hosts, its replicas should be able to service the request. 
- If none of the replica's is able to respond we'll assume it's a cache miss and fetch data from disk.

- Learning Design November 17, 2012 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

Breaking down to aspects:

1. Definition (what the quote is for) --
a. Paragraph based. Not just a word ('the' is everywhere, we cannot say this is a quote). Check paragraph length.
b. Main text. Not header, footer, menu, signature, etc. No html tags.
c. Has tolerance (similarity > 95%, for instance). This is important for dynamic pages as some parts of them change very often.

2. Detection --
a. Remove header/footer/menu and remove html tag.
b. Calculate edit distance of the whole two pages. If big, run edit distance over all paragraphs (O(n^2) * O(edit distance)) to find the quote. Store them if existing.

3. Representation, storage --
CREATE TABLE Quote (id BIGINT PRIMARY KEY DEFAULT NEXTSEQ, url1 VARCHAR(MAX) NOT NULL, url2 VARCHAR(MAX) NOT NULL, content TEXT NOT NULL);
CREATE INDEX idx_Quote_Url1 ON Quote (url1) INCLUDE(id, url2, content);
CREATE INDEX idx_Quote_Url2 ON Quote (url2) INCLUDE(id, url1, content);

4. Present --
a. $url is given. $page = get_content_from_url($url); $altered_page = $page;
b. SELECT * FROM Quote WHERE url1 = '$url' OR url2 = '$url'
c. foreach ($quotes as $quote) {
$verify = verify_content_exists($page, $quote['content']);
if (!$verify) sql("DELETE FROM Quote WHERE id = " . $quote['id']);
$other_url = $quote['url2'];
if ($other_url == $url) $other_url = $quote['url1'];
$other_page = get_content_from_url($other_url);
verify_content_exists($other_page, $quote['content']);
if (!$verify) sql("DELETE FROM Quote WHERE id = " . $quote['id']);
str_replace($quote['content'], "\"" . $quote['content'] . "\" (<a href=\"" . $other_url . "\">exists here too</a>", $altered_page);
}

5. Update --
a. Delete on the fly as shown in 4.
b. Run detect regularly to add new quotes.

- paul4156 September 27, 2014 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

How do you find quotes?
There are 1T web pages on the internet. We may not afford anything more than linear. So just compare if paragraph is part of any other webpage using kmp.

How do you make it scale to the web?
We cannot afford to check if a paragraph is quoted in the 1T web pages. We have to cluster the web pages. Then check in each cluster. The cluster is not very big hopefully.

How do you handle updates?
Store all the web pages and periodically crawl them.
schema paragraph_id, page_id1, page_id2,...
If one is updated, see if the changed paragraph is part of any parapraph in its cluster.

How would you arrange the servers?
no sql database for webpages.
no sql database to store the quotes.

How much storage would you need?
Assume 10k for a page, 10p to store all pages

- Anonymous October 05, 2016 | Flag Reply


Add a Comment
Name:

Writing Code? Surround your code with {{{ and }}} to preserve whitespace.

Books

is a comprehensive book on getting a job at a top tech company, while focuses on dev interviews and does this for PMs.

Learn More

Videos

CareerCup's interview videos give you a real-life look at technical interviews. In these unscripted videos, watch how other candidates handle tough questions and how the interviewer thinks about their performance.

Learn More

Resume Review

Most engineers make critical mistakes on their resumes -- we can fix your resume with our custom resume review service. And, we use fellow engineers as our resume reviewers, so you can be sure that we "get" what you're saying.

Learn More

Mock Interviews

Our Mock Interviews will be conducted "in character" just like a real interview, and can focus on whatever topics you want. All our interviewers have worked for Microsoft, Google or Amazon, you know you'll get a true-to-life experience.

Learn More