Amazon Interview Question for SDE-3s

Country: United States

Comment hidden because of low score. Click to expand.
of 0 vote

Given a seed URL, the crawler needed to auto-discover the value of the missing fields for a particular record. So if a web page didn't contain the information that I was looking for, the crawler needed to follow outbound links, until the information was found.
It needed to be some kind of crawler-scraper hybrid, because it had to simultaneously follow outbound links and extract specific information from web pages.
The whole thing needed to be distributed, because there were potentially hundreds of millions of URLs to visit.
The scraped data needed to be stored somewhere, most likely in a database.
The crawler needed to work 24/7, so running it on my laptop wasn't an option.
I didn't want it to cost too much in cloud hosting1.
It needed to be coded in Python, my language of choice.

- justhelping September 11, 2019 | Flag Reply

Add a Comment

Writing Code? Surround your code with {{{ and }}} to preserve whitespace.


is a comprehensive book on getting a job at a top tech company, while focuses on dev interviews and does this for PMs.

Learn More


CareerCup's interview videos give you a real-life look at technical interviews. In these unscripted videos, watch how other candidates handle tough questions and how the interviewer thinks about their performance.

Learn More

Resume Review

Most engineers make critical mistakes on their resumes -- we can fix your resume with our custom resume review service. And, we use fellow engineers as our resume reviewers, so you can be sure that we "get" what you're saying.

Learn More

Mock Interviews

Our Mock Interviews will be conducted "in character" just like a real interview, and can focus on whatever topics you want. All our interviewers have worked for Microsoft, Google or Amazon, you know you'll get a true-to-life experience.

Learn More