Amazon Interview Question for Software Engineer / Developers


Team: AWS
Country: United States
Interview Type: In-Person




Comment hidden because of low score. Click to expand.
4
of 4 vote

This sound like a simple web crawler.

I think the key point of question is
"We have one machine that has enough disk space but limited memory."

No matter what kind of data structure is used to store the data in memory, there must be a grabage-collector-alike job that periodically check for node/record that are stale(not written for a long time). So they can be dumped into file system to free the memory usage.

To avoid the duplication you could either check for existence when new node is created and load it if there is or create a mechanism to merge data when dumping into disk.

- gameboy1024 August 21, 2014 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

tree data structure must be used , so that different information link url , tuple ,content etc can be kept under different nodes holding value of same kind .This would increase searching and will store data efficiently.

- Anshul August 20, 2014 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

B+Tree or equivalent

- Shan August 24, 2014 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

I think just a memcacheDB (or any other harddisk-based key-value store) will satisfy this. So do something like

<?php

function incr($url, $host, $inLinks, $outLinks) {
  global $memcache;
  $memcache->incr("url:$url");
  $memcache->incr("host:$host");
  foreach ($inLinks as $link) {
    $memcache->incr("inlink:$link");
  }
  foreach ($outLinks as $link) {
    $memcache->incr("outlink:$link");
  }
}

- paul4156 September 23, 2014 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

public class InfiniteMachine {

	String[]	domains	= { ".com", ".net", ".org" };	// etc.

	public void storeTuple(Tuple record) {
		String url = record.url;
		String domain = url.substring(url.indexOf("://")).substring(0, url.indexOf("/"));
		String host = null;
		try {
			host = new URL(url).getHost();
		} catch (MalformedURLException murle) {
			murle.printStackTrace();
			System.out.println("Couldn't get host from URL " + url + "!");
			System.exit(1);
		}

		// use JSoup for links
		Document doc = Jsoup.parse(record.html);
		Elements links = doc.select("a[href]");

		// write everything to its respective file
		write(url, domain, host, links, url.replaceAll(":", "").replaceAll("/", "") + ".txt");
	}

	public void write(String url, String domain, String host, Elements links, String filePath) {
		try (BufferedWriter writer = new BufferedWriter(new FileWriter(new File(filePath)))) {
			writer.write("URL\t" + url + "\n\r");
			writer.write("DOMAIN\t" + domain + "\n\r");
			writer.write("HOST\t" + host + "\n\r");
			for (Element link : links) {
				writer.write("LINK\t" + link.text() + "\n\r");
			}
		} catch (IOException ioe) {
			ioe.printStackTrace();
			System.out.println("Couldn't write to file " + filePath + "!");
			System.exit(1);
		}
	}

}

- Jeremy Gilreath January 04, 2015 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

I believe B+ tree would be useful which will have all content on leaves only. So we can keep whole tree in memory except leaves. Besides it, memory mapped files with LRU approach would be also useful.

- Phoenix August 23, 2015 | Flag Reply


Add a Comment
Name:

Writing Code? Surround your code with {{{ and }}} to preserve whitespace.

Books

is a comprehensive book on getting a job at a top tech company, while focuses on dev interviews and does this for PMs.

Learn More

Videos

CareerCup's interview videos give you a real-life look at technical interviews. In these unscripted videos, watch how other candidates handle tough questions and how the interviewer thinks about their performance.

Learn More

Resume Review

Most engineers make critical mistakes on their resumes -- we can fix your resume with our custom resume review service. And, we use fellow engineers as our resume reviewers, so you can be sure that we "get" what you're saying.

Learn More

Mock Interviews

Our Mock Interviews will be conducted "in character" just like a real interview, and can focus on whatever topics you want. All our interviewers have worked for Microsoft, Google or Amazon, you know you'll get a true-to-life experience.

Learn More