Amazon Interview Question for SDE-2s


Country: India




Comment hidden because of low score. Click to expand.
2
of 2 vote

We can make use of method known as "Content Filtering"

Intro:

> Google news is a large multi-user streaming news application
> The data set has very high churn due to the abundance of new news stories
> There are too many stories to rely on the user’s discretion to filter interesting and uninteresting news
> There are thousands of users with known usage histories in the form of “click data”
> Because this is an online system with streaming data (users don’t want old news) an offline method will not work.


Make Decisions for Users:


> The method is called content filtering
> The goal is to use any data at your disposal about a user and about incoming articles to predict the user’s interest.
> More specifically you would need a function that would take a user and an article and generate a prediction for interest (0,1).
> Then order the list such that articles with a high interest prediction are presented to the user.


Available Data:


> We have a limited variety of data, however we have an abundance of a certain type of data from which we can compute metadata about users and news stories.
> Click Histories
>> Record the stories that a user has clicked on in the past
>> Store this along with all other users click histories in a large table
>> The table consists simply of news articles for columns and users for rows.


MAP REDUCE - a large cluster data processing framework can be used. In simple words this framework will do:

> User Program executes the framework.
> The framework assigns a master computer
> And partitions the data set over the available workers
> The workers map the data
> Once all mapping is complete the workers are reassigned reducing tasks.
> The result of the reducing tasks are returned to the User Program 


Map Reduce Example of Word Counts

Map - scan words from documents
	Map(String key, String value)‏
		//key: document name
		//value: document contents
		for each word w in value
			EmitIntermediate(w,”1”)‏
Reduce – add all word’s counts together
	Reduce(String key, Iterator value)‏
		//key: a word
		//value: a list of counts
		int result = 0
		for each v in value:
			result+= ParseInt(v)‏
		Emit(AsString(result))‏	

Memroy Based Method:

> Makes predictions based on a user’s past selections
> For each story generate a vector of users that clicked on the story.
> Then use a similarity measurement (dot product) in the user-vector space to compute a users’s similarity with all other users.
> Unfortunately this is not scalable
	O(n^2) to compare all stories with all other stories
	So we will borrow a previously mentioned comparison technique – Min-wise Hashing


Can look at Min-Hash Clustering, Map-Reduced Min-Hash Clustering.

Please refer the presented paper at the below:
Google News Personalization paper at www2007.org papers paper570.pdf

- R@M3$H.N November 06, 2015 | Flag Reply


Add a Comment
Name:

Writing Code? Surround your code with {{{ and }}} to preserve whitespace.

Books

is a comprehensive book on getting a job at a top tech company, while focuses on dev interviews and does this for PMs.

Learn More

Videos

CareerCup's interview videos give you a real-life look at technical interviews. In these unscripted videos, watch how other candidates handle tough questions and how the interviewer thinks about their performance.

Learn More

Resume Review

Most engineers make critical mistakes on their resumes -- we can fix your resume with our custom resume review service. And, we use fellow engineers as our resume reviewers, so you can be sure that we "get" what you're saying.

Learn More

Mock Interviews

Our Mock Interviews will be conducted "in character" just like a real interview, and can focus on whatever topics you want. All our interviewers have worked for Microsoft, Google or Amazon, you know you'll get a true-to-life experience.

Learn More