## Apple Interview Question for SDE-3s

Country: United States
Interview Type: In-Person

Comment hidden because of low score. Click to expand.
0
of 0 vote

``II. Someone put distribute Random()*ID in a Hive script to prevent data skew. What would be the problem here?``

Problem here is , same id will get different partition number if using Random()*ID and hence will go to different reducers. Aggregation functions based on ID will result in incorrect results.

Comment hidden because of low score. Click to expand.
0
of 0 vote

1. Assuming that Every line in the input data contains user-id and list of product ids.
In the map phase, we will first extract all products purchased by a user and pair them up with the count.
e.g. CUST_123, PROD_1, PROD2, PROD3

result of map phase.
(PROD_1:PROD_2,1)
(PROD_1:PROD_3,1)
(PROD_2:PROD_3,1)

In the reduce phase, we will collect all such results from all users and then add all counts and then return top 100.

Comment hidden because of low score. Click to expand.
0
of 0 vote

This is how to answer the second question in old and boring SQL, join a table with itself by user id (so that each product is mapped with each product). Then remove rows with the same product and deduplicate them by filtering higher product id:

``````SELECT
FP.product AS product1,
T.product AS product2,
COUNT(1) AS bought_count
FROM Purchases AS FP
-- the < sign in the join so that we keep only 1 pair of (p1,p2) and (p2,p1)
INNER JOIN Purchases AS T
ON FP.user = T.user AND FP.product < T.product
GROUP BY FP.product, T.product
ORDER BY bought_count DESC
LIMIT 100``````

Though I have no idea how to do this in Spark. The bottleneck is obviously inner join, but what can we do to optimize it? Maybe the question means distributing the load proportionally among workers, I don't know.

Name:

Writing Code? Surround your code with {{{ and }}} to preserve whitespace.

### Books

is a comprehensive book on getting a job at a top tech company, while focuses on dev interviews and does this for PMs.

### Videos

CareerCup's interview videos give you a real-life look at technical interviews. In these unscripted videos, watch how other candidates handle tough questions and how the interviewer thinks about their performance.