Apple Interview QuestionSDE-3s
- 2of 2 votes
Apple On-site at Cupertino
Team Data Warehousing
Questions on Hadoop, Hive and Spark
I. Given a table with 1B of user ID and product IDs that the users bought, and another table with product ID mapped with product name. We are trying to find the paired products that are often purchased together by the same user, such as wine and bottle opener, chips and beer … How to find the top 100 of these co-existed pairs of products. If going with hadoop, where is the bottleneck and how to optimize?
II. Someone put distribute Random()*ID in a Hive script to prevent data skew. What would be the problem here?
Country: United States
Interview Type: In-Person