How do find out the top trendi

Amazon Interview Question for SDE-2s

1

of 3 votes

9
Answers
How do find out the top trendings products on a last hour/day/monthly basis?

Given that we have access to a real-time stream of sold product ids.
- koustav.adorable July 13, 2018 in India | Report Duplicate | Flag | PURGE
Amazon SDE-2 System Design

Email me when people comment.

An error occurred in subscribing you.

Country: India
Interview Type: In-Person

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 6 vote

There are 2 parts to this question
1. Keeping track of Top K products at all times
2. Data comes as a Stream. So we do not have access to all the data at once. So overall sorting is not possible.

Based on these requirements sliding window is a good option where we keep track of top K products each time a new data come in from a stream.
Note: Trending of products could be on any basis like "quantity sold" or "user ratings" etc.

Idea: We use a sliding window to keep track of Top K products, Within the sliding window, we sort the items based on their trends.

Example, assuming for a stream of products with "quantity sold" values coming as [10,5,13,2,21,8,3,40,12,15,19], what are the top 3 products

Sliding window = 3

Stream: 10 comes in
Window: [10]

Stream: 5 comes in
Window[10, 5]

Stream: 13 comes in
Window: [13, 10, 5]

Stream: 2 comes in
Window: [13, 10, 5] --> 2 is ignored as its not in Top K

etc.

For Sliding Window we can use TreeMap that can sort elements within the window internally. And for keeping it a fixed size, we can have our own implementation of TreeMap as follows:

Time Complexity: O(N)
Note: Sorting within TreeMap will be constant as regardless of the stream size, it will only sort K items in the Map.
Space Complexity: O(K) where K is the size of the window.

import java.util.Collections;
import java.util.TreeMap;

public class TopKTrending {

	public static void main(String[] args) {
		int[] a = {10,5,13,2,21,8,3,40,12,15,19};

		topKTrending(a, 3);
	}
	
	private static void topKTrending(int[] a, int k) {
		FixedMap<Integer, String> fMap = new FixedMap<Integer, String>(k);

		for(int i = 0; i < a.length; i++) {
			fMap.put(a[i], "Prod"+i);
			printMap(fMap);
		}
	}
	
	private static void printMap(FixedMap<Integer, String> fMap) {
		for(Integer key: fMap.keySet()) {
			System.out.print("{" + fMap.get(key) + "|" + key + "}");
		}
		System.out.println();
	}
}
/*Custom implementation of TreeMap with Fixed Size*/
class FixedMap<K,V> extends TreeMap<K,V> {
	int size = 0;
	FixedMap(int size) {
		super(Collections.reverseOrder()); //descending order elements inside TM starting highest
		this.size = size;
	}
	
	public V put(K key, V value) {
	  V old = super.put(key, value);
	  if (size() > size) {
	      super.remove(lastKey());
	  }
	  return old;
	}
}


Output:
======
{Prod0 | 10} 
{Prod0 | 10} {Prod1 | 5} 
{Prod2 | 13} {Prod0 | 10} {Prod1 | 5} 
{Prod2 | 13} {Prod0 | 10} {Prod1 | 5} 
{Prod4 | 21} {Prod2 | 13} {Prod0 | 10} 
{Prod4 | 21} {Prod2 | 13} {Prod0 | 10} 
{Prod4 | 21} {Prod2 | 13} {Prod0 | 10} 
{Prod7 | 40} {Prod4 | 21} {Prod2 | 13} 
{Prod7 | 40} {Prod4 | 21} {Prod2 | 13} 
{Prod7 | 40} {Prod4 | 21} {Prod9 | 15} 
{Prod7 | 40} {Prod4 | 21} {Prod10 | 19}

- Saurabh July 16, 2018 | Flag Reply

Comment hidden because of low score. Click to expand.

of 1 vote

View Count could be a major factor. Question is what is the condition or factor to cause a post / product a trend? is it people view, click, people response? Based of that you can use priority queue at least for normal interview based question but in real time, this could blow up heavily

- hprem991 July 16, 2018 | Flag Reply

Comment hidden because of low score. Click to expand.

of 1 vote

The following code is for hourly basis. And can be easily extended.
Check the items list on every 100 milliseconds to clean the older data.
To simplify the code names are used instead of product ids. But can be easily extended.

import java.util.SortedSet;
import java.util.TreeSet;

public class FindTrending {

	private SortedSet<Item> items = new TreeSet<>();
	
	public static void main(String[] args) throws InterruptedException {
		FindTrending findTrending = new FindTrending();
		RemoveAction removeAction = findTrending.new RemoveAction(findTrending.items);
		Thread subThread  = new Thread(removeAction);
		subThread.start();
		for(int i=0;i<100;i++){
			findTrending.addItem("test"+i);
			Thread.sleep(500);
			synchronized (findTrending.items) {				
				System.out.println(findTrending.items);
			}
		}
		subThread.interrupt();
	}
	
	public void addItem(String itemName){
		Item item = new Item(itemName);
		synchronized (items) {			
			if(items.contains(item)){
				items.remove(item);
			}
			items.add(item);
		}
	}
	
	class Item implements Comparable<Item>{
		String name;
		long time;
		
		public Item(String name) {
			this.name = name;
			time = System.currentTimeMillis();
		}
		
		public Item(String name,long time){
			this.name = name;
			this.time = time;
		}
		
		@Override
		public boolean equals(Object obj) {
			if(obj instanceof Item){
				((Item) obj).name.equalsIgnoreCase(this.name);
			}
			return false;
		}
		
		@Override
		public String toString() {
			return name;
		}
		
		@Override
		public int hashCode() {
			return name.hashCode();
		}
		
		@Override
		public int compareTo(Item o) {
			if(time-o.time > 0){
				return 1;
			}else{
				return -1;
			}
		}
		
	}
	
	class RemoveAction implements Runnable{
		
		private SortedSet<Item> items;
		
		public RemoveAction(SortedSet<Item> items) {
			this.items = items;
		}
		
		@Override
		public void run() {
			while(true){				
				synchronized (items) {
					long currTime = System.currentTimeMillis();
					long diff = currTime - 60*60*1000;
					Item temp = new Item("Dummy",diff);
					SortedSet<Item> headSet = items.headSet(temp);
					headSet.clear();
				}
				try {
					Thread.sleep(100);
				} catch (InterruptedException e) {
					
				}
			}
		}
	}
}

- dadakhalandhar August 05, 2018 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

I am a beginner so this might not be the most optimal solution. I was thinking about using a maxHeap to keep track of the products with largest counts.

- Om July 14, 2018 | Flag Reply

Comment hidden because of low score. Click to expand.

-1

of 1 vote

MapHeap or a Deque would be a good solution if the requirement is to keep track of only the top trending product (1 product) since we need to keep track of Top K products, we will need to keep track of multiple top elements at the same time.

A sliding window is one of the solutions. I have explained the approach in another answer.

- Saurabh July 16, 2018 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

Use an LRU cache where every time it gets refreshed when we have a new product with latest trend . Then from this cache we can show the n products .

- upumesh499 August 28, 2018 | Flag Reply

Comment hidden because of low score. Click to expand.

-1

of 1 vote

What about this solution.

We maintain a rolling hash map(productId,sumOfSoldQuantiesTillNow)
For each time slot, We store the map of (products,sumOfSoldQuantiesTillNow)
which can be retrieved later.

time hour1:
hour1_map={(p1,34),(p2,23),(p3,12),(p4,17),(p6,12)}

time hour2:
hour2_map=hour1_map+{(p1,3),(p2,14),(p3,8),(p4,34),(p5,12)}

time hour3:
Here, we can calculate the count of product sold of time_hour2 simply
by subtraction and take out top K

- koustav.adorable July 17, 2018 | Flag Reply

Comment hidden because of low score. Click to expand.

-1

of 1 vote

So, if I understand the problem description, you are given (timestamp, id) data stream and asked to compute the most frequent ids falling into the specified time window?

This is a well known problem. We need to know the size of the data stream that fits in the biggest time window of interest as compared to the available memory. If you need to compute the exact top-K heavy hitters it will require O(N) memory (it's proved that even exact top-1 requires O(N) space worst-case), where N is the biggest number of data points fitting into the time window. If you can satisfy this requirement, then it's one thing and the solution is relatively easy. On the other hand (I suspect this is what the interviewer wanted to hear) if the size of the stream is too huge and you can tolerate errors, then it calls for an approximation solution like lossy counting etc. see https://en.m.wikipedia.org/wiki/Streaming_algorithm#Frequent_elements

See also here https://cstheory.stackexchange.com/questions/19802/top-k-frequent-items-in-data-stream

- adr July 18, 2018 | Flag Reply

Comment hidden because of low score. Click to expand.

-1

of 1 vote

I am considering the case where the selling rate is too high. I am assuming that the trending products are the products that are sold most.

As the selling rate is high, we need to process quicky, so I will divide it further into more queues for faster processing.
Q1->W1
Q -> Q2->W2
Q3->W3
.
.
.
.
In the above figure, Q is the real time selling data. Q1,Q2,Q3 are the queues we are further dividing into. W1,W2, W3 are the corresponding workers. I explained about the workers in the next para.

There will be one worker for every queue which consumes the data and inserts into a database say IntervalDB_2018_01_01_12:25to12:30. In the database IntervalDB.., I will be storing the product ids. For every 5 minutes we create a new IntervalDB. So each IntervalDB corresponds to one five minute interval as shown below.

IntervalDB_2018_01_01_12:25to12:30
IntervalDB_2018_01_01_12:30to12:35
IntervalDB_2018_01_01_12:35to12:40
.
.
.

Note : For increasing the write speed to IntervalDB.. , we can use DB sharding, which I explained here.

Here for multiple orders with same pId, I am inserting the pId instead of increasing the count of pId. Because in the case where we need to process multiple orders with same pids, processing gets slowed because of concurrency.

Once insertion is done in a 5 minutes intervalDB, we count the number of items sold in that interval for each pId and insert to another db say CountDB.
______________________
| CountDB |
|_____________________|
| pId | 5MinInterval | count |
------------------------------------
1 ...... 5
.
.
.
.

We can delete the old intervalDBs whose counting is done.

For finding the products that are sold most in an hour or day or month, we can query the CountDB.

- ravi July 22, 2018 | Flag Reply

CareerCup

Amazon Interview Question for SDE-2s

Books

Videos

Resume Review

Mock Interviews