Country: India

Few signals from the requirements
1. Stream of uninterrupted data suggests a big data problem.
2. A real-time dashboard suggests a solution like displaying a dashboard / Heavy hitters information.
3. Data displayed for 10 mins could suggest a time window of 10 mins or 1 min time window that is refreshed every minute to show data for the last 10 mins.

The overall design could include exposing endpoints over Websocket connections to accept a stream of real-time data. This data is then processed using a Spark streaming job that writes to an analytics store (like Apache Hbase / Apache Kudu). A dashboard that pulls data from the analytics store and displays top hitters on screen.

For approximate calculation, we can also talk about a faster route that uses Count Min Sketch (CMS) data structure on a single node and uses Min priority queue to hold max 20 IP addresses that are heavy hitters. The advantage of CMS over the Spark streaming route is CMS requires sublinear asymptotic computational complexity thus requiring less computing power and storage.

Since overall design will be large, I have created a google doc to show overall architecture.

- Saurabh January 03, 2020 | Flag Reply
Splunk Dashboard can help

- Naag December 22, 2019 | Flag Reply

