Linkedin Interview Question for Senior Software Development Engineers


Country: United States
Interview Type: In-Person




Comment hidden because of low score. Click to expand.
1
of 1 vote

Data Structure: BoundedPriorityQueues - one for 1hrs, one for 24 hrs and one for 5 months.

- ss January 10, 2016 | Flag Reply
Comment hidden because of low score. Click to expand.
1
of 1 vote

You guys are nuts! :-)

Problem explicitly states scale of the problem - facebook/twitter, so you should rightfully expect millions of events per hour. You should also understand that topics freq chart has a long-long tail. Keeping these millions of tags in RAM (all these heap-based solutions) would kill you. Storing them on disk is an option (DB based solutions) but then computing and retrieving them would be an offline job with certain latency.
I guess the most appropriate solution is any of the streaming counting algorithms - see sticky sampling or lossy counting for instance.

- Igor March 04, 2016 | Flag Reply
Comment hidden because of low score. Click to expand.
1
of 1 vote

This is a system design question and most answers seem to be solutions for a single system. Here is the high-level design points
1.) We need to keep the counts for 5min, 1 hour and 24 hours in a rolling window fashion.
2.) There are multiple machines, each machine should keep track of the counts of topics. At every delta interval, each machine updates counts with central server and resets it counts to zero.
3.) The central server merges the counts from all machines to have final list of topics and counts. At every delta interval, the central takes a snap shot of the top N topics and counts in an array and resets its counter. N is a tunable parameter. Higher N means more memory and accurate metrics, lower N means lower memory but less accurate metrics.
4.) There are three counters that keep track of topic counts for last 5 mins, 1 hr and 24 hrs. As the central server takes a snap shot, it adds the counts with each of these counters and remove the counts from the 5mins/delta, 1h2/delta and 24hrs/delta array value from the coressponding counter thus maintaining the rolling counts for topics.
5.) at any point of time, the sytems pull the topics and values from these counters to show the top tending topics for this duration

--------------------- Alternate Approach ----------------------
This a cleaner approach that gives statistically right results but the counts are not accurate. The central server or Load balancer can sample queries and increment the topic counter and the last update time. There are three counters, one for the 5mins, one for the 1hr and one for the 24 hrs. Each counter automatically removes the topics if the last update time is < corresponding duration, thus continuously trimming the counts. The sampling rate can be different for different counters which also is a tunable parameter between accuracy and performance.

- VMK September 25, 2016 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 votes

Thanks for your answer; I'd like to propose some improvements:
1.) in-memory db, such as Redis, with 16G RAM can keep all counters for all topics. However, if size of topics grow, we can use a cluster of Redis, with partitioned keys;
1.5.) Introduce a queue of events, and our compute servers will simply process the stream, aggregate counts for each batch, and update those aggregates in the DB. This is a huge game changer, because it reduces the number of updates to DB. A few aggregators can process events from the queue at 1,000 RPS; if queue builds up, aggregators can be simply increased.
If we use Kafka, we can set 10-20 partitions no the topic, so consumers can scale out easily to 10-20.

2) So, multiple machines need to be clarified: is it handful or thousands.

3) With introducing streaming, compute is separated and results are kept in 1 or two in-memory DBs. Central server aggregates and updates the 3 counts in moving window: 5min every minute, 1hour every 5 minutes gets updated, and 24hr number every hour gets updated.

- keighobadi May 26, 2022 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 vote

Yes, this is more of system design question rather than algorithmic. That said, just wanted to add few notes to already answered algorithmic answers.

I think both a max heap and min heap of N tags (built based on their counts) would be required,
So whenever a new tag comes,

a. Get count of tag
b. Check min heap O(1) if its worthy to add i.e. if its really a Top N tag
c. Abort if not
d. if yes, then add it to both max and min heap (at end, as its a complete tree) and call heapify.
e. Now max heap always has the top tag at its root O(1)

Let me know if this can be really done with help of 1 heap?

w.r.t System design, i will leave it to some person to answer it but I believe the heaps should be designed like a big cache residing in-memory (for O(1) faster access, strong use-case) and the challenge here is how to design such a distributed cache in-memory, persisting asynchronously the history of trending tags over time to some big data store for later retrieval, in case if needed (slow access, not a strong use-case)

- Hari November 20, 2015 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 votes

yes, not sure if everyone here is realizing that you need to keep the data structure up-to-date with cutoffs at 5m/1hr/24hrs. Using the heap or whatever, doesnt automatically solve that issue. The calculation of what is "currently" trending (in those set intervals) needs to be done elsewhere, may be once every minute or so and update the heaps or even a in-memory map somewhere for clients to read from.

- somedude January 11, 2016 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 vote

Design a Cassandra database for this.
Query: Select original_posts where shared_time > current_time - (1 hr/24hrs/5months) order by numshares desc

So design the database around this query.

Create table posts_by_time_numshares (
   original_post_id uuid,
   original_post_title text,
   original_post_creator frozen <user>,
   shared_by frozen <user>,
   time_shared_at timestamp,
   numshares counter
   PRIMARY KEY ((original_post_id), time_shared_at, numshares, shared_by)
   CLUSTERNING ORDER BY (time_shared_at DESC, numshares DESC, shared_by ASC)  
);

So when the write happens it will write to the Cassandra nodes lets say with RF=3, We can read the nodes with CONSISTENCY QUORUM, so we have the agreement with majority of nodes for the counter numshares.

- dhirendra.sinha January 18, 2016 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

In my opinion, we can maintain a nosql DB ( for better performance ) for the users and the posts. So, will have a list of all the users in the network ( lets say it's U) and a list of all the posts ( lets say it's P).
Next we will have a p2u table which will have a mapping between a particular post with the list of users that have shared it. And we need to update this table every 5 mins/1 hr etc. So, at every point of time p2u table will look like :

p2u = { 'p1': [u1, u2, u3, u5], 'p2': [u4,u5]

So, this dict can fairly solve the problem.

- Agniswar Bakshi January 26, 2016 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

So, whats the real answer of this question ? Can anybody help...

- .NetGeek May 07, 2016 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

There's an article which explains it all. Sliding window technique.

www . michael-noll . com / blog / 2013 / 01/18/implementing-real-time-trending-topics-in-storm/

- saav August 19, 2017 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

There's an article which explains it all. Sliding window technique.

www . michael-noll . com / blog / 2013 / 01/18/implementing-real-time-trending-topics-in-storm/

- saav August 19, 2017 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

There's an article which explains it all. Sliding window technique.

michael-noll . com / blog / 2013 / 01 / 18 / implementing - real-time-trending- topics-in-storm /

- saav August 19, 2017 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

Check out the youtube video with ID = kx-XDoPjoHw ("System Design Interview - Top K Problem (Heavy Hitters)") for a thorough treatment of the modern approach

- G Chao July 29, 2022 | Flag Reply
Comment hidden because of low score. Click to expand.
-1
of 1 vote

Max Heap can be used in this scenario.

Basic tending system works based on #tag and its count e.g. how many times it has been shared in particular geography,You can go with Heap data structure when designing such system, say for example you wanna find out the top N trending keywords in you application/website,you can create heap of N #tag based on count (its should be max-heap).

Algorithm:

> Create max heap of size N. Time complexity to build the heap is O(N).
> For every new #tag compare its count with existing root (#tag) count if its greater then root count , replace root with this node (#tag, count) and call heapify to maintain the max-heap property. Heapify will take O(log N) time.
> Sort the heap if you're interested in ordering, it will take O(N log N) to sort the heap.
> At any time T heap will only contain the top N trending keywords.

- R@M3$H.N October 02, 2015 | Flag Reply
Comment hidden because of low score. Click to expand.
1
of 1 vote

I think when we get a new tag we can't just compare with the root.
What if the number of times it has appeared is less than the root but greater than the 2nd/ 3rd highest tag???

I suggest using a min heap. Add randomly n tags and build min-heap and then for remaining of the tags compare with root of heap and change if current_tag.count > roothead.count and call heapify. Same can be done when we get a new tag.
This way we always maintain a heap of N elements with maximum counts




Also I think we could use a balanced BST and whenever we want the trending topics, we could just do reverse inorder traversal.

- anubhav9042 October 14, 2015 | Flag
Comment hidden because of low score. Click to expand.
2
of 2 votes

I'm not sure if this is necessarily a question of algorithms / data structure. This seems to me more on the side of system design, like using Hadoop to store data, writing API in front of it, how to store data in a scalable way, etc.

- John October 24, 2015 | Flag
Comment hidden because of low score. Click to expand.
-1
of 1 vote

public class ImprovedHashMap {
	Map<Integer, TreeNode> map = new HashMap<Integer, TreeNode>();

	public static void main(String[] args) {
		

		ImprovedHashMap m = new ImprovedHashMap();
		m.put(4, 1, 3);
		m.put(4, 0, 4);
		m.put(4, 3, 16);
		m.put(4, 5, 17);
		m.put(4, 2, 12);
		int x = m.get(4, 0.6f);
		System.out.println("x is " + x);
	}

	public int get(int key, float t) {
		TreeNode root = map.get(key);
		int i = searchBST(root, t, root.value, root.t);
		return i;
	}

	public int searchBST(TreeNode root, float t, int closestMatch, float closestT) {
		if (root == null) {
			return closestMatch;
		}
		if (root.t > t) {
			return searchBST(root.left, t, closestMatch, t);
		} else if (root.t < t) {
			if (closestT > root.t) {
				closestMatch = root.value;
				closestT = root.t;
			}
			return searchBST(root.right, t, closestMatch, closestT);
		} else {
			return root.value;
		}

	}

	public void put(int key, float t, int value) {
		// TreeNode node = new TreeNode(value, key);
		TreeNode n = new TreeNode(value, t);
		TreeNode root = map.get(key);
		if (root == null) {
			map.put(key, n);
		} else {
			insert(root, n);
		}
	}

	public static void insert(TreeNode root, TreeNode child) {
		if ((root.t > child.t)) {
			if (root.left == null) {
				root.left = child;
			} else {
				insert(root.left, child);
			}
		} else if (root.t < child.t) {
			if (root.right == null) {
				root.right = child;
			} else {
				insert(root.right, child);
			}
		}
	}

	private static class TreeNode {
		float t;
		int value;
		TreeNode left;
		TreeNode right;

		public TreeNode(int i, float t) {
			this.value = i;
			this.t = t;
		}

}

- Amey November 02, 2015 | Flag Reply


Add a Comment
Name:

Writing Code? Surround your code with {{{ and }}} to preserve whitespace.

Books

is a comprehensive book on getting a job at a top tech company, while focuses on dev interviews and does this for PMs.

Learn More

Videos

CareerCup's interview videos give you a real-life look at technical interviews. In these unscripted videos, watch how other candidates handle tough questions and how the interviewer thinks about their performance.

Learn More

Resume Review

Most engineers make critical mistakes on their resumes -- we can fix your resume with our custom resume review service. And, we use fellow engineers as our resume reviewers, so you can be sure that we "get" what you're saying.

Learn More

Mock Interviews

Our Mock Interviews will be conducted "in character" just like a real interview, and can focus on whatever topics you want. All our interviewers have worked for Microsoft, Google or Amazon, you know you'll get a true-to-life experience.

Learn More