Facebook Interview Question
SDE1sCountry: United States
Load balancing is what interviewer is looking for but its ambiguous when he says "Slow".It can be any from a broad spectrum.
If its getting slow relative to its earlier performance than it varies from network speed to amount of space / resources available at server end.
In terms of codes, the best you can do is limiting the use of thread by threadpooling. meaning, you can spawn specific number of threads at a time so that it wont hog the system doing nothing but scheduling.
This can mean anything, like the system was built and from the very beginning with one user it was slow, after introducing a new feature it got slow, after no obvious change it got slow, over night it got slow, ... there are thousand possible causes, ranging from classical scalabillity problems, over attacks, wrong configurations to programming errors or unexpected user behavior, etc. etc.
- Chris June 09, 2017The approach to bring light into the dark would be:
1) Verify and understand the objectives
2) Based on data, create a thesis of what's wrong and prove it.
3) Solve it.
1: what does "slow" mean, what is the actual objective (e.g. 50 ms end-to-end in 99.9% of times, from clients not further away than 3000 miles, or it may be the serving time, from the moment the request starts to the moment the respond is sent out completely, etc. etc.) being not specific about the service level objective is usually the first cause of many downstream problems. So, figure out what's needed and what should be measured first.
2: If there is monitoring information available, try to understand which measurement changed in correlation of the "slow response" time or depending on the situation other potential evidence may be available.
an other approach may be to gather the people who built it, try to get their intuition on what's wrong, create a thesis and try to prove it. If that's not available either start with standard monitoring information from the webserver and the database, narrow down an isolated and reproducable case that is "slow", understand it, maybe instrument code and analyze accumulated runtimes in call trees, try to identify the component that causes the most delay etc. (work your way downwards)
I think the key is, do not change anything until you understand what the goal is and what causes the problem. There are too many people that start out based on their personal experience which might or might not apply in a specific situation.