Bloomberg LP Interview Question
Senior Software Development EngineersCountry: United States
How about distributing the data across evenly on all the nodes ,
eg: Every node has a configuration params that defines the near cache nodes(the data will be present in any of the near cache nodes if not present in it).
As data gets added to the node , depending on the optimal configuration , data is deplicated among all the near nodes ...
This would solve the distributed data and improves overall cputime in transaction oriented operations...
In networking there are algorithms using which each router maintains the updated routing tables and then it is updated timely by collaboration between all the routers. In the same way each node can keep the configuration of which node is free and available and that configuration can be regularly updated using some collaboration algorithm. The job submitted can be placed in a queue and then from the queue it can be picked up by a node. All nodes can poll the queue at certain frequency. The node who picked up the job can then send request to different node to schedule the job. If node does not accept the request then it will send the request to other node which it knows are free. The node which did not accept request it will mark as busy.
This may not be exact solution since there may be certain trade off but something which can be proposed.
Paxos leader election. Apache Zookeeper implements paxos. And Zookeeper is used by hadoop, solr etc.
- Jen August 28, 2014