Bloomberg LP Interview Question
Financial Software DevelopersCountry: United States
Interview Type: Phone Interview
Because at any time I need to change/throw 1 company out of the heap, I would like the one which has been traded the minimum. If I keep maximum heap, every time a company is traded, I have to check that company's number of share, with the current heap's minimum, because that's the only 1 which will be thrown out of the heap
Thanks Aditya for explaining. Sorry, I didn't quite follow you. Can you explain by giving an example.
Everyday u create a new heap and hasmap. 1st 10 companies traded automatically enter and create the heap, as they are the most (and only) traded company till then. Then, as and when more trades happen, every time there is a live update, you check the updated record with the min heap only, so only, a constant number of 10 checks. If the company is in the Heap already, you update the number of shares, and rebalance the heap if required. If the company is not in the heap, you just see if the total trade of this new company is more than the trade of the root company. If yes, you replace the root with the new company, and rebalance if required.
How can we find the ten most traded companies from the HashMap that is maintained? Do we need to scan the entire hashmap to find the top 10 every time a live update comes in?
Nope, never said you find top 10 from HashMap. For the top 10, you maintain a min heap, and everytime there is a live update, you check the updated record with the min heap only, so only, a constant number of 10 checks. If the company is in the Heap already, you update the number of shares, and rebalance the heap if required. If the company is not in the heap, you just see if the total trade of this new company is more than the trade of the root company. If yes, you replace the root with the new company, and rebalance if required.
I was wondering why a simple sorted array is not the solution. It provides O(1) modification and search and max.
Exactly how do you get O(1) modification, search or max in a sorted array?
None of these operations are possible in O(1), except maybe Modification, that also, ONLY if you know the index you want to modify
You are right. Search and max-10 are O(1) since it is sorted. But the sorting after modifying one company is O(n) which is not good.
But a combination of sorted array and heap might help. If we assume "N = the number of companies" and N >> 10, then you could imagine a sorted array of length 9 and a heap of length N - 9 with "extract-max".
Now if the load for a company changes, if it already belongs to the top 9, then it is at most "9" swaps to re-sort. If it belongs to 10'th company, then we compare the max of heap with the min of array. Then if necessary, in one swap, we could change the "max" of heap with min of array and in at most 9 swaps re-order array. If the company is already in the heap, then it is O(log(N - 9)). So the time complexity is max(10, c log(N)) swaps.
1) In your original approach, even searching isn't O(1), since the array was sorted by number of trades, and not some company index.
2) In your current approach, if the company is in the heap, and the number of trades isn't sufficient for it to qualify as top 10, you still end up re-balancing the heap in O(log(N)) time, which isn't good. You will end up re-balancing the heap at every trade, not just when you need the top 10.
1) Searching is O(1). You can assume you keep a separate array to keep the company index and apply any modification on the trade load array on the index array as well. Then the i-th position of the trade-load is the i-th largest, and the i-th position of the index array, is the company index.
2) True. But each new modification (if arbitrary) requires O(log(n)) comparisons to determine the position.
3) <<<On a separate note>>> I guess we don't even need to have a heap for this application. Divide the companies into two sets: Set(1) = {top 9}, Set(2)={Other companies}. Keep a pointer to the company in Set(2) with maximum trade load at this time.
Event: A company "X" changes in trade load:
Let:
Y = max{Set(2)}, Z = arg min{Set(1)}
A) X is in Set(2):
A-1) X < Y ==> Do nothing.
A-2) (X > Y) & (X < Z) ==> max{S(2)} = X.
A-3) X > Z ==> Add X to array of top 10. Drop Z from the array. Let max{Set(2)} = Z.
B)
The X = Y (also in Set(2))
B-1) Y > Z ==> Set max{S(2)} = &Z. Add to position of Z. Reorder top 9.
B-2) Y < Z ==> Do nothing.
C)
X is in Set(1)
C-1) A simple reorder under 9 swaps. No change to Set(2).
Then, the time complexity of each modification would be O(1).
This approach works if trade load change at each instance is a non-negative number.
1) Still not sure how you will get O(1) search. Can you please give an example? (Also, you are now using 2 arrays, twice d space).
2) In your approach, each modification requires O(log(N)) comparisons to determine the position. That will not be the case in a 10 element min heap and a HashTable, the solution I proposed after the hint from the interviewer.
3) Not sure how you intend to implement the Set Interface. That will dictate how much time it will take for you to find and update the trade load value of companies in both the sets. If that's taken care of, this look good, at least to me. The only thing, though outside the scale of this specific problem is, what if the query of top 10 changes, say, to a 100, or a 1000. Maintaining a min heap still scales better, as a new entry in the heap will require a operation of logN time, while sorting a set (or array, whatever you are using) will need more time.
3) This is a crude implementation:
Company* top_companies[10]; \\ Pointer to members of Set 1 (I included the 10th company in Set(1)). This is going to be a sorted array where comparison is based on Company.load.
Company* all_companies = new Company[N]; // Set of all companies = Set(1) U Set(2)
Company* MaxInSet2; // Pointer to company with max load in the low-(N-10) companies.
void OnCompanyLoadReceived(Event e)
{
Company* c = e.PtrToCompany; // Find the company whose load is changed
c->load += e.load; // Update load
// If the company load has become large enough to join the top 10
if (c->load > top_companies[9]->load)
{
c = InsertNewCompany(c);
// "c" is inserted in top_10 and the company which is kicked out is returned. If "c" WAS already in top 10, nullptr is returned
};
if (c != nullptr)
{
// the initial company, or some other company does not belong to top 10
if (c->load > MaxInSet2->load)
{
MaxInSet2 = c;
};
}
}
As I look again, the pointer to maximum is not necessary. All we care is the top 10. When a company changes in load, we just need to compare it with top 10.
As you said, it scales badly with number of top companies. Maybe for large enough such number, a balanced binary tree is a solution.
2) You are right. At first I was thinking of sorted array implementation and I did not notice you solution.
1) What I meant was something like this:
order = {1, 2, 3, 4, 5};
load = {0, 0, 0, 0, 0};
if load for some company changes, e.g., company "3", then
order = [1, 2, 3, 4, 5];
load = [0, 0, 10, 0, 0];
After sort we swap load[2] and load[0]. Do the same for order.
load = [10, 0, 0, 0, 0];
order = [3, 1, 2, 4, 5];
Now order[0] refers to the company with maximum load.
However, this is useless since we could define an array of Company objects already including all the information.
class Solution {
public:
int topsize;
struct cmpIntDesc {
bool operator()(int a, int b) const {
return a>b;
}
};
unordered_map<string, int> svi;
multimap<int,string, cmpIntDesc> ivs;
Solution(int size=10){
topsize=size;
}
void add(string name, int count){
multimap<int,string, cmpIntDesc>::iterator itr=ivs.find(svi[name]);
if(itr!=ivs.end()){
ivs.erase(itr);
}
svi[name]+=count;
ivs.insert({svi[name], name});
if(topsize<(int)ivs.size()){
multimap<int,string>::iterator bottom = ivs.end();
ivs.erase(--bottom);
}
}
void printtop(){
for(auto itr=ivs.begin(); itr!=ivs.end(); ++itr){
cout<<itr->second<<":"<<itr->first<<endl;
}
}
};
class Solution {
public:
int topsize;
struct cmpIntDesc {
bool operator()(int a, int b) const {
return a>b;
}
};
unordered_map<string, int> svi;
multimap<int,string, cmpIntDesc> ivs;
Solution(int size=10){
topsize=size;
}
void add(string name, int count){
multimap<int,string, cmpIntDesc>::iterator itr=ivs.find(svi[name]);
if(itr!=ivs.end()){
ivs.erase(itr);
}
svi[name]+=count;
ivs.insert({svi[name], name});
if(topsize<(int)ivs.size()){
multimap<int,string>::iterator bottom = ivs.end();
ivs.erase(--bottom);
}
}
void printtop(){
for(auto itr=ivs.begin(); itr!=ivs.end(); ++itr){
cout<<itr->second<<":"<<itr->first<<endl;
}
}
};
Hi Aditya,
Totally unrelated to your question, I just wanted to know if the interview process in Bloomberg specifies any language like C or C++ or Java, or is it open to us to use anything.
I think bloomberg has bigger focus on c++, because their work is mainly in that, except a couple of team who work in JAVA. That said, if u are strong in JAVA and not C++, dnt say to d interviewer that you will give the interview in C++. To them it doesn't matter that much I believe, as long as you can show that you understand the approach to the problem sufficiently well.
Why maitaining a MinHeap instead of a MaxHeap?
- Anonymous April 28, 2013