Interview Question
Country: India
Interview Type: Written Test
how many time does an element occur at most in a set? ;-) once otherwise it's not a set, right? but I guess, he didn't really mean a set but a data set / array / what ever with the semantics that if you add 10 two times and delete it once, it's still once in there.
- mean: Sum/n (as long as no overflow danger exists...)
- median: Method with two heaps. To delete use the same structure as with mode below but instead of element count, use element value (a max heap and a min heap...)
- mode: heap of element'counts combined with a HT to the index of such an element into the heap (requires the heap to patch the HT when heapifying, bubbling up/down)
requires some memory... but the question didn't mention about usage. If you insert / delete most of the time, it might be more convenient to just insert/delete into a HT (value, count) and perform the queries in linear time which works for mean, median and mode.
Calculating mode over stream of elements/number is tricky.
If the numbers are known in advance, then O(nlogn) to sort and then iterate over the list to find the mode. or just O(n) for inserting all the numbers into hashtable<key, count> and then O(n) to find the mode. However, to operate over the stream of numbers, the combination of HT and sorted list is probably better. the data structure to use will be
struct list_element {
int count;
element e;
};
list<list_element > ls;
unordered_map<element, list< list_element >::iterator> ht;
where the value of the hashtable is the iterator into the list_element of {element + count}.
Insertion is O(1) and will cause two kinds of operations into the list
a) if not found in ht, push_back the entry into list (element, count =1), and then add into HT (element, and list.iterator).
b) if found, then get the iterator of the node into the list, update the count and then look for nodes prior to this node until we find node->count > this node->count and insert this node after the node. Thus maintaining the ordering of the list by the count.
However, a set, probably should not have a mode.
- NoOne May 31, 2017Mode is defined to be the element having maximal occurrence in a collection of data points.
In that case, a sorted list set should suffice.