Microsoft Interview Question
Country: United States
Interview Type: In-Person
Correct.
You can reduce some key space if you hold images only in the full B-Tree and add links from the favorites B-Tree nodes to the proper node in the full B-Tree to retrieve the image. Thus avoiding the HashMap all together.
But image space is usually going to be much bigger then key space.
Use a binary tree of some type (pick your favorite flavor). Each node of the tree would preserve a single photo and a flag indicating if the photo is a favorite. The tree is sorted on the time that the photographs are taken.
Insert(x, t, m): inserts a new tree node at a location 't' containing photo 'x' and flagged favorite = 'm'==1. This operation is O(log n) complexity and takes O(1) memory where n is the number of stored photos
Search(t): do a search for the first node that occurs immediately after 't'. This operation is O(log n) in complexity and O(1) memory where n is the number of stored photos
NextFavorite(t): do a search for the first node with the flag == true and time after 't'. This operation is also O(log n) in complexity and O(1) memory where n is the number of stored photos
For the most part, this is the 90% immediate solution. There are naturally other things that could be done like bucketing chunks of time or maintaining separate trees for the favorites vs all the photos.
Use multiple hash maps
like hashmap : (key date/value hashmap : (key hour/value hashmap : (key minutes/value tree storing data in order of time)) )...
The granularity can be a design decision, but should be flexible to change and add more HashMaps or remove them , may be by some intellegent layer to retreive the photos efficiently..
NextFavorite(t) would not be logarithmic time under that structure. You'd have to start an in-order traversal from the first node greater than t. Finding that first node is logarithmic but the traversal could touch every remaining node in the tree.
For instance, NextFavorite(0) in a tree where no nodes are marked favorite.
This could be addressed by using two trees, one of which only holds favorited pictures.
B+ tree is a data structure that can be used here.
Leaf nodes contain (pointer to photograph stored in memory, time and favorite). and all leaf nodes are connected.
Now for inserting new record it will take O(logn base b) b - order of b+ tree
Search also it takes O(logn base b) Time complexity.
B+ tree is a data structure that can be used here.
Leaf nodes contain (pointer to photograph stored in memory, time and favorite). and all leaf nodes are connected.
Now for inserting new record it will take O(logn base b) b - order of b+ tree
Search also it takes O(logn base b) Time complexity.
Use multiple hash maps
like hashmap : (key date/value hashmap : (key hour/value hashmap : (key minutes/value tree storing data in order of time)) )...
The granularity can be a design decision, but should be flexible to change and add more HashMaps or remove them , may be by some intellegent layer to retreive the photos efficiently..
how would you perform the search in this case? can you detail your search operation and the time complexity for it?
Use multiple hash maps
like hashmap : (key date/value hashmap : (key hour/value hashmap : (key minutes/value tree storing data in order of time)) )...
The granularity can be a design decision, but should be flexible to change and add more HashMaps or remove them , may be by some intellegent layer to retreive the photos efficiently..
In the game of Jack Straws, a number of plastic or wooden "straws" are dumped on the table and players try to remove them one-by-one without disturbing the other straws. In the question here, we are concerned with the following, related problem: First n straws are dumped on the table. Next, for every pair of straws, we want to determine if the straws are connected by a path of touching straws. In other words, given a list of the endpoints for n > 1 straws (as if they were dumped on a large piece of graph paper), determine all the pairs of straws that are connected. Note that touching is connecting, but also two straws can be connected indirectly via other connected straws. Give an algorithm to compute all the pairs of straws that are connected.
Here is some additional information that you can use:
i.As a basic step you need to determine whether two straws directly touch (intersect), i.e., whether straw ab with end points a and b intersects with straw cd (with end points c and d).
ii. To get full credit also explain how the elementary step i) can be done. If you cannot find an algorithm for this step, assume an O(n 2 ) algorithm and explain how to find all pairs of straws that are connected.
Analyse and explain the worst-case computational complexity of your algorithm in terms of n.
1. Have a linked list of timestamps that have at least 1 photo; ordered by time
2. Have a hash that maps from t to:
a) the corresponding timestamp in the linked list above
b) a set of photos taken at t that are not favourite
c) a set of photos taken at t that are marked as favourite
The linked list of timestamps is to quickly find the next available timestamp in O(1) time.
The hash allows for finding a timestamp along with the next available timestamp and photos taken at t, all in O(1) time.
insert(x,t,m): add timestamp t to LL and hash if not exists. Add x to set of photos taken at t (which set depends on m).
search(t) and nextFavourite(t): use hash to find t in linked list, then call next() to get the next timestamp t' if exists. Can then use hash to find t' and the next photo after time t that has a favourite or non-favourite.
Overall, every type of operation (add, remove, search) will be O(1) time.
Space is O(n) where n is the number of photos.
I have come up with BST approach
it would be a binary search tree indexed by time t
1. Insert operation: same as the standard put operation. Avg case O(lg N) worst case O(N) ,, memory O(1)
2. Search operation: this is like a ceiling operation in BST where we find the node that is immediately greater than the current key value
Avg case: O(lg N) , worst case : O(N) memory: O(1)
public Node search(int t)
{
return search(root,t); // Assume that we have a pointer to the root.
}
private Node search(Node x, int t)
{
if (x==null) return null;
else if(compareTo(x.t,t)<0) // x.t less than t , focus on right
return search(x.right,t);
else if(compareTo(x.t,t)>=0)
{
Node t=min(x.left); // check to ensure that the we don't have an element which is smaller than the current element x
if(t!=null)
return t;
else
return x;
}
}
3. NextFavourite(t): similar operation as above but replace min(x.left) with minfav(x.left)
avg case : O(lg N) ,, worst case O(N) , memory O(1)
minfav(Node x)
{
Node y;
if(x==null) return null;
y=minfav(x.left);
if(y==null)
{
if(x.fav==1);
return x
else
y=minfav(x.right);
}
return y;
}
Here is the complete solution for the problem
Please note that, the author has emphasized on the size of data, so we will store the photographs in external memory only.
here is the background and assumptions behind the algorithm
We will use B+ tree to implement this, where all the internal nodes only store the indexes and all the terminal node stores the real data(the photographs)
The indexes will stay in the main memory only where the assumption the indexes can stay in the memory(primary memory), if not, we have to save the indexes also into the secondary memory, which will increase few I/O operations.
we will keep two trees, one to store normal images and another to store favourites
Here is the algorithm for Search(t)
Search(t)
Figure out the memory block where the photographs are stored which are inserted after t both from normal tree and favourite-tree.
Node = T.Root;Directive = Next_Comparison_Directive(null); // return year
while(true)
If Node.IsDevisionPresent
Node = Node.Child_a ? Node.Child_a.Year = t.Year(can use BST to figure out the child, as not Year have the child)
Directive = Next_Comparison_Directive(Directive); // return in following order Year > Month > Day > Hour > Minute > Second > MillSecond as so on
else
read Node.Photograph block from secondary memory(we may have to read two blocks)
break;
After that, you can do the BST to find out an images whose insert time is just next to given t
return the one having minimum time
Here is the algorithm for NextFavourite(t)
NextFavourite(t)
Figure out the memory block where the photographs are stored which are inserted after t from favourite-tree.
Node = T.Root;Directive = Next_Comparison_Directive(null); // return year
while(true)
If Node.IsDevisionPresent
Node = Node.Child_a ? Node.Child_a.Year = t.Year(can use BST to figure out the child, as not Year have the child)
Directive = Next_Comparison_Directive(Directive); // return in following order Year > Month > Day > Hour > Minute > Second > MillSecond as so on
else
read Node.Photograph block from secondary memory(we may have to read two blocks)
break;
After that, you can do the BST to find out an images whose insert time is just next to given t
return the image.
Here is the algorithm for Insert(x,t,m)
Insert(x,t,m)
Based upon m decide the tree;
Figure out the memory block where the photographs can be stored with t time.
Node = T.Root;Directive = Next_Comparison_Directive(null); // return year
while(true)
If Node.IsDevisionPresent
Node = Node.Child_a ? Node.Child_a.Year = t.Year(can use BST to figure out the child, as not Year have the child)
if(Node is null or no node exists, add a new child to parent node with directive value, and insert this new data into the block of this new node).
Directive = Next_Comparison_Directive(Directive); // return in following order Year > Month > Day > Hour > Minute > Second > MillSecond as so on
else
read Node.Photograph block from secondary memory(we may have to read two blocks)
break;
insert the entry at proper location in bst.
while(node is full) // means the block size is full
set Node.IsDevisionPresent = true;
figure out distinct count of Directive time units from the block;
Create one child for each, and set there IsDevisionPresent = false;
set the list of photographs into each of the child based upon the Directive.
If any of the node is full(actually there will be one only)
set node = node.child which is full.
else
save all the nodes block to secondary memory.
break;
Complexities
Search
Time : Constant(which is the search time) + 4.Log(M), where M is the a memory block size + 4 I/O
NextFavourite
Time : Constant(which is the search time) + 2.Log(M), where M is the a memory block size + 2 I/O
Insert
Time :
Best case : Constant(which is the search time) + Log(M), where M is the a memory block size + 2 I/O, which occur most of the time,
Node spiting case : Constant(which is the search time) + Log(M), where M is the a memory block size + M.I/O
With little modification, we can reduce Node spiting case Complexities to the best case, we can maintain Min and Max insert time of photos which are residing in a block at last internal node(which are just above the terminal node)
Space : O(N) : size of the input in secondary memory + constant size of internal memory.
In the game of Jack Straws, a number of plastic or wooden "straws" are dumped on the table and players try to remove them one-by-one without disturbing the other straws. In the question here, we are concerned with the following, related problem: First n straws are dumped on the table. Next, for every pair of straws, we want to determine if the straws are connected by a path of touching straws. In other words, given a list of the endpoints for n > 1 straws (as if they were dumped on a large piece of graph paper), determine all the pairs of straws that are connected. Note that touching is connecting, but also two straws can be connected indirectly via other connected straws. Give an algorithm to compute all the pairs of straws that are connected.
Here is some additional information that you can use:
i.As a basic step you need to determine whether two straws directly touch (intersect), i.e., whether straw ab with end points a and b intersects with straw cd (with end points c and d).
ii. To get full credit also explain how the elementary step i) can be done. If you cannot find an algorithm for this step, assume an O(n 2 ) algorithm and explain how to find all pairs of straws that are connected.
Analyse and explain the worst-case computational complexity of your algorithm in terms of n.
Use 2 B-Trees for storing the photo Node indexes based on their timestamp, one tree for favorites and other for non-favorites so search will be
- guilhebl May 29, 2015O (Log N), you can search for first immediate node after Timestamp T, this will still be O (log N), insert and remove O(log N) each node of the tree would store a key value which would point to a HashMap<key, photo> which would store the actual photo data (or else retrieve it from a persistent storage).