Design a DS for storing browsi

Amazon Adobe Interview Question for Software Engineer / Developers

0

of 0 votes

45
Answers
Design a DS for storing browsing history.
- Nascent April 20, 2013 in United States | Report Duplicate | Flag | PURGE
Amazon Adobe Software Engineer / Developer

Email me when people comment.

An error occurred in subscribing you.

Country: United States

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 8 vote

This is a very complex question that needs a detailed analysis of the requirements and typical use cases. Only then can we properly weigh the different possible solutions. Below are just some simple thoughts off the top of my head:

A simple idea is to keep 2 data structures. First I'd have a doubly-linked list-based queue sorted by date for each page access. Each node would have the timestamp, URL, and a pointer to the node in the queue representing the next older access of that same URL. The queue would be capped to both a certain size and certain time range and then I could delete entries from the back of the queue as needed.

Second, I would have a TreeSet sorted alphabetically by URL domain name first and then by page URL within that domain name, mapping URLs to pointers to the node in the doubly-linked list that represents the most recent access to that page.

That way, if someone wanted to look at when all their accesses to a specific page occurred, it could be done very quickly. They'd be presented with URLs in alphabetical order and they'd select the right one, a lookup would be done and they'd get the record of the most recent access, and the record of the most recent access has a pointer to the next access, etc. Then if they wanted to see their *entire* history of visited sites, they could view the entire queue and it would be showing them a log of things they've visited in chronological order.

The use of additional data structures is possibly desirable. For instance, maybe we'd store a pointer somewhere to a node that's a day old, a node that's 2 days old, etc., so certain timeslices of the history could be presented with the least amount of node traversals. Of course these pointers would have to be updated too, but background work is much better than having time during which the user is actively waiting for something to happen.

That's just if it's an in-memory structure. Another thought is that you need to persist this data structure. So maybe I'd rather use something that's more like a database (but local, without the whole remote server aspect that technologies like SQL Server have). If you used an indexed database table, you'd want at least an index on the date and an index on the URL.

- eugene.yarovoi September 03, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Finally someone sane here...

- Anonymous September 03, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 1 vote

@eugene.yarovoi, Before writing my thoughts, i experimented how firefox works in this respect. I found that they even keep browser history of 6 months old.
Here is my thought:
I would use a double linked list of dates. Each day, browser is accessed, i would be adding a new node to it. Now, each node of the double linked list will be another linked list for storing the URL's. I don't think timestamp is necessary here. We can add new visited pages to head of the list. In this way, the recent visited pages would be near the front of the list.
I liked your idea of treeSet. It will help if a user visits previously visited page. We need to adjust the nodes of the list.

Now, if the user wants to see the history by data, list will serve the purpose. If the user wants to search a particular URL, we can provide prefix searching like TRIE.

Let me know your thoughts.

- Aashish September 09, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

I believe Firefox has a setting for how long to keep browser history.

I proposed a timestamp because that's something the user might actually want to look at. "When did I access this page?"

Sure, you can use a linked list of linked lists if you want. I proposed just using one linked list and storing a pointer to something that's a day old, then two days old, etc. It doesn't matter much which of these strategies you use. It's much like how you can implement a 2D array as either an array of arrays or as a 1D array and an array of offsets into it.

A prefix tree is a good idea because it's probably a little easier to use it to provide search results as the user types (before the user has finished typing, that is). A modified TreeSet would also work for this purpose.

- eugene.yarovoi September 09, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Java Developers can use LinkedHashMap and override the removeEldestEntry(Map.Entry) method to implement the browsing history retention policy
More info on LinkedHashMap : docs.oracle.com/javase/6/docs/api/java/util/LinkedHashMap.html

- Anonymous April 24, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

How about using an ArrayList or Vector to store page access history, so that users can perform range query (which page was accessed during Jun. 1st - Jun. 5th)

- Anonymous April 30, 2013 | Flag

Comment hidden because of low score. Click to expand.

-1

of 1 vote

@First Anonymous: yes, LinkedHashMap is a convenient class that does a lot of what you need. It may not do everything that the combination of data structures I propose above does; namely; there's no convenient way to jump to a particular date and search for the pages accessed on that date.

@Second Anonymous: the data structures I proposed above are sufficient to be able to do range queries.

- eugene.yarovoi May 06, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 2 vote

I would use a trie in this case and we need to search for the link as well when we type in the address bar.

- DashDash April 21, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 2 vote

Doubly linked list. As we need to traverse both forward and backward directions.

- PCB April 23, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 1 vote

Web browsers commonly allow you to navigate through a "history" of web pages that have previously been visited. The mechanism is somewhat like a stack, in that the most recently visited pages are at the top of the history and revisited when the "back" button is pressed.

However, the history does not necessarily have unbounded capacity. In reality, there may exist a fixed limit on the size of the history. The issue arises as to what should happen when the capacity is exhausted and a new item is pushed onto the stack. One possibility is to throw an exception. But this is not how a Web browser behaves. If it only has room to save 50 pages in its history and yet you visit more pages, it will make room in the history for a new page by throwing away the page that is on the very bottom of the history (i.e., the least recently visited page). The formal Stack interface does not help, as it gives us no way to directly access or remove the object on the bottom of the stack.

- jayram singh September 03, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

I typically think of browser history as being distinct from browser navigation (forward / back buttons). The record that you visited a page should still be there even if you press "back".

- eugene.yarovoi September 03, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

There are supposed to be 2 Stacks one for Back and another for Forward buttons. When you move forward, add the current page details in back stack and when u move back (i.e pick up the top element details from back stack which will be the current page) add the current page details to forward stack. Dont worry about the size of the array, dynamic array can be used. This can also be done using list.

- Maddy September 06, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

We're talking about different things. There's browser history, which is all the pages you've visited in (for example) the last 7 days, and there's browser navigation which is 2 stacks for forward/back like you said. I guess it's not 100% clear which one the question wants.

- eugene.yarovoi September 08, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 1 vote

A stack because you always need to access the last element you put in first

- Anonymous April 20, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 2 vote

Sets to be used as one page visited again will be overwritten, will go with treeset with sorting according to time. Guys ,Any problem in this approach is welcomed

- as15 September 02, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

You're saying your equality criterion will be the URLs of two PageVisitedRecord objects matching, but yet your tree ordering criterion will be the date. So two PageVisitedRecord objects could be equal (same URLs) and yet still be supposed to be inserted in different places in the tree. This won't work for a variety of reasons.

First, when you update the set with an equal PageVisitedRecord to something you already have in the tree (equal by your equality criterion -- the URLs match), how will the set search for this equality? The tree is sorted by date.

Second, even if this update did somehow work, the set might do something stupid. It might think that since you're not inserting a new element, it doesn't need to move the existing element. That might be bad because your tree is sorted by date and your new element has a new date that might need to be in a different place in the tree. Or the set might think that it can just ignore your update request, since inserting an element into a set that is already there should have no effect, in theory. Then the date might not get updated.

- eugene.yarovoi September 02, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

It can't be stack, as there in you need to pop each element.
In history, we need to simply update one on URL if not already there.

- Varun September 03, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Or maybe you want to know *all* the times when a page was visited, not just the most recent. The requirements need to be clarified.

- eugene.yarovoi September 03, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 2 vote

double linked list may be the answer as we can go both forward and backward...

- Anonymous September 02, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Can someone point out why we cannot use double linked list instead of just voting down?

- musfiqur September 03, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

code.google.com/p/chromium/source/search link and search history and press enter

- q September 04, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

we can use hasp map where time object will be key and site address will be as value.This way we can work on time object(day,month,year) to update and delete history.

- guest April 21, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Binary search tree with each node containing timestamp(search key) and URL visited. This helps in finding the browsing history between two given timestamps with the help of inorder traversal.

- chetsrock April 21, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

I would use a min heap, with each node conatining the timestamp and the url. in this way, the most recent url (having minimum timestamp) will be on the top of the heap

- Devang Sinha April 21, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

I don't understand why you want to use a heap. Each time you add a new url, it would have the maximum timestamp (not minimum) already, unless the user can travel through time.

A linked list would do it.

- Adrien April 21, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

The actions the user can perform are:

- 1) Add a new URL in history (it would have the greatest timestamp).

- 2) See history
-> The user can browse today's history
-> The user can browse yesterday's history
-> The user can browse the history of a specific month
-> The user can browse the history of a specific year

- 3) The user can delete history (last day, last week, last month, all, a particular entry).

- 4) Search in browsing history (can use page's URL or page's title)
Maybe — to make the search — more efficient, we want to find a way to have a quicker access to pages that are visited more frequently.

For 2) and 3), I guess a skip-list would do it. The skip list would have 4 levels (days, weeks, months, years).
For 4), we could use a Generalized Suffix Tree.

The nodes from the skip list would reference nodes from the suffix tree and vice versa.

- adrienconrath April 21, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

I think the best DS is Hash Map in association with linked list. The hashmap will have name value pair as <Date, Linked List*>. Where for a given date the browsing details will be added in the linked list. The addition will be done at the start of the linked list. This, is to make the latest visited URL at the first.

Comments to it will be appreciated.

- Nascent April 22, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

- Nascent April 24, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Use a splay tree. For details see wikipedia for Splay Tree.

- mukulbudania April 25, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

If we visit chrome://history/ we see that the browsing history is grouped by date. The immediate browsing history is of course separate from this. I think every browser is exhibiting more or the same behavior.

So the data structure for storing the browsing history needs to be split into 2 categories:
1. Immediate - less than 6 hours
2. Long term - more than 6 hours

For immediate, we can use stacks. When the age runs out, we can copy this into a BST where each node represent a time stamp key and URL as value.

The long term storage can also go into a hash map/hash table. Where time stamp can be the key and URL the value. Agreed, some time stamps could be pretty close of each other, but the hash function should be able to identify the difference based on difference in seconds value.

- whitenoise April 26, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Why not TreeMap?
Timestampt as key and URL as value.
It will sorted according to timestampt.
we can traverse using iterator.

- scott April 27, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

-1

of 1 vote

stacks are used as they are LIFO.

- prateek September 02, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

-1

of 1 vote

Browse through webkit open source browser project. It will have the answer:
www . webkit . org /

- Anonymous September 02, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Mentioning at least a little bit about what strategy they use, instead of just inviting readers to look through a huge open-source project, would be a good idea.

- eugene.yarovoi September 03, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Can you just give the idea?

- musfiqur September 03, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

I agree. It would have been nice to mention something about their implementation.

- Varun September 03, 2012 | Flag

Comment hidden because of low score. Click to expand.

-1

of 1 vote

I think it should be some implementation of MRU (most recently used). Don't know the data structure though. Can it be Heap?

- Learner September 02, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

i guess it is m-ary search trees.

- kratika lakhani September 02, 2012 | Flag

Comment hidden because of low score. Click to expand.

-1

of 1 vote

I will go with an array of linked lists.

Lets say my browser stores history of last 7 days then there will be array of length 7 and each element of the array is a linked list which is nothing but all the urls it had accessed that day. So the array will be used like a queue and every day 1 linked list would be deleted from top to create a new linked list on the bottom for that particular day.

- anshulzunke September 02, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

-1

of 1 vote

What if we create a hashtable where key is simhash of domainName and value is a map of pageName as key and list of accesstime as value(here we have the flexibility to keep track of multiple access times). I feel this will improve searching and we do not keep redundant information as suggested above..

- Anonymous September 03, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Can someone comment as to why this approach is not good. Looks good to me but I am not confident.

- Chaos September 11, 2012 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

This doesn't really give us a way of just seeing all the pages that were visited in chronological order.

CareerCup

Amazon Adobe Interview Question for Software Engineer / Developers

Books

Videos

Resume Review

Mock Interviews