• 1
of 1 vote

Country: United States

Comment hidden because of low score. Click to expand.
0
of 0 vote

edited (mixed up terms)
I would try to approach the question in a structured way, where do these queries come from, what do we expect to ask tomorrow, so we don't have to change this structure all the time. Anyway, for the exact question:

page(s) visited by 100 users:
- would store for each number of unique users per day the pages
- as a b-tree with #of unique users as key and page id and number of total visits as value (I would loose which user it was with this aggregation), but i could have fast statistics for a page ranking based on user visits.
- to ask which was visited 100 times, I had a O(lg(m)+k) query. m is the different page frequencies per day, k the number of pages with exactly 100 users per day. Instead of b-tree I could use a HT with the frequency as index, so it would be O(k), how ever, I'd expect a query > 100, > 1000 or >100 < 200 etc...

pages visited by 1 user 15 times:
- same as above, I would get all pages visited by 1 unique user and ask which one had only 15 visits in total. if that's not good enough, because there are many pages with 1 unique user a day, I would use a different index on the b-tree of first query.
that index would consist of two integers, the #unique users and the #of total page visits (ordered primarily on unique users, secondary on total page visits) So I could ask both questions on the same index with logarithmic upper bounds (depends what I want exactly, e.g. one result or multiple results)

page visited by u3 more than 20 times (may be more than one...first, all > 20?, any if exists):
- I would assume the vector describing #visits / page for a single user and day is sparse.
- hash-table with user id as key and a vector as value
- the vector contains the tuple page-id, #visits
- to find I would have O(1) to get to the vector and a linear scan on the vector, If I sort the vector, I had O(lg(n)) to find the min(>20), O(1) to find any or none, or O(n) to find all > 20

Comment hidden because of low score. Click to expand.
0
of 0 vote

I think a hashmap for each of those query categories with the help of a master hashmap(one that keeps track of all enteries) would solve this specific question. What do you guys think?

Comment hidden because of low score. Click to expand.
0
of 0 vote

I would use a hash in which the keys are the pages and the values the list of users that visited them. Next the code in Python that illustrates it showing how to solve the queries using the proposed data structure

``````from collections import defaultdict

def build_query_structure(data):
d = defaultdict(list)
for x, y in data:
d[y].append(x)
return d

query_data = [('u1', 'p4'), ('u3', 'p2'), ('u7', 'p9')]
query = build_query_structure(query_data)

# Query 1: Pages visited exactly 100 times
print filter(lambda x: len(query[x]) == 100, query.iterkeys())

# Query 2: Pages visited by only one user exactly 15 times
print filter(lambda x: len(query[x]) == 15 and len(set(query[x])) == 1,\
query.iterkeys())

# Query 3: Page visited by u3 more than 20 times
print filter(lambda x: query[x].count('u3') >= 20, query.iterkeys())``````

Comment hidden because of low score. Click to expand.
0
of 0 vote

I'm probably missing something here, but there's my approach depending on efficiency priority:
Efficiency for queries: 3-dimensional matrix with userId, pageId, visitCounter; indexed/sorted by all of them
Efficiency for inserts: 2-dimensions userId, pageId, allowing duplication

Comment hidden because of low score. Click to expand.
0
of 0 vote

Comment hidden because of low score. Click to expand.
0
of 0 vote

Struct PageUserStat {
pages: array of pointers to Page objects, sorted by visitorCount
users: array of pointers to User objects, sorted by userId
}

Page {
pageId: int
visitorCount: int
visitCount: int
}

User {
userId: int
pageCounts: array of pointers to Page objects, sorted by visitCount
}

This provides:

Query 1/2: Teta/Omeg(logP) O(P), worst case happens when a large number of pages satisfies the condition.
Query 3: Teta/Omeg(logU+logP) O(P), worst case happens when a large number of users satisfies the condition.

U is the number of distinct users.
P is the number of distinct pages.

Comment hidden because of low score. Click to expand.
0
of 0 vote

I see the bulk of the work for this problem to be done just by building trees of page hits and user hits (user visits). We need a tree for all page hits, page hits for a specific user, and user hits (user visits) for a given page.

With super large data sets, additional data structures should be added to build all the users and pages and hits ... so I've added dictionaries (hash tables) to allow for the fast creation of the tree data.

``````using System;

// To execute C#, please define "static void Main" on a class
// named Solution.

class Solution
{

class PageHit
{
public Page Page;     // the page
public int HitCount;  // hit count
public PageHit Left;  // tree of hit counts <= this.HitCount
public PageHit Right; // tree of hit counts > this.HitCount
}

class UserHit
{
public User User;     // the user
public int HitCount;  // hit count
public PageHit Left;  // tree of hit counts <= this.HitCount
public PageHit Right; // tree of hit counts > this.HitCount
}

class Page
{

// tree of user hits (user visits) balanced/sorted by hit count
private UserHit m_userHitTree;

// fast lookup of user hits by user
private Dictionary<User, UserHit> m_userHitsByUser;

public void AddUserHit( User user )
{
// increment UserHit.HitCount
// sort (or resort) into m_userHitTree
}
public User FindUserByVisitCount( int visitCount )
{
// search m_userHitTree O(log(n))
}
}

class User
{

// user identifier
public string UserId;

// tree of page hits balanced/sorted by page hit
private PageHit m_pageHitTree;

// fast lookup of page hits by page
private Dictinary<Page, PageHit> m_pageHitByPage;

public void AddPageHit( Page page )
{
// increment PageHit.Hitcount
// sort (or resort) PageHit into m_pageHitTree
}
public Page FindPageByHitCount( int hitCount )
{
// search m_pageHitTree O(log(n))
}
}

class PageManager
{

// tree of page hits balanced/sorted by pagehit
private PageHit m_pageHitTree;

// fast lookup of page hit by page
private Dictionary<Page, PageHit> m_pageHitsByPage;

// fast lookup of page by url
private Dictionary<string, Page> m_pagesByUrl;

public Page GetPage( string url, bool createIfNotFound = false )
{
// get existing page from m_pagesByUrl
// return page
}
public void AddPageHit( Page page )
{
// increment PageHit.HitCount
// sort (or resort) PageHit in m_pageHitTree
}
public Page FindPageByHitCount( int hitCount, Page searchStart = null )
{
// search binary tree for hit count starting at searchStart
// or from m_pageHitTree root node
// O(log(n)) from root or O(1) if given searchStart
}

}

class UserManager
{

private Dictionary<string, User> m_usersById;

public User GetUser( string userId, bool createIfNotFound = false )
{
// get existing user in m_usersById
// return user
}
}

static void Main(string[] args)
{

// data structs
UserManager userManager = new UserManager();
PageManager pageManager = new PageManager();

foreach (string userId and string pageUrl in log)
{
User user = userManager.GetUser( userId, true );
Page page = pageManager.GetPage( pageUrl, true );
}

// which page was visited by 100 users in a day
Page pageWith100UserVisits = pageManager.FindPageByHitCount( 100 );

// which page was visited by only one user exactly 15 times in a day
Page pageWith1UserVisiting15Times = null;
Page prevPage = null;
while (true)
{
Page page = pageManager.FindPageByHitCount( 1, prevPage != null ? prevPage.Left : null );
if (page == null)
{
break;
}
if (page.FindUserByVisitCount( 15 ))
{
pageWith1UserVisiting15Times = page;
break;
}
prevPage = page;
}

// which page ws visited by u3 more than 20 times in a day
User user3 = userManager.GetOrAddUser( "u3" );
if (user3 != null)
{
Page pageU3VisitedMoreThan20Times = user3.FindPageByHitCount( 20 );
}

}

}``````

Comment hidden because of low score. Click to expand.
0
of 0 vote

I think, there should be a directional graph having users and pages as vertex. Edge goes from user to page. Each edge will have weightage which will show the number of visits by user to that page. at each visit, increase the weight by 1.
So the answer for the three questions:-
1. Which page was visited by exactly 100 users in a day? get the list of all vertex with 100 incoming edges.
2. Which page was visited by only one user exactly 15 times in a day? vertices sharing the edge with the weightage 15 and is vertex type "Page" not "User".
3. Which page was visited by u3 more than 20 times a day? All the outgoing edges from the vertex u3 having weigthage more then 20.

Name:

Writing Code? Surround your code with {{{ and }}} to preserve whitespace.

Books

is a comprehensive book on getting a job at a top tech company, while focuses on dev interviews and does this for PMs.

Videos

CareerCup's interview videos give you a real-life look at technical interviews. In these unscripted videos, watch how other candidates handle tough questions and how the interviewer thinks about their performance.