You are given logs that contai

Facebook Interview Question

1

of 1 vote

9
Answers
You are given logs that contain user and page visits for a given day.
u1 -> p4
u3 -> p2
u7 -> p9
...
comeup with efficient data structure that answers these queries

Which page was visited by exactly 100 users in day?
Which page was visited by only one user exactly 15 times in a day?
Which page was visited by u3 more than 20 times in a day?
- smitha August 13, 2017 in United States | Report Duplicate | Flag | PURGE
Facebook

Email me when people comment.

An error occurred in subscribing you.

Country: United States

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 1 vote

edited (mixed up terms)
I would try to approach the question in a structured way, where do these queries come from, what do we expect to ask tomorrow, so we don't have to change this structure all the time. Anyway, for the exact question:

page(s) visited by 100 users:
- would store for each number of unique users per day the pages
- as a b-tree with #of unique users as key and page id and number of total visits as value (I would loose which user it was with this aggregation), but i could have fast statistics for a page ranking based on user visits.
- to ask which was visited 100 times, I had a O(lg(m)+k) query. m is the different page frequencies per day, k the number of pages with exactly 100 users per day. Instead of b-tree I could use a HT with the frequency as index, so it would be O(k), how ever, I'd expect a query > 100, > 1000 or >100 < 200 etc...

pages visited by 1 user 15 times:
- same as above, I would get all pages visited by 1 unique user and ask which one had only 15 visits in total. if that's not good enough, because there are many pages with 1 unique user a day, I would use a different index on the b-tree of first query.
that index would consist of two integers, the #unique users and the #of total page visits (ordered primarily on unique users, secondary on total page visits) So I could ask both questions on the same index with logarithmic upper bounds (depends what I want exactly, e.g. one result or multiple results)

page visited by u3 more than 20 times (may be more than one...first, all > 20?, any if exists):
- I would assume the vector describing #visits / page for a single user and day is sparse.
- hash-table with user id as key and a vector as value
- the vector contains the tuple page-id, #visits
- to find I would have O(1) to get to the vector and a linear scan on the vector, If I sort the vector, I had O(lg(n)) to find the min(>20), O(1) to find any or none, or O(n) to find all > 20

- Chris August 13, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

I think a hashmap for each of those query categories with the help of a master hashmap(one that keeps track of all enteries) would solve this specific question. What do you guys think?

- Itsme August 13, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

I would use a hash in which the keys are the pages and the values the list of users that visited them. Next the code in Python that illustrates it showing how to solve the queries using the proposed data structure

from collections import defaultdict

def build_query_structure(data):
        d = defaultdict(list)
        for x, y in data:
                d[y].append(x)
        return d

query_data = [('u1', 'p4'), ('u3', 'p2'), ('u7', 'p9')]
query = build_query_structure(query_data)

# Query 1: Pages visited exactly 100 times
print filter(lambda x: len(query[x]) == 100, query.iterkeys())

# Query 2: Pages visited by only one user exactly 15 times
print filter(lambda x: len(query[x]) == 15 and len(set(query[x])) == 1,\
             query.iterkeys())

# Query 3: Page visited by u3 more than 20 times
print filter(lambda x: query[x].count('u3') >= 20, query.iterkeys())

- Fernando August 14, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

I'm probably missing something here, but there's my approach depending on efficiency priority:
Efficiency for queries: 3-dimensional matrix with userId, pageId, visitCounter; indexed/sorted by all of them
Efficiency for inserts: 2-dimensions userId, pageId, allowing duplication

- Andres MJ August 14, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Fernnado, its not efficient.

- tomlobato August 19, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Struct PageUserStat {
pages: array of pointers to Page objects, sorted by visitorCount
users: array of pointers to User objects, sorted by userId
}

Page {
pageId: int
visitorCount: int
visitCount: int
}

User {
userId: int
pageCounts: array of pointers to Page objects, sorted by visitCount
}

This provides:

Query 1/2: Teta/Omeg(logP) O(P), worst case happens when a large number of pages satisfies the condition.
Query 3: Teta/Omeg(logU+logP) O(P), worst case happens when a large number of users satisfies the condition.

U is the number of distinct users.
P is the number of distinct pages.

- tomlobato August 19, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

I see the bulk of the work for this problem to be done just by building trees of page hits and user hits (user visits). We need a tree for all page hits, page hits for a specific user, and user hits (user visits) for a given page.

With super large data sets, additional data structures should be added to build all the users and pages and hits ... so I've added dictionaries (hash tables) to allow for the fast creation of the tree data.

using System;

// To execute C#, please define "static void Main" on a class
// named Solution.

class Solution
{
    
    class PageHit
    {
        public Page Page;     // the page
        public int HitCount;  // hit count
        public PageHit Left;  // tree of hit counts <= this.HitCount
        public PageHit Right; // tree of hit counts > this.HitCount
    }

    class UserHit
    {
        public User User;     // the user
        public int HitCount;  // hit count
        public PageHit Left;  // tree of hit counts <= this.HitCount
        public PageHit Right; // tree of hit counts > this.HitCount
    }
    
    class Page
    {
        
        // tree of user hits (user visits) balanced/sorted by hit count
        private UserHit m_userHitTree;
        
        // fast lookup of user hits by user
        private Dictionary<User, UserHit> m_userHitsByUser;
        
        public void AddUserHit( User user )
        {
            // get UserHit from m_userHitsByUser, create if not found
            // increment UserHit.HitCount
            // sort (or resort) into m_userHitTree
        }
        public User FindUserByVisitCount( int visitCount )
        {
            // search m_userHitTree O(log(n))
        }
    }
    
    class User
    {
        
        // user identifier
        public string UserId;
        
        // tree of page hits balanced/sorted by page hit
        private PageHit m_pageHitTree;
        
        // fast lookup of page hits by page
        private Dictinary<Page, PageHit> m_pageHitByPage;
        
        public void AddPageHit( Page page )
        {
            // get PageHit from m_pageHitsByPage, create if not found
            // increment PageHit.Hitcount
            // sort (or resort) PageHit into m_pageHitTree
        }
        public Page FindPageByHitCount( int hitCount )
        {
            // search m_pageHitTree O(log(n))
        }
    }
        
    class PageManager
    {
        
        // tree of page hits balanced/sorted by pagehit
        private PageHit m_pageHitTree;
        
        // fast lookup of page hit by page
        private Dictionary<Page, PageHit> m_pageHitsByPage;
        
        // fast lookup of page by url
        private Dictionary<string, Page> m_pagesByUrl;
        
        public Page GetPage( string url, bool createIfNotFound = false )
        {
            // get existing page from m_pagesByUrl
            // possibly create if not found
            // return page
        }
        public void AddPageHit( Page page )
        {
            // get PageHit from m_pageHitsByPage, create if not found
            // increment PageHit.HitCount
            // sort (or resort) PageHit in m_pageHitTree
        }
        public Page FindPageByHitCount( int hitCount, Page searchStart = null )
        {
            // search binary tree for hit count starting at searchStart
            // or from m_pageHitTree root node
            // O(log(n)) from root or O(1) if given searchStart
        }
        
    }
    
    class UserManager
    {
        
        private Dictionary<string, User> m_usersById;
        
        public User GetUser( string userId, bool createIfNotFound = false )
        {
            // get existing user in m_usersById
            // possibly create if not found
            // return user
        }
    }
    
    static void Main(string[] args)
    {
        
        // data structs
        UserManager userManager = new UserManager();
        PageManager pageManager = new PageManager();
        
        // load data from logs
        foreach (string userId and string pageUrl in log)
        {
            User user = userManager.GetUser( userId, true );
            Page page = pageManager.GetPage( pageUrl, true );
            pageManager.AddPageHit( page );
            page.AddUserHit( user );
            user.AddPageHit( page );
        }
        
        // which page was visited by 100 users in a day
        Page pageWith100UserVisits = pageManager.FindPageByHitCount( 100 );
        
        // which page was visited by only one user exactly 15 times in a day
        Page pageWith1UserVisiting15Times = null;
        Page prevPage = null;
        while (true)
        {
            Page page = pageManager.FindPageByHitCount( 1, prevPage != null ? prevPage.Left : null );
            if (page == null)
            {
                break;
            }
            if (page.FindUserByVisitCount( 15 ))
            {
                pageWith1UserVisiting15Times = page;
                break;
            }
            prevPage = page;
        }
        
        // which page ws visited by u3 more than 20 times in a day
        User user3 = userManager.GetOrAddUser( "u3" );
        if (user3 != null)
        {
            Page pageU3VisitedMoreThan20Times = user3.FindPageByHitCount( 20 );
        }
        
    }
    
}

- colin@colinday.net August 31, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

I think, there should be a directional graph having users and pages as vertex. Edge goes from user to page. Each edge will have weightage which will show the number of visits by user to that page. at each visit, increase the weight by 1.
So the answer for the three questions:-
1. Which page was visited by exactly 100 users in a day? get the list of all vertex with 100 incoming edges.
2. Which page was visited by only one user exactly 15 times in a day? vertices sharing the edge with the weightage 15 and is vertex type "Page" not "User".
3. Which page was visited by u3 more than 20 times a day? All the outgoing edges from the vertex u3 having weigthage more then 20.

- Priyank September 28, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

/* 
Too much convolution in all the answers
Actual, things can be done much faster
*/
def UserActivity : {
   user_id : "user-id" ,
   actions_at : [/* list of timestamps */ ] 
}

def PageActivityForDay : { 
  page_id : "page-id" , 
  day : "YYY-MM-dd" , 
  user_activity : { /* map of User Activities  keyed by user-id */ }
}
// which pages has exactly 100 users in a day?
yo = select(  list_of_page_activities_per_day ) where { size( $.user_activity ) == 100  }
// Which page was visited by only one user exactly 15 times in a day? 
meh = select(  list_of_page_activities_per_day ) where {  
    exists ( $.user_activity ) where { size( $.actions_at ) == 15  }
} 
// Which page was visited by u3 more than 20 times in a day? 
foo = select(  list_of_page_activities_per_day ) where {  
    "u3" @ $.user_activity && size( $.user_activity["u3"].actions_at ) > 20  
}

- NoOne April 17, 2019 | Flag Reply

CareerCup

Facebook Interview Question

Books

Videos

Resume Review

Mock Interviews