Google Interview Question
Country: India
Interview Type: In-Person
To spice up the question a little bit, you can redesign for the case that the list of videos is too large to fit on one server.
Dedupe: route video to be processed on a server based on its hash value, then on that server combine objects with same video name and update watch rate (updated rate would be the average of the old rate and the new rate), O(n)
Per server: heapify your list, O(n), then extract top 10, O(1)
Central server: get top 10 from each server, and add elements into a min heap of size 10 only when an element is larger than the min of the heap (and remove the min to maintain size 10).
This is how I would do it
1 list.add() adds only if duplicate not found, else increment existing
2 comparable interface on Entry class implementing compareTo()
3 sort list using the sortt command pass the compareTo of Entry to sort
4 after sorting printTopTen()
public class VideoTopTen {
public static void main(String[] args) {
// populate list using List list.add() eg.
List.add(new Entry("Mine", 1234));
List.add(new Entry("Yours", 5555));
List.add(new Entry("Mine", 1234));
List.list.sort(Entry::compareTo);
List.printTop10();
}
static class List {
static ArrayList<Entry> list;
static {
list = new ArrayList<>();
}
static void add(Entry tmp) {
boolean flag = true;
for (Entry e : list)
if (e.name.equals(tmp.name)) {
e.count += tmp.count;
flag = false;
break;
}
if (flag)
list.add(tmp);
}
static void printTop10() {
int i = 0;
for (Entry e : list) {
System.out.println(e.name + e.count);
if (i == 9) break;
i++;
}
}
}
static class Entry implements Comparable<Entry> {
String name;
int count;
Entry(String name, int count) {
this.name = name;
this.count = count;
}
@Override
public int compareTo(Entry tmp) {
return tmp.count - count;
}
}
}
Tried with python. Somehow python dict kept removing dup hence used csv for the input. csv column1 is movie name and column2 is rate. assumed i have to do this in less than 20 min (for real interview). the basic idea is to keep those given top 10 or 5 in stack and interate thru from the begin to the end one time.. since problem didn't indicate "sort the result", didn't add sort.. i do have interview experience with Google and got very similar question - find top 10 frequent words from given huge file.. but they asked sorting. last time i failed. they didn't want to see i use simple sorting because file is huge.. hence better i had to keep top 10 list from the beginning and maintaining it while iterating from beginning of file to the end.. sorry i didn't test many test cases. last time, they didn't ask error case but just asked big O. min() with lambda seems to use sorting but should be quick because list is only top 10 or 5...
import csv
path = "c:/work/python/movies.csv"
class movie:
def __init__(self, path):
self.path = path
self.stack = {}
self.capacity = 0
self.minval = ""
def setmovie(self, capacity):
with open(self.path) as csv_file:
csv_reader = csv.reader(csv_file, delimiter=',')
header = next(csv_reader) # read out first line
for row in csv_reader:
self.helper(row[0], int(row[1]), capacity)
def helper(self, key, value, capacity):
if key not in self.stack:
self.stack[key] = value
else:
self.stack[key] += value
if len(self.stack) > capacity:
self.minval = min(self.stack.keys(), key=lambda k: self.stack[k])
self.stack.pop(self.minval)
def printmovie(self):
print(self.stack)
m = movie(path)
m.setmovie(5)
m.printmovie()
Hi there,
my solution for this problem is to use Queue. Depending on output: if we can allow videos with same or unique name we can use cache like Dictionary to check if item is already there. As an alternative you can enumerate queue itself for a bit slower but also nice solution.
static int Main(string[] args)
{
List<VideoInfo> videoList = new List<VideoInfo>();
for(int i =0 ; i< 30;i++){
videoList.Add(new VideoInfo(){Index=i, Name="Star Wars "+i%6, Rating=i%6});
}
var videos = GetTopRatedVideosUnique(videoList, 3);
for(int i = videos.Length-1;i >= 0;i--){
Console.WriteLine("Index: "+videos[i].Index+" Name: "+videos[i].Name+" Rating: "+videos[i].Rating);
}
return 0;
}
private static VideoInfo[] GetTopRatedVideosUnique(List<VideoInfo> list, int maxCount){
Dictionary<string,VideoInfo> _uniqueItems = new Dictionary<string,VideoInfo>();
Queue<VideoInfo> _queue = new Queue<VideoInfo>();
float maxRating = 0;
foreach(var vid in list){
//we assume that same title has same rating
if(vid.Rating > maxRating || (vid.Rating == maxRating && !_uniqueItems.ContainsKey(vid.Name))){
_uniqueItems.Add(vid.Name,vid);
_queue.Enqueue(vid);
maxRating = vid.Rating;
if(_queue.Count > maxCount){
_uniqueItems.Remove(_queue.Dequeue().Name);
}
}
}
return _queue.ToArray();
}
private static VideoInfo[] GetTopRatedVideos(List<VideoInfo> list, int maxCount){
Queue<VideoInfo> _queue = new Queue<VideoInfo>();
float maxRating = 0;
foreach(var vid in list){
if(vid.Rating >= maxRating){
_queue.Enqueue(vid);
maxRating = vid.Rating;
if(_queue.Count > maxCount){
_queue.Dequeue();
}
}
}
return _queue.ToArray();
}
I assume this should do. Group it based on video name as there may be duplicate video name, then add the rate.
Sort it by values in reverse and limit X number of records.
import java.util.Arrays;
import java.util.Comparator;
import java.util.List;
import java.util.Map;
import java.util.stream.Collectors;
public class Top10RatesFromListOfObjects {
// Given a list L of video names and their watch rates,
// write a function that will return the videos with the top 10 watch rates.
// Video names may appear more than once.
private static List<Video> L;
public static void main(String[] args) {
L = Arrays.asList(new Video("A", 2), new Video("C", 5), new Video("B", 5), new Video("A", 4));
// Add more
Map<String, Integer> grouped = L.stream()
.collect(Collectors.groupingBy(Video::getName, Collectors.summingInt(Video::getRate)));
grouped.entrySet().stream()
.sorted(Map.Entry.comparingByValue(Comparator.reverseOrder()))
.limit(10)
.forEach(e -> System.out.println(e.getKey()));
}
private static class Video {
private String name;
private int rate;
Video(String name, int rate) {
this.name = name;
this.rate = rate;
}
public String toString() {
return "{" + name + ", " + rate + "}";
}
public String getName() {
return this.name;
}
public int getRate() {
return this.rate;
}
}
}
class VideoTop10 {
public static void main(String[] args) {
List<Video> L = new ArrayList<>();
L.add(new Video("Friends", 1234));
L.add(new Video("Snoopy", 23232));
L.add(new Video("Sienfield", 11234));
L.add(new Video("Games", 33));
L.add(new Video("Thoran", 3433));
L.stream()
.distinct()
.limit(10)
.sorted(Video::compareTo)
.forEach((video) -> System.out.println(video.name + "," + video.rate));
}
static class Video implements Comparable<Video> {
private String name;
private Integer rate;
public Video(String name, Integer rate) {
this.name = name;
this.rate = rate;
}
@Override
public int compareTo(Video o) {
return o.rate - this.rate;
}
}
}
use a MaxHeap based on Rate, and run:
1. pass through collection adding all elements to heap - O ( n )
2. Extract 10 max elements from heap.
Total time: O ( n )
In Java 8 Heap can be implemented as a PriorityQueue.
Example using Scala:
case class Video (name: String, rate: Int)
val sample = Seq(Video("Ghost", 56), Video("Titanic", 74), Video("Alien", 88), Video("Krappe", 18),
Video("Krappe", 18), Video("VideoA1", 43), Video("VideoA2", 63), Video("VideoA3", 58),
Video("Krappe2", 12), Video("VideoA4", 62), Video("VideoA5", 31), Video("VideoA6", 39))
def videoRatingOrderer(t: Video) = t.rate
def topTen() = {
val pq = collection.mutable.PriorityQueue()(Ordering.by(videoRatingOrderer))
pq.enqueue(sample: _*)
println(pq.take(10))
}
println(topTen())
like this
#include <algorithm>
#include <iostream>
#include <string>
#include <utility>
#include <unordered_map>
#include <vector>
using namespace std;
vector<string> topNVideos(const vector<pair<string, unsigned int>>& videos,
size_t topN) {
// TODO: topN is a constant 10 in the given problem, this method only behaves
// nice if topN is small.
unordered_map<string, unsigned int> video_count;
vector<pair<string, unsigned int>> result_count;
for (const auto& v : videos) {
video_count[v.first] += v.second; // TODO: ensure no overflow
// notice here as well, the string as key, which makes the thing not O(n)
// but O(n*avg_string_length(video_title)), probabily in practice we would
// work with an integer e.g. a reference to the actual title ...
// As well notice the copying around of string's which is not nice
// maybe we would use somethign like std::string_view in practice with C++ 17.
}
for (const auto& vc : video_count) {
// insertion sort for small and constant topN ok, makes that thing
// O(N*10) which is O(N). There are better ways to cut off topN.
// (e.g. quicksorts partition, especially if topN doesn't need to be sorted)
result_count.push_back(vc);
for (size_t i = result_count.size() - 1; i > 0; --i) {
if (result_count[i].second <= result_count[i - 1].second) break;
swap(result_count[i], result_count[i - 1]);
}
if(result_count.size() > topN) {
result_count.pop_back();
}
}
vector<string> result(result_count.size());
transform(result_count.begin(), result_count.end(), result.begin(),
[](const pair<string, unsigned int>& vcount) -> string { return vcount.first; });
return result;
}
int main()
{
vector<pair<string, unsigned int>> movies = {
{"Rambo", 1},
{"Indiana Jones", 2},
{"Lethal Weapon", 2},
{"Indiana Jones", 5},
{"Rambo", 1},
{"Lethal Weapon", 4},
{"Casablanca", 7},
{"Star Wars", 9},
{"The Godfather", 15},
{"Wallstreet", 3}};
for(const string& video : topNVideos(movies, 4)) { // top 4 instead of top 10
cout << video << endl;
}
}
Use a hash table that hashes on the name and stores a count.
- DR milota September 05, 2018Once you have the counts you can do better than sorting using partition rather than sorting everything. This is an algorithm worth knowing for interviews. It is basically a partition sort but only partition branch that get you towards having the desired 10 on one side and don’t sort the items you do not care about.
You could also try using a max heap of size 10. Fill the heap and heapify it. Then run through the rest of the items. If a new item is bigger than the one on the top it replaces it otherwise go to the next one. With only 10 in the top the heap will have good cash locality especially if you only keep links to the names in the heap and then you will only get misses for the one time pass through the count list.
In both cases the counting will be what takes all the time or rather cash misses in the count hash table will be what eats it.