Design a cache module for an i

Google Interview Question for Software Engineer in Tests

1

of 1 vote

16
Answers
Design a cache module for an image server. The server accepts image requests from users and sends them back the images. The cache should always hold in-memory the 10 most recently requested images. The cache should also support multiple requests simultaneously
- assaf.keret1 June 28, 2015 in Switzerland | Report Duplicate | Flag | PURGE
Google Software Engineer in Test Coding

Email me when people comment.

An error occurred in subscribing you.

Country: Switzerland
Interview Type: In-Person

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 1 vote

Use a LinkedListHashMap to keep a list of keys ordered by insertion.
Every time you reach the limit (10 in this case), you remove the first inserted key (older one)

public static final class BoundedLinkedListHashMap<Key, Value> {
        private LinkedHashMap<Key, Value> map = new LinkedHashMap<>();
        private int N;

        public BoundedLinkedListHashMap(int N) {
            this.N = N;
        }

        public synchronized void put(Key k, Value v) {
            map.put(k, v);

            if (map.keySet().size() > N) {
                removeLast();
            }
        }

        public synchronized Value get(Key k) {
            return map.get(k);
        }

        private void removeLast() {
            map.remove(map.keySet().iterator().next());
        }
    }

- Felipe Cerqueira June 28, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

The method removeLast should me renamed to removeFirst or removeOlder to be better.

- Felipe Cerqueira June 28, 2015 | Flag

Comment hidden because of low score. Click to expand.

of 1 vote

But doesnt this not support concurrent reads? Dont you want multiple reads to happen simultaneously?

- CJ August 02, 2015 | Flag

Comment hidden because of low score. Click to expand.

of 1 vote

I did this in Go by using a combination of a map and container/list, which implements a linked list to create an LRU cache. It's made threadsafe by embedding a sync.Mutex. Assuming "the 10 most recently requested images" means that we should bump up existing entries when they're requested again, every request needs to Lock() the mutex to avoid race issues. If, on the other hand, it's just the age of the images that matters, this can be converted to use a sync.RWMutex and only cache misses need to Lock().

For this example, I'm mocking up the image data with just a call to time.Now().UnixNano()

package main

import (
	"container/list"
	"fmt"
	"sync"
	"time"
)

const (
	cacheSize = 10
)

type Image struct {
	name string
	data string
}

type ImageCache struct {
	sync.Mutex
	imgIdx  map[string]*list.Element
	imgList *list.List
}

func NewImageCache() *ImageCache {
	return &ImageCache{
		imgIdx:  make(map[string]*list.Element),
		imgList: list.New(),
	}
}

func (ic *ImageCache) Request(imageName string) string {
	ic.Lock()
	defer ic.Unlock()
	i, ok := ic.imgIdx[imageName]
	//If we requested an image we don't have
	if !ok {
		//Have we hit the maximum size?
		if ic.imgList.Len() >= cacheSize {
			//remove the oldest entry from the list and map
			delete(ic.imgIdx, ic.imgList.Remove(ic.imgList.Back()).(*Image).name)
		}
		newImage := &Image{
			name: imageName,
			data: fmt.Sprintf("%s:%d", imageName, time.Now().UnixNano()),
		}
		//push the new image to the front and update the map
		ic.imgIdx[imageName] = ic.imgList.PushFront(newImage)
		return newImage.data
	} else {
		//push the accessed image to the front
		ic.imgList.MoveToFront(i)
		return i.Value.(*Image).data
	}
}

func main() {
	//runtime.GOMAXPROCS(runtime.NumCPU())
	ic := NewImageCache()
	//wg := new(sync.WaitGroup)
	for i := 0; i <= cacheSize; i++ {
		//	wg.Add(1)
		//	go func(i int) {
		fmt.Println(ic.Request(fmt.Sprintf("image%d", i)))
		//		wg.Done()
		//	}(i)
	}
	//wg.Wait()
	fmt.Println("---")
	for i := cacheSize; i >= 0; i-- {
		fmt.Println(ic.Request(fmt.Sprintf("image%d", i)))
	}
}

- alaska July 01, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

public class ImageCacheRetriever {
    private static final int CACHE_MAX_SIZE = 10;

    ImageCache cache = new ImageCache(CACHE_MAX_SIZE);

    public Image retrieve(final long id) {
        Image cacheImage = cache.getImage(id);
        if (cacheImage == null) {
            final Image diskImage = getFromDisk(id);
            new Thread( new Runnable() {
                @Override
                public void run() {
                    cache.put(id, diskImage);
                }

            }).run();
            return diskImage;
        }
        return cacheImage;
    }

    private Image getFromDisk(long id) {
        return new Image();
    }
}

class ImageCache {

    private ImageCacheEntry[] cache;
    int cacheSize = 0;
    int cacheHead = 0;
    private int cacheMaxSize;
    Object monitor = new Object();

    public ImageCache(int cacheMaxSize) {
        this.cacheMaxSize = cacheMaxSize;
        cache = new ImageCacheEntry[cacheMaxSize];
    }

    public Image getImage(long id) {
        synchronized (monitor) {
            for (int i = 0; i < cacheSize; i++) {
                if (cache[i].id == id) {
                    return cache[i].data;
                }
            }
            return null;
        }
    }

    public void put(long id, Image data) {
        synchronized (monitor) {
            ImageCacheEntry newEntry = new ImageCacheEntry(id, data);
            if (cacheSize < cacheMaxSize) {
                cache[cacheSize] = newEntry;
            } else {
                cache[cacheHead] = newEntry;
                cacheHead = (cacheHead + 1) % cacheMaxSize;
            }
        }
    }

    public static class ImageCacheEntry {

        protected long id;
        protected Image data;

        public ImageCacheEntry() {
            id = -1;
        }

        public ImageCacheEntry(long id, Image data) {
            this.id = id;
            this.data = data;
        }
    }
}

- ctash June 28, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

how do you store users and their 10 recent requests in an efficient manner, also isn't requests for each individual user in which case you don't need monitors

- monica_shankar June 29, 2015 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Just explain the logic before jumping to coding part

- worried June 29, 2015 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

explain to logic first before jumping to coding

- coderrrr June 29, 2015 | Flag

Comment hidden because of low score. Click to expand.

of 1 vote

The implemented cache does not hold the most-recent requests. A proper most-recently used cache should implement a priority queue where the removed item is the least recently used.

- ittai gilat June 30, 2015 | Flag

Comment hidden because of low score. Click to expand.

of 2 votes

This gives you O(N) for searching for a hit. Not very efficient. You can use a HashMap to actually store cached objects and a Queue (which you already have) to keep track of objects arrival times

- Mohammad July 04, 2015 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

public class ImageCacheRetriever {
    private static final int CACHE_MAX_SIZE = 10;

    ImageCache cache = new ImageCache(CACHE_MAX_SIZE);

    public Image retrieve(final long id) {
        Image cacheImage = cache.getImage(id);
        if (cacheImage == null) {
            final Image diskImage = getFromDisk(id);
            new Thread( new Runnable() {
                @Override
                public void run() {
                    cache.put(id, diskImage);
                }

            }).run();
            return diskImage;
        }
        return cacheImage;
    }

    private Image getFromDisk(long id) {
        return new Image();
    }
}

class ImageCache {

    private ImageCacheEntry[] cache;
    int cacheSize = 0;
    int cacheHead = 0;
    private int cacheMaxSize;
    Object monitor = new Object();

    public ImageCache(int cacheMaxSize) {
        this.cacheMaxSize = cacheMaxSize;
        cache = new ImageCacheEntry[cacheMaxSize];
    }

    public Image getImage(long id) {
        synchronized (monitor) {
            for (int i = 0; i < cacheSize; i++) {
                if (cache[i].id == id) {
                    return cache[i].data;
                }
            }
            return null;
        }
    }

    public void put(long id, Image data) {
        synchronized (monitor) {
            ImageCacheEntry newEntry = new ImageCacheEntry(id, data);
            if (cacheSize < cacheMaxSize) {
                cache[cacheSize] = newEntry;
            } else {
                cache[cacheHead] = newEntry;
                cacheHead = (cacheHead + 1) % cacheMaxSize;
            }
        }
    }

    public static class ImageCacheEntry {

        protected long id;
        protected Image data;

        public ImageCacheEntry() {
            id = -1;
        }

        public ImageCacheEntry(long id, Image data) {
            this.id = id;
            this.data = data;
        }
    }

}

- ctash June 28, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

The images are SHA'ed and their long ids are obtained, each user has a unique user id which is stored in dictionary order. Each user has a hash table of 10 recent image requests <Integer, Long ids>. If excess, the least recent one is deleted. This is just image retrieval hence I don't think synchronization is necessary. Please correct me if I am wrong.

- monica_shankar June 29, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Assuming each image is queried by it's path, I would use two data structures -

1. HashMap<String(path), Blob(data)>
2. DoublyLinkedList<String(path)>

Explanation:

When a request for an image path is received, check in Hashmap -

if the path already exists
(a) remove the path from Doubly linked list and put it in the beginning
(b) return blob data of the image
if the path doesn't exist in HashMap
(a) fetch the image data
(b) put <path, blob_data_of_the_image> in HashMap
(c) remove the last element from Doubly linked list
(d) insert path in the beginning of Doubly linked list
(e) return blob data

- Virupaksha Swamy October 19, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Also remove from HashMap the key that was found by removing after step(c)

- Virupaksha Swamy October 19, 2015 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

Some brain storm. The challenge is to support multi-threading. Besides lock, I am thinking about a lock free design.

1. use below data structure:

a. a hashmap (let's call it image map) whose key is the image name and the value is the double linked list node. Assumption: each image has different name so names can be used as hash key.
b. a double linked list (let's call it position list) stores the image name and the image index in the online image array.
c. two image arrays. Any time one array is online serving the requests and the other is offline processing the write operations (e.g. image adding/deletion).
d. an image write operation buffer. Just like a queue. All image write operation is put in this buffer.
e. an image write thread responsible for handling the write request in the buffer and update the offline image array. After updating, this thread put the original online array offline and the original offline array online. This design ensures the high availability - the cache always responds the image read request.

The image map and position list is immutable. Each request thread will clone and maintain its own copy of image map and position list. These two pieces of data are small so it is ok to clone repeatedly.

The image array could be big so it is shared. But there is no lock for it because: (1) we use two rotating image array for writing. One thread is handling these requests in a serialized way. (2) there could be chances that a thread tries to read an image which has been replaced/deleted in the online image array. This is a cache miss. It is ok to return "not found" by comparing the image name from the position list node with that from the image array. It should never return a wrong image. In this case, the request thread should be able to look up the online image array to check if the wanted image is in the cache but just in a different slot. If not, send an image adding request to the write buffer and return.

For all image write operations, the request threads just fire and forget.

This is a start and a lot of more details to add. For example, how the request thread update the hash table and position list if the image is deleted or changing the image array slot. How to reduce cache miss.

- jiangok2006 November 11, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

#written code in python using deque.
 from collections import deque
 class Cache10Images(object):
	def __init__(self):
		self.image = []
		self.imagelist = deque(self.image)
		
	def append(self, new_image):
		self.imagelist.appendleft(new_image)
		if len(self.imagelist) >= 10: # as 0 is the index
			self.imagelist.pop()
	def get(self, image_at_position):
		return self.imagelist[image_at_position]

- rampappula April 16, 2017 | Flag Reply

CareerCup

Google Interview Question for Software Engineer in Tests

Books

Videos

Resume Review

Mock Interviews