Suppose you want improve the p

Amazon Interview Question for SDE1s

0

of 0 votes

4
Answers
Suppose you want improve the performance of search queries, using a cache. Each search queries is in form of a string and returns a list of ad id's in the form of longs.
Design a class with the appropriate data structure(s) that can manage a cache of search queries.
- LCHammer September 25, 2014 in United States for Advertising | Report Duplicate | Flag | PURGE
Amazon SDE1 Coding

Email me when people comment.

An error occurred in subscribing you.

Team: Advertising
Country: United States
Interview Type: In-Person

More Questions from This Interview

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 1 vote

Implementation of LRU (least recently used) cache

package cache;

import java.util.ArrayList;
import java.util.Collection;
import java.util.Collections;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;

public final class LRUCache {
	private final Map<String, List<Long>> map;
	private final int capacity;
	private final float loadFactor = 0.75f;

	public LRUCache(int capacity) {
		this.capacity = capacity;
		map = Collections
				.synchronizedMap(new LinkedHashMap<String, List<Long>>(
						this.capacity, loadFactor, true) {
					private static final long serialVersionUID = 1L;

					@Override
					protected boolean removeEldestEntry(
							Map.Entry<String, List<Long>> eldest) {
						return size() > capacity;
					}
				});
	}

	public List<Long> get(String key) {
		return map.get(key);
	}

	public void put(String key, List<Long> value) {
		map.put(key, value);
	}

	public void remove(String key) {
		map.remove(key);
	}

	public Collection<Map.Entry<String, List<Long>>> getValues() {
		return map.entrySet();
	}
}

Just kept the size of cache small for the simplicity. But over time, there would be more readers then writers to the cache. If we have multiple readers to read the cache at the same time (as long as allowed by locking process), then the performance of cache would increase. Therefore, we can either choose to use ConcurrentHashMap (lock-striping) or ReentrantReadWriteLock as coded below.

package cache;

import java.util.ArrayList;
import java.util.Collection;
import java.util.Collections;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;
import java.util.concurrent.locks.Lock;
import java.util.concurrent.locks.ReadWriteLock;
import java.util.concurrent.locks.ReentrantReadWriteLock;

public final class EfficientLRUCache {
	private final Map<String, List<Long>> map;
	private final int capacity;
	private final float loadFactor = 0.75f;
	private final ReadWriteLock lock = new ReentrantReadWriteLock();
	private final Lock readLock = lock.readLock();
	private final Lock writeLock = lock.writeLock();

	public EfficientLRUCache(int capacity) {
		this.capacity = capacity;
		map = new LinkedHashMap<String, List<Long>>(
						this.capacity, loadFactor, true) {
					private static final long serialVersionUID = 1L;

					@Override
					protected boolean removeEldestEntry(
							Map.Entry<String, List<Long>> eldest) {
						return size() > capacity;
					}
				};
	}

	public List<Long> get(String key) {
		readLock.lock();
		try {
			return map.get(key);
		} finally {
			readLock.unlock();
		}
	}

	public void put(String key, List<Long> value) {
		writeLock.lock();
		try {
			map.put(key, value);
		} finally {
			writeLock.unlock();
		}
	}

	public void remove(String key) {
		writeLock.lock();
		try {
			map.remove(key);
		} finally {
			writeLock.unlock();
		}
	}

	public Collection<Map.Entry<String, List<Long>>> getValues() {
		readLock.lock();
		try {
			return map.entrySet();
		} finally {
			readLock.unlock();
		}
	}
}

However, if we choose to use ConcurrentHashMap, we lose LRU capacility that comes handly with LinkedHashMap.

- Hope September 26, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

do you think the list held in the cache could be a partial result for the query? For example, on a query, 10 ad IDs are pulled from the cache, but 5 more are pulled from the disk/DB? Or is it all or nothing? Either the entire list is in the cache or none at all?

- interesting problem September 29, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Seem like one issue is sharing the same set of ad IDs across keys (queries). If the intersection of the lists across all the queries is a large set, then reducing duplication would be important. So if you maintain a list for each query...if it is a list<long> then you could have a large amount of duplicate ad IDs on each list. If it is a list<long*> then you don't save space since each pointer is 64 bits pointing to a 64 bit long. If it was a list<short> then each short would be a key into a map that contained the associated 64 bit ad ID.

- sharing across queries September 29, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Yes, definitely. It could be partial. Depends on the contraints/requirements of your application, there might be cases that a search query may return huge number of ad IDs. In order not to fill up in-memory cache quickly, we use pagination. By using pagination, two other parameters are by default introduced to system. (limit and offset [naming can be different]) [limit: # of elements that returns from query at a time (ad ID's in this case), offset: from where you want to continue to search]

Having said that, lets talk about it on an example now.
<HOST>/search?q="Comedy"&offset=0&limit=25 => this will return the first 25 ad IDs from this search query
<HOST>/search?q="Comedy"&offset=25&limit=25 => this will return next 25 ad IDs for the same search query.
(Think about offset and limit are automatically populated by UI)

These two queries will stay in the cache as separate entries. Whenver the same search query is requested, the same result will be delivered.

Question comes when the result of search query will be changed in the cache in case it has been accessed all the time; in other words LRU hasn't discarded it for a long time.

To cover this scenario, I'd add TTL (Time to Live) feature as well to the above solution that I provided earlier. A single TTL can be used globally or TTL per search query entry. Let's think about using global TTL. Let's say we want our customers to see new ads that are added to system right away. We can keep created time per search query entry in the cache and after certain TTL we can easily invalidate the search result and remove it from the cache. [removeEldestEntry is good place to do this] When the same search query is requested again, the new result of the query will be populated into cache and it will be used for the period of next TTL.

- Hope September 29, 2014 | Flag Reply

CareerCup

Amazon Interview Question for SDE1s

Books

Videos

Resume Review

Mock Interviews