Facebook Interview Question for SDE1s


Country: United States




Comment hidden because of low score. Click to expand.
0
of 0 vote

/*
 *
 * Analyss
 * the snaphot is read-only. it doesn *NOT* change once created. we change only the latest map.
 * a lookup takes a snapshot as argument & searches the given snapshot.
 * note that a snap maintains *everything* that is already in the map. for space-efficiency, this MUST be shared with the latest (writeable) mapping.
 *  	so an entry might be added in some time & then searchable in multiple snaps following it, until its removed.
 * also note that "put()" can update the value of an existing mapping which was created in older snaphot. so we need to maintain BOTH values !!!
 *
 * Requirement
 * This question requires space-efficiency over running time.
 *
 * Solutions:
 * 1) a simple mapping without snaps is obviously implemented as a hash-map.
 * for snapshots, we can use multiple has-maps.
 * when a snapshot is taken, we cn copy all of the mapping.
 * in short: bad for space efficiency
 *
 *
 * 2) we can use a monotonically increasing generation number to attach a unique generation per snapshot.
 * we than define the key of a search DS as generation + mapping key. so we search in a specific snapshot using both generation & key
 *  in this case, deletion of a mapping might need to remove it if its was introduced in current snap (latest) or leave it in place for older snaps but designate it as non existent for latest generation. (e.g.: by decreasing its generation number by 1).
 * a binary search tree can fit this.
 * in short : too complex
 *
 * 3) consider a simple hash-map for the mapping of string->value in which we maintain for the "value" a DS which is searchable by generation.
 * to save memory, the DS will hold generation ranges (like a range tree). each range will point to the string which is the real result of the mapping for that generation.
 *  	id the mapping remained the same for multilpe snapshots, it will appear in a single range.
 *  	once its deleted, we check
 *  		if it was added in latest generation, we simply remove that generation indication. this means that during a snapshot, the range of generations must end in "latest_generation - 1" bcz its not valid for current generation.
 *  			if its added back, we add it again & it becomes valid for the latest generation range,
 * since the value of the same key can change between generations, we need the value of the main hash-map to be a searchable DS in which we search by generation
 * to save memory of the value that remains the same between generations, an entry will designate a range of generations along with the pointer to the string.
 *  	in real-life, the string would be shared between generations that have the same value (e.g.: all generation ranges point to same string)
 *  	or better, since key & values will probably remain for long (in many snapshots, until a snap is deleted), we can put them all in a set & always point at them so the string is shaed by all users, be them a key, the values of different keys, the value of same key but in different generations
 *  		that set of strings can be managed using a ref-count on each string, for simplicity.
 *
 * so the hashmap will always contain just a single mapping of each mapping that was ever inserted (this is a must as snaps remain forver with this API)
 * 		we actualy delete a mapping only if it was added in latest generation & removing it from that generations makes it invalid of ANY generation.
 *  	but when we do lookup, once a key was found, we check whether the snapshot in which the search was made consideres that mapping as valid or not. if not, we dont return the mapping even though its found within the overall hash-map.
 * for the validity of a value per snaps, this can be maintain in many variations:
 * a) a bitmap, in case there arent so many snaps, yet each key participate in many snaps.
 * 		otherwise, it wastes memory for long bitmaps, though the compactness of representation might make it valid for up to 1000 snaps considering pointer-based DS have a lot of extra memory for pointers.
 * b) a hash-map, in case a mapping would be valid for just a few generations while the number of overall generatons is very big.
 *  	bcz taking a snapshot requires adding the mapping to new generation mapping of that key, for any mapping that is valid for latest generation.
 *  	so a generation-id might be inserted (as key) to the multiple times when it remains valid in successive snapshots.
 * c) a range tree for an adaptive mapping.
 *
 * Design (3)
 * the main DS is a hash-map, mapping a string -> set of {generation range, string}
 *  	key is a string
 *  	value is a set (keyeed by generation) of :
 *  		1) generations id : in which this key-value mapping is considered valid (i.e.: will be found)
 *  		2) string : the value of mapping, given in put(key, value) API.
 * init()
 *  	set current_generation = 0
 *  	initialize hash-map (empty)
 *
 * put()
 *  	searche for the mapping.
 *  	if its not found
 *  		add the mapping
 *  		set its first generation range as spanning the current (latest) generation & upwards (open ended)
 *  	else
 *  		if key valid in latest generation
 *  			if key has same value
 *  				return
 *  			else
 *  				update last range to end in previous range.
 *  				if last range becomes empty update remove it.
 *  				add new range for current generation with new value string (if removed previous, we can reuse instead of delete & new)
 *  		else
 *  				add new range for current generation with new value string
 *
 * delete()
 *  	search for the mapping.
 *  	if its not found
 *  		return
 *  	// key was found, check validity in latest range
 *  	if key isnt valid in latest generation
 *  		return
 *  	// key is valid in latest generation
 *  	if range contains only latest generation
 *  			remove the whole range.
 *  			if this was the only range, remove the whole mapping
 *  		else
 *  			otherwise, mark the value as valid until previous generation.
 *
 * get()
 *  	searche for the mapping.
 *  	if its not found
 *  		return
 *  	else
 *  		search for specified snapshot generation (range that contains it)
 *  		if not found
 *  			return
 *  		return value attached to the generation range.
 *
 * take_snapshot()
 * #ifdef USING_CLOSED_ENDED_RANGES
 *  	iterate over all mappings & for each
 *  		if its latest range is current increment it by 1, to make it valid for next generation
 * #endif
 *  	return latest_generation ++
 *
 * deleteSnapshot()
 *  	iterate over all mappings & for each:
 *  		search generation id of the reuqired snap
 *  		if not found
 *  			continue (its not valid in that snapshot, so not affected in any way)
 *  		if deleted generation is the only generation in found range
 *  			delete the range we found
 *  			if there are no more ranges in main mapping (empty set of ranges)
 *  				remove whole mapping
 *  			continue
 *  		if deleted generation is first generation in found range
 *  			shorten found range from lower end, i.e.: increase lower end of range by 1
 *  			continue
 *  		if deleted generation is last generation in found range
 *  			shorten found range from higher end, i.e.: decrease higher end of range by 1
 *  			continue
 *  		// break range into 2 ranges
 *  		copy found range. we now have A & B
 *  		del_gen = the generation to be deleted
 *  		update A's end-of-range to be (del_gen - 1)
 *  		update B's start-of-range to be del_gen + 1)
 *  		add the new copy to mapping in *proper* position
 *  		NOTE: update of range (which is a key) of range that is already in set might require a trick (possibly remove & insert again.
 *
 * Implementation
 * the set of {generation-range, string} cannot be implemented using a std::set<> bcz
 * 		1) we need search based on containment comparator while ordering based on non-intersecting ranges.
 *  		this might work with single comparator that treats A == B even when A is contained in B bcz this is used only in search.
 *  		but this is risky bcz some API's might not take it or there is a hidden assumption in the code.
 *  	2) any range of generation-id that contains the latest-generation, needs to be maintained open-ended, bcz as the latest generation increases, these ranges include new generations implicitly, without the need to update them.
 *  		if we had a more encapsulated DS, since this question asks for memory efficiency & allowe sacrificing speed, a std:set<> could use a more sophisticated comparator or just set all ranges to end at the latest generation & then update all of then when the latest generation increases.
 *  	a fast solution is to use a vector for the set & then search with lower_bound while insertion always adds new ranges at the end of the vector & thus maintains ordering of ranges (ascending generation id)
 *  	if the cost of vector reallocatin (when adding new generation range) or squeeze (when deleting a snapshot) is too much, we can use a linked list instead of vector. but the search of a range of the list is then linear time search
 *
 * picks:
 * 1) i use a list for the ranges in which a key is valid. this means memory allocation is in exact match to number of ranges (unlike with a vector whose size doubles when reallocated).
 * 2) i pick here to use open ended ranges, so end of range with latest generation is open on the higher end. saves update time
 */


#include <unordered_map>
#include <list>
#include <string>
#include <cassert>
#include <limits>


using namespace std;

class snap_map {
public:
typedef unsigned long long 				gen_id;		// generation id
typedef gen_id							Snapshot;	// a snapshot only maintains the generation id

protected:
static const gen_id						gen_id_max;	// the maximum gen_id value. setting the end-of-range with this ensures the range is in fact open-ended, until we close it.

// properties of a range of generation id
struct gen_range {
	gen_id	start_gen_id;
	gen_id	last_gen_id;
	string	value;
gen_range(gen_id	_start_gen_id,
		  gen_id	_last_gen_id,
		  string	_value)
: start_gen_id(_start_gen_id), last_gen_id(_last_gen_id), value(_value)
{
}

gen_range(const gen_range		&o)
: start_gen_id(o.start_gen_id), last_gen_id(o.last_gen_id), value(o.value)
{
}

bool is_contain_gen(gen_id	gen) const
{
	if ((gen >= start_gen_id) &&
		(gen <= last_gen_id))
		return true;
	return false;
}

};	// end of class 'gen_range'


typedef unordered_map<string, list<gen_range>>		main_map;

protected:
main_map	map;
gen_id		latest_gen_id;

public:
snap_map()
: map(), latest_gen_id(0)
{
}


// insert/update a mapping in latest generation
void put(string	&key,
		 string	&value)
{
	main_map::iterator	fit = map.find(key);
	if (fit == map.end()) {
		// no such mapping, in any generation
		list<gen_range>		l;
		l.push_back(gen_range(latest_gen_id, gen_id_max, value));
		map.insert(make_pair(key, l));
		return;
	}
	list<gen_range>	&lr = fit->second;
	assert(!lr.empty());
	gen_range			&last_r = lr.back();	// pick last range in which the key was valid
	if (last_r.is_contain_gen(latest_gen_id)) {
		if (last_r.value == value) {
			// no need to set value since its already with same value
			return;
		} else {
			if (last_r.start_gen_id == latest_gen_id) {
				// reuse last range for new value, as it became empty
				last_r.value = value;
			} else {
				// shorten last range
				last_r.last_gen_id = latest_gen_id - 1;
				assert(last_r.last_gen_id >= last_r.start_gen_id);
				// add a new range
				lr.push_back(gen_range(latest_gen_id, gen_id_max, value));
			}
		}
	} else {
		// add a new range
		lr.push_back(gen_range(latest_gen_id, gen_id_max, value));
	}
}

// delete a key mapping from latest revision only
void del(string 	&key)
{
	main_map::iterator	fit = map.find(key);
	if (fit == map.end()) {
		// no such mapping, in any generation
		return;
	}
	list<gen_range>	&lr = fit->second;
	assert(!lr.empty());
	gen_range			&last_rng = lr.back();	// pick last range in which the key was valid
	if (!last_rng.is_contain_gen(latest_gen_id)) {
		// key isnt valid in latest generation - nothing to delete
		return;
	}
	// key is valid in latest generation
	if (last_rng.start_gen_id == latest_gen_id) {
		// range contains only latest generation
		lr.pop_back();
		if (lr.empty()) {
			map.erase(key);
		}
	} else {
		last_rng.last_gen_id = latest_gen_id - 1;
	}
}

// find a range that contains the requested generation within the list & return iterator that points at it, l.end() if not found
list<gen_range>::iterator find_range_iterator_by_id(list<gen_range>	&l,
													gen_id			gen)
{
	for (auto it = l.begin(); it != l.end(); ++it) {
		gen_range	&rng = *it;
		if (rng.is_contain_gen(gen)) {
			return it;
		}
	}
	return l.end();
}

// find a range that contains the requested generation within the list, nullptr if not found
gen_range* find_range_by_id(list<gen_range>	&l,
							gen_id			gen)
{
	auto it = find_range_iterator_by_id(l, gen);
	if (it == l.end())
		return nullptr;
	return &(*it);
}

const string *get(string	&key,
				  Snapshot	gen)
{
	auto	fit = map.find(key);
	if (fit == map.end()) {
		// no such mapping, in any generation
		return nullptr;
	}
	list<gen_range>	&lr = fit->second;
	assert(!lr.empty());
	gen_range	*rng = find_range_by_id(lr,	gen);
	if (rng == nullptr)
		return nullptr;
	assert(rng->is_contain_gen(gen));
	return &rng->value;
}

Snapshot take_snapshot(void)
{
	return latest_gen_id ++;
}

void del_snapshot(Snapshot 	gen)
{
	list<string>	to_del;	// maintain keys to be deleted, so we dont invalidate map iterator

	for (auto mit = map.begin(); mit != map.end(); ++mit) {
		list<gen_range>	&lr = mit->second;
		assert(!lr.empty());
		list<gen_range>::iterator lit = find_range_iterator_by_id(lr, gen);
		if (lit == lr.end())
			continue;
		if ((lit->start_gen_id == gen) &&
			(lit->last_gen_id  == gen)) {
			// deleted generation is the only generation in found range - remove range
			lr.erase(lit);
			if (lr.empty())
				to_del.push_back(mit->first);
			continue;
		}
		if (lit->start_gen_id == gen) {
			// deleted generation is first generation in found range - shorten range
			assert(lit->last_gen_id > gen);
			lit->start_gen_id ++;
			continue;
		}
		if (lit->last_gen_id == gen) {
			// deleted generation is last generation in found range
			assert(lit->start_gen_id < gen);
			lit->last_gen_id --;
			continue;
		}
		// break range into 2 ranges - copy orig to hi & fix orig to become low range
		gen_range	hi_rng(*lit);
		// update existing range in-place to become the low part of range
		lit->last_gen_id = gen - 1;
		assert(lit->last_gen_id >= lit->start_gen_id);
		// update copy to become hi part of range & insert to list
		hi_rng.start_gen_id = gen + 1;
		assert(hi_rng.start_gen_id <= hi_rng.last_gen_id);
		lr.insert(lit, hi_rng);
	}

	// delete all mappings that became empty
	while (!to_del.empty()) {
		string	&key = to_del.front();
		map.erase(key);
		to_del.pop_front();
	}
}

};	// end of class 'snap_map'

const snap_map::gen_id 	snap_map::gen_id_max = numeric_limits<gen_id>::max();


int main ()
{
	snap_map 	map;
	string	k1("k1"), k2("k2"), k3("k3"), k4("k4");
	string	v1("v1"), v2("v2"), v3("v3"), v4("v4");

	map.put(k1, v1);
	map.put(k2, v2);
	map.put(k3, v3);
	snap_map::Snapshot	snap1 = map.take_snapshot();	//Snapshot -> snap1 { k1 -> v1, k2 -> v2, k3 -> v3 }
	assert(*map.get(k1, snap1) == v1);
	map.put(k1, v4);
	map.del(k3);
	snap_map::Snapshot snap2 = map.take_snapshot();	//eSnapshot -> snap2 { k1 -> v4, k2 -> v2 }
	assert(*map.get(k1, snap2) == v4);
	assert(*map.get(k1, snap1) == v1);
	assert( map.get(k3, snap2) == nullptr);
	map.del_snapshot(snap1);
	assert( map.get(k1, snap1) == nullptr);

	return 0;
}

- Eitan BA June 09, 2018 | Flag Reply


Add a Comment
Name:

Writing Code? Surround your code with {{{ and }}} to preserve whitespace.

Books

is a comprehensive book on getting a job at a top tech company, while focuses on dev interviews and does this for PMs.

Learn More

Videos

CareerCup's interview videos give you a real-life look at technical interviews. In these unscripted videos, watch how other candidates handle tough questions and how the interviewer thinks about their performance.

Learn More

Resume Review

Most engineers make critical mistakes on their resumes -- we can fix your resume with our custom resume review service. And, we use fellow engineers as our resume reviewers, so you can be sure that we "get" what you're saying.

Learn More

Mock Interviews

Our Mock Interviews will be conducted "in character" just like a real interview, and can focus on whatever topics you want. All our interviewers have worked for Microsoft, Google or Amazon, you know you'll get a true-to-life experience.

Learn More