Amazon Interview Question for SDE-3s


Country: India
Interview Type: In-Person




Comment hidden because of low score. Click to expand.
0
of 0 vote

Regarding storage optimization. We can compress the lists of indexes. E.g. we can use augments instead of absolute values, something like writing 38447347 (32 bits), +1(2 bits), +1 (2 bits), instead of 38447347 (32 bits), 38447348 (32 bits), 38447349 (32 bits).

#include <vector>
#include <iostream>
#include <unordered_map>

using namespace std;

class QueryVector {
	public:
		QueryVector(vector<string> const &words)
		{
			for (int i = 0; i < words.size(); ++i) {
				index_[words[i]].push_back(i);
			}
		}
		vector<int> GetIndices(vector<string> const &words)
		{
			vector<int> out;
			for (auto &w : words) {
				auto it = index_.find(w);
				if (it != index_.end()) {
					out.insert(out.end(), it->second.begin(), it->second.end());
				}
			}
			return out;
		}

	private:
		unordered_map<string, vector<int>> index_;
};

int main()
{
	QueryVector qv({"an", "apple", "day", "keeps", "doctor", "away", "apple", "iphones", "cool", "doctor", "recommends"});

	vector<vector<string>> in = {
		{"apple"},
		{"doctor", "cool"},
		{"random"},
		{"random","keeps","day","doctor"}
	};

	for (auto &a : in) {
		for (int i : qv.GetIndices(a)) {
			cout << i << ", ";
		}
		cout << "\n";
	}
}

- Alex December 01, 2017 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

Interviewer was not much happy about the solution I gave using map, his concern was that it won't scale.

I talked about using trie tree, he further asked how would you scale it, and how you are going to make sure that for million string it would suffice.

Also the concern was what if the whole string can't be loaded into memory.

- CoolGuy December 02, 2017 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

To scale, we can shard the data. E.g. one server contains QueryVector object for all words beginning with "a"-"ad", the second server - "ad"-"ag", ..., the last server - "zu"-"zz". The server which gets the query can do simultaneous requests to the servers containing the request words, and merge the results.

- Alex December 02, 2017 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

b

- a December 07, 2017 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

Each node of a trie would have 26 pointers, so 26 * 8 < 2^5 * 2^3 = 2^8 bytes. Assuming the length of the longest word you'll have to index is 32 (Wiki has longer "coined" words but we can ignore them for this discussion). The max number of nodes we'll have in the full non-compressed trie is 2^6 - 1. So the total mem consumption is 2^14 bytes. You'll also have to store the positions of the word in the trie leaf node. Now if you have millions of words (1M < 2^20), you need to have a list of ints (2^4 bytes) to store the indices. Assuming that a word appears 1000s of times, you have the list length of 2^7

This scheme has the following mem footprint:
- 2^8*2^6*2^4*2^7 = 2^25 = 32G

Which is not a lot. However it can be sigificantly reduced if we notice that a major contributor to this is the storage of the list of ints. This can be taken off this datastructure.

The alternate scheme is, we split the "index" and the "occurence" into two bits. Index is a trie that points into the occurrences list. The pointer points to a linked list stored in an implicit datastructure like a contigous array (i.e something like allocate a block of 32, 128, 512.. ints, the last one is used to point to the next linked list, this block of 32, 128... ints are the indices of the occurence of the word).

The scheme above has the advantage that the "index" is sort of fixed size and depends on your dictionary, the occurence grows ammortized linear as the length of array increases. You could theoretically scale the occurence datastructure horizontally and have some service etc to look up the same. You should also use a bloom filter to quickly return empty arrays instead of searching the trie.

The scheme sounds elaborate but lucene uses similar idea to mantain its index (cant find source, actually not even very sure, but I vaguely remember that it does). It also uses delta encoding for the ints, (i.e. occurence of something is say 100, 128, 132, 158, 1158... is encoded as 100, 28, 4, 26, 1000 and so on to save some space).

- Ankola December 08, 2017 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

We can we trie for this. We can have list of indices where he position of word starts at the leaf.

class TrieNode {

char ch;
boolean isLeaf;
List<integer> pos;

}

- Mani K December 23, 2017 | Flag Reply
Comment hidden because of low score. Click to expand.
-1
of 1 vote

How about using the hash table to maintain the input data. Parse the input vector once in O(n), and put the input string in the hash table (use chaining) and also keep the indexes (the order in which they appears in the vector, e.g 1 and 6 in case of "apple") with them.
When ever GetIndices() gets called, it gets the key of the input string, and fetch the indices from there. Its not exactly O(1), but close to that.

- Punit December 04, 2017 | Flag Reply


Add a Comment
Name:

Writing Code? Surround your code with {{{ and }}} to preserve whitespace.

Books

is a comprehensive book on getting a job at a top tech company, while focuses on dev interviews and does this for PMs.

Learn More

Videos

CareerCup's interview videos give you a real-life look at technical interviews. In these unscripted videos, watch how other candidates handle tough questions and how the interviewer thinks about their performance.

Learn More

Resume Review

Most engineers make critical mistakes on their resumes -- we can fix your resume with our custom resume review service. And, we use fellow engineers as our resume reviewers, so you can be sure that we "get" what you're saying.

Learn More

Mock Interviews

Our Mock Interviews will be conducted "in character" just like a real interview, and can focus on whatever topics you want. All our interviewers have worked for Microsoft, Google or Amazon, you know you'll get a true-to-life experience.

Learn More