Second Attempt
BAN USER 0of 0 votes
AnswersWe have 2 data centers with equivalent datasets & a system that makes them approximately same. Ex table:
 Second Attempt in United States
aaa.aardvark.com 10.10.10.10
bbb. ..
www.
Datacenter 1 has :
aaa
bbb
ddd
...
Datacenter 2 has:
aaa
ccc
ddd
...
zzz
We know data is never perfectly consistent and need to
find up to 100 errors (there are at most 100 out of a billion). We'd want to find bbb, ccc as inconsistent entries. How can we do a diff of a lot of data across an ocean? Report Duplicate  Flag  PURGE
Amazon Software Engineer / Developer Algorithm  0of 0 votes
AnswerWrite Hash Table (with templates) and implement:
 Second Attempt in United States for Platform Engineering
1) Insert(key,value)
2) Delete(key)
3) Get(key)
Focussed on collision resolution, chaining practices concurrency. I added locality of reference too speed up get(key). Report Duplicate  Flag  PURGE
Software Developer  0of 0 votes
AnswersWrite a class hashmap and implement: insert, get and delete (Open addressing/chaining). What points to be considered before going to production (focus on concurrency)
 Second Attempt in United States for platform engineering Report Duplicate  Flag  PURGE
Twitter Software Engineer / Developer Algorithm  0of 0 votes
AnswersDesign minesweeper .
 Second Attempt in United States Report Duplicate  Flag  PURGE
Yahoo Software Engineer / Developer Data Structures
These approaches work, but why can't we use kd tree? I agree creating the tree would take the same time as for these operations but, if we have the tree made and we've to perform such queries multiple times then kd tree would prove to be a good solution. Isn't it?
 Second Attempt March 18, 2013+1 @Bevan
Complexity is O(k*k) and not O(k log k) .
@genys The remove and insert operations are to be done on the matrix rows.. You are correct that complexity of finding the kth smallest combination is O(k log k) but to be able to run your algorithm the input needs to be created and that WOULD take O(k* k) time.
I update/correct my comment above, it can be done in O(n2) complexity. Please follow stackoverflow.com/questions/15036821/findcommonelementsinnsortedarrayswithnoextraspace
 Second Attempt March 17, 2013Algorithm explained here: codercareer.blogspot.com/2012/02/no33maximumsinslidingwindows.html
 Second Attempt March 17, 2013I think he is right. If the interviewer wants constant time, then he/she should be ready to spend extra memory on this. The hash map will be pointing to a vector to do delete in constant time.
 Second Attempt March 17, 2013Worst case complexity should be linear in terms of number of elements in the binary tree.
 Second Attempt March 16, 2013This is cool, the only issue I see is overflow.
 Second Attempt March 16, 2013Could anyone explain the question ?
1,1,1,1,1,25 has 8 coins in it, not n.
How about interval trees? They're simple BST's (key being starting time of an appointment) but store time range and the max ending time seen in the subtree at every node to improve detecting collisions. Search & insertion will be logarithmic.
If a root has start and end time as start1, end1 and maxend (for the subtree) and a new time interval (start2, end2) has to be inserted condition for collision check will be:
if(start2 >= maxend)
// No collision in this subtree
else // Collision in subtree
if (end2 <= start1  start2>end1) // No collision with this node's interval
else
{
// Collision
}

Second Attempt
March 14, 2013 Question is for subset sum on negative numbers.
 Second Attempt March 14, 2013DFS or simply, say that we would traverse the tree marking each node as seen. If while traversal we ever find a node which is already visited then the tree has a cycle.
 Second Attempt March 13, 2013In that case too, cost of inserting/deleting an element will be linear in worst case time. Or am I missing something here?
 Second Attempt March 13, 2013Indeed Trie implementation would consume a lot of space. Therefore, we have succinct data structures, which are compressed TRIE based implementations. This is a very good explanation for using them for T9: stevehanov.ca/blog/index.php?id=120
 Second Attempt March 12, 2013When we know that we want to leverage the runtime polymorphism features provided by the library.
When the class defines a virtual function in C++/Java and that function isn't invoked using expected reference/pointer then we see compile time binding as the object on which the function is to be invoked is available to the compiler.
There is nothing to explain the algorithm really, just a bunch of conditions.
bool validateNum(string in)
{
int i=1;
bool hexFound = false;
bool decFound = false;
bool numFound = false;
bool zeroFound = false;
bool signFound = false;
if(in[0]=='+'  in[0]=='') { i=1; signFound=true;}
else i=0;
for(; i<in.length(); i++)
{
if(in[i]=='.')
{
if(hexFound  decFound  !numFound) return false;
decFound = true;
}
else if(in[i]=='x')
{
if(signFound  hexFound  decFound  !numFound  !zeroFound) return false;
hexFound = true;
}
else if(in[i]=='0')
{
numFound = true; zeroFound=true;
}
else if((in[i]>='0' && in[i]<='9')  (in[i]>='A' && in[i]<='F')  (in[i]>='a' && in[i]<='f')) { numFound = true; }
else return false;
}
return true;
}

Second Attempt
March 04, 2013 1) Start traversing the ransom note and text simultaneously.
2) Maintain an ASCII character map for magazine text.
3) For each character in ransom note (except space) check if we have found the character from text by checking it's count (>=1). If it is (>=1) decrement it by 1 and move to next character in text.
4) If we haven't found the char in magazine text yet, keep parsing it (magazine) and keep adding newly found characters to map until you find the character you were looking for.
The worst case complexity will be O(m+n) when we can't make ransom out of text or if we are unlucky otherwise we would not have to traverse entire magazine and will be done much earlier. For ex: I ran the following code on a pattern of length 50 and text 1000, I had to parse only 204 characters (out of 1000) before reaching the answer.
bool makeRansomNote(string ransomNote, string text)
{
int ASCII[26]={0};
std::transform(ransomNote.begin(), ransomNote.end(),ransomNote.begin(),::tolower);
std::transform(text.begin(), text.end(),text.begin(),::tolower);
int pos=0;
for(int i=0; i<ransomNote.length(); i++)
{
if(ransomNote[i]!= ' ')
{
while(pos<text.length() && ASCII[ransomNote[i]97]==0)
{
if(text[pos]==' ') { pos++; continue;}
ASCII[text[pos++]97]++;
}
if(ASCII[ransomNote[i]97]>=1 && pos<text.length())
{
ASCII[ransomNote[i]97]=1;
}
else return false;
}
}
return true;
}

Second Attempt
March 03, 2013 bool validateNum(string in)
{
bool decimalFound = false, numFound = false;
int i=1;
if(in[0]=='+'  in[0]=='') i = 1;
else i=0;
for(; i<in.length(); i++)
{
switch(in[i])
{
case '0':
case '1':
case '2':
case '3':
case '4':
case '5':
case '6':
case '7':
case '8':
case '9': numFound = true;
break;
case '.':
if(decimalFound) return false;
else if(numFound) decimalFound = true;
else return false;
break;
default:
return false;
}
}
return true;
}

Second Attempt
March 02, 2013 you are not dealing with negative powers.
 Second Attempt March 02, 2013+1 for Paras and ftfish
 Second Attempt March 02, 2013Using implicit Stack :
Lnode* kthFromTail(Lnode* start, int k)
{
static int count =0;
if(!start)
return NULL;
Lnode* end = NULL;
end = kthFromTail(start>next, k);
count++;
if(count==k) end = start;
return end;
}

Second Attempt
March 02, 2013 If you do node>next>next>next, and one of the pointers is NULL..you'd get a segmentation fault. How do you plan to do it?
 Second Attempt March 02, 2013This won't work for little/big endian machines both.
 Second Attempt March 02, 2013What if they're not sorted? In that case we'd have to use a hash map (Red Black Trees in C++)/bit vector[Need to know the range] or Sets (BST in C++) for achieving O(m+n) complexity externally?
 Second Attempt March 02, 2013I think this would do :
void fetchVerticalSum(Tnode* root, map<int, int>& vsum, int idx)
{
if(!root) return;
if(vsum.find(idx)==vsum.end())
vsum[idx]=root>val;
else
vsum.find(idx)>second += root>val;
fetchVerticalSum(root>left, vsum, idx1);
fetchVerticalSum(root>right, vsum, idx+1);
}

Second Attempt
February 23, 2013 How is the string "ABAAMAZONAMAZONAAAMAZON" lexicographically sorted? COuld someone please explain the question to me?
 Second Attempt February 23, 2013Traverse through the list of points<lat, long>, calculate euclidean distance from given point and maintain a max heap of size 'k' where 'k' is the number of closest points we are expected to return. If, the heap_size is < 'k' just append new element to heap, else compare with maxheap root. if the new point is closer than farthest seen so far replace it with the new point seen and maxheapify.
 Second Attempt February 23, 2013If the minimum complexity possible looks like to be O(n2logn) we could simply do this:
for each element v in row1
do binary search in rest (N1) rows
We could improve this by comparing the min and max of each row and decide based on that whether we want to perform binary search or not, i.e. whether it is possible for a number 'num' to fall between 'min_num' and 'max_num' of that row.
Binary search > O(log n)
For searching 1 num in n1 rows : O(nlogn)
Binary search for each number in first row : O(n2logn)
I selected first row, we can pick any row and if a no element of the row picked is found in any of the (N1) rows then we don't really have common data.
This should cover all the cases :
void findDiff(int a1,int a2, int a3, int a4)
{
cout<<"("<<a1<<","<<a2<<")  ("<<a3<<","<<a4<<") = \n";
if(a1 < a3)
{
int end = (a2<a3) ? a2: a31;
for(int i=a1; i<=end; i++) cout<<i<<"\t";
}
if(a4 < a2)
{
int start = (a1>a4) ? a1 : (a4+1);
for(int i=start; i<=a2; i++) cout<<i<<"\t";
}
cout<<"\n";
}

Second Attempt
February 22, 2013 Versioned writes. Each thread has it's own copy of object in question (shadow copy) which it works on, no need of locks.
 Second Attempt February 19, 2013From stackoverflow.com/questions/4599501/nthsmallestnumberwithnbitssetto1 , (2<<n) 3
 Second Attempt February 19, 2013Our map should be of the form <word, pair<count, pointer>>. This pointer should be an index to a heap structure of size k (5 here). We should maintain a min  heap, every time the count of a word is greater than root of the heap we remove the minimum count seen and replace it with the current value and maxheapify. Cost of maxheapify is lg k. Even if we have to maxheapify after reading every word, the complexity will come out to be O(n lg k) whereas in case of sorting it'd be O (n lg n).
 Second Attempt February 17, 2013Traverse one tree and keep a hash map, for each node add a <key,pair<,>> pair where key is the parent node info and pair are the two children (duplicates can be resolved via chaining). Now traverse next tree and see if same node (value) has same children as found in previous tree based on the hash table values.
 Second Attempt February 17, 20131) distinct keys and distinct values > hashing, have a good hash function to have minimum collision and in case of collision have chaining
2) nondistinct keys and distinct values > Still hashing but hash table of relatively smaller size and if we know that the values are spread uniformly over the namespace/alphabet we can do bucketing.
3) nondistinct keys and nondistinct values > Hashing, hash table of small size and another hash map for values <value,count> with count as number of times the value has occurred.
Why can't the reducers implement a merge algorithm like we do in case of an NWay merge. No doubt, sorting will be much more efficient if we have dupes but for unique numbers it should work too.
 Second Attempt February 17, 2013Counting sort for sorting 5 billion numbers, we'd need an auxiliary array occupying ~20GB space. If we don't have that much memory at hand [which we might in case of production systems] then we'd have to resort to External sort.
 Second Attempt February 17, 2013DNF won't work as it expects all elements to be only of 3 types, but the 3 way partitioning is correct answer. Linear in terms of input.
 Second Attempt February 16, 2013geeksforgeeks.org/archives/9591
 Second Attempt December 07, 2012Taken from stanford's site, word by word..
 Second Attempt December 07, 2012We could use Lamport's algorithm to maintain total order in distributed systems without using physical clocks.
 Second Attempt November 22, 2012Isn't this just finding nCk?? (combinatorial problem)
 Second Attempt November 21, 2012Run dijkstra's on the graph (assuming we don't have negative edge cycles). For each node keep an adjacency list for best paths for ex: I can reach node 'f' directly from s, but when I also know next best path is via 'g' so adjacency list of 'f' will have s>g>.... Once we've this adjacency list setup, we just have to find path from destination to source traversing best available paths (fetched from our list).
 Second Attempt November 21, 2012 Earlier users had to buy credits per application but now with virtual currency, user can use the same credit in any Facebook affiliated application.
 Single user's credit balance is more of a user specific information now as compared to user's account info for a specific application. Therefore it can be combined with existing user info.
 Our aim is to atomically and securely purchase and spend virtual currency thereby updating user's current balance.
 @fb user info is cached with memcached which provides us with performance in fetching user data, replication for availability, atomic transactions using 2phase commit, fault tolerance, cache coherency etc.
 For each expenditure we check user balance, if more or equal to be spent balance calculated accordingly and saved.
I'm really not sure, what exactly do they expect here. If they're ok with us leveraging their architecture then we don't have much to say here but if they want us to design everything from scratch then probably we need to explain memcached and DHT here.
1) TCP based session establishment (process based system where each request is handled by a forking/preforking a new process instance.)
2) Per user session created. Session class can look like:
class HMSession
{
string userid;
string word;
int [] correct_guesses; // positions guessed already
int chances_left;
bool isGuessCorrect(char c);
public:
string getRandomWord();
bool guessChar(char c);
bool guessWord(string word);
};
3) Basic operations expected will be searching for presence of given char in the string to be guessed and updating number of turns left.
4) I think the requestresponses can be synchronous. Operations to be done are pretty basic. Single server should be able to serve 100s of users concurrently without any significant latency.
5) Since the user data is confined in the process'es address space boundaries concurrency issues should not come in picture.
6) Session info will be inmemory (if scores are to be kept, those can be written to disk). User info should also be saved as cookies, so if connection is lost the game can be resumed easily.
7) These Hangman servers can be behind a loadbalancer who can simply serve in requests in a roundrobin manner or sophisticated monitoring system can be built for health checking and keeping track of current load on a system.
8) New hosts can be added to the load balancer or inactive/dead hosts can be removed without the needing any special configuration.
Please point out issues and pour in ideas.
paulbutler.org/archives/datastructuresforrangesumqueriesslides/
 Second Attempt November 15, 2012Don't you have to find combinations and not permutations?
 Second Attempt November 15, 2012Since we know none of the interval overlaps, we can simply store them in a sorted array with information about if this entry is start or end of an interval.
Modify binary search algorithm to return the low index in case of failed search. This will give us the index of key if found else low index. This low index gives us the position of new element to be inserted. For ex:
intervals: [1, 10), [15, 25), [40, 50)
My array has, 1, 9, 15, 24, 40, 49
I want to insert new interval [30,32). Modified binary search will give position of 40. We need to verify that the 40 is start and 24 is end ..ie the two entries don't constitute an interval or else [30, 32) will overlap. if they don't constitute an interval insert [30,32) and push 40, 49. Therefore insertion is linear in worst case.
Search and delete would work similarly
Interval trees are used to find intersection. Here we know none of the interval is overlapping. Interval trees would be overkill.
 Second Attempt November 10, 2012Can we have multiple mutex? The question says there is 1 mutex only. Should the issue be that we can either lock entire directory structure at time for write or a single entry using 1 mutex.
So as you said we can have a reader  writer lock, readers don't need to acquire lock but if there are write requests only single writer can be entertained even if rest of them want to write on a different entry.
this makes sense, but please explain given sorted input how can we find two numbers which sum up to a given number in O(n) time?
 Second Attempt October 10, 2012Open Chat in New Window
@Bin, store leaves of first tree in queue and second tree in stack and then parallely keep popping from both and comparing for equality.
 Second Attempt March 18, 2013