Amazon Interview Question
Software Engineer / DevelopersCountry: United States
Interview Type: In-Person
We can also say this question as
find all subsets of set {1-9} where all numbers are repeated m (max size of cheque num) times in the set i.e. {m1s, m2s....m9s} such that the subset is increasing order with difference of 1 or equal and size m.
max number of subset is 2^n. but while creating a subset we can validate if the subset matches the pattern and size is <= m.
creating subset can be done with bit pattern matching from 0-2^n,
First of all I cannot aasume from the problem statement that we are dealing with prefix or suffixes, so I think we could imagine that any check number containing any pattern would be considered a fraud.
Taking this into account, we could apply String matching with finite automata instead of using a Trie. If the automata accepts the string we are done. The running time, without the automata construction preprocessing, is Θ(k) where k is the length of the input to be checked. Neither Trie nor DAFSA provides O(1) running time and the Hash solution proposed by JerryC do not work for suffixes or prefixes because it will only match the exact string.
How will you construct the automata? Will you make 1 automaton per pattern, and run each one individually when matching? In that case, the complexity would be at least O(k * n) where n is the number of patterns. Also, matching a pattern of length k with an automaton is not O(k) because your automaton is going to be an NFA here, not a DFA.
I will construct one automata to accept all the patterns.
My ideia is to use a directed acyclic word graph.
This will be an NFA that looks exactly like a trie, right? This NFA will have up to O(max pattern size) active states at any given time, and will therefore require O(k * max pattern size) time to check.
let me clear the confusion about the question the interviewer asked me:
first of all, the numbers in the pattern are not consecutive so, 36512 could be a prefix. Interviewer asked me first what data structure would you use for efficiently detecting a new cheque. I answered Trie. He said, good, that's the best choice to go. Now, If I say you need to match in O(1) time does your solution still correct. I said yes as we know that number of digits in a cheque number is fixed. SO length of prefixes are constant. That means traversing the trie for a prefix is of constant order i.e. O(1). It is a easy stuff guys, not that complex!!
I wrote exactly what the interviewer asked me.... may be it was confusing but it is what exactly was asked...
but this is good in a sense that people now solves complex version of the problem. That is good anyway :)
Specifically, those are prefixes. So, you want to find any prefix that matches with the new cheque to detect fraud. Straight forward answer is to use a Trie.
Agreed... Trie is one of the answer but as you wrote in question complexity should be O(1). This can't be achieve through Trie.. I think we should create Hash map with patterns. when new cheque arrived try to match the patters with newly arrived cheque(take prefix equal to pattern length) . This will also take 'N' (Number of patters) comparison.
You could apply the solution proposed by JerryC so you would end up with a complexity O(1).
JerryC's solution is not O(1). No solution can be O(1) because in some cases, it will be necessary to scan the input for at least max (prefix length over all prefixes) characters before determining whether there's a match. The only way it can be O(1) is if we assume all prefixes are of a size bounded by a constant. Then there are solutions that are O(1) because they are only dependent on the size of the longest prefix and independent of the number of patterns we have to match. We can use a prefix tree to obtain one such solution.
Since we have variable string length, we pin down the length of each string we're checking and only check it against strings of that length. Store the fraud values in linked lists, within a hashtable <integer, linkedlist> The key value integer represents the length of the string. For example, the hashtable entry under key value 8 in this case would be a linked list of 22334455-->11111111-->etc. Then your search is just value.length and run the O(1) search through that list.
Or is that considered O(2) - once for the hashtable and once for the linked list?
@JerryC ... yes if the number of fraud patterns are small then its certainly a nice idea(which one should ask the interviewer and in this case most probably it will be). however if the the fraud pattern are large in number then it will be O(n) in searching in linked list. So we can use a balanced BST(AVL, RB) instead of linked list and hence time will be O(1)+O(logn) =>logn
Instead of using a BST you could use another Hash for storing fraud numbers patterns ending up with O(1) running time.
@pisaruk If you are gonna use the fraud numbers strings as keys then why dont you create a hash of <fraud numbers, bool> in the first place instead of having hash inside a hash
You are absolutely right.
Sorry for that confusion.
Actually, I think I would propose the solution using finite automata.
this solution assumes that we are looking for the exact match of the fraud pattern and the cheque number, however, it seems that we are looking for a partial match - whether the cheque number contains one of the fraud patterns..
Here's my understanding of the problem.
Since the numbers in question are cheque numbers and we can very safely assume that the check numbers are of fixed length which would be around 10. The second thing that I can figure out from this is that the pattern can occur anywhere in the cheque. So it would mean that we cannot use a prefix tree, instead we'll use a suffix tree. The run time for search in a suffix tree is O(n).
Since the length of check is constant, it would be O(c), where c is a constant. And thus constant run time can be easily represented as O(1).
The run time would be O(n) only if the length of the cheque was not a constant, but that is not the case because cheque are of constant length.
@abhisek, you are wrong... the patterns are prefixes from beginning . See my comment above.
Well i think the Question is simple we need to match a pattern in O(1)
- Debu February 14, 2013and the pattern is simple. Its starting with a number and next number is equal to current number or current number+1.
now we need to design a data structure to do that.
so
1ques to interviewer : is there a max size of cheque number.
most probably ans should be yes. say n (makes our life easy)
2. do i have a constrain in memory
most probably he would say consider u have 2-4 gig (cool)
3.since this is a system where we need O(1) sol. there must be large number of cheque to be validated. am i rite?
ans should be yes.
Now problem is a combination question
we need to find all possible unique combinations that can be generated with size less than or equal to n and digits (1-9) and the numbers should be in increasing order.
when ever we create such combination we create it and add to hash table.
and this data is finite set
so 1s we create such table we are done.
example N = 3 and number of digits are 1-2
so possible unique values are
1,
11,
111
2
22
222
12
112
122
I am guess good enough to create a hash table and find it in O(1)
i