Suppose we are detecting fraud

Amazon Interview Question for Software Engineer / Developers

1

of 1 vote

42
Answers
Suppose we are detecting fraud cheques and we found the cheques with the following list of patterns are fraud:

111122234455
1234
22334455
11111111
234567
etc.

Now if you have a new cheque and wan to detect fraud in O(1) time what data structure you want to use?
- zahidbuet106 February 13, 2013 in United States | Report Duplicate | Flag | PURGE
Amazon Software Engineer / Developer Data Structures

Email me when people comment.

An error occurred in subscribing you.

Country: United States
Interview Type: In-Person

More Questions from This Interview

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 5 vote

Well i think the Question is simple we need to match a pattern in O(1)
and the pattern is simple. Its starting with a number and next number is equal to current number or current number+1.
now we need to design a data structure to do that.
so
1ques to interviewer : is there a max size of cheque number.
most probably ans should be yes. say n (makes our life easy)
2. do i have a constrain in memory
most probably he would say consider u have 2-4 gig (cool)
3.since this is a system where we need O(1) sol. there must be large number of cheque to be validated. am i rite?
ans should be yes.

Now problem is a combination question
we need to find all possible unique combinations that can be generated with size less than or equal to n and digits (1-9) and the numbers should be in increasing order.

when ever we create such combination we create it and add to hash table.
and this data is finite set
so 1s we create such table we are done.

example N = 3 and number of digits are 1-2
so possible unique values are
1,
11,
111
2
22
222
12
112
122

I am guess good enough to create a hash table and find it in O(1)
i

- Debu February 14, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

We can also say this question as
find all subsets of set {1-9} where all numbers are repeated m (max size of cheque num) times in the set i.e. {m1s, m2s....m9s} such that the subset is increasing order with difference of 1 or equal and size m.
max number of subset is 2^n. but while creating a subset we can validate if the subset matches the pattern and size is <= m.
creating subset can be done with bit pattern matching from 0-2^n,

- Debu February 14, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

@Debu So do you mean that after creation of HashTable of n numbers that Hashtable will contain all correct and fraud numbers? or just correct ones. If both then How you will check that new cheque contains fraud numbers?

- Geet February 15, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 2 vote

First of all I cannot aasume from the problem statement that we are dealing with prefix or suffixes, so I think we could imagine that any check number containing any pattern would be considered a fraud.
Taking this into account, we could apply String matching with finite automata instead of using a Trie. If the automata accepts the string we are done. The running time, without the automata construction preprocessing, is Θ(k) where k is the length of the input to be checked. Neither Trie nor DAFSA provides O(1) running time and the Hash solution proposed by JerryC do not work for suffixes or prefixes because it will only match the exact string.

- pisaruk February 13, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

How will you construct the automata? Will you make 1 automaton per pattern, and run each one individually when matching? In that case, the complexity would be at least O(k * n) where n is the number of patterns. Also, matching a pattern of length k with an automaton is not O(k) because your automaton is going to be an NFA here, not a DFA.

- eugene.yarovoi February 14, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

I will construct one automata to accept all the patterns.
My ideia is to use a directed acyclic word graph.

- pisaruk February 14, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

This will be an NFA that looks exactly like a trie, right? This NFA will have up to O(max pattern size) active states at any given time, and will therefore require O(k * max pattern size) time to check.

- eugene.yarovoi February 14, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

I agree with autometa.. what do think about my solution below

- Debu February 14, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

How about using KMP for each pattern.

- Nitin Gupta February 16, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 2 vote

let me clear the confusion about the question the interviewer asked me:

first of all, the numbers in the pattern are not consecutive so, 36512 could be a prefix. Interviewer asked me first what data structure would you use for efficiently detecting a new cheque. I answered Trie. He said, good, that's the best choice to go. Now, If I say you need to match in O(1) time does your solution still correct. I said yes as we know that number of digits in a cheque number is fixed. SO length of prefixes are constant. That means traversing the trie for a prefix is of constant order i.e. O(1). It is a easy stuff guys, not that complex!!

- zahidbuet106 February 14, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 1 vote

I wrote exactly what the interviewer asked me.... may be it was confusing but it is what exactly was asked...
but this is good in a sense that people now solves complex version of the problem. That is good anyway :)

- zahidbuet106 February 15, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Taking into account that we are dealing with prefix matching we could use a Trie as pointed out by @zahidbuet106 or we could use a DFA and not a NFA(this is required only if we are going to deal with suffixes, patterns like: .*1234). This would require less space.

- pisaruk February 15, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 1 vote

Hashmap??

- Raja February 13, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

even i think a hashmap will suffice.but I mean if assuming the fraud numbers are in the form of consecutive numbers or repeated numbers. So cheque number will be the keys of hashmap. If too many numbers are repeated then its a fraud cheque?

- Anonymous February 14, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

Specifically, those are prefixes. So, you want to find any prefix that matches with the new cheque to detect fraud. Straight forward answer is to use a Trie.

- zahidbuet106 February 13, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Not really O(1) time. It's O(max length of pattern)...I suppose perhaps you're assuming that the patterns are all O(1) in size, and you just don't want the runtime to be proportional to the number of patterns?

- eugene.yarovoi February 13, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

Use trie.

- zahidbuet106 February 13, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Agreed... Trie is one of the answer but as you wrote in question complexity should be O(1). This can't be achieve through Trie.. I think we should create Hash map with patterns. when new cheque arrived try to match the patters with newly arrived cheque(take prefix equal to pattern length) . This will also take 'N' (Number of patters) comparison.

- Anonymous February 13, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

You could apply the solution proposed by JerryC so you would end up with a complexity O(1).

- Fabio Pisaruk February 13, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

JerryC's solution is not O(1). No solution can be O(1) because in some cases, it will be necessary to scan the input for at least max (prefix length over all prefixes) characters before determining whether there's a match. The only way it can be O(1) is if we assume all prefixes are of a size bounded by a constant. Then there are solutions that are O(1) because they are only dependent on the size of the longest prefix and independent of the number of patterns we have to match. We can use a prefix tree to obtain one such solution.

- eugene.yarovoi February 14, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

with Hashmap you can not do it because prefixes may contain other prefix. it'll take O(n) ultimately.

Eugene is right... .we don't want the runtime to be proportional to the number of patterns... in that sense it is order of constant time actually.

- zahidbuet106 February 13, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 4 vote

Since we have variable string length, we pin down the length of each string we're checking and only check it against strings of that length. Store the fraud values in linked lists, within a hashtable <integer, linkedlist> The key value integer represents the length of the string. For example, the hashtable entry under key value 8 in this case would be a linked list of 22334455-->11111111-->etc. Then your search is just value.length and run the O(1) search through that list.

Or is that considered O(2) - once for the hashtable and once for the linked list?

- JerryC February 13, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

@JerryC ... yes if the number of fraud patterns are small then its certainly a nice idea(which one should ask the interviewer and in this case most probably it will be). however if the the fraud pattern are large in number then it will be O(n) in searching in linked list. So we can use a balanced BST(AVL, RB) instead of linked list and hence time will be O(1)+O(logn) =>logn

- vik February 14, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Instead of using a BST you could use another Hash for storing fraud numbers patterns ending up with O(1) running time.

- pisaruk February 14, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

@pisaruk If you are gonna use the fraud numbers strings as keys then why dont you create a hash of <fraud numbers, bool> in the first place instead of having hash inside a hash

- vik February 14, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

You are absolutely right.
Sorry for that confusion.
Actually, I think I would propose the solution using finite automata.

- pisaruk February 14, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

this solution assumes that we are looking for the exact match of the fraud pattern and the cheque number, however, it seems that we are looking for a partial match - whether the cheque number contains one of the fraud patterns..

- S.Abakumoff February 14, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

JerryC i guess your solution checks limited pattern only.

check my solution .

- Debu February 14, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Can we use Regex?? what kinda data structure is it??
we can make regex string like

s1 = "^\\d{8}[11111111]$"
s 2= "^?[234567]$"
etc.

- Aditya February 26, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

using stack>use top[stack] for pushing next element
a[]={let our input is in form of array}
x=each element of array
if(x==top[stack] || x==top[stack]+1)
stack[top]=stack[top]+1
stack[top]=x
else "say invalid"

- nitesh anone February 14, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

It has to be a hashtable of patterns

- praveenkcs28 February 14, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

No O(1) solution exists

- S.Abakumoff February 14, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

If you have static keys (they are predefined somewhere), you can use PERFECT HASHING (this is hash of hashes without collissions, see wiki for more details). This allow us to do lookup in O(1) worth time.

- m@}{ February 14, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Here's my understanding of the problem.

Since the numbers in question are cheque numbers and we can very safely assume that the check numbers are of fixed length which would be around 10. The second thing that I can figure out from this is that the pattern can occur anywhere in the cheque. So it would mean that we cannot use a prefix tree, instead we'll use a suffix tree. The run time for search in a suffix tree is O(n).
Since the length of check is constant, it would be O(c), where c is a constant. And thus constant run time can be easily represented as O(1).
The run time would be O(n) only if the length of the cheque was not a constant, but that is not the case because cheque are of constant length.

- as09 February 15, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

@abhisek, you are wrong... the patterns are prefixes from beginning . See my comment above.

- zahidbuet106 February 15, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Can you please update the description of the question with same then? Anyone looking at the problem first tried to solve it on his own and insufficient information makes the task a bit difficult. Thanks

- as09 February 15, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

Suffix tree requires O(substring length) in this case since the length of check string has an upper bound we will get this in O(1).. The substring cannot exeed the max length string.

- isandesh7 February 17, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

How about Rabin–Karp multiple pattern search?

- AndrewJD February 19, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

What about Regex??
we can simply define regex patterns for the given strings. It can be one of the answers, can't it??

- Aditya February 26, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

use trie or suffix tree

- Vaibhav March 04, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

What about "bloom filter"? There is a possibility of "False positive" but no "False negative". In fact bloom filter is used for situations such as these

- Parvez July 04, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Total digits in Cheque is fixed. Create a trie of all such fraud numbers and for a given number, do lookup in trie to check its existence. This comparison is O(1)

- mithya September 14, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

/*The following code will work if the cheque number is a unsigned integer */

BOOL isValid(unsigned long int nL)
{
	int nCR, nPR =nL%10;
	nL/=10;
	while(nL)
	{
		nCR=nL%10;
		if(nPR<nCR)
			return TRUE;
		nPR = nCR;
		nL/=10;
	};
	return FALSE;
}

- dreamBlitzer March 14, 2015 | Flag Reply

CareerCup

Amazon Interview Question for Software Engineer / Developers

Books

Videos

Resume Review

Mock Interviews