Given a large file, having N (

Amazon Interview Question for Software Engineer in Tests

0

of 0 votes

25
Answers
Given a large file, having N (N is very large) positive integers, how will you find a pair of numbers that add up to x (eg. 100). What data structure will you use and give appropriate Algo/code. It should be efficient in time and space.
- Nick April 29, 2010 | Report Duplicate | Flag | PURGE
Amazon Software Engineer in Test Algorithm

Email me when people comment.

An error occurred in subscribing you.

More Questions from This Interview

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 0 vote

One solution that comes to my mind is we approach the solution through a greedy algorithm. We can read a constant number of positive integers from the file. This way it becomes memory efficient. Then apply the algorithm to find the pair of numbers which add up to x. This algo gives a sub optimal solution.

- Genie May 01, 2010 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

- Genie May 01, 2010 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

What if solution pair is (A,B), and A, B doesn't belong to same set of integers being read from file ?

- Shoonya May 01, 2010 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

That's why Genie said it is a sub-optimal solution.

- rrs February 10, 2011 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

Order the numbers with a merge sort (you can do this in batches), using the X as pivot. Now you know that only two numbers on the left can be added up to equals X. Then you can have i=0 and j=A[X] you will do j-- when X < A[i] + A[j] or you would i++ when X > A[i] + A[j]

- Chris May 01, 2010 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Pivot solution works only when the array only contains positive numbers. If the intervier doesn't specify the file only contains positive integer only, you need to use the leftmost index as i and and rightmost index as j.
e.g. -9, -7. -6, 1,3, 5, 9, 10. X = 2

- Stan May 03, 2010 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

scan the entire file and remove duplicates & numbers greater than X(100) , store the remaining numbers ( non-dups ) in a hash or linked list.. o(n)

sort them - o(nlogn )..

take first element (a) & start checking from last element(b) of the array or linked list to see if a+b = 100 ; continue this way till you reach the middle of array or linked list - 0(n)

- Anonymous May 02, 2010 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Yup. My thoughts exactly. Seems to be O(nlogn).

- Bandicoot May 06, 2010 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

What if the duplicates add to 100? eg (50,50). By removing them you are losing out on a pair of numbers.

- SP May 13, 2010 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

SP has a valid point. That approach does not work. There may be very few numbers greater than 100, so getting rid of them may not be helpful too.

- rrs February 10, 2011 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

If x is much smaller than N, space complexity would be O(x), time complexity would be O(N).

If N is much bigger than x: if a space complexity of O(N) is allowed, time complexity will be O(N).

If cannot have O(N) space, use an other file to store temp data in sorted order and a resulted file for results.
Space complexity = O(1), Time complexity O(NlogN)

- Thn May 04, 2010 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Oops, case 2: x is much bigger than N

- Thn May 04, 2010 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

There is a O(N) solution to this problem. Just maintain an array of size 100, A[100]. Now as you progress through the N numbers, if the current number x is smaller than 100 then try to find its complement (100-x)in A (see if A[100-x] == true). If it's there then you have the solution. If not then set A[x] = true.

- dp May 08, 2010 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

- drekhi May 08, 2010 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

I agree, you do not need to scan the whole file with this method unless the only usable complement is at the end. Well done!

- shiggity May 26, 2010 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Actually, now that I reread, it only uses 100 as an example. A hash table should be the optimal method.

- shiggity May 27, 2010 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

How about we try this out:
Lets say the numbers sum-up to SUM. Define a hashtable H.Then go through the numbers and for each number, n, check if n is in H and if its not, then you store SUM-n in H (i.e. you store the number needed to add n to give SUM). This allows you to scan for the needed number, and not those you have seen. Duplicates will be automatically eliminated as we wont store duplicates in has table. Algorithm will look something like this:

Hashtable H;
Number SUM;

for each number n in Numbers (assuming Numbers is an array)
if (H has n)
Then a pair is found (i.e. n and SUM-n)
else
Store SUM-n in H

I guess time complexity here will be O(N), not sure of space, i guess worst case of O(N) as well if all the numbers are different and no pair sums up to SUM

- Newbe May 08, 2010 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Number are positive I am using BitSet.

public class Amazon{
        BitSet aBitSet = new BitSet(Integer.MAX_VALUE);
        HashSet<Integer> aSet = new HashSet<Integer>();
        int sum=0; 
        public Amazon(String aFile,int sum) throws IOException{
             Scanner s = new Scanner(new File(aFile));
             this.sum = sum; 
             while(s.hasNextInt()){
                int x = s.nextInt();
                int y = sum - x;
                if(y>0 && aBitSet.get(y))
                   aSet.add(x);  
                else
                   aBitSet.set(x);
             }
        }
     public void printPair()
     {
        foreach(Integer i: aSet)
        {
           System.out.print(i);
           System.out.print(","); 
           System.out.print(sum-i);
           System.out.println();  
        }
       
     }
}

- MaYank May 08, 2010 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

excellent solution!

- chenming831 May 16, 2010 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

BitSet doesn't have to be this long Integer.MAX_VALUE
Its length could be "sum" long

- Jing July 11, 2010 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

In that way, you won't be able to find solutions like (50, 50) as someone mentioned above. However, the original bit set approach will fail too.

- rrs February 10, 2011 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

One solution which is coming to my mind is:
Divide the range of number into k number of subranges. We will maintain a datastructure with k number of buckets and populate the datastructure with numbers taken from the original array. time O(n), space O(n).

Now, sort the numbers in each range. Time: O(n*log(n/k)) on average, if numbers are well distributed.

The ranges are choosen in such a way the pairing of ranges are possible. For example, for x+y=100; If we have ranges 0-9, 10-19, 11-19 .., 50, ...,81-90, 91-100.

Here pairing will be like (0-9, 91-100), (10-19, 81-90), etc.

From each such pairs pickup one number at a time (by traversing one range top-down, and other bottom-up) and try to match sum with the given number (100 for example).

- devnull007 May 15, 2010 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

c++ code

func(int a[],int x)
{
map<int,int> table;
for(i=0;i<N;i++)
 {
if(p=table.count(a[i])==1)
    cout<<a[i]<<','<<x-a[i];}
else if (p==0)
   {
   table(x-a[i])=a[i];
   }
}

- heartbreakkid May 21, 2010 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Hash table is the best solution to do in O(n) time and O(n) space.

Otherwise two pointer sum (one from the first and one on the last) after sorting the numbers would do in O(nlogn). In space comparison, hence no space complexity involved. Everything else ruled out!

- Anon June 10, 2010 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Improving on mayank's solution and using only bitset. I don't see the need of hash if all we are doing is printing out.

public void PrintNumbers(String file,int sum) throws FileNotFoundException
	{
	 BitSet bs= new BitSet(sum);
	 //HashSet<Integer> hash = new HashSet<Integer>();
	 Scanner s= new Scanner(new File(file));
	  while(s.hasNextInt()){
		 int x=s.nextInt();
		 if(x<=sum){
			 if(bs.get(sum-x)==true){
			//if(hash.contains(sum-x)){
				System.out.println(x +"  "+(sum-x));
			    bs.set(sum-x, false);//To take care of duplicates
				}
			else 
				bs.set(x);
		 }
	  }
	 }

- prolific.coder June 22, 2010 | Flag Reply

CareerCup

Amazon Interview Question for Software Engineer in Tests

Books

Videos

Resume Review

Mock Interviews