Amazon Interview Question for Software Engineer / Developers


Country: United States
Interview Type: Phone Interview




Comment hidden because of low score. Click to expand.
1
of 1 vote

Assume that 1) the size of file A > the size of file B; 2) the size of each is too big to be loaded into Memory; the Memory only allows N numerb of integers.

do while() {
read N integers from A into the Hashtable in memory, if no more integer available, break;
read each integer from B, check if it is in the Hashtable. if yes, write it into the output file C
}

O(B*A/N)

- sqw June 07, 2012 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 votes

would hash tables consume a lot of memory.
Can we do this using a radix sort??

- chaos June 08, 2012 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

I see here i do not have to write into both files but the condition that A>B is a little less ideal. What if B>A, i believe we can write a function to check for that.

Could you please help me with the hash functions. I am new to them.

- chaos June 08, 2012 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

Reasonable approach.

- Anonymous June 09, 2012 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

This is a simple implementation of the same.

This is a simple implementation of the same. I got a clarification that numbers are integers but the files are large but on one disk only.

I am using a similar approach as suggested above. In case the file is too big to fit into memory on one go, break it into parts or read n integers at a time.

import java.io.*;
import java.util.*;
import java.io.FileWriter;

public class FileRead{

    /**
     * @param args
     */
    public static void main(String[] args) {
        // TODO Auto-generated method stub
   
    File file1 = new File("C:\\Documents and Settings\\chaos\\Desktop\\input1.txt");   
    File file2 = new File("C:\\Documents and Settings\\chaos\\Desktop\\input2.txt");
    try{
       
    Map mymap = new HashMap();
    Scanner scan1 = new Scanner(file1);
    Scanner scan2 = new Scanner(file2);
    
    
    
    while(scan1.hasNextLine(    ))
    {
        int line = scan1.nextInt();
        mymap.put(line,1);
    }
   
    while(scan2.hasNextLine())
    {
        int line = scan2.nextInt();
        try{
        int val = (Integer) mymap.get(line);
             if(val==1)
            
             try{
                    FileWriter f1 = new FileWriter("C:\\Documents and Settings\\chaos\\Desktop\\output.txt");
                    BufferedWriter out = new BufferedWriter(f1);
                    out.write(line);
                    System.out.println(line);
                    }catch (IOException e){}
             
        }catch(NullPointerException e2){}
    
    }
    } catch(FileNotFoundException e){e.printStackTrace();}
    
    }

}

- chaos June 11, 2012 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 vote

what does it mean "large integer"?

- sk June 07, 2012 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 votes

can be a 64 or larger t integer. But for simplicity, I wanted to assume 32 bit for now.

- chaos June 08, 2012 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

Do you realize that 6 bits is actually incredibly small (0-63 range)? 6 bytes, on the other hand, would be quite big, if that's what you meant.

- Anonymous June 09, 2012 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

corrected to 64

- chaos June 10, 2012 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 vote

Assume that the "large integer" is unsigned Int32's.
Allocate byte array of size 2^32 bytes(4Gb).
Read through the first file:
read next int(N), set the bit N % 8 of N /8 element of byte array.
close the 1st file.
Open the output file for writing
Read through the second file:
read next int(N), check the bit N % 8 of N /8 element of byte array. If it is set to 1 then write N to the output file.
close the 2nd file.
close the output file.

Complexity is O(n1 + n2) where the n1 and n2 are sizes of the input files.

- sa June 08, 2012 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 votes

can you please elaborate a little more.

- chaos June 11, 2012 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 vote

Is it really about two large integers or two large numbers? If it is integers we do not need strings to hold them. Moreover if it is for numbers then its easier.

Steps:
Read each line from file A and store the number in form of String in a HashSet ( I will prefer a caching mechanism or multiple HashSets of sqrt(n) size). This will take care of duplicates in the same file.

After the first file is completely read and stored in the HashSets. Read the second file and use the Boyre Moore algorithm to search the strings from the second file against the HashSets.

If a hit is found in any of the sets then write the String in third file.

Analysis: Reading and storing into HashSets will take O(N) time and comparing in worst case take O(N) time where you have to compare to each element.

We need to analyze the Boyre Moore Algorithms complexity too where if the Strings to compare are almost of same length. In this case if there is a mismatch to occur in average case it will take O (L/10) where L is the length of the String and 10 is the base of numbers.

- dharmendra June 08, 2012 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 votes

This is a simple implementation of the same.

This is a simple implementation of the same. I got a clarification that numbers are integers but the files are large but on one disk only.

I am using a similar approach as suggested above. In case the file is too big to fit into memory on one go, break it into parts or read n integers at a time.

import java.io.*;
import java.util.*;
import java.io.FileWriter;

public class FileRead{

    /**
     * @param args
     */
    public static void main(String[] args) {
        // TODO Auto-generated method stub
   
    File file1 = new File("C:\\Documents and Settings\\chaos\\Desktop\\input1.txt");   
    File file2 = new File("C:\\Documents and Settings\\chaos\\Desktop\\input2.txt");
    try{
       
    Map mymap = new HashMap();
    Scanner scan1 = new Scanner(file1);
    Scanner scan2 = new Scanner(file2);
    
    
    
    while(scan1.hasNextLine(    ))
    {
        int line = scan1.nextInt();
        mymap.put(line,1);
    }
   
    while(scan2.hasNextLine())
    {
        int line = scan2.nextInt();
        try{
        int val = (Integer) mymap.get(line);
             if(val==1)
            
             try{
                    FileWriter f1 = new FileWriter("C:\\Documents and Settings\\chaos\\Desktop\\output.txt");
                    BufferedWriter out = new BufferedWriter(f1);
                    out.write(line);
                    System.out.println(line);
                    }catch (IOException e){}
             
        }catch(NullPointerException e2){}
    
    }
    } catch(FileNotFoundException e){e.printStackTrace();}
    
    }

}

- chaos June 11, 2012 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 vote

1) Get the file 1 size in bits and create a bitmap of that size and initialize to zero.
2) Read file1 and from file 1, read all integers.
if(bitmap[file1[i]] is not set then -> bitmap[file1[i]] = 1.
else //common numbers in same file. so no action.
3) Now read from integers from file 2.
Check the bitmap created by file1 for the number.
if it is already set then print as common element.
otherwise dont print.
time: O(Number of entries in file1+file2)
space: O(size(file1))

- VillageMonkey June 08, 2012 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 votes

sorry bitmap of size 2^32 is needed. Creation of bitmap of size file1 wont work.

- VillageMonkey June 08, 2012 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

This indeed is the fastest and most efficient solution. No need for all the trie/set/hash table/<insert a high level data structure here>.

- Saurabh Rawat June 12, 2012 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

What if the integers are 64 bits? That's why folks worried about "high level data structures".

- eugene.yarovoi June 14, 2012 | Flag
Comment hidden because of low score. Click to expand.
0
of 2 vote

cant we use trie???

- Hector June 08, 2012 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 votes

By all means mention random data structure names with no indication of how you intend to use them.

- Anonymous June 09, 2012 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

So let's store the long integers in out trie ...... by keeping each digit of number on each node ,after storing ,now we have to find it in our trie structure..

- Hector June 21, 2012 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 vote

The problem is similar to 2 sets intersection. The difference is only in big numbers. We can treat big numbers as Strings. So actually all solutions for sets intersection should work.
1) Use external mergesort + radix sort for in-memory chunks. Traverse sorted files and determine intersection.
2) Store all strings from 1-st file inside 'trie'. Traverse 2-nd file and search for each string in trie.

- m@}{ June 08, 2012 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

use bool[int.maxvalue] as the hash table, to save the memory and make more fast. Here is my solution.

bool[] intindex = new bool[Int32.MaxValue];
            string file1name = "c:\\test1.txt";
            string file2name = "c:\\test2.txt";

            StreamReader file1 = new StreamReader(file1name);
            string line;
            while ((line=file1.ReadLine())!=null)
            {
                try
                {
                    intindex[Int32.Parse(line)] = true;
                }
                catch (Exception)
                {
                }
                
            }
            file1.Close();

            StreamReader file2 = new StreamReader(file2name);
            StreamWriter newfile = new StreamWriter("c:\\newfile.txt");
            line = null;

            while ((line = file2.ReadLine()) != null)
            {
                try
                {
                    if(intindex[Int32.Parse(line)]==true)
                    {
                        newfile.WriteLine(line);
                    }

                }
                catch (Exception)
                {
                }

            }

- Capsen June 22, 2012 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

What do these integers represent? how much memory do we have? What about duplicates? Is either file smaller? much smaller? Can we tell which file is smaller?

In the simplest case, we can use a simple counting sort.
In the tougher case, we might have to do resort to external mergesort.

e.g: if we dont have to worry about duplicates, and the integer is 32bits, we should be able to use a bitset that will fit in half a GB of RAM.

- ViX July 03, 2012 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 votes

Correction 4GB

- ViX July 03, 2012 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 vote

Hmm, we may be able to use MapReduce to deal with large files. For example, during Map phase, we emit <int, file-id>. Since we only have two files, file-id will be file1 or file2. Then, during Reduce phase, <key, value> pairs with same key will be sent to reducers, then reducers can check if a int appears in both files. If so, output int to the file.

- Anonymous October 14, 2012 | Flag Reply
Comment hidden because of low score. Click to expand.
-1
of 1 vote

assuming the cpu has multiple processors
three threads, 1 hashtable (or trie), 1 queue --

thread #1:
read from file #1 (treat the long ints as strings )
insert the long int into a hashtable, key is the #, value is a bool.
if the # is already in the table, check the bool to see if it's been queue'd for writing.
if it hasn't, add the # to a queue, set the bool to true

thread #2
same as thread #1, except it reads from file #2

thread #3
deque a # from the queue, write it to file
when thread #1 is finished and thread #2 is finished and the queue is empty...
done.

access to the hashtable and queue need to be sync'd across threads.

assuming the file is on disk; most of the time is going to spent in reading/writing files,
so breaking that work up across multiple threads is ideal (assuming the files are large).

The memory usage might be too high with a hashtable;
this could be reduced by using a trie/radix tree

- MessyHack June 07, 2012 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 votes

What happens when the integer is present twice in one file and not present at all in other file.your algorithm will count that as a common entry I think.

- Anonymous June 08, 2012 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

yep, good catch.

easily fixed by additionally storing which thread (1 or 2) the # was encountered in
(if not queued yet; and seen by the other thread...
queue it for writing)

- MessyHack June 08, 2012 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

Looks like a good approach.
Some more reading and it looked like radix sort would be more optimal.

Can you help me with the functions here. I am new to hashmaps. I can add a key value pair but have never tried it within a thread or complex manipulations.. I really need to see some code to understand how this could be put to work.

- chaos June 08, 2012 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

Radix sort doesn't involve hashmaps.

- Anonymous June 09, 2012 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

And there are actually many reasons this is a terrible solution. A hashtable is only reasonably efficient while everything fits into memory. The threaded approach you suggested requires a huge amount of fine-grained, synchronization that will have lock contention issues.

Your best bet would probably be to do a mergesort on both files independently first (in parallel if desired), and then do a modified version of a merge algorithm to find the common elements. This has relatively good disk I/O performance.

The approach suggested by sqw can also be a good approach, depending on just how large the files are. Many real database implementations use similar techniques.

- Anonymous June 09, 2012 | Flag


Add a Comment
Name:

Writing Code? Surround your code with {{{ and }}} to preserve whitespace.

Books

is a comprehensive book on getting a job at a top tech company, while focuses on dev interviews and does this for PMs.

Learn More

Videos

CareerCup's interview videos give you a real-life look at technical interviews. In these unscripted videos, watch how other candidates handle tough questions and how the interviewer thinks about their performance.

Learn More

Resume Review

Most engineers make critical mistakes on their resumes -- we can fix your resume with our custom resume review service. And, we use fellow engineers as our resume reviewers, so you can be sure that we "get" what you're saying.

Learn More

Mock Interviews

Our Mock Interviews will be conducted "in character" just like a real interview, and can focus on whatever topics you want. All our interviewers have worked for Microsoft, Google or Amazon, you know you'll get a true-to-life experience.

Learn More