Amazon Interview Question for SDE-3s


Country: United States




Comment hidden because of low score. Click to expand.
1
of 1 vote

1. let's say there are 100 files of such 6GB of data.
2. Conceptually divide each file in to 3 segments, i.e. there will be 300 segment
3. now run this loop -

for (int i = 0; i < 300; i++) {
	for (int j = 0; j < i; j++) {
		load_in_memory(segment[j], segment[j-1]);
		sort_data_in_memory();
		save_in_disk_at_segment_position(i, 1st_half_of_memory);
		save_in_disk_at_segment_position(i+1, 2nd_half_of_memory);
	}
	// after i == 0 segment[0] is the smallest segments of all files.
	// after i == 300 all segments are sorted.
}

does it make sense? it's like bubble sort, in each bubble we sort two consecutive segment.

- ahmedhasan July 15, 2020 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 votes

correction -
save_in_disk_at_segment_position(j-1, 1st_half_of_memory);
save_in_disk_at_segment_position(i, 2nd_half_of_memory);

- ahmedhasan July 15, 2020 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 vote

As all the files hold stream of integers, I would design a program that read each file and count the number of each integer in the files. When the program starts, it will create 8 temporary count files to hold the count of all integers each of which will keep counts of 2^29 numbers in sorted order. As each count will be a 64 bit number, each file will be 4GB. 8 files will cover count of all possible numbers.

A program will be designed to read each integer from each file and store integer counts temporarily in a sorted key-pair values such as sorted dictionary Dictionary<int, long>. When the number of element in the dictionary reach a point where it would run out of memory (about 340k element), the program will update the count files by adding the new value to existing one and saved to the same location. The program then clears the dictionary and continues the loop until all input files are read and processed.
When all the input files are processed, the program reads the 8 temporary count files, the output the number of integer to the final output file.

This solution limit is that there are at most 2^64-1 appearance of each number in all the input files

- Dung Nguyen April 16, 2019 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 votes

Assuming numbers are uniformly distributed, this approach would require a lot of count file switches -> lots of 4GB disk writes. As disk operations are way slower than inmemory operations, we need to minimize disk access for an efficient solution.

what i would do instead; read from 6GB files into RAM 3 GB at a time, sort inmemory and write back.
so N x 6GB files becomes -> 2N x 3GB sorted files

then I would do a 2N way merge sort on those sorted files. Whenever the overall sorted block size in RAM reaches 4GB, write it to disk, empty the memory and go on until all 3GB chunks are processed.

This requires reading + writing all data once to create sorted chunks and then once more to create single sorted file, overall a O(N) solution

- Anonymous74 May 06, 2020 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 vote

This can be solved with map/reduce paradigm. basically you create K mapper (k is the number of buckets you want to do rough partition sort, say (2^31-1)/20, 20 will be # of mappers) and each mapper covers a range of integers to contain. After mapper tasks, each mapper contains a range of integers and each mapper's bucket is in ascending order. Now create same number of sorter and each sorter is sorting the corresponding mapper's outtput. Each sorter (reducer) will output the sorted result into the final output file. As you see the number of mapper and reducer will match and process streams coming from the file sources. This is a scalable and efficient design with map/reduce paradigm.

- Anonymous June 18, 2019 | Flag Reply


Add a Comment
Name:

Writing Code? Surround your code with {{{ and }}} to preserve whitespace.

Books

is a comprehensive book on getting a job at a top tech company, while focuses on dev interviews and does this for PMs.

Learn More

Videos

CareerCup's interview videos give you a real-life look at technical interviews. In these unscripted videos, watch how other candidates handle tough questions and how the interviewer thinks about their performance.

Learn More

Resume Review

Most engineers make critical mistakes on their resumes -- we can fix your resume with our custom resume review service. And, we use fellow engineers as our resume reviewers, so you can be sure that we "get" what you're saying.

Learn More

Mock Interviews

Our Mock Interviews will be conducted "in character" just like a real interview, and can focus on whatever topics you want. All our interviewers have worked for Microsoft, Google or Amazon, you know you'll get a true-to-life experience.

Learn More