How do you add up a very long

Amazon Interview Question for Software Engineer / Developers

0

of 0 votes

3
Answers
How do you add up a very long list of numbers multiple threads (100 threads?) How many cores do you require?

How do you increase the performance of this program? Does the number of threads created get limited by the cores of the box? How exactly are you gonna delegate it to the cores? (as in- these number of threads need to use core#1, and so)
- xankar March 14, 2013 in United States | Report Duplicate | Flag | PURGE
Amazon Software Engineer / Developer Algorithm

Email me when people comment.

An error occurred in subscribing you.

Country: United States

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 2 vote

Explanation of why we need some sort of relation on cores and threads per core.
Nowadays each core have 2 hardware threads, so we can have 2 threads per core, but more does not give us any benefit, in terms of performance, because each core can run only two thread in parallel(both are mapped to existing hardware threads), when we create more threads, it increase overhead of their creation, their synchronization, coherency and many other, and the switch time of c.p.u. because now same work is done by more threads, so when you create more then 2 thread per processor, it actually reduce performance, (in terms of actual time).

Now to increase performance we increase core, here is the algorithm to sum n numbers with n/2 core.

Let A[1 to n] is the given array
they sum in binary tree form,
example :
...................23
..........12.............11
...6...........6........11
1...5.....2.....4...5...6

at i th thread or at ceil(i/2)th core
  read(A[i],a);
  write(a,B[i]);
  for h = 1 to log(n)
     if i <= n/2^h
         read(B(2*i-1),x)
         read(B(2*i),y);
         z = x+y;
         write(z,B(i));
  if i=1
        print B(i);

this require O(log(n)) time and O(1) space complexity per core.but overall it require O(n) + O(n),
But here we consider(in parallel program) complexity per core.

Now if number of core are limited, means we can only create let say P threads only then what ?
so here we create (n/P) number set and sum them at each thread which require O(n/p) time then we apply above procedure which require O(log(p))
so total complexity become O(n/p) + O(log(p))

- sonesh March 14, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 1 vote

Great answer, @sonesh.

One thing to clarify with the interviewer is it's taken as a hypothetical that the actual addition computation is expensive enough to justify the overhead of passing values between the threads. Even for arbitrarily big integers, it's probably gonna cost more to hand off a number to a thread than it is to simply add the same number to a single accumulated sum managed by one core.

Also, since adding numbers is O(N), using a binary tree breakdown is way overkill. For a really long list of numbers, you're eventually gonna do N-1 additions, and the only real engineering concern is keeping the cores busy doing the real work (arithmetic), instead of keeping the cores busy doing synchronization (pure overhead). So, maybe send 10,000 integers at a time to each worker, let them spin a while, and then each worker update the overall sum after each iteration, using a mutex to manage contention.

- showell30@yahoo.com March 15, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

You are right @showell,
In general we have two different organization of memory, one is shared memory and distributed one,
In shared memory we don't need to pass anything, as all cores share same address space, so for that above solution is ok,
But If we have distributed systems, where each core have different their own memory address space, then we do by MPI, here each core actually have to send their data to another core, so their we have two type of cost
1)computation cost,
2) communication cost.
so the computation cost is same but communication cost vary as it depends upon how distributed system are implemented, for example they all might be connected by a common bus, or they may be connected our a network, then again their software implementation also affect the actual cost.

What ever you are saying in last para is actually comes under vector calculation, here we read not only a single data item but a vector of data item, and in message passing interface also, we have optimize in terms of size of packet.

- sonesh March 15, 2013 | Flag

CareerCup

Amazon Interview Question for Software Engineer / Developers

Books

Videos

Resume Review

Mock Interviews