Suppose there is a distance ma

Amazon Interview Question for Software Engineer / Developers

0

of 0 votes

30
Answers
Suppose there is a distance matrix which consists of n points and that gives the distance of say (a,b) = 6 , (a,c) = 5. If there are N points assume 10000, then it requires N * N matrix to store the corressponding distances.How to store the matrix in such a fashion that it gives a fast retrieval and optimized storage.
- sowmya January 14, 2013 in United States | Report Duplicate | Flag | PURGE
Amazon Software Engineer / Developer Algorithm

Email me when people comment.

An error occurred in subscribing you.

Country: United States
Interview Type: In-Person

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 6 vote

Store the Matrix in single dimensional array by removing all the elements on & above major diagonal.

E.g. Let matrix size is 5X5, then the single dimensional array will have following list
-------------------------------------------------------------------------------
|(1,2) | (1,3) | (1,4) | (1,5) | (2,3) | (2,4) | (2,5) | (3,4) | (3,5) | (4,5)|
-------------------------------------------------------------------------------

The following function will calculate the exact index position of (i,j) in single dimensional array of matrix size NXN.

int index(int i, int j, int N)
{
    int Max=0, s=0, sub_size=0, sub_index=0, index=0;
    if(i>=N || j>=N) return -1;
    if(i == j) return 0;
    if(i>j) {i = i+j; j=i-j; i=i-j;} //swap i & j
 
    Max = N*(N-1)/2;
    s = N-i;
    
    sub_size = s*(s+1)/2;
    sub_index = Max-sub_size;
    index = sub_index+j-i;
    rerurn index;
}

S(N) = O(N*(N-1)/2)
O(N) = O(1)

- SRB January 14, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

can you explain the subindex please?

- Developer January 14, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

For Example, suppose we are looking for M(2,4), then Max, sub_size, sub_index, index goes as follow.

-------------------------------------------------------------------------------
|(1,2) | (1,3) | (1,4) | (1,5) | (2,3) | (2,4) | (2,5) | (3,4) | (3,5) | (4,5)|
-------------------------------------------------------------------------------
|   1   |   2   |   3   |   4   |   5   |   6   |   7   |   8   |   9   |  10 |
-------------------------------------------------------------------------------

<-----------------------------Max---------------------------------------------->
                                 <-----------------------sub_size-------------->
                             ^
			     |
			  sub_index
                                            ^
			                    |
					   index

- SRB January 14, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

I understand that it takes O(N^2) space.
how about run time for query. I believe it is O(lgN). Correct me if I am wrong. Thank you

- Vincent January 14, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Actually, this is not optimized storage. Given any 3 points, you can have the 3rd distance figured out by the other 2. Using that logic, one can cut the size of the 1 array to at least half (if not more). With some more thinking, one can index all those, too.

- Son January 15, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

@Son: Given the lengths of AB and BC in a triangle, can you find the length of AC? The answer is no. It might still be possible to use a knowledge of certain distances to compute certain other distances, but it's not as simple as you suggest.

@Vincent: It's constant time, assuming O(1) arithmetic operations. There's nothing in there such that the running time depends on N.

- eugene.yarovoi January 15, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 2 votes

@SRB though the logic is correct but you mention "Store the Matrix in single dimensional array by removing all the elements on & above major diagonal" which means only those elements will be stored for which i>j however in the example you have taken elements above major diagonal but thats fine as the logic is correct

My Q is do we need to do so much calculations? consider if we only save lower half as u mentioned (i>j). Then in your example elements saved will be

-------------------------------------------------------------------------------
|(2,1) | (3),1 | (3,2) | (4,1) | (4,2) | (4,3) | (5,1) | (5,2) | (5,3) | (5,4)|
-------------------------------------------------------------------------------
So for calculating index in new array we can just use
new_index = (i-2)*(i-1)/2 +j

because for reaching any element in virtual 2D array we will have to skip (i-2)*(i-1)/2 elements in i-1 rows and then j elements in ith column

- vik January 15, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

@vik .. Ur logic also correct. The same thing come to me after I post this solution.

- SRB January 15, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Sorry, I just figured out that this is the same SRB wrote

- vpodlesnyak January 16, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

@Vik - how did you come up with the idea for new_index = (i-2)*(i-1)/2 +j

- kevinshah2006 January 17, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 1 vote

@kevin: Here is the formula:
Address of any element a(i,j) = number of elements up to a(i,j) element
= Total number of elements in first (i-1) rows + number of elements upto jth column in the ith row
= 0+1+2+...(i-2) + j
= (i-2)*(i-1)/2 + j

Here is why (0+1+2...(i-2) )
suppose take 3x3 matrix
a11 a12 a13
a21 a22 a23
a31 a32 a33

we are taking lower triangular matrix excluding diagnol element, so here is the number in each row
0
1
2

- buddy January 19, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 1 vote

Assuming distances are symmetric, you only need to store the entries below or above the diagonal.

- Anonymous January 14, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

They are symmetric i.e (a,b) = (b,a). @jasmeet Its not like what you think. Its not like a graph.

- sowmya January 14, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

I suggest following solution (written in java):

public static void main(String[] args) {
		int N = 5;
		int[][] matrix = new int[][]{
			{ 0, 1, 2, 3, 4},
			{ 1, 0, 5, 6, 7},
			{ 2, 5, 0, 8, 9},
			{ 3, 6, 8, 0,10},
			{ 4, 7, 9,10, 0},
		};
		
		int[] matrixArray = new int[]{
			1, 2, 3, 4,
			   5, 6, 7,
			      8, 9,
			        10
		};
		
		// print to ensure that representation is ok
		for (int i = 0; i < N; i++) {
			for (int j = 0; j < N; j++) {
				System.out.print(getMatrixElement(matrixArray, i, j, N)+" ");
			}
			System.out.println();
		}
	}
	
	static int getMatrixElement(int[] matrixArray, int i, int j, int N) {
		//check index validity
		if (i < 0 || N <= i || j < 0 || N <= j) throw new IllegalArgumentException();
		
		//check equality	
		if (i == j) return 0;
		
		// ensure j<i
		if (i < j) {
			int tmp = i;
			i = j;
			j = tmp;
		}
		
		int rowOffset;
		if (j == 0){
			rowOffset = 0;
		} else {
			rowOffset = (2*N-j-1) * j/2; // offset is an arithmetic progression like: 	4 .. 3 .. 2 .. 1 
		}
		
		return matrixArray[rowOffset + (i-j-1)];
	}

- vpodlesnyak January 15, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

This solution requires (N-1)*N/2 place for storage and seek time is constant

- vpodlesnyak January 15, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

/*
Assume the elements are stored in the array as given below
(2,1),
(3,1),(3,2),
(4,1),(4,2),(4,3),
(5,1),(5,2).......

*/

#include<stdio.h>
#define N 20
int find_distance_index(int a,int b);
main()
{
	int i,j,distance[(N*N-1)/2];
	FILE *fp;
	
	fp=fopen("vin.txt","w");

	for (i=1;i<=N;i++)
	{
		for(j=1;j<=N;j++)			
		{
			fprintf(fp,"%d \t",find_distance_index(i,j));
		}
		fprintf(fp,"\n");
		
	}
	fclose(fp);
	getchar();
	return 0;

}
int find_distance_index(int a,int b)
{
	int return_value = 0;
	if(a!=b)
	{
		return_value= (a>b)?((a-1)*(a-2)/2)+b: ((b-1)*(b-2))/2+a;
	}
	return(return_value-1);
}

- vinod January 19, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

-1

of 3 vote

If you know the distance from a to b and a to c, you don't need to store the distance from b to c

- jasmeet@iastate.edu January 14, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

yes, we need to if the points are in 2-D space. In 1-D space we can sort the points on the basis of their co-ordinate and store it.

- aasshishh January 14, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

we don't need to even if the points are in n-D space.

- van.truongson January 15, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

If you know the distance AC and you also know AB, how will you find BC? You cannot infer the length of the third side of a triangle from the length of the two other sides.

- eugene.yarovoi January 15, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

right. sorry. my bad. i wasn't thinking straight.

- van.truongson January 15, 2013 | Flag

Comment hidden because of low score. Click to expand.

-1

of 1 vote

The problem with the matrix is that it occupies N*N rooms of the memory, where only a small subset of node-pairs (let's say m) have distance information. To solve this problem, one can use a hashmap, where the keys are the pair information, and the values are the distance between the two. The key of a pair could be defined as an interger, which can be uniquely built having the pair ids. For example, (x*N + y) can be used as the key id for the nodes with ids x, and y.
Using this scheme, the required space for storing the information will be: 2*len(Integer) *m/alpha , which for an enough large N, it would be much smaller than N*N*len(Integer)

- taheri.javad January 14, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

But here say if all N*N values are there after removing the symmetric ones, Then also the amount of N * (N-1) / 2 values is so huge to store in a hashmap right. Lets say N is 10000.

- sowmya January 14, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

"The problem with the matrix is that it occupies N*N rooms of the memory, where only a small subset of node-pairs (let's say m) have distance information."

I don't understand what you mean by this sentence. The problem statement suggests a distance is defined between every pair of points, not some small subset.

- eugene.yarovoi January 15, 2013 | Flag

Comment hidden because of low score. Click to expand.

-1

of 1 vote

In case the matrix is asymmetric, then can we use the following approach?

Create a hashMap, with key=distance, value=bucket of pairs having such a distance between them.

PROS: If the overall original matrix size >> number of distance distances between pairs, then this approach will have a lesser space requirement.

CONS: If the number of distinct distances between pairs is comparable to size of the original matrix, then this approach will fail.

- Learner January 14, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

What happens if the distance between any two pair is same....
Instead wouldn't it be a good approach to store the pair as key since it is unique and store their distance as values...
This would also help in fast retrieval as given a pair distance could be computed in constant time

- Anuj Agrawal February 02, 2013 | Flag

Comment hidden because of low score. Click to expand.

-1

of 1 vote

hey guys i made a hash function for this :
check this

index= ((i-1)/2) * (2*(n-1)-(i-2)) + (j-i);

tell me whether it is working or not?

- deepak malik January 14, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

-1

of 1 vote

KD trees must be used for storing the data.

building a kd-tree has O(N·logN) time complexity and O(K·N) space complexity
nearest neighbor search - close to O(logN)
M nearest neighbors - close to O(M·logN)

- Lyubomir January 14, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

I am not wrong guys :)

- Lyubomir January 19, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

This isn't a nearest-neghbors problem. You're asked to store the exact data in the distance matrix.

- eugene.yarovoi January 19, 2013 | Flag

CareerCup

Amazon Interview Question for Software Engineer / Developers

Books

Videos

Resume Review

Mock Interviews