## Microsoft Interview Question

**Country:**United States

Do you know the maximum integer value? Also can the array be changed or is it immutable?

If its not to be changed, you can create a new byte array, with number of elements = MAX_INT/8, with all elements initialized to 0, all bits are 0's. In this array each bit represents a integer value in your original array. So you make a pass of your original array and set the corresponding bit in the new byte array. The moment you encounter a already set bit during pass, you are done. This is O(n) in time and takes space in bytes equal to max integer value in the array.

Which idiot downvoted this? This is a perfectly valid question!

People are posting homework on this site, or interpreting the question completely wrong and posting their own version here.

For instance, trying to figure this out in the minimum number of comparison is going to be HARD problem and a moronic interview question.

I back u up. It is because your question is valid.However, please don't blame on others. =^_^=

Assume we have one more limitation on this question. The duplicate value appear twice in the second half of array. Then the question becomes

the question for finding duplicate in an array. (We consider the second half of array as an new array)

@Psycho: Are you absolutely sure the interviewer said _minimum number_ of comparisons? (Not asymptotically optimal, but _minimum number_).

Also, what about the other questions in the interview? Why don't you post them?

And why not indicate the company as Microsoft in the question.

Actually my friend sat for microsoft. And told me the same.

I m sure that they say the minimum number of comparison !!

I think Hashset and HashTable is now allowed here. In addition, the original array is immutable.

@Psycho: So this is hearsay :-)

Perhaps you can ask your friend to recall what exactly was said...

I have seen a lot of candidates interpret statements like "can you do better" to mean "can you find the best possible solution" etc.

And given the way people post questions here, it is not a surprise.

My understanding of the problem is, given an array a[1], ..., a[n], only a[i] and a[j] are the same number, while other elements are all distinct. However we don't know the index i, j. The only thing we know is j \in (i, i + arr_size/2]. We need to find both i and j.

I'm thinking of a method using a hash table which may work. For each element in a[i], if there is the same element in the hash table, then we have found this duplicated element, otherwise put a[i] into the hash table. Time complexity should be O(n). This algorithm should work, right?

One problem is that it did not take advantage of the information that the duplicated number must appear in the range (i, i + arr_size/2]. Also, since the range of the number is unknown, the size of hash table could be large if the largest number is very large and the hash function is not well chosen. Hope to see better method.

Yes your understanding is correct.

I also try to use the information that "half of the array size" - but till now i m not successful!

Please correct me if I am wrong or my understanding of problem is wrong. Here is my logic which makes use of the given range. Worst case would be O(n) if i is 0.

void find_dup(int *a, int i, int asize)

{

int n = (i+(asize/2) );

int ele = a[i];

if (n>asize)

printf("invalid range given!");

for (i=i+1; i<n; i++)

{

if (a[i] == ele)

printf ("duplicate ele found @%d", i+1);

}

printf ("No duplicate ele found within given range :%d", n);

}

Please let me know if this deals with the problem.

The problem is that we don't know i (if I understand correctly). In this case, the input i of your function will be unknown.

Hi Richard,

If only the element @a[i] and size is given, then we cant make use of the range in which case we should go for hashing!!! Or else, we can use of offsetof()?

we can use a hash map here of max size (n/2)........start traversing the array from 0 to (n/2)-1 element and put it into the map. If during this you encounter any element which is already in the map and has appeared once once .. stop .. dats ur answer ..

else from (n/2) index "i" remove the last range element (i - n/2 ) from the hash and insert "i"th element... continue the process untill u find ur answer

We can use the hastable as follows. First, travel one round and set H[a[i]] = 0. Use a temporal variable total = 0. Do one more round, by adding to total: total += i * (++H[a[i]]). When we finish, we will have total = 0 + 1 + ... + (n-1) + j where a[i] = a[j], i < j, so we can computer j. Similarly, if we travel from right to left of the array, we can compute i.

A not so bad solution would be to sort the array (O(n * log n)), then just scan all the elements from start to end to see for repetitions (O(n)). So overall the algorithm complexity is O(n*logn)

- Sunday Programmer August 23, 2012If you give specifics about problem like, only positive integer or O(1) space or max integer is already given etc.. then there are better solutions.