unknown Interview Question
Software EngineersInterview Type: Phone Interview
@NoOne: Maybe I'm bit slow today, but let's take this example with only one repetition.
array = [5,9,13,15,20,20]
start = 0, end = 5, middle = 2 (value 13)
how to know it's in the interval (middle..end] when looking at 13 and maybe 9 and 15?
the array could be 5,5,13,15,17,20 and it would look the same at the first step of binary search, but in order to get the O(lg(n)) behavior one has to decide for interval [start..middle] or (middle..end]
...
if you could post the python/java/c code, that finds the single repeated element in a sorted array in less than O(N), that would be awesome ;-) for example with the array above.
@ChrisK you are right you can't perform that search in less than O(n). You can perform the search in a binary fashion but you still would have to go trough all the elements in the array in order to find the repeated sequence in the worst case. To be honest the answer to the question comes before that part so I did not think that very thoroughly. The real answer is that you can't solve the problem better than O(n). The source code I gave is the proof. You are solving a relaxed version of the problem and it still takes O(n). 2log(n) is still linear as the base of the logarithm is also 2. You can get a formal demonstration using the master theorem. The recursion is T(n) = 2T(n/2) + O(1). The thing is that on the interview I got assured by the interviewer that you can do better. What a nice feeling when I finally got the proof that you can't. I think I am satisfied with that answer so I will stop looking for a way to find an improvement over O(n).
@Fernando: I agree. Concluded:
1) The best conceivable run-time is O(n) for the initial problem.(an informal "prove" is that you need to look at all elements because if you do not look at one element and if this one is the repeated element, there is no way to tell which element (value) repeats.
2) You can relax it in several ways:
a) define that adjacent elements differ by at most 1 (see my initial post), you can find the number of repeated elements then in O(1) and the element in O(lg(n))
b) you know the position of one element and thus need to find the end and start of the sequence as you pointed out
BUT, O(2*log(n)) is O(log(n) if "*" is a multiplication) because constant factors can be eliminated (if it was O(2^log(n)) where "^" being power then this is n because 2^log(n) = n, if log means the logarithm with base 2)
I don't agree with your recursion that's why the application of the master theorem turned out to be O(n): The recursion is T(n) = T(n/2). You do it twice for n, but not on every "recursion level" that's why you can't put the 2 in front of the T. They way you wrote it would be, in your binary search you had to either use recursion or introduce a stack so you can follow both path, the left and the right side which is clearly not the case, a T(n) = 2*T(n/2) recursion would look like:
def fun(s, e):
if s == e: return 1
m = (s + e + 1) // 2
return fun(s, m) + fun(m, e) # or max(fun(s, m), fun(m, e))
fun(0, n-1)
but you do clearly:
def fun(s, e):
if s == e: return 1
m = (s + e + 1) // 2
if (condition): fun(s, m)
else fun(m, e)
fun(0, n-1)
fun(0, n-1)
an other way to thin about is that doing a single binary search leads to O(lg(n)) why should two binary searches done after each other lead to O(n).
Even your prove might be wrong, I think it's important to remember interviewer and interviewee are humans with their own background in terms of professional experience, education and personal, cultural background. So whatever is said, is said from an implicit context, which is a bias. It can be, the person really thought he helped you, even if it wasn't the case for you since you didn't have his context...
If you know that the array it is sorted you can reduce the memory cost to O(1). Here is an implementation in python:
def length_of_repeated_elements(s):
l = 0
for x in xrange(len(s) - 1):
y = x
while s[y] == s[y + 1]:
l += 1
y += 1
if x != y:
break
return l
To see if we can do better than linear cost let's assume that we know the repeated value and a random position in the array that it occurs. In that situation we can perform a binary search towards both sides to get the lower and upper limits of the repeated sequence.
For example the array [1,2,3,4,4,4,7], the repeated value 4, and the position 5
We could perform from both sides a binary search to reach the limits. In this case 3 and 5.
Here is a python code that solves this relaxed problem.
def length_of_repeated_elements(s, value, position):
if s[0] == s[-1]:
return len(s)
high = position
low = 0
while (low < high):
middle = (low + high) / 2
print "Low", low, high, middle
if s[middle] == value:
if s[middle-1] != value:
break
else:
high = middle
else:
if low == middle:
middle = high
break
else:
low = middle
first_element = middle
high = len(s)
low = position
while (low < high):
middle = (low + high) / 2
print "High", low, high, middle
if s[middle] == value:
if middle == len(s) - 1 or s[middle+1] != value:
break
else:
low = middle
else:
high = middle
last_element = middle
return first_element, last_element
The cost is 2 log(n).
Now to answer the question about how to find the repeated value and the position the answer is similar we can perform the binary search until we hit a random position with element as we have shown using the previous code it doesn't matter which position we find.
def find_max_repeated_elem(a){
m = mset(a)
#(min,max) = minmax(m){ $.l.value < $.r.value }
max.value
}
def find_max_repeated_elem_sorted(a){
p = { 'item': a[0] , 'count' : 0, 'max' : 0 }
p = fold ( a , p ) -> {
if( $.o == $.p.item ){
$.p.count += 1
} else {
if ( $.p.count > $.p.max ){
$.p.max = $.p.count
}
$.p.count = 1
$.p.item = $.o
}
$.p // return
}
p.item
}
First Case: Array Not Sorted. We iterate through the input. We keep track of previously encountered elements in an unordered_set. Once we find a previously encountered element we stop searching and count the number of remaining elements in the input. Space complexity: O(n); (If we never find a match we will store all the elements in the unordered_set). Time complexity: O(n).
template<typename iter>
std::size_t
lengthOfRepeatedSequence(iter first, iter last)
{
using value_t = std::iterator_traits<iter>::value;
std::unordered_set<value_t> previously_encountered;
std::size_t result = 0;
value_t repeated_elem = value_t();
bool notFound = true;
for(; first != last && notFound; ++first){
const auto term = *first;
if(1 == previously_encountered.count(term)){
notFound = false;
repeated_elem = *term;
} else {
previously_encountered.insert(term);
}
}
if(!notFound){
assert((first != last));
result = 1 + std::count(first,last,repeated_elem);
}
return result;
}
If the array is already sorted we can cut down on the space complexity because repeated elements will "live" next to each other. (I don't think) we can cut down on the time complexity because we need to examine every element in the list to conclude a repeated element doesn't exist. If the input has n elements and we only examine $n-1$ of them, it is possible that the element we neglected is part of the the only pair of repeated elements.
template<typename iter>
std::size_t
lengthOfRepeatedSequenceSorted(iter first, iter last)
{
assert(std::is_sorted(first,last) );
bool notFound = true;
std::size_t result = 0;
while(first != last && notFound){
const auto val = *first;
int count = 0;
while(first != last && *first == val){
++count;
++first;
}
if(count != 1){
notFound = false;
result = count;
}
}
return count;
}
actually, if it is sorted and if max-diference of two adjacent elements is 1, then the length of the repeated sequence is n-1-(array[n-1]-array[0]). If you want to know the elements value, you can binary search i:
- Chris May 29, 2017