Amazon Interview Question
Software Engineer / DevelopersCountry: United States
Interview Type: In-Person
How does this reduce to longest repeated substring problem? I could have a sub-array of size 10 that repeats 3 times and a sub-array of size 4 that repeats 20 times. If these are the only candidates, the sub-array of size 4 is the answer.
@Epic_coder
You are right, the statement that it reduces to finding the longest repeated substring is misleading. I am editing my post. Thanks for the correction.
However, the technique of building the suffix array still holds, because, it is going to make life easier in terms of matching subarrays. The other variables - frequency and length - as I mentioned helps in finding the solution.
@Dumbo: Is it the same approach as I mentioned below?I don't know much about sufix array.
@aka
Suffix array is actually a 2D array. The suffix array for the given array {4,5,6,8,3,1,4,5,6,3,1} would be as below. Here, each element of the array itself is an array.
{4,5,6,8,3,1,4,5,6,3,1}
{5,6,8,3,1,4,5,6,3,1}
{6,8,3,1,4,5,6,3,1}
{8,3,1,4,5,6,3,1}
{3,1,4,5,6,3,1}
{1,4,5,6,3,1}
{4,5,6,3,1}
{5,6,3,1}
{6,3,1}
{3,1}
{1}
After sorting the suffix array, you'd get:
{8,3,1,4,5,6,3,1}
{6,8,3,1,4,5,6,3,1}
{6,3,1}
{5,6,8,3,1,4,5,6,3,1}
{5,6,3,1}
{4,5,6,8,3,1,4,5,6,3,1}
{4,5,6,3,1}
{3,1,4,5,6,3,1}
{3,1}
{1,4,5,6,3,1}
{1}
Checking for matching subarrays is easily done in a suffix array by comparing the prefixes. If you traverse the above sorted array and compare adjacent elements for similarity you'd see the prefix [4,5,6] is occurring maximum number(=2) of times and is also of maximum length. There are other subarrays as well, like [6], [5,6],[3,1] and [1] that are occurring 2 times, but they are shorter than the subarray [4,5,6], which is our required answer. HTH.
A Dumbo' s implementation can be found on sites.google.com/site/spaceofjameschen/annnocements/findthemostlongestormostfrequentlongestsubarray--amazon
@Dumbo
Can you throw some light on why you chose to use suffix array and not suffix tree? I believe creation of suffix array is at least as complex as creation of suffix tree. Using suffix tree we just have to find the deepest internal node than has at least n children, where n is the frequency of the most frequent character in the array.
@Epic_coder.
Yeah, the problem could be solved by Suffix tree as well. I believe that using suffix array eases the implementation.
@vankasundeep.
The complexity I believe is O(n^2lgn) due to the sorting step.
@vankasundeep O(nlogn).
@Dumbo can u please explain for what type of problems shall we use suffix array
and cant we do it using dynamic programing
@mani 4m sklm
Suffix arrays are usually used to solve problems that involve dealing with contiguous subarray/substrings etc.
DP on the other hand is particularly used to solve optimization(maximization or minimization) problems. Also, the given optimization problem has to have the following properties in order for DP to be applied.
1. Optimal substructure (Optimal solution to the problem is constructed from optimal solutions to subproblems)
2. Overlapping subproblems
I am unable to clearly see the above 2 properties in the given problem and so resorted to using a suffix array. You can give it a shot using DP, though, if you deem it fit to be applied.
@Epic_coder
I think the sorting step below can get a bit more expensive.
>> 2.Arranging/Sorting suffixes lexicographically using the suffix array << O(n)
Any comparison-based sorting of n elements has a lower bound of big-Omega(nlgn). n*lgn comparisons is again based on the premise that comparison between two elements takes constant time.
In the case of a suffix array, there are n elements, but the issue here is each element itself is an array. In order to compare 2 elements(here, 2 arrays), you need to check n numbers on avg to determine whether one element is larger or smaller to the other. In other words, the comparison between 2 array elements in a suffix array is not constant, it takes n comparisons on avg. Hence the total complexity would be (n^2 * lgn) in the sorting step.
{4,5,6,8,3,1,4,5,6,3,1}
start putting elements in the array:
array = {4,5,6,8,3,1}
Once you encounter same element again which is already present in the array then increment the count of the element i.e. in this case for 4.
Once you again find some element, in this case 5 then again increment the count again and do this for others.
In the end we will have something as below:
{element(count)}
{4(1),5(1),6(1),8(0),3(1),1(1),4(0),5(0),6(0),3(0),1(0)}
Count array will be as below:
{1,1,1,0,1,1,0,0,0,0,0}
Now all we need to find out is the longest 1's or greater than '1' in the count array.
I don't know if this will pass all tests but counter tests are most welcome
It won't.
consider
{1,2,3,4,4,1,3,2,1,2}
according to you, count array should be
{1,1,1,1,0,0,0,0,0,0}
which yields
{1,2,3,4}
as answer, but correct answer should be
{1,2}
correct answer for {1,2,3,4,4,1,3,2,1,2} is {1} or {2}, they appear 3 times but {1,2} only 2
Find the frequency of each number. Consider the most frequent numbers.
We need two consider 2 cases to get ans:
Case 1) There is only one most frequent number ( in this case ans is most frequent no.)
say given array is {4,5,7,4, 7, 4,}.
most frequent number is {4}.
In this case ans is [4] since we can not create subarray which is more frequent then subarray[4].
case 2) more than one most frequent number
ex. given array is [4, 6, 5, 5, 4]
In this case most frequent nos are [4,5].
Possible ans could be [4,5] or [4/5] or [5,4]
If each occurrence of 4 (and 5) has next number 5 (and 4) then ans is [4,5] ( in second case [5,4]) and if it doesn't then again ans is only one number array i.e either [4] or [5]
Here is a different solution using one-pass through the array and a trailing back-pointer. A Map is used to keep track of the number of occurrences of any particular sub-array. Since an individual integer will always be equal to the greatest number of sub-arrays, if we find an individual integer that is greater than the sub-array we are looking at - move the back-pointer forward and don't consider cases behind it.
public static int[] mostFrequentSubArray(int[] array) {
Map<String, Integer> arrayCounts = new HashMap<String, Integer>();
int[] mostFrequestSubArray = new int[0];
int mostFrequestCount = 0;
int arrayStartPointer = 0;
for (int i=0; i < array.length; i++) {
int initialCount = 0;
// add subarrays to backpointer
for (int j=i; j >= arrayStartPointer; j--) {
// would nice to have a better way of doing this rather than copying the array
int[] subarray = Arrays.copyOfRange(array, j, i+1);
// also, wish I had a better key - the int array itself cannot be used as a key
String subArrayKey = Arrays.toString(subarray);
Integer subArrayCount = arrayCounts.get(subArrayKey);
if (subArrayCount == null) {
subArrayCount = 0;
}
subArrayCount++;
if (subarray.length == 1) {
initialCount = subArrayCount;
}
arrayCounts.put(subArrayKey, subArrayCount);
if (subArrayCount < initialCount) {
arrayStartPointer = j + 1;
} else {
if (initialCount > mostFrequestCount || (initialCount == mostFrequestCount && subarray.length > mostFrequestSubArray.length)) {
mostFrequestCount = subArrayCount;
mostFrequestSubArray = subarray;
}
}
}
}
return mostFrequestSubArray;
}
var input = [4, 5, 6, 8, 3, 1, 4, 5, 6, 3, 1];
var result = [];
for (var cursor = 0; cursor < input.length; cursor++) {
for (var matchCursor = cursor + 1; matchCursor < input.length; matchCursor++) {
var item = [];
for (var index = 0; index < input.length - matchCursor; index++) {
if (input[cursor + index] == input[matchCursor + index]) {
item.push(input[cursor + index]);
} else {
break;
}
}
if (result.length < item.length) {
result = item;
}
}
}
Here is an inplace algorithm that works in roughly O(n^2)
public class LongPat {
public static void main(String args[]) {
int arr[] = { 3, 5, 1, 2, 7, 0, 4, 5, 6, 8, 3, 1, 4, 5 };
int result[] = findLong(arr);
System.out.println();
for (int i = 0; i < result[2]; i++)
System.out.print(arr[result[0] + i] + " ");
}
public static int[] findLong(int arr[]) {
if (arr == null || arr.length < 2)
return null;
int result[] = new int[3];
// locate and expand
int len = 0, maxlen;
int biglen = 0;
for (int i = 0; i < arr.length - 1; i++) {
for (int j = i + 1; j < arr.length; j++) {
// match found then expand
maxlen = (j - i < arr.length - j) ? (j - i) : (arr.length - j);
if (maxlen > biglen) {
for (len = 0; len < maxlen; len++)
if (arr[i + len] != arr[j + len])
break;
if (len > biglen) {
storeResult(result, i, j, len);
biglen = len;
}
}
}
}
return result;
}
public static void storeResult(int arr[], int stpos1, int stpos2, int length) {
arr[0] = stpos1;
arr[1] = stpos2;
arr[2] = length;
}
}
Here is an in place algorithm of complexity O(n^2).
The method FindLong returns the starting locations in result[0], result[1] and length of the string in result[2]
public class LongPat {
public static void main(String args[]) {
int arr[] = { 3, 5, 1, 2, 7, 0, 4, 5, 6, 8, 3, 1, 4, 5 };
int result[] = findLong(arr);
System.out.println();
for (int i = 0; i < result[2]; i++)
System.out.print(arr[result[0] + i] + " ");
}
public static int[] findLong(int arr[]) {
if (arr == null || arr.length < 2)
return null;
int result[] = new int[3];
// locate and expand
int len = 0, maxlen;
int biglen = 0;
for (int i = 0; i < arr.length - 1; i++) {
for (int j = i + 1; j < arr.length; j++) {
// match found then expand
maxlen = (j - i < arr.length - j) ? (j - i) : (arr.length - j);
if (maxlen > biglen) {
for (len = 0; len < maxlen; len++)
if (arr[i + len] != arr[j + len])
break;
if (len > biglen) {
storeResult(result, i, j, len);
biglen = len;
}
}
}
}
return result;
}
public static void storeResult(int arr[], int stpos1, int stpos2, int length) {
arr[0] = stpos1;
arr[1] = stpos2;
arr[2] = length;
}
}
If we take this example
For example: if input = {4,5,6,8,3,1,4,5,6,3,1}
Result: {4,5,6}
Algorithm
1.Take an master list of all the elements and iterate through the elements in the list
List l=new ArrayList();
add the elements {4,5,6,8,3,1,4,5,6,3,1} to l
l={4,5,6,8,3,1,4,5,6,3,1}
2.iterate through the elements and check if the element in the current iteration exists before the current current in the Master list
-->does 4 exist before position 0 in the list -->No
-->does 5 exist before position 1 in the list -->No
-->does 6 exist before position 2 in the list -->No
-->does 8 exist before position 3 in the list -->No
-->does 3 exist before position 4 in the list -->No
-->does 1 exist before position 5 in the list -->No
-->does 4 exist before position 6 in the list -->Yes
if yes add the element to a new list ,
l1={4}
if the next element also exists in the mater list in after the previous element in the sequence then add this element in the list
does 5 exist before position 7 in the list -->Yes
l1={4,5}
and keep on doing this till we get elements in the sequence existing in the master list .
l1={4,5,6}
when the an element that dosent exist in the master list is encountered , then stop adding the elements to the list
and a new list will be formed for and matches going forward
does 6 exist before position 8 in the list -->Yes
l2={6}
does 3 exist before position 9 in the list -->Yes
l3={3}
does 1 exist before position 10 in the list -->No
in the end see how many unique lists we have
l1={4,5,6}
l2={6}
l3={3}
and output is the longest one
{4,5,6}
1. start using n/2 as window size.
2. calculate a hash key = n[0] + n[1] * 2 + n[2] * 4 + n[3] * 8 + ....
3. when calculate next hash key, use previous one by:
next key = previous key - n[0] / 2 + n[w] * 2 ^ w where w is window size
4. check if hash key exists, increase count by 1.
5. Once done, check hash table if there is any key with count greater than 1; then stop and return result.
6. Otherwise, reduce window size by 1 and loop again
7. if window size is less than or equal to 1, stop and return no result.
1. start using n/2 as window size.
2. calculate a hash key = n[0] + n[1] * 2 + n[2] * 4 + n[3] * 8 + ....
3. when calculate next hash key, use previous one by:
next key = previous key - n[0] / 2 + n[w] * 2 ^ w where w is window size
4. check if hash key exists, increase count by 1.
5. Once done, check hash table if there is any key with count greater than 1; then stop and return result.
6. Otherwise, reduce window size by 1 and loop again
7. if window size is less than or equal to 1, stop and return no result.
Below code will take O(n^2) time without extra space except some variables....
I have testing it by many scenario, please give some scenario where it is not working.....
#include<stdio.h>
int main(void)
{
int len, arr[100];
int i, j, j1, start, end, st, ed, cnt;
int count =0;
printf("\nEnter the No of Elements : ");
scanf("%d", &len);
for(i=0; i<len; i++)
{
printf("\nEnter the Element at Array[%d] : ", i);
scanf("%d", &arr[i]);
}
for(j = len-1; j>=0; j--)
{
st=j; ed=j; cnt=0;
j1 = j;
for(i=0; i<j && j1<=len-1; i++)
{
if(arr[i] == arr[j1])
{
j1++; ed++; cnt++;
}
else if(j1 != j)
{
if(count < cnt)
{ count=cnt; start=st; end=ed-1; }
j1=j; st=j; ed=j; cnt=0;
}
}
if(count < cnt)
{count=cnt; start=st; end=ed-1; }
}
if(count > 0)
{
printf("\n Lenght of the Most Frequent non-empty subarray : %d", count);
printf("\n Array : ");
while(start <= end)
printf("\t%d", arr[start++]);
}
else
printf("\nNo Frequent non-empty subarray found.....\n");
return 1;
}
What if the input was {4,5,6,8,3,1,4,5,7,3,1}, here {4,5,6} is not repeating. instead both {3,1} and {4,5} are repeating. What should be the output in this case?
Very simple program to understand, but it takes more memory and processing.
package com.bala.logical;
import java.util.ArrayList;
import java.util.List;
public class RepitivieArray {
public static void main(String[] args) {
int[] array = new int[] { 1, 2, 3, 4, 5, 6, 2, 3, 4, 5, 9, 7, 8, 9, 1,
2, 7, 8, 9, 1, 2 };
List list = new ArrayList();
for (int i = 0; i < array.length; i++) {
int index = 2;
while (index <= array.length / 2 && (i + index) < array.length) {
int[] array1 = new int[index];
int[] array2 = new int[index];
int curIndex = 0;
while (curIndex < index) {
array1[curIndex] = array[i + curIndex];
curIndex++;
}
for (int k = index; k < array.length; k++) {
int len = array.length - k - i;
if (len < index)
break;
curIndex = 0;
while (curIndex < index) {
array2[curIndex] = array[i + k + curIndex];
curIndex++;
}
if (compareArrays(array1, array2)) {
if (list.size() > 0) {
int[] duplicateArray = (int[]) list.get(0);
if (array1.length > duplicateArray.length) {
list.clear();
list.add(array1);
}
} else {
list.add(array1);
}
}
}
index++;
}
}
for (int i = 0; i < list.size(); i++) {
for (int m = 0; m < ((int[]) list.get(i)).length; m++) {
System.out.print(((int[]) list.get(i))[m]);
}
System.out.println("");
}
}
public static boolean compareArrays(int[] a, int[] b) {
for (int j = 0; j < a.length; j++) {
if (a[j] != b[j])
return false;
}
return true;
}
}
#define SIZE(A) (sizeof(A)/sizeof(A[0]))
int MaxValue(int *a, int n)
{
int val=INT_MIN;
for(int i=0;i<n;i++)
{
if(a[i]>val)
val=a[i];
}
return (val+1);
}
void MaxRepeatedSequence(int *a, int n)
{
int i,j;
int flag=-1;
int max=INT_MIN;
int I,J;
int m=MaxValue(a,n);
int *t=new int[m];
memset(t,0,sizeof(int)*m);
for(i=0;i<n;i++)
{
if((a[i]==a[i+1]) || (a[i]+1==a[i+1]) || (a[i]==a[i-1]+1))
t[a[i]]++;
}
for(i=0;i<m;i++)
{
if(t[i]!=0 && t[i]>max)
{
max=t[i];
j=i;
while(t[j]==t[j+1])
j++;
I=i;
J=j;
i=j;
}
}
printf("\n\nMax Repeated Sequence:\t");
for(i=I;i<=J;i++)
printf("%d\t",i);
printf("\n");
delete [] t;
}
int main()
{
int a[]={4,5,6,8,3,1,4,5,6,3,1}//{4,5,6,7,8,3,1,2,3,1,2,4,1,2,5,1,2,6,1,2,3,4,5,6,7,8,9,3,1,2,3,1,2,3};
int n=SIZE(a);
MaxRepeatedSequence(a,n);
return 1;
}
Find each occurrence of number and then display the maximum occurred numbers as subarray.....
Consider {4,5,4,5}. In this case, the answer should be [4, 5] instead of just [4] or [5], because we want the longest possible.
thats what i am saying find each occurence of the number in the array , and print the maximum occured numbers ..
in the above case 4 and 5 has occured 2 times so print both..
in above example 4,5,6 has occured 3 times so print it
import java.util.HashMap;
import java.util.LinkedList;
import java.util.List;
import java.util.Map;
public class MaxLongestOccuranceFinder {
public void find(Integer[] integers) {
populateMap(integers);
findMaxLongest(integers);
}
private void findMaxLongest(Integer[] integers) {
for(int arrIndex = 0; arrIndex < integers.length;arrIndex++){
int arrVal = integers[arrIndex];
for(int listEntry:_map.get(arrVal)){
if(listEntry > arrIndex){
int a = arrIndex, b = listEntry;
while(a < integers.length && b < integers.length){
if(integers[a] != integers[b]){
break;
}else{
a++;b++;
}
}
if(a-arrIndex > length){
length = a-arrIndex;
index = arrIndex;
}
}
}
}
}
private void populateMap(Integer[] integers) {
int index = 0;
for(Integer i:integers){
if(_map.containsKey(i)){
_map.get(i).add(index);
}else{
List<Integer> l = new LinkedList<>();
l.add(index);
_map.put(i, l);
}
index++;
}
}
Map<Integer,List<Integer>> _map = new HashMap<>();
public int index = -1;
public int length = -1;
}
1. Build a suffix array and sort the array. Use 2 variables - one to maintain the length of the longest repeated sub array and the other to maintain the frequency.
- Murali Mohan June 29, 20132. Traverse the sorted array to find out the most occurring and longest repeated subarray and return it.