## Google Interview Question

Solutions Engineers**Country:**United States

**Interview Type:**In-Person

Don't think this works.

Since it only bumps

`result`

when it encounters a full set, it ignores duplicates along the way. Consider input a=[1, 2, 3, 4, 1, 2, 3, 4] and k=4. You have 2 of every number 1 to k so you can create all sequences of length 1 and 2, so the answer should be 3 and this function returns exactly that. But if you move the second 1 so you instead have a=[1, 1, 2, 3, 4, 2, 3, 4] and still k=4, then this function will pass over the duplicate 1 and only increment

`result`

once, returning 2 which is incorrect.

Seems like the best option is to have an array of length k where each index, i, contains the number of occurrences of i+1 in the array (so index 0 is the number of occurrences of 1 in the array, index 1 is for 2, etc.). Then you look for the min of this array and return min+1. Should be O(n) to build the array, and O(k) to find the min for a total of O(n + k) time and O(k) space.

You may be able to do better by using a min heap to keep track of the number of occurrences. If you have an array of length k where each index, i, contains the node in the heap for occurrences of i+1, then since you're only incrementing by 1 each time you should get amortized constant time operations in the heap. Then finding the min of the heap is of course constant and you'd just return min+1. Your heap and map would both take up O(k) space still but your time would drop to O(n).

@Flint The problem can be rephrased as such: your input is an integer K>0, and a sequence S = s1 s2 .. sn, 1<=si<=K, 1<=i<=n. Consider a set Q of all possible sequences q1 q2 .. qz, 1 <= qi <= K, z >= 1 that are not subsequences of S (that cannot be derived from S by deleting some or no elements without changing the order of the remaining elements). You need to find the length of the shortest sequence in Q.

@aonecoding provided a cool solution. It took me a while to convince myself that it actually works. Here is the sketch of a proof if you are interested:

1) If an input string S does not contain all K characters, then the answer is obviously 1, as any missing character forms a string that is not a substring of S

2) Otherwise, an input string S can be represented as a concatenation S1 S2 where

S1 is the *shortest* prefix of S that contains all K characters and (a possibly empty) suffix S2.

Now, if the suffix S2 contains all K characters it can also be represented as a concatenation of a shortest prefix that contains all K characters and (a possibly empty) suffix.

Repeating this process until we get a suffix that does not contain all K characters, an input string S can uniquely be represented as a concatenation of S1 S2 .. Sn where Si (i in [1..n-1]) is the shortest prefix of the rest of S (to be more precise, of the suffix of S produced by taking S1 S2 .. Si-1 prefix off) containing all K characters, and Sn - is the (possibly empty) suffix that does not contain all K characters (meaning some of K characters are not in Sn).

3) Since Si is the shortest prefix with all K characters, its last character occurs only once (otherwise this character could be omitted producing a shorter prefix with all K characters).

4) This is how to build a string that is guaranteed to not be a sub-sequence of S and also having minimum length among other strings having this property, meaning that any shorter string is necessarily a sub-sequence of S: s1 s2 .. sn, where si (i in [1..n-1]) is the last character of Si, and sn is any of the K characters not present in Sn.

First, let's prove that any shorter sequence q1 q2 .. qk, k<n is a sub-sequence of S. This part is obvious since qi is in Si, since Si contains all K characters (i in [1..k]).

Now, let's prove that the string s1 s2 .. sn is not a sub-sequence of S. Let's assume otherwise. Since S1 contains s1 only as the last character, either the whole s1 s2 .. sn is a subsequence of S2 .. Sn, or just s2 s3 .. sn is a subsequence of S2 .. Sn. Either way, s2 s3 .. sn is a subsequence of S2 S3 .. Sn. Now, since S2 contains s2 only as the last character, s3 .. sn is a subsequence of S3 .. Sn. By repeating this n-1 times we get that sn is a subsequence of Sn, but this cannot be true because sn was specifically chosen as one of the K characters not in Sn. We get a contradiction which means s1 s2 .. sn is not a subsequence of S.

Now, having proved that s1 .. sn is the minimal string that is not a subsequence of S, we see that in order to compute n we just need to break up the input string S into concatenations S1 .. Sn-1 Sn, as outlined in 2) and count the number of these. This is exactly what the algorithm is doing.

```
int shortestSubSequence(int[] arr, int k) {
Map<Integer, Integer> map = new TreeMap<Integer, Integer>();
for(int i=0; i<arr.length; i++) {
if(map.containsKey(arr[i])) {
map.put(arr[i], map.get(arr[i]) + 1);
} else {
map.put(arr[i], 1);
}
}
int result = k;
for(Integer key: map.keySet()) {
int val = map.get(key);
if(val < result) result = val;
}
return result + 1;
```

}

My solution:

```
int result = 0;
int start = 0;
HashSet<Integer> remaining = new HashSet<>();
while(start < arr.length){
for(int i = 1; i <= K; i++){
remaining.add(i);
}
while(start < arr.length && !remaining.isEmpty()){
if (remaining.contains(arr[start])) remaining.remove(arr[start]);
start++;
}
if(start == arr.length){
return remaining.isEmpty() ? result+1 : result;
}else{
result++;
}
}
return result;
```

My solution:

```
int result = 0;
int start = 0;
HashSet<Integer> remaining = new HashSet<>();
while(start < arr.length){
for(int i = 1; i <= K; i++){
remaining.add(i);
}
while(start < arr.length && !remaining.isEmpty()){
if (remaining.contains(arr[start])) remaining.remove(arr[start]);
start++;
}
if(start == arr.length){
return remaining.isEmpty() ? result+1 : result;
}else{
result++;
}
}
return result;
```

Seeing no one is talking about progressively how to approach the problem, i though to take the chance and present a way to approach the problem;

now lets take a example;

A = [4, 2, 1, 2, 3, 3, 2, 4, 1], K = 4 => X[1,2,3,4]

Brute Force:

Here what we are trying to do is basically try to find is there any subsequence of length (L) which present in X but not in A, right. What about if we just generate

all subsequence of lenght 1 to k and check does it present in A or not, the one which does not present is your answer;

Algo:

1. first check does all numbers are present in A or not ( L=1); if not return 1 othrewise go to step 2;

2. Start generating different subsequence of L = 2 ... k, Let say this subsequence name is Y

2.1 for each subsequence check LCS(A,Y)!=len(Y) [ we find the longest common subsequence of A and Y and if there is a subsequence in A of Y length then this is not your answer] otehrwise its your answer len(Y)

3. Keep doing it until you find a seq Y which follow the rule LCS(A,Y)!=len(Y) ; if no one then answer is -1;

Complexity:

Step 1: O(n)

Step 2: Generating all the subsequence of each length is big, which is in a nutshell generating all the subsequence, In k length array there are K^k subsequence.

Step 2.1 checking each in A, will take O(n*K)

Hence; O(K^k*n*K) = O(n*K^(k+1)) = O(n*K^k)

Bottle neck; K^k

Brute Force 2: with slight improvement

Instead blindly generating of every length subsequence from X, first find the longest common subsequence in LCS(A,X), why? this will gurentee to you that

this length subsequence is present in A and all the shorter than this also present in A (this is what LCS does).

Algo:

0. first check does all numbers are present in A or not ( L=1); if not return 1 othrewise go to step 1;

1. Find the LCS(A,X) = len

2. generate all the subsequence of length len from X, Say Y and keep matching them in A; LCS(A,Y)!=len(Y); then one don't match is your answer;

Complexity:

Step 1: O(n*k)

Step 2: Generating all the subsequence of len length is also big, which is in a nutshell generating all the subsequence of length (len), In k length array there are len^k subsequence.

Step 2.1 checking each in A, will take O(n*len)

Hence; O(n*k) + O(n*len*len^k) = O(n*len^(k+1))

This is just a slight improvement though the worst case is still O(n*K^k) [ i guess; correct me if i'm wrong]

Bottle neck; len^k

Bleed Force :

What we are doing wrong in above brute force is generating blindly checking all subsequence of length len in A, right? We can avoid them

1. first check does all numbers are present in A or not ( L=1); if not return 1 othrewise go to step 1;

2. Find the LCS(A,X) = len

3. Backtrack the LCS array and find out all the subsequence of length len and keep them in a set;

4. generate all the subsequence of length len from X, Say Y which is not in set and keep matching them in A; LCS(A,Y)!=len(Y); then one don't match is your answer;

Complexity:

Step 1: O(n*k)

Step 2: Generating all the subsequence of len length is also big, which is in a nutshell generating all the subsequence of length (len), In k length array there are len^k subsequence.

Step 2.1 checking each in A, will take O(n*len)

Hence; O(n*k) + O(n*len*(len^k)/fact(len)) [ simplify it :P ]

This is just a slight improvement though the worst case is still O(n*K^k) [ i guess; correct me if i'm wrong]

Still ; Bottle neck; len^k

Better solution:

If we observe carefully, what we are doing is just finding the subsequence of len size and checking them in A, which is nothing but marking out the bad possibility of subsequence of length len;

if we some how mark them in bit faster way, we can find our SCS is much faster way.

Algo:

1. Create an frequency array of size K+1; Y

2. count the frequency of each item of A and store them in the Y. [ this will tell all the subsequence of length present A of Y, each element tells the subsequence made by him from A ]

3. Find the minimum and return min+1; Since all the subsequence of len=min is present in A and all the subsequence of length > min is also present in A [ there could be some combination which is not present but we are intersted in smallest]

Complexity:

Step 2: O(n);

Step 3: O(k)

Complexity: O(n+k); yupee

Can we do more better? Yes we can (slightly)

Best Solution:

Better solution:

If we observe carefully, what we are doing is just finding the subsequence of len size and checking them in A, which is nothing but marking out the bad possibility of subsequence of length len;

if we some how mark them in bit faster way, we can find our SCS is much faster way.

Algo:

1. Create an min heap size K; heap; it contains two element in each node ( element, frequency) and heapify based on frequency; also keep a map for cross reference so that you can update the value in heap in constant time; Map<Integer, Node>

2. now, start doing step 2 of above algo, every time you increase the frequency of element (from 1...k) update it in heap too and heapify.

3. Find the minimum and return min+1; constant time

Complexity:

Step 2: O(n*logk) amortized;

Step 3: O(1)

Complexity: O(nlogk); amortized; yupee

Some code of last Algo;

```
package Java;
import java.util.*;
import java.util.stream.Collectors;
import java.util.stream.IntStream;
/**
* Author: Nitin Gupta(nitin.gupta@walmart.com)
* Date: 14/04/19
* Description:
* google-interview-questions7
* <p>
* Give an array A of n integers where 1 <= a[i] <= K.
* Find out the length of the shortest sequence that can be constructed out of numbers 1, 2, .. k that is NOT a subsequence of A.
* eg. A = [4, 2, 1, 2, 3, 3, 2, 4, 1], K = 4
* All single digits appears. Each of the 16 double digit sequences, (1,1), (1, 2), (1, 3), (1, 4), (2, 1), (2, 2) ... appears. Because (1, 1, 2) doesn't appear, return 3.
*/
public class ShortestSequence {
static class Node implements Comparable<Node> {
int item;
int freq;
public Node(int item, int freq) {
this.item = item;
this.freq = freq;
}
@Override
public int compareTo(Node o) {
return Integer.compare(this.freq, o.freq);
}
}
public static void main(String args[]) {
int array[] = {4, 2, 1, 2, 3, 3, 2, 4, 1};
test(array, 4);
test(array, 3); //does not follow the rule
int array2[] = {1, 2, 3, 4, 1, 2, 3, 4};
test(array2, 4);
int array3[] = {1, 1, 3, 4, 2, 2, 3, 4};
test(array3, 4);
int array4[] = {1, 1, 3, 2, 2, 2, 3, 3};
test(array4, 4); //should return 1
int array5[] = {1, 1, 2, 3, 4, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4}; //all till 4 length seq kicked out
test(array5, 4);
}
private static void test(int array[], int k) {
System.out.println(shortestSequence(array, k));
}
//O(n) + O(n*logk) = O(n*logk)
private static int shortestSequence(int[] array, int k) {
if (null == array)
return -1;
//O(n)
if (!ofSize1(array, k))
return 1;
if (!validateInput(array, k)) {
PriorityQueue<Node> pq = new PriorityQueue<>(k); //create a pq of size k only;
Map<Integer, Node> map = new HashMap<>();
for (int i = 0; i < array.length; i++) {
if (map.containsKey(array[i])) {
Node n = map.get(array[i]);
n.freq++; //since this same node is also present in pq, so pq will automatically hepify it self due to cross reference
} else {
Node n = new Node(array[i], 1);
map.put(array[i], n);
pq.offer(n);
}
}
if (pq.isEmpty())
return -1;
Node n = pq.poll();
return n.freq + 1;
}
return -1;
}
//O(n)
private static boolean ofSize1(int[] array, int k) {
IntStream stream = IntStream.range(1, k + 1);
Set<Integer> set = stream.boxed().collect(Collectors.toSet());
for (int i = 0; i < array.length; i++)
if (set.contains(array[i]))
set.remove(array[i]);
return set.isEmpty();
}
private static boolean validateInput(int[] array, int k) {
return Arrays.stream(array).filter(x -> !(x >= 1 && x <= k)).count() > 0;
}
}
```

My version:

```
int result = 0;
int start = 0;
HashSet<Integer> remaining = new HashSet<>();
while(start < arr.length){
for(int i = 1; i <= K; i++){
remaining.add(i);
}
while(start < arr.length && !remaining.isEmpty()){
if (remaining.contains(arr[start])) remaining.remove(arr[start]);
start++;
}
if(start == arr.length){
return remaining.isEmpty() ? result+1 : result;
}else{
result++;
}
}
return result;
```

Looking for interview experience sharing and coaching?

Visit AONECODE.COM for ONE-ON-ONE private lessons by FB, Google and Uber engineers!

SYSTEM DESIGN

ALGORITHMS (conquer DP, Greedy, Graph, Advanced Algorithms, Clean Coding),

latest interview questions sorted by companies,

mock interviews.

Get hired from G, U, FB, Amazon, LinkedIn, Yahoo and other top-tier companies after weeks of training.

Email us aonecoding@gmail.com with any questions. Thanks!

The optimal solution is a little tricky. Can you prove it?

Optimal Solution:

- aonecoding July 12, 2018