Amazon Interview Question
Country: United States
Number of sub-arrays having at most 2 different letters = number of arrays having nothing but a's and/or c's + number of arrays having nothing but b's and/or c's + number of arrays having nothing but a's and/or b's - number of arrays having only a's - number of arrays having only b's - number of arrays having only c's.
Why is that true? It's because we know arrays that have a and c are disjoint from those that have b and c and so on, if indeed they have both letters. But, some of the arrays we count in the "having nothing but a's and/or c's" category just have a's and do not have c's, for example. These are double-counted when we count all arrays consisting of nothing but a's and/or b's, in this example. Following similar reasoning for the other letters, we see that every subarray composed only of repetitions of the same letter is double-counted. We thus need to subtract the number of subarrays consisting of repetitions of a single letter, hence the formula above.
So how do we use this to make an O(n) algorithm? We scan the array for runs consisting of only a's and b's, and we note the length of such runs. A run of size k contributes k(k+1)/2 valid sub-arrays to the total. (We don't need to inspect all of them, we just increment a counter to know k and then calculate k(k+1)/2 once we hit the third letter.) We evaluate the sum of the number of subarrays for a&b, a&c, b&c, a alone, b alone, and c alone, and then apply the formula at the top of this answer. The runs of single letters can all be obtained in one pass, so we'll make a total of 4 passes over the data.
you need to avoid double counting. For example, aaaabbbbaaa, in your first parse, you find "aaaa", "bbbb", and "aaa". Then when you parse a&b, you find "aaaabbbbaaa". However you can't simply apply the k(k+1)/2 formula again. Otherwise, you double count the "aaaa", bbbb", and "aaa' portions and their substrings.
Did you read the entire part of my solution that deals with how I avoid this double counting (I subtract out a bunch of stuff to account for this)? Or did you read it and still have doubts about its correctness?
I edited the last sentence or two of the second paragraph to hopefully clarify my solution. I've now stated things in somewhat more precise terms.
Your method needs 2 fixes. First, you don't subtract the single-character series if it's in the "middle" of a 2-character series.
Second, you don't subtract a single-character series if it's the first and last in the input array.
(1) For an string like "caaabbbaaac", your method is to count(caaa) + count (aaabbbaaa) + count(aaac) - count(c) - count(aaa) - count(bbb) - count(aaa) - count (c), which is incorrect.
The correct calculation is count(caaa) + count (aaabbbaaa) + count(aaac) - count(c) - count(aaa) - count(aaa) - count (c), you don't need to subtract coutn(bbb) here.
It also applies to "aaabbbaaabbbaaabbb" and so on.
(2) You also don't need to subtract the first and last series of 1 character. If the input array is "aaabbbc.....", you don't subtract "aaa" here.
I think you misunderstand my method. You *do* need to subtract bbb in the example you gave, because it will be counted by both the b&c pass and the a&b pass.
I'm going to provide some code and you can test this to your satisfaction.
Anonymous, you said: "(1) For an string like "caaabbbaaac", your method is to count(caaa) + count (aaabbbaaa) + count(aaac) - count(c) - count(aaa) - count(bbb) - count(aaa) - count (c), which is incorrect."
That's not the right assessment of what this method computes. This method would compute count (aaabbbaaa) [a&b pass] + count(caaa) + count(aaac) [a&c pass] + count(c) + count(bbb) + count(c) [b&c pass] - count(c) - count(aaa) - count(bbb) - count(aaa) - count (c) [single letter sequence subtraction pass]. Canceling out like terms, this is count (aaabbbaaa) + count(caaa) + count(aaac) - count(aaa) - count (aaa).
This almost coincides with what you stated the correct answer should be, except that for some reason you additionally want to subtract out count(c) twice -- I'm not quite sure why. I think this is related to your second point, but I don't see the logic of it. I mean, for a case like "cac" we definitely want the answer to be 6, right?
When you said "number of arrays having only b and c", do you mean "an array contains b and c, which also includes an array only has b or only has c"?
I think what needs to be subtracted is the character which has been counted twice when you count the (a&b), (a&c),(b&c),
for the above example "caaabbbaaac", I think the answer should be count(caaa) + count(aaabbbaaa) + count(aaac) - count(aaa) - count(aaa), because "b" and "c" only appear once, so they don't need to be subtracted. what do you think ?
@Blizzard: the expression you give is correct...I lost a term when I was doing the math for that particular example. I've corrected the example above to match what you gave. The algorithm was and remains correct.
This question requires a Suffix Tree. A suffix tree for a string "abba" will be built with following strings: "abba","bba","ba" and "a". In my solution, I have built such a tree. Note that it does not count individual single character substrings - but you can easily count it with str.length. Once you have a suffix tree, you can traverse in level order to get the required strings. Please note that there is a complex algo that can build the tree in O(n) using automata but I have not used it (since expectation to know a specific automata is just too much in one algo question). Following is the complete code:
package org.algocode;
import java.util.ArrayDeque;
import java.util.Queue;
public class TernaryStringCombinator {
private static final int MAX_SIZE = 10000;
public static void main(String[] args) throws Exception {
// Read the input string.
String str = "abac";
if (str == null || str.length() > MAX_SIZE)
throw new Exception(
"Error! Input string is either null or greater than maximum allowed "
+ MAX_SIZE);
// Create the suffix tree
SuffixTree st = new SuffixTree(str);
st.levelOrderTraverse();
// You deduct one here because you don't want to count the root
// placeholder node.
System.out.println("Total nodes in tree " + (st.size() - 1));
// You deduct one here because you don't want to count the original
// string.
System.out.println("Total substrings with only one or two chars "
+ st.size2char());
}
/*
* A suffix tree is a tree of all possible contagious substrings of a
* string. It is a kind of a Trie. For example, for a given input string,
* ABBCD, the tree will store: ABBCD BBCD BCD CD D
*/
private static class SuffixTree {
private Node root;
private String s;
private int size;
private int size2char;
SuffixTree(String s_) throws Exception {
s = s_.toLowerCase();
size2char = s.length();
for (int i = 0; i < s.length(); i++)
insert(s.substring(i));
}
private void insert(String value) throws Exception {
if (root == null) {
root = new Node();
size++;
}
insert(root, value, 0);
}
// This is a recurrsive call to do the insertion
private void insert(Node node, String value, int index)
throws Exception {
Node next = null;
switch (value.charAt(index)) {
case 'a':
if (node.getA() == null)
createChildLink(node, value.charAt(index));
next = node.getA();
break;
case 'b':
if (node.getB() == null)
createChildLink(node, value.charAt(index));
next = node.getB();
break;
case 'c':
if (node.getC() == null)
createChildLink(node, value.charAt(index));
next = node.getC();
break;
default:
throw new Exception("Error! Character is not permitted. "
+ value);
}
if (index < (value.length() - 1)) {
insert(next, value.substring(index + 1), 0);
}
}
void levelOrderTraverse() {
if (root == null || root.isLeaf())
return;
Queue<Node> q = new ArrayDeque<Node>();
if (root.getA() != null)
q.add(root.getA());
if (root.getB() != null)
q.add(root.getB());
if (root.getC() != null)
q.add(root.getC());
while (!q.isEmpty()) {
Node n = (Node) q.poll();
// Only show if path has a color of 0 (two characters). Also the original
// string is not counted as a substring.
if (n.color() == 0) {
if (n.myPath().length() > 1 && !n.myPath().equalsIgnoreCase(s)) {
System.out.println("Two or more char path = "
+ n.myPath());
size2char++;
}
}
if (n.getA() != null)
q.add(n.getA());
if (n.getB() != null)
q.add(n.getB());
if (n.getC() != null)
q.add(n.getC());
}
}
Node root() {
return root;
}
int size() {
return size;
}
int size2char() {
return size2char;
}
private void createChildLink(Node parent, char childVal)
throws Exception {
Node child = new Node(parent, childVal);
size++;
switch (childVal) {
case 'a':
parent.setA(child);
break;
case 'b':
parent.setB(child);
break;
case 'c':
parent.setC(child);
break;
default:
throw new Exception("Error! Character is not permitted. "
+ childVal);
}
}
}
/*
* We will define an inner class to store a suffix tree node. Since it is a
* ternary string, we will have three child nodes of each string.
*/
private static class Node {
private Node parent;
private Node aLink;
private Node bLink;
private Node cLink;
private char value;
private String path;
// Color of a path. 0 if only one or two chars. 1 if all three are
// present in path.
private int color = 0;
Node(Node parent_, char value_) {
value = value_;
parent = parent_;
// Eagerly insert path
path = getPath();
// Eagerly calculate color. If my parent has a 1, then I will
// also be 1. If my parent has 0, then addition of myself can create
// my color to 0. This means that if my parent's path already has
// three
// characters, then I would be on a three character path.
if (parent.color() == 1)
color = 1;
else
colormyself();
}
Node() {
};
void setA(Node aLink_) {
this.aLink = aLink_;
}
void setB(Node bLink_) {
this.bLink = bLink_;
}
void setC(Node cLink_) {
this.cLink = cLink_;
}
Node getParent() {
return parent;
}
Node getA() {
return aLink;
}
Node getB() {
return bLink;
}
Node getC() {
return cLink;
}
char getValue() {
return value;
}
int color() {
return color;
}
// A special method to return combined string of parent
private String getPath() {
if (parent == null)
return null;
String path = parent.myPath();
if (path == null)
return String.valueOf(value);
StringBuilder stb = new StringBuilder();
stb.append(path);
stb.append(value);
return stb.toString();
}
String myPath() {
return path;
}
boolean isLeaf() {
return aLink == null && bLink == null && cLink == null;
}
private void colormyself() {
boolean sawA = false;
boolean sawB = false;
boolean sawC = false;
for (char c : path.toCharArray()) {
switch (c) {
case 'a':
sawA = true;
break;
case 'b':
sawB = true;
break;
case 'c':
sawC = true;
break;
}
if (sawA && sawB && sawC) {
color = 1;
break;
}
}
}
}
}
Time Complexity: O(n) Space Complexity: O(1)
public class Solution {
public static int count(int n) {
return n * (n + 1) / 2;
}
public static int solve(String s) {
char a = 0, b = 0;
int anum = 0, bnum = 0;
int result = 0;
int n = s.length();
int i = 0, j = 0;
while (j < n) {
char curr = s.charAt(j);
if (curr == a) {
anum++;
} else if (curr == b) {
bnum++;
} else if (a == 0) {
a = curr;
anum = 1;
} else if (b == 0) {
b = curr;
bnum = 1;
} else {
result += count(j - i);
char prev = s.charAt(j - 1);
if(prev == b){
while(anum > 0){
char ch = s.charAt(i);
if(ch == a){
anum --;
}else{
bnum --;
}
i ++;
}
a = curr;
anum = 1;
}else{ // prev == a
while(bnum > 0){
char ch = s.charAt(i);
if(ch == a){
anum --;
}else{
bnum --;
}
i ++;
}
b = curr;
bnum = 1;
}
result -= count(j - i);
}
j++;
}
result += count(j - i);
return result;
}
public static void main(String args[]) {
System.out.println(solve("aabc"));
System.out.println(solve("abc"));
System.out.println(solve("baaccb"));
System.out.println(solve("ababaaaabba"));
System.out.println(solve("aabbbaa"));
}
}
Algorithm
group the string by unique characters
aabc =>(aa)(b)(c)
the substrings are either formed by the characters within same group or adjacent groups
each group alone forms n * (n-1)/2 unique substring (the group contains n characters)
adjacent groups form n * m unique substring (the groups contains m and n characters respectively)
Example
aabc
(3 + 1 + 1) + (2*1 + 1* 1) = 8
abc
(1 + 1 + 1) + (1*1 + 1*1) = 5
baaccb
(1 + 3 + 3 + 1) + (1*2 + 2*2 + 2*1) = 16
f(1) = 1
f(2) = 3
f(3) = 6
...
f(n) = n + f(n-1) = n*(n-1)/2
"the substrings are either formed by the characters within same group or adjacent groups"
Not true. Consider something like ababaaaabba.
This doesn't seem like it could be correct, therefore.
a dynamic programming solution. O(n) time complexity.
def count_unique_strings(string):
if not string: return 0
# c1[i] denotes for string[:i +1 ], the number of unique string with one character
# c2[i] denotes for string[:i + 1], the number of unique string with two character
c1 = [0] * len(string)
c2 = [0] * len(string)
c1[0] = 1
for i in range(1, len(string)):
if string[i] == string[i - 1]:
c1[i] = c1[i - 1] + 1
c2[i] = c2[i - 1]
else:
c1[i] = 1
c2[i] = c1[i - 1]
return sum(c1) + sum(c2)
please give a minute to the else part:
i think
It should be
C2[i] = C2[i-1];
otherwise It produces incorrect result for aabc
@Pankaj. no, it is supposed to be C2[i] = C1[i-1], which means, if we find a char that is different from the last char, the total number of substrings which consists of two unique letters and ends at the current position equals to the total number of substrings which consists of one unique letter and ends at the last position.
My code outputs 9 for the input "aabc"
@eugene, not sure how you get this 78. but i don't think my code is incorrect. the following is the results of c1 and c2. please take a look and see if there is anything wrong.
c1: [1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2]
c2: [0, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]
38
@gnahzy: yes, I'm afraid it's wrong. For aabbaabbaabb, every subarray is valid, not just those that consist of first a's and then b's or b's and then a's. aabbbaab, for instance, is supposed to be counted too. You can see how I'm getting 78. For the first letter, there are 12 substrings starting with it, 11 starting with the second letter, and so on. 12 + 11 + 10 + ... + 1 = 12*(12+1)/2 = 78.
@eugene, yes... you are right. my code does not work case like 'aba'. made some modification to handle this case:
def count_unique_strings(string):
if not string: return 0
# c1[i] denotes the number of substrings
# which consists of one unique char and ends at string[i]
c1 = [0] * len(string)
# c2[i] denotes the number of substrings
# which consists of two unique chars and ends at string[i]
c2 = [0] * len(string)
# other[i] represents the other character
# in the two-char substrings which ends at string[i].
# so in the two-char substrings, one char is string[i],
# the other one is other[i]
other = [''] * len(string)
c1[0] = 1
for i in range(1, len(string)):
if string[i] == string[i - 1]:
c1[i] = c1[i - 1] + 1
c2[i] = c2[i - 1]
other[i] = other[i - 1]
else:
c1[i] = 1
# if the current char is the same as the other char
# for substrings ending at string[i-1], we have two methods to constructs
# two-char substring ending at string[i]:
# 1. use the one-char substring ending at string[i - 1] + the current character
# 2. use the two-char substring ending at string[i - 1]
if string[i] == other[i - 1]:
c2[i] = c1[i - 1] + c2[i - 1]
# otherwise, we only have one way to construct the two-char
# substrings ending at string[i - 1]
else:
c2[i] = c1[i - 1]
other[i] = string[i - 1]
print(c1)
print(c2)
return sum(c1) + sum(c2)
[1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2]
[0, 0, 2, 2, 4, 4, 6, 6, 8, 8, 10, 10]
78
Nope. I already had a testing framework for this, so I converted your code to Java (the test was in Java) and ran it. You can see some failed test cases here: ideone. com/4e7tLZ
@Eugene, interesting test framework. but actually you didn't implement my code correctly. that what i got:
>>> DP.count_unique_strings("acababbbbcbcaabaaacabcacbaabbc")
[1, 1, 1, 1, 1, 1, 2, 3, 4, 1, 1, 1, 1, 2, 1, 1, 2, 3, 1, 1, 1, 1, 1, 1, 1, 1, 2
, 1, 2, 1]
[0, 1, 2, 1, 2, 3, 3, 3, 3, 4, 5, 6, 1, 1, 2, 3, 3, 3, 3, 4, 1, 1, 1, 2, 1, 1, 1
, 3, 3, 2]
111
>>> DP.count_unique_strings("abbabaaaacccbccabbaabccabacbaacabcbaabbbbccbbcbacac
bcabcaccbbbaaacabacacabcbbaccaaacaaaabaccccbcccca")
[1, 1, 2, 1, 1, 1, 2, 3, 4, 1, 2, 3, 1, 1, 2, 1, 1, 2, 1, 2, 1, 1, 2, 1, 1, 1, 1
, 1, 1, 2, 1, 1, 1, 1, 1, 1, 2, 1, 2, 3, 4, 1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 2, 1, 2, 3, 1, 2, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 2,
1, 2, 3, 1, 1, 2, 3, 4, 1, 1, 1, 2, 3, 4, 1, 1, 2, 3, 4, 1]
[0, 1, 1, 3, 4, 5, 5, 5, 5, 4, 4, 4, 3, 4, 4, 2, 1, 1, 3, 3, 5, 1, 1, 2, 1, 2, 1
, 1, 1, 1, 2, 3, 1, 1, 2, 1, 1, 3, 3, 3, 3, 4, 4, 6, 6, 8, 9, 1, 1, 2, 3, 1, 2,
1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 4, 1, 2, 1, 2, 3, 4, 1, 1, 2, 2, 2, 1, 1,
3, 3, 3, 6, 7, 7, 7, 7, 4, 5, 1, 1, 1, 1, 4, 5, 5, 5, 5, 4]
439
@Eugene, found where your code goes wrong. in line 112, you should move "other[i] = input.charAt(i - 1);" out of the else block.
import java.util.Set;
import java.util.HashSet;
import java.util.Arrays;
public class Q14967736 {
public static int count(final String str) {
if (str == null || str.isEmpty()) {
return 0;
} else if (str.length() == 1) {
return 1;
}
String last = "";
int count = 0;
for (int index = 0; index < str.length(); index++) {
final String current = last + str.charAt(index);
if (uniqueCount(current) <= 2) {
count++;
last = current;
} else {
break;
}
}
return count + count(str.substring(1));
}
public static int uniqueCount(final String str) {
final Set<Character> chars = new HashSet<Character>();
for (char c : str.toCharArray()) {
chars.add(c);
}
return chars.size();
}
}
O(n) solution..
static int numberOfSubstrings(String input) {
int count = 0;
int mode = 3; //ternary
boolean[] isInSubString = new boolean[mode];
initArray(isInSubString);
int subStringStart = 0;
int[] lastIndexOfCharInSubString = new int[mode];
for(int i = 0; i < input.length(); i++){
boolean b = isInSubString[input.charAt(i) -'a'];
int j = input.charAt(i) - 'a';
if(b == false){
isInSubString[j] = true;
lastIndexOfCharInSubString[j] = i;
}else{
lastIndexOfCharInSubString[j] = i;
}
if (doAllCharactersInSubString(isInSubString)) {
if (subStringStart == 0) {
int n = i - subStringStart;
count += n * (n+1)/2;
}
char charAt = input.charAt(subStringStart);
while (!(lastIndexOfCharInSubString[charAt - 'a'] == subStringStart)) {
subStringStart++;
charAt = input.charAt(subStringStart);
}
char removedChar = input.charAt(subStringStart);
isInSubString[removedChar - 'a'] = false;
subStringStart = subStringStart + 1;
count += i - subStringStart + 1;
} else if (count > 0) {
count += i - subStringStart + 1;
}
}
return count;
}
private static boolean doAllCharactersInSubString(boolean[] isInSubString) {
for (boolean b : isInSubString) {
if(b == false)
return false;
}
return true;
}
private static void initArray(boolean[] isInSubString) {
isInSubString[0] = false;
isInSubString[1] = false;
isInSubString[2] = false;
}
// PHP version
// get all the substring of length 1;
// get all the substrings of lenght 2 and more
function numberOfSubstrings($A){
$strLen = strlen($A);
$initial_count = $strLen; // lenght 1
$sub_count =0;
// starting
for($i=0;$i<$strLen-1;$i++){
// variable length
for($j=2; $j<$strLen;$j++){
$str_2 = substr($A, $i, $j);
$strLen_2 = strlen($str_2);
if($strLen_2 < $j){break;}
if($j>2){
// check for char count.
$charCount = array();
for($k=0; $k<$strLen_2; $k++){
$charCount[$str_2[$k]]++;
}
$diff_chars = count($charCount);
if($diff_chars <= 2){
$sub_count++;
// echo $str_2.'--'.$i.'|';
}
}else{
$sub_count++;
// echo $str_2.'--'.$i.'|';
}
}
echo "<br>";
}
$totalCount = $initial_count + $sub_count;
return $totalCount;
}
We increment the result count under 3 cases:
1. Every time "distinct" single characters are found. (Ex.: a, b, c, d...)
2. Every time "distinct" double characters are found (Ex.: ab, bc, cd...)
3a. Every time characters are repeated after a character (Ex.: abb, bcc ...)
3b. Every time characters are repeated before a character (Ex.: aa, bb, cc...)
below is the code that hopefully works.
//main method
public static void main (String[] args) throws java.lang.Exception
{
String input="baaccb";
int result=findNumSubStrings(input);
System.out.println(result);
}
//wrapper
public static int findNumSubStrings(String input)
{
//base cases
if (input.length()==0)
return 0;
if (input.length()==1)
return 1;
return (FNSS(input, 0, "", 0));
}
//recursive function to find substrings
//input=original string input, index=currently "selected" index,
//temp=String built so far, distinctChar=number of Distinct characters in temp
public static int FNSS(String input, int index, String temp, int distinctChar)
{
int result=0;
//if we are at the end of the string and have something in temp, add 1
if (index==input.length())
{
if (distinctChar>0)
{
//System.out.println(temp);
return 1;
}
else
return 0;
}
//case 3a: if we have 2 distinct characters in temp, count self and increment result by 1 everytime we encounter same char afterwards
if (distinctChar==2)
{
++result;
//System.out.println(temp);
for (int i=index; i<input.length(); ++i)
{
if (temp.charAt(temp.length()-1)==input.charAt(i))
{
++result;
//temp+=input.charAt(i);
//System.out.println(temp);
}
else
break;
}
return result;
}
//initial split
if (temp.length()==0)
{
result+=FNSS(input, index+1, temp, distinctChar);
result+=FNSS(input, index+1, temp+input.charAt(index), distinctChar+1);
}
else
{
//case 1 or 3b: we have something in temp, increment count by 1
result+=FNSS(input, input.length(), temp, distinctChar);
//case 2: if we have 1 distinct character and next char is also distinct, add it to temp
if (temp.charAt(temp.length()-1)!=input.charAt(index) && distinctChar<2)
{
result+=FNSS(input, index+1, temp+input.charAt(index), distinctChar+1);
}
//case 3b: if we have 1 or 2 distinct character and next char is NOT distinct, add it to temp
else if (temp.charAt(temp.length()-1)==input.charAt(index)&& distinctChar<=2)
{
result+=FNSS(input, index+1, temp+input.charAt(index), distinctChar);
}
}
return result;
}
O(N^2) solution would be:
n=strlen(input string)
for i = 0 to <n-1:
From current index i, run j until [i,j] has atmost only 2 different characters. Add 1 to count every time this is satisfied.
Answer would be count[n-2]+n.
int main()
{
char * inp = "baaccb";
int n=strlen(inp);
int *a = new int[strlen(inp)];
for(int i=0;i<n;i++) a[i]=0;
for(int i=0;i<n-1;i++)
{
a[i]=1+(i>0?a[i-1]:0);
int j=i+2;
while(isOK(inp,i,j))
// isOK is a helper that returns true if the char* has only 1 or //different characters from a[i] to a[j]
{
a[i]+=1;
j++;
}
}
cout<<(a[n-2]+n)<<endl;
return 0;
}
- Anupam March 11, 2013