Google Interview Question
Software Engineer / DevelopersCountry: India
Interview Type: In-Person
Sorry, there was a small bug.
Integer s1 = getSubsetSize(i-1, x-n0s,y-n1s) + 1;
Integer s2 = getSubsetSize(i-1, x, y);
r = Math.max(s1,s2);
should be
Integer s1 = getSubsetSize(i-1, x-n0s,y-n1s) + 1;
Integer s2 = getSubsetSize(i-1, x, y);
if (s1 == 0) s1 = -1;
r = Math.max(s1,s2);
I am think this hashmap won't work in the way you want since you didn't override the .hashcode() and .equals() method in the array object, hence h.get(...) will always return null...
This code has a bug that you didn't keep track the ones of in the array used. Like the example from the question, if we are looking for x=3, y=2. We want to find indexes 0, 1, 2 from the array. If you didn't keep track the one used, you may run into 0, 0, 2.
Hey, seeing your code, I got confused. Please clarify. Are you just making combinations of the String in the set?
#include <iostream>
using namespace std;
int main()
{
int f[100][100];
int X = 3;
int Y = 2;
memset(f, 0, sizeof(f));
const int aSize = 4;
string a[aSize] = {"01", "10", "0", "110"};
for(int i = 0; i < aSize; i++)
{
int zeroNum = 0;
int oneNum = 0;
for(int j = 0; j < a[i].size(); j++)
if (a[i][j] == '1')
oneNum++;
else
zeroNum++;
for(int x = X; x >= 0; x--)
for(int y = Y; y >= 0; y--)
{
int preX = x - zeroNum;
int preY = y - oneNum;
if (preX >= 0 && preY >= 0 && (f[preX][preY] > 0 || (preX == 0 && preY == 0)))
f[x][y] = max(f[x][y], f[preX][preY] + 1);
}
}
cout << f[X][Y] << endl;
}
dp[i][j] = j - i + 1 ; // if after concatination of these i to j elements the required no. of zeros and 1's are found.
or
dp[i][j] = max_int ; // if not found.
dp[i][j] = min(dp[i][j-1] ,dp[i+1][j], dp[i][j])
There are a couple of issues here. Firstly it should be max and not min. Secondly, the elements in the subset may or may not be contiguous.
What if the given array is like {"01", "10", "110", "0"}? Does your solution still returns 3?
Sort the given array, in increasing no. of symbols each string contain.
like array[] = {"01", "10", "0", "110"} will become after sorting
array[] = {"0","01", "10", "110"}
int ans=0; // this will be the answer
Now for i=0 to n-1(array length)
a= No. of 0's in array[i];
b= No. of 1's in array[i];
if(a<=x&&b<=y){
ans++;//this element will be in subset
x-=a; y-=b;
}
<pre lang="" line="1" title="CodeMonkey56585" class="run-this">typedef struct node
{
int zeroNums;
int oneNums;
int index;
}node;
int maxSubArrNum(node *arr, int nLength,int X,int Y)
{
if (NULL == arr || nLength <= 0)
return -1;
if (X < 0 || Y < 0)
return -1;
if (0 == X && 0 == Y)
return 0;
if (X == arr[0].zeroNums && Y == arr[0].oneNums)
return 1;
int max = -1;
for (int i = 1;i < nLength;i++)
{
int tmp = maxSubArrNum(arr+i,nLength-i,X-arr[0].zeroNums,Y-arr[0].oneNums);
if (tmp >= 0 && tmp > max) max = tmp;
}
if (max != -1)
max++;
return max;
}
int getArrResults(char **arr, int nLength, int X, int Y)
{
if (NULL == arr)
return -1;
node *nodeArr = (node*)malloc(sizeof(node)*nLength);
char *pstr = NULL;
for (int i = 0;i < nLength;i++)
{
pstr = arr[i];
nodeArr[i].oneNums = nodeArr[i].zeroNums = 0;
nodeArr[i].index = i;
while (*pstr)
*pstr++ == '0' ? nodeArr[i].zeroNums++:nodeArr[i].oneNums++;
}
int max = 0;
for (int i = 0;i < nLength;i++)
{
int tmp = maxSubArrNum(nodeArr+i,nLength-i,X,Y);
if (tmp > max) max = tmp;
}
return max;
}</pre><pre title="CodeMonkey56585" input="yes">
</pre>
It's a typical DP problem. Similar to Knapsack. Here's the logic. The recursion equation is like this:
1. Assume ai0/ai1 is the number of 0's/1's in string array[i].
2. Define s[i,j,k] is the max # of elements in array from element 0 to i which contain j 0's and k 1's. 0<=i<=N, 0<=j<=X, 1<=k<=Y
3. Initialize s[i,ai0,ai1]=1
4. s[i,j,k]=s[i-1,j,k](if j<ai0 or k<ai1) or
s[i,j,k]=max(s[i-1,j,k],s[i-1,j-ai0,k-ai1]) (if j>=ai0 and k>=ai1 and s[i-1,j-ai0,k-ai1]>0)
The time complexity is O(N*X*Y) and Space could be O(X*Y) if we only need to return the number of elements.
The recurrence relation should be
dp[i][j] = max number of strings such that when concatenated, they contain i Xs and j Ys.
Then
initialize dp[][] with -1
dp[0][0] = 0;
for(i = 0 to 3) {
for(j = 0 to 2) {
for(all strings as string){
nX = countX(string);
nY = countY(string);
if(nX >=i && nY >= j)
dp[i][j] = max(dp[i-nX][j-nY]+1,dp[i][j]);
}
}
}
dp[3][2] should be yor answer.
int maxNumber(vector<string> arr, int X, int Y) {
int dp[arr.size()][X][Y];
int ai0[arr.size()], ai1[arr.size()];
for(int i = 0; i <= X; i++) {
for(int j = 0; j <= Y; j++) {
dp[0][i][j] = 0;
}
}
for(int i = 0; i < arr.size(); i++) {
ai1[i] = 0;
ai0[i] = 0;
for(int j = 0; j < arr[i].length(); j++) {
if(arr[i][j] == '1')
ai1[i]++;
else
ai0[i]++;
}
}
if(ai0[0] <= X && ai1[0] <= Y)
dp[0][ai0[0]][ai1[0]] = 1;
for(int i = 1; i < arr.size(); i++) {
for(int x = 0; x <= X; x++) {
for(int y = 0; y <= Y; y++) {
dp[i][x][y] = dp[i-1][x][y];
if(x-ai0[i] >= 0 && y-ai1[i] >=0) {
if(dp[i-1][x-ai0[i]][y-ai1[i]] != 0)
dp[i][x][y] = max(dp[i][x][y], dp[i-1][x-ai0[i]][y-ai1[i]] + 1);
else {
if(x == ai0[i] && y == ai1[i])
dp[i][x][y] = max(dp[i][x][y], 1);
}
}
}
}
}
cout << dp[arr.size() - 1][X][Y] << endl;
}
int maxNumber(vector<string> arr, int X, int Y) {
int dp[arr.size()][X][Y];
int ai0[arr.size()], ai1[arr.size()];
for(int i = 0; i <= X; i++) {
for(int j = 0; j <= Y; j++) {
dp[0][i][j] = 0;
}
}
for(int i = 0; i < arr.size(); i++) {
ai1[i] = 0;
ai0[i] = 0;
for(int j = 0; j < arr[i].length(); j++) {
if(arr[i][j] == '1')
ai1[i]++;
else
ai0[i]++;
}
}
if(ai0[0] <= X && ai1[0] <= Y)
dp[0][ai0[0]][ai1[0]] = 1;
for(int i = 1; i < arr.size(); i++) {
for(int x = 0; x <= X; x++) {
for(int y = 0; y <= Y; y++) {
dp[i][x][y] = dp[i-1][x][y];
if(x-ai0[i] >= 0 && y-ai1[i] >=0) {
if(dp[i-1][x-ai0[i]][y-ai1[i]] != 0)
dp[i][x][y] = max(dp[i][x][y], dp[i-1][x-ai0[i]][y-ai1[i]] + 1);
else {
if(x == ai0[i] && y == ai1[i])
dp[i][x][y] = max(dp[i][x][y], 1);
}
}
}
}
}
cout << dp[arr.size() - 1][X][Y] << endl;
}
I use dp to solve it. N represent the index of string, X represent the number of 0 and Y the number of 1. Here is the code. It passes several cases. Please tell me if you find it wrong.
#include <iostream>
#include <vector>
#include <climits>
using namespace std;
int minumum_numnber01(vector<string> strs, int X, int Y)
{
int N=strs.size();
int dp[N+1][X+1][Y+1];
for(int i=0; i<=N; i++)
{
for(int j=0; j<=X; j++)
for(int k=0; k<=Y; k++)
dp[i][j][k]=INT_MAX/2;
dp[i][0][0]=0;
}
for(int i=1; i<=N; i++)
for(int j=1; j<=X; j++)
for(int k=1; k<=Y; k++)
{
int tx=0, ty=0;
for(auto ele:strs[i-1])
{
if(ele=='0') tx++;
if(ele=='1') ty++;
}
dp[i][j][k]=dp[i-1][j][k];
if(j-tx>=0 && k-ty>=0)
dp[i][j][k]=min(dp[i-1][j][k], dp[i-1][j-tx][k-ty]+1);
}
cout<<dp[1][2][1]<<endl;
return dp[N][X][Y];
}
int main()
{
vector<string> strs{"001","01","0001","00011"};
cout<<minumum_numnber01(strs, 6, 4)<<endl;
return 0;
}
typedef struct node
{
int nums0;
int nums1;
int index;
};
struct node means node.index string has node.nums0 0 and node.nums1 1.
so the question is equal to find node[i],node[j]..., that node[i].nums0
+node[j].nums0+... = x,node[i].nums1+node[j].nums1+... = y.
let SubSumEle(nodeArr, Arrlength,x,y) is result, the max result is
max(SubSumEle(nodeArr+0, Arrlength-0,x,y) ,SubSumEle(nodeArr+1, Arrlength-1,x,y) ,...)
It is NP-complete.
For simplicity lets assume that the number of 0's or 1's is less than 100.
Then we can encode each string as y*100 + x where y is the # of 1's and x of 0's.
eg. the array {"01", "10", "0", "110"} becomes {101,101,1,201} and for X=3, Y=2 we get 203.
Now the problem is reduced to subset sum problem which is NP-complete.
You have changed the encoding. If the input is unary (like in this case) there is a linear time algorithm for subset sum, based on dynamic programming. Search the web for Strongly NP-Complete and pseudo polynomial time algorithm.
So as given, the problem is not NP-Complete (or to be more precise, NP-Hard, as it is not a decision problem).
@anonymous: I think you are wrong here. No linear time algorithm exists for subset sum. It calls pseudo-polynomial time. Look up in wiki.
@anonymous2: Read properly: If the input is unary, then there is a polynomial time algorithm (linear might not be completely accurate) for subset sum.
i.e if array is {2,5,7} it is given as {"11", "11111", "1111111"}. Here size of the input is 14.
Check this out: cs.toronto.edu/~pmccabe/csc363-2005S/notes17.pdf
Look for UNARY-SUBSET-SUM.
I concur with you. Thanks for introducing the fancy term "UNARY-SUBSET-SUM". As it shows the complexity is O(nt) where n = # of items, and t = size of input (represented in unary system).
So, for the given problem, how could we derive the solution? What would be the complexity of the DP solution?
It is not fancy. All it says is the input representation matters when deciding complexity (especially so when taking about NP-Completeness).
See the first pages of pdf for the DP solution.
I termed it "fancy" because it doesn't convince me to tell that generalized SUBSET SUM problem belongs to P, even corresponding UNARY-SUBSET-SUM does. What the point of thinking the input in UNARY system (where we formulate I/O in binary) and thus claiming that UNARY-SUBSET-SUM is solvable in polynomial time, when the complexity is O(nt) - same complexity of solving generalized subset sum. The only difference in UNARY-SUBSET-SUM formulation is t = "size" of input, whereas in generalized SUBSET-SUM t depends on "value" of input (and thus make the running time pseudo-polynomial in later case).
Now for the given problem, how could you encode it in UNARY? I can think about DP that solve it in O(nXY) where X = # of 0s, Y = # of 1s. Any faster method there?
@anaonymous2: You are missing the point entirely. UNARY-SUBSET-SUM is in P. The 'normal' SUBSET-SUM is NP-Complete.
As to the original interview question: it is ALREADY IN UNARY! How much time will it take to even read the whole input array? Isn't it Theta(t)?
If the encoding was different (eg you are given number of zeroes and ones instead of the whole string), it would be O(n + log t) (or something like that). Exponential difference.
For instance, say n was 5 and the five strings had around 2^(32) zeroes and ones each. This is like 40GB of data!
If instead the encoding was number of zeroes and ones, you would not require more than 320 bits!
By doing brute force in the second scenario (320 bits), you can quickly solve subset sum (as n=5).
But in the first scenario, you cannot avoid reading 40GB of data!
So saying there is no difference (rephrasing you) is incorrect.
And you are missing the context of the comments. Some guy comes in and claims the original problem is NP-Complete, which is _incorrect_ as the input is in _unary_. The comment was to point out that mistake.
This solution seems to work, however I think the time complexity is O(X*Y*L) where L is the number of characters in the array of strings:
The recurrence relation is subsetSize(i,x,y) = max(subsetSize(i-1,x,y), subsetSize(i-1,x-number of 0s in arr[i], y-number of 1s in array[i]))
- Anonymous September 21, 2011Each sub-problem takes O(l) where l is the length of array[i]. There are n*X*Y subproblems so the entire solution runs in O(n*X*Y*l). Since the number of individual characters in the array is equal to n*l, the solution runs in O(X*Y*L) which seems to be pseudo-polynomial due to X and Y not being given in unary encoding.