Amazon Interview Question
SDE1sCountry: United States
Interview Type: Phone Interview
We can DFS starting from some cell travelling until the DFS reaches a 0. We record the number of visited cells as we DFS and return this value. Let's call this DFS function F.
We now do F on every cell, and return the maximum value found from the Fs.
There are of course many ways of optimizing this, but this is the general idea.
We can think of this problem as a sort of graph coloring problem, where each "island" of 1s should have it's unique color, and also each "oceans" of 0s should have it's unique color.
Based on that,:
1. calculate each group or "island" of elements(those clustered together) of 1s, tagging each country with a number starting from 1 (color), and 0s too.
2. after all groups are calculated retrieve size of largest cluster group of 1s.
public static void solveCountNumCountries() {
int m[][]= {{1,1,0,0,0},{1,1,0,0,0},{1,0,0,1,1},{0,0,0,0,0},{1,0,1,0,1}};
System.out.println(countNumCountries(m));
}
public static int countNumCountries(int[][] m) {
int[][] country = new int[m.length][m[0].length];
for (int i = 0; i < country.length; i++) {
for (int j = 0; j < country[0].length; j++) {
country[i][j] = -1;
}
}
int ctyCount = 1;
for (int i = 0; i < m.length; i++) {
for (int j = 0; j < m[0].length; j++) {
if(country[i][j] == -1) {
expandBorders(m, country, i, j, m[i][j], ctyCount++);
}
}
}
int[] ctyCounts = new int[ctyCount];
for (int i = 0; i < ctyCounts.length; i++) {
ctyCounts[i] = 0;
}
int maxCount = 0;
for (int i = 0; i < m.length; i++) {
for (int j = 0; j < m[0].length; j++) {
ctyCounts[country[i][j] - 1]++;
if(m[i][j] == 1) {
maxCount = Math.max(maxCount, ctyCounts[country[i][j] - 1]);
}
}
}
return maxCount;
}
private static void expandBorders(int[][] m, int[][] c, int i, int j, int elemVal, int ctyCount) {
if (i >= 0 && i < m.length && j >= 0 && j < m[0].length) {
if(c[i][j] == - 1 && m[i][j] == elemVal) {
c[i][j] = ctyCount;
// check borders clockwise rotation
if (i >= 0 && i < m.length && j > 0 && j < m[0].length) {
expandBorders(m, c, i, j - 1, elemVal, ctyCount);
}
if (i > 0 && i < m.length && j > 0 && j < m[0].length) {
expandBorders(m, c, i-1, j-1, elemVal, ctyCount);
}
if (i > 0 && i < m.length && j >= 0 && j < m[0].length) {
expandBorders(m, c, i-1, j, elemVal, ctyCount);
}
if (i > 0 && i < m.length && j >= 0 && j < m[0].length-1) {
expandBorders(m, c, i-1, j+1, elemVal, ctyCount);
}
if (i >= 0 && i < m.length && j >= 0 && j < m[0].length-1) {
expandBorders(m, c, i, j+1, elemVal, ctyCount);
}
if (i >= 0 && i < m.length-1 && j >= 0 && j < m[0].length-1) {
expandBorders(m, c, i+1, j+1, elemVal, ctyCount);
}
if (i >= 0 && i < m.length-1 && j >= 0 && j < m[0].length) {
expandBorders(m, c, i+1, j, elemVal, ctyCount);
}
if (i >= 0 && i < m.length-1 && j > 0 && j < m[0].length) {
expandBorders(m, c, i+1, j-1, elemVal, ctyCount);
}
}
}
}
// output: 5
General idea:
- View matrix as a graph
- Have a maxSize variable to track the max cluster found so far
1. Search for next unvisited node n with value 1
2. Perform DFS or BFS starting at n where you only continue traversing if the current node is unvisited with a value 1. Keep incrementing a size variable as you traverse.
3. Update maxSize if needed
4. Go back to 1 until there are no more unvisited nodes with value 1
5. Return maxSize
This problem can be broken down into a few methods. I use an auxiliary boolean matrix d to track visited nodes and a Point class to pass around (x,y) pairs.
// Returns next point that contains an unvisited node with value 1
// As a small optimization, start search from x, y, searching row by row, left to right.
Point findNextAvailableCluster(int [][] m, boolean [][] d, int x, int y) {...}
// Perform DFS or BFS starting at (x,y) in m and return the size of the cluster
// Precondition is that (x,y) is an unvisited node with value 1
int getClusterSize(int [][] m, boolean [][] d, int x, int y) {...}
// Keep finding the next unvisited node with value 1
int findMaxClusterSize(int [][] m) {
if (m == null || m.length <= 0 || m[0].length <= 0) {
return 0;
}
int maxSize = 0;
boolean [][] d = new boolean[m.length][m[0].length];
Point p = findNextAvailableCluster(m, d, 0, 0);
while (p != null) {
int size = getClusterSize(m, d, p.x, p.y);
if (size > maxSize) {
maxSize = size;
}
p = findNextAvailableCluster(m, d, p.x, p.y);
}
return maxSize;
}
Note that the BFS/DFS only needs to traverse two 'children' at each step: (x+1,y) and (x,y+1)
This is b/c we are always going left->right, top->bottom when finding the next available cluster, and that (x+1,y+1) will be intrinsically searched when looking at one of the children.
I decided to tackle this problem a little differently and limit myself to iterative solutions only since others have already done recursive solutions. Here's my C# solution. (If the 2D array size is variable, this would be pretty nasty since it'd be O(n^4). But that wasn't specified, so I'm assuming constant array sizes here.)
public struct ClusterNode
{
public bool Checked { get; set; }
public int ClusterId { get; set; }
}
private const int MAX_X = 10;
private const int MAX_Y = 10;
private int LargestCluster(ClusterNode[,] clusterArray)
{
int result = 0;
List<int> clusterCount = new List<int>();
for (int x = 0; x < MAX_X; ++x)
{
for (int y = 0; y < MAX_Y; ++y)
{
if (clusterArray[x, y].Checked)
{
// Assign same cluster ID as checked node above
if (y > 0 && clusterArray[x, y - 1].Checked)
{
clusterArray[x, y].ClusterId = clusterArray[x, y - 1].ClusterId;
clusterCount[clusterArray[x, y].ClusterId]++;
// If node to the left is also checked but with a different cluster ID, merge them into one cluster.
if (x > 0 && clusterArray[x - 1, y].Checked
&& clusterArray[x - 1, y].ClusterId != clusterArray[x, y].ClusterId)
{
TakeOverBadCluster(clusterArray, clusterCount,
clusterArray[x - 1, y].ClusterId, clusterArray[x, y].ClusterId, x, y);
}
}
// Assign same cluster ID as checked node to the left
else if (x > 0 && clusterArray[x - 1, y].Checked)
{
clusterArray[x, y].ClusterId = clusterArray[x - 1, y].ClusterId;
clusterCount[clusterArray[x, y].ClusterId]++;
}
// Assign new cluster ID
else
{
clusterCount.Add(1);
clusterArray[x, y].ClusterId = clusterCount.Count - 1;
}
}
}
}
return clusterCount.Count > 0 ? clusterCount.Max() : 0;
}
private void TakeOverBadCluster(ClusterNode[,] clusterArray, List<int> clusterCount,
int badClusterId, int goodClusterId, int currentX, int currentY)
{
for (int x = 0; x < MAX_X; x++)
{
for (int y = 0; y < MAX_Y; y++)
{
if (x == currentX && y == currentY)
{
break;
}
if (x >= currentX)
{
continue;
}
if (clusterArray[x, y].Checked && clusterArray[x, y].ClusterId == badClusterId)
{
clusterArray[x, y].ClusterId = goodClusterId;
clusterCount[goodClusterId]++;
}
}
clusterCount[badClusterId] = 0;
}
}
public static int FindBiggestClusterInField(int[][] field)
{
// put all positions in hash set that have 1.
// take one out find all positions that exist in set adjacent.
// in this case as it is 5 X 5 we can join the indexes so that it is 00 to 44 as keys in the set.
// we ensure we do not try to check the same position twice as it is removed from the groomed set.
HashSet<int> uncheckedOneSpaces = new HashSet<int>();
// O(n) cost.
// groom and store positions to be counted
for (int row = 0; row < field.Length; row++)
{
for (int column = 0; column < field[row].Length; column++)
{
int position = (row * 10) + column;
if (field[row][column] == 1)
{
uncheckedOneSpaces.Add(position);
}
}
}
int highestCount = 0;
// O(m*4) : m <= n cost. Still O(n)
// while we have uncounted clusters keep going
while (uncheckedOneSpaces.Count > 0)
{
Queue<int> positionsToCheck = new Queue<int>();
// Code reduction action
Action<int> queueIfPresentinUncheckedSet = (position) =>
{
if (uncheckedOneSpaces.Remove(position))
{
positionsToCheck.Enqueue(position);
}
};
// Seed queue with one of the positions that have not been checked.
// and remove it so it wont be counted again.
var tempPosition = uncheckedOneSpaces.First();
uncheckedOneSpaces.Remove(tempPosition);
positionsToCheck.Enqueue(tempPosition);
int count = 0;
while (positionsToCheck.Count > 0)
{
// foreach position find its adjacent positions that exist in unchecked set;
// place them in positionsToCheckSet
var currentPosition = positionsToCheck.Dequeue();
count++;
// add up one
queueIfPresentinUncheckedSet(currentPosition - 10);
// add down one
queueIfPresentinUncheckedSet(currentPosition + 10);
// add left one * works due to current limitation of width of field.*
queueIfPresentinUncheckedSet(currentPosition - 1);
// add right one
queueIfPresentinUncheckedSet(currentPosition + 1);
}
if (count > highestCount)
{
highestCount = count;
}
}
return highestCount;
}
def find_max_cluster_size(arr):
checked_status = [[False] * len(arr[0]) for i in range(len(arr))]
max_cluster_size = 0
next_unchecked = (0, 0)
while next_unchecked:
x, y = next_unchecked[0], next_unchecked[1]
max_cluster_size = max(
max_cluster_size, find_cluster_size(arr, x, y, checked_status))
next_unchecked = find_next_unchecked(x, y, checked_status)
return max_cluster_size
def find_next_unchecked(x, y, checked_status):
"""
Finds the next node that has not yet been checked
Starts at the last known node that we attempted a find_cluster_size
"""
starting_x = x
for i in range(y, len(checked_status)):
for j in range(starting_x, len(checked_status[0])):
if not checked_status[i][j]:
return (j, i)
starting_x = 0
return None
def find_cluster_size(arr, x, y, checked_status):
"""
Get the size of the cluster starting from this node in directions left,
right and down
"""
try:
if checked_status[y][x]:
return 0
except IndexError:
# We're beyond the size of the array. Easier than checking if x and y
# valid
return 0
checked_status[y][x] = True
if arr[y][x] == 0:
return 0
# Add one for this block, and everything in the cluster to the right, to
# the left and underneath
return (
1
+ find_cluster_size(arr, x+1, y, checked_status)
+ find_cluster_size(arr, x-1, y, checked_status)
+ find_cluster_size(arr, x, y+1, checked_status)
)
if __name__ == '__main__':
table = [
[1, 1, 1, 0, 0],
[1, 0, 0, 0, 0],
[1, 0, 0, 0, 0],
[0, 0, 0, 1, 0],
[1, 1, 1, 1, 1],
]
for row in table:
print row
print find_max_cluster_size(table)
Possible approach :
- Anonymous April 24, 20151. Assume that each element is a node.
2. Each '1' node is connected to the neighbor if the neighbor is also '1'
3. Find the largest connected component of the graph.