Find largest rectangle containing only zeros in an N×N binary matrix

Given an NxN binary matrix (containing only 0's or 1's), how can we go about finding largest rectangle containing all 0's?

Example:

      I
    0 0 0 0 1 0
    0 0 1 0 0 1
II->0 0 0 0 0 0
    1 0 0 0 0 0
    0 0 0 0 0 1 <--IV
    0 0 1 0 0 0
            IV 

For the above example, it is a 6×6 binary matrix. the return value in this case will be Cell 1:(2, 1) and Cell 2:(4, 4). The resulting sub-matrix can be square or rectangular. The return value can also be the size of the largest sub-matrix of all 0's, in this example 3 × 4.


Here's a solution based on the "Largest Rectangle in a Histogram" problem suggested by @j_random_hacker in the comments:

[Algorithm] works by iterating through rows from top to bottom, for each row solving this problem, where the "bars" in the "histogram" consist of all unbroken upward trails of zeros that start at the current row (a column has height 0 if it has a 1 in the current row).

The input matrix mat may be an arbitrary iterable e.g., a file or a network stream. Only one row is required to be available at a time.

#!/usr/bin/env python
from collections import namedtuple
from operator import mul

Info = namedtuple('Info', 'start height')

def max_size(mat, value=0):
    """Find height, width of the largest rectangle containing all `value`'s."""
    it = iter(mat)
    hist = [(el==value) for el in next(it, [])]
    max_size = max_rectangle_size(hist)
    for row in it:
        hist = [(1+h) if el == value else 0 for h, el in zip(hist, row)]
        max_size = max(max_size, max_rectangle_size(hist), key=area)
    return max_size

def max_rectangle_size(histogram):
    """Find height, width of the largest rectangle that fits entirely under
    the histogram.
    """
    stack = []
    top = lambda: stack[-1]
    max_size = (0, 0) # height, width of the largest rectangle
    pos = 0 # current position in the histogram
    for pos, height in enumerate(histogram):
        start = pos # position where rectangle starts
        while True:
            if not stack or height > top().height:
                stack.append(Info(start, height)) # push
            elif stack and height < top().height:
                max_size = max(max_size, (top().height, (pos - top().start)),
                               key=area)
                start, _ = stack.pop()
                continue
            break # height == top().height goes here

    pos += 1
    for start, height in stack:
        max_size = max(max_size, (height, (pos - start)), key=area)    
    return max_size

def area(size):
    return reduce(mul, size)

The solution is O(N), where N is the number of elements in a matrix. It requires O(ncols) additional memory, where ncols is the number of columns in a matrix.

Latest version with tests is at https://gist.github.com/776423


Please take a look at Maximize the rectangular area under Histogram and then continue reading the solution below.

Traverse the matrix once and store the following;

For x=1 to N and y=1 to N    
F[x][y] = 1 + F[x][y-1] if A[x][y] is 0 , else 0

Then for each row for x=N to 1 
We have F[x] -> array with heights of the histograms with base at x.
Use O(N) algorithm to find the largest area of rectangle in this histogram = H[x]

From all areas computed, report the largest.

Time complexity is O(N*N) = O(N²) (for an NxN binary matrix)

Example:

Initial array    F[x][y] array
 0 0 0 0 1 0     1 1 1 1 0 1
 0 0 1 0 0 1     2 2 0 2 1 0
 0 0 0 0 0 0     3 3 1 3 2 1
 1 0 0 0 0 0     0 4 2 4 3 2
 0 0 0 0 0 1     1 5 3 5 4 0
 0 0 1 0 0 0     2 6 0 6 5 1

 For x = N to 1
 H[6] = 2 6 0 6 5 1 -> 10 (5*2)
 H[5] = 1 5 3 5 4 0 -> 12 (3*4)
 H[4] = 0 4 2 4 3 2 -> 10 (2*5)
 H[3] = 3 3 1 3 2 1 -> 6 (3*2)
 H[2] = 2 2 0 2 1 0 -> 4 (2*2)
 H[1] = 1 1 1 1 0 1 -> 4 (1*4)

 The largest area is thus H[5] = 12

Here is a Python3 solution, which returns the position in addition to the area of the largest rectangle:

#!/usr/bin/env python3

import numpy

s = '''0 0 0 0 1 0
0 0 1 0 0 1
0 0 0 0 0 0
1 0 0 0 0 0
0 0 0 0 0 1
0 0 1 0 0 0'''

nrows = 6
ncols = 6
skip = 1
area_max = (0, [])

a = numpy.fromstring(s, dtype=int, sep=' ').reshape(nrows, ncols)
w = numpy.zeros(dtype=int, shape=a.shape)
h = numpy.zeros(dtype=int, shape=a.shape)
for r in range(nrows):
    for c in range(ncols):
        if a[r][c] == skip:
            continue
        if r == 0:
            h[r][c] = 1
        else:
            h[r][c] = h[r-1][c]+1
        if c == 0:
            w[r][c] = 1
        else:
            w[r][c] = w[r][c-1]+1
        minw = w[r][c]
        for dh in range(h[r][c]):
            minw = min(minw, w[r-dh][c])
            area = (dh+1)*minw
            if area > area_max[0]:
                area_max = (area, [(r-dh, c-minw+1, r, c)])

print('area', area_max[0])
for t in area_max[1]:
    print('Cell 1:({}, {}) and Cell 2:({}, {})'.format(*t))

Output:

area 12
Cell 1:(2, 1) and Cell 2:(4, 4)