Find the set of largest contiguous rectangles to cover multiple areas [duplicate]

I'm working on a tool called Quickfort for the game Dwarf Fortress. Quickfort turns spreadsheets in csv/xls format into a series of commands for Dwarf Fortress to carry out in order to plot a "blueprint" within the game.

I am currently trying to optimally solve an area-plotting problem for the 2.0 release of this tool.

Consider the following "blueprint" which defines plotting commands for a 2-dimensional grid. Each cell in the grid should either be dug out ("d"), channeled ("c"), or left unplotted ("."). Any number of distinct plotting commands might be present in actual usage.

. d . d c c
d d d d c c
. d d d . c
d d d d d c
. d . d d c

To minimize the number of instructions that need to be sent to Dwarf Fortress, I would like to find the set of largest contiguous rectangles that can be formed to completely cover, or "plot", all of the plottable cells. To be valid, all of a given rectangle's cells must contain the same command.

This is a faster approach than Quickfort 1.0 took: plotting every cell individually as a 1x1 rectangle. This video shows the performance difference between the two versions.

For the above blueprint, the solution looks like this:

. 9 . 0 3 2
8 1 1 1 3 2
. 1 1 1 . 2
7 1 1 1 4 2
. 6 . 5 4 2

Each same-numbered rectangle above denotes a contiguous rectangle. The largest rectangles take precedence over smaller rectangles that could also be formed in their areas. The order of the numbering/rectangles is unimportant.

My current approach is iterative. In each iteration, I build a list of the largest rectangles that could be formed from each of the grid's plottable cells by extending in all 4 directions from the cell. After sorting the list largest first, I begin with the largest rectangle found, mark its underlying cells as "plotted", and record the rectangle in a list. Before plotting each rectangle, its underlying cells are checked to ensure they are not yet plotted (overlapping a previous plot). We then start again, finding the largest remaining rectangles that can be formed and plotting them until all cells have been plotted as part of some rectangle.

I consider this approach slightly more optimized than a dumb brute-force search, but I am wasting a lot of cycles (re)calculating cells' largest rectangles and checking underlying cells' states.

Currently, this rectangle-discovery routine takes the lion's share of the total runtime of the tool, especially for large blueprints. I have sacrificed some accuracy for the sake of speed by only considering rectangles from cells which appear to form a rectangle's corner (determined using some neighboring-cell heuristics which aren't always correct). As a result of this 'optimization', my current code doesn't actually generate the above solution correctly, but it's close enough.

More broadly, I consider the goal of largest-rectangles-first to be a "good enough" approach for this application. However I observe that if the goal is instead to find the minimum set (fewest number) of rectangles to completely cover multiple areas, the solution would look like this instead:

. 3 . 5 6 8
1 3 4 5 6 8
. 3 4 5 . 8
2 3 4 5 7 8
. 3 . 5 7 8

This second goal actually represents a more optimal solution to the problem, as fewer rectangles usually means fewer commands sent to Dwarf Fortress. However, this approach strikes me as closer to NP-Hard, based on my limited math knowledge.

Watch the video if you'd like to better understand the overall strategy; I have not addressed other aspects of Quickfort's process, such as finding the shortest cursor-path that plots all rectangles. Possibly there is a solution to this problem that coherently combines these multiple strategies.

Help of any form would be appreciated.


Solution 1:

I found the paper Fast Algorithms To Partition Simple Rectilinear Polygons from San-Yuan Wu and Sartaj Sahni, which could be of interest to you. In your example the region with character 'd' forms a rectilinear polygon, also the regions with 'c' and '.'. This paper includes algorithms for hole-free simple rectilinear polygons.

If a polygon includes holes, there are algorithms running with O(n^3/2 log n) time, as JM Keil in the paper Polygon Decomposition on page 11 states.

If minimizing the total length of the line segments introduced in the partitioning process is the other optimization criterion, the problem becomes NP-complete if the polygon contains holes (page 12). For these problems approximation algorithms exist (the paper refers to papers with such algorithms). If the polygon doesn't contain holes there is an O(n^4) time algorithm.

Solution 2:

This is not really an answer but using a naive search you can get

. 1 . 2 3 3
4 1 5 2 3 3
. 1 5 2 . 6
7 1 5 2 8 6
. 1 . 2 8 6

Basically you start from the top left corner and use it as the top left corner of your next rectangle, then you check how far you can extend it to the right and down, then find the topmost and leftmost cell of the remaining bits and so on.

This is probably very ineffective in some cases but it's quick as you don't have to recalculate anything..

Solution 3:

In my view, all solutions that find a set of rectangles that cover the original area is a correct one. Finding a smaller set of rectangles is better because it compresses/performs better.

So I would not advise trying to find the optimal solution. (I would guess it is NP-hard as well).

For a faster running solution, you could tile the matrix into groups of 4 cells initially, and try to merge them if they are the same. After that, you can merge groups of 4 groups, if they are the same. And do so recursively if you are done.

This won't find the optimal solution but will be very fast. If your matrixes are large, with large contiguous areas, the difference with the optimal won't be that great.