I am trying to find a robust way to classify and distinguish polyominoes. I would like to write a simple algorithm that could identify similar free polyominoes (under translation, rotation, reflection or glide reflection), given a set. Since I am new to this topic, I am getting a bit lost in the literature and I don't really know where to look.

I was thinking that something based on counting the length of the sides could work. For example, the following polyomino:

enter image description here

has one 5-length side, two 2-length sides and five 1-length sides. So we could describe it with the following notation: $\{(5,1),(2,2),(1,5)\}\,$, where the first number in parenthesis stands for the length of the side and the second one for the number of sides of that length. Then, all translations, rotations, reflections or glide reflections of this shape would result in the same description. Which is good.

However, this doesn't work in general. For example, the following two different polyominoes:

enter image description here

would result in the same description $\{(5,1), (3,1), (2,2), (1,6) \}$.

Probably, someone has already figured out a better way to do this, but, as I said, I am getting lost in the literature of the topic. Can anybody point me to the right direction?

Update:

Thanks for the useful suggestions in the comments. After some more research I found an interesting method, explained in the book "Linknot: Knot Theory by Computer - S. Jablan, R. Sazdanovic", which works like this:

Let's imagine the path of a ray of light (bold line in figure) in a polyomino with the borders consisting of mirrors, where in every border cell we place two-sided mirrors (bolder lines) perpendicular to the internal edges in their midpoints . After a series of reflections, the ray of light will describe a shape called self-avoiding curve. If we denote a reflection in a border mirror by 0, and a reflection in an internal mirror by 1, we have 0-1 words for polyominoes, where these words are cyclically equivalent (this means, could be red starting from any sign 0 or 1 and ending in it). For n = 1 we will have only one polyomino 0000, for n = 2 the polyomino 00010001, for n = 3 two polyominoes: 000101000101 and 000100100011, for n = 4 five of them: 0001010100010101, 0001010001100011, 001001001001, 0001001100010011, and 0001001010001011, etc.

enter image description here

We can translate these symbols (or binary numbers) into hexadecimal numbers and assign one number to each polyomino, in order to establish one-to-one correspondence between the numbers and polyominoes. For example, this number can be the minimum of all cyclic-equivalent symbols (e.g. to the polyomino 00010001 correspond cyclically equivalent symbols 00100010, 01000100, 10001000 and the minimum of them is 00010001 = 11 in the hexadecimal system). Hence, we have a notation for polyominoes where exactly one number corresponds to every polyomino, and vice versa.

This last part of the method is similar to what achille hui was suggesting in his "dumb" method, in the comments. I don't know how the efficiency of this approach compares to the efficiency of the "dumb" one, but I think it should be faster, since we are only considering the "perimeter" of the polyomino and forgetting about the cells in the inside (this could be a problem for polyominoes with holes, but for my application I am only interested in the outer perimeter, so it's not a big deal).

However, I did not find a proof that in this notation "exactly one number corresponds to every polyomino, and vice versa", even though it intuitively feels right.

Update 2:

To clarify, the reason why I want to label (or encode or hash, the difference is still not completely clear to me...) the polyominoes is because I have a code that simulates the evolution of a 2d system where the different states of the system can look like polyominoes (something like a cellular automaton). Basically, labelling the different shapes would allow me to give at each state a unique identifier and, for example, efficiently count how many times a given state has been visited.


I haven't studied the subject, so I hesitate to suggest this idea. It is geometrically intuitive, and surely simple to code in any language, but perhaps it is dreadfully inefficient.

Assign Gaussian integer coordinates to the centres of the squares in a given $n$-polyomino, compute their centroid (i.e. their arithmetic mean), and replace the coordinates with $n$ times their offsets from the centroid.

For example, the L-shape with centres $(3 + 3i, 3 + 4i, 3 + 5i, 4 + 3i)$ has centroid $\tfrac14(13 + 15i),$ so it is coded as $(-1 - 3i, -1 + i, -1 + 5i, 3 - 3i).$ (In any order, it doesn't matter.)

Let $(z_1, z_2, \ldots, z_n)$ and $(w_1, w_2, \ldots, w_n)$ be two such lists. They represent isomorphic polyominoes if and only if $f\{z_1, z_2, \ldots, z_n\} = \{w_1, w_2, \ldots, w_n\}$ for one of the $8$ functions $f(z) = \pm z, \pm iz, \pm\overline{z}, \pm i\overline{z}$ (forming a dihedral group of order $8$). So one only has to check $z_1$ against each of the $w_k$. If $f(z_1) = w_k,$ test if $f$ matches the two lists in some order. If not, try another $w_k$. If all such tests fail, the polyominoes are non-isomorphic.

(There's no need to use complex arithmetic, of course; it's just a convenient way of describing the functions $f$.)

Example

Two isomorphic tetrominoes.

The initial lists of pairs of integer-valued square centre coordinates are: \begin{gather*} P : [(5, 2), (5, 3), (5, 4), (6, 4)], \\ Q : [(-1, -1), (-1, -2), (-2, -2), (-3, -2)]. \end{gather*} The centroids are: \begin{gather*} P : \left(5\tfrac14, 3\tfrac14\right), \\ Q : \left(-1\tfrac34, -1\tfrac34\right). \end{gather*} The scaled offsets of the square centre coordinates from the centroids are: \begin{gather*} P : [(-1, -5), (-1, -1), (-1, 3), (3, 3)], \\ Q : [(3, 3), (3, -1), (-1, -1), (-5, -1)]. \end{gather*} The algorithm compares $z_1 = (-1, -5)$ with $w_1 = (3, 3),$ $w_2 = (3, -1),$ $w_3 = (-1, -1),$ in turn, and fails to find a match with any of them. Next, it compares $z_1 = (-1, -5)$ with $w_4 = (-5, -1),$ and finds that $f(z_1) = w_4,$ where $f(x, y) = (y, x).$ (In terms of Gaussian integers, $f(z) = i\overline{z}.$) Finally, it finds $f(z_2) = w_3,$ $f(z_3) = w_2,$ and $f(z_4) = w_1,$ and announces that $P$ and $Q$ are isomorphic.


If you want a unique way to identify polyominoes - essentially a name - you need to consider what the name will be used for.

If your intention is to make it easy for humans to understand what polyomino is meant, then none of the proposed schemes work well, since it will take you some time to work out which polyomino is meant by a given name. There has been other systems proposed, for example this naming works like chemistry names (and suffer from the same problems).

The best way to identify a polyomino is by its image. So whatever your computer produces, simply outputting the image instead of a string name may be best. (This convention is also used in many papers on polyominoes.)

One compromise is a name like "011/110/010" (this denotes the F-pentomino shown below). You can normalize the name; that is, finding a way to choose a canonical name from the 8 possibilities, but a human can recognize all pretty easily. This is basically the same as achille huis method, but you don't convert the binary. (I use this in my computer notes).

enter image description here

If humans are not your concern, I would recommend using a point set presentation as described below. It is the most direct, easiest to debug, and fast enough for analytics.


(I leave this here for others stumbling on the question.) Before using any clever system, implement the simple algorithm first:

  • Represent a polyomino as a set of coordinates, such that the minimum $x$-and $y$- coordinates are both 0. Essentially, you are subtracting the minimum corner of the hull from each point. You need the points in a consistent order, left to right for each row, and rows from bottom to top, for example.

  • To compare, you do for each of the 8 orientation a point by point comparison. If all $n$ points are equal, the polyominoes are equal, otherwise, go to the next orientation. If none are equal, the polyominoes are not equal.

Already, this algorithm is not bad; you need to make less than $8n$ comparisons (in practice, way less). Also, you can get the isometries of integer points with simple (very cheap) integer arithmetic. For example, vertical reflection is $(w - x + 1, y)$.

You can get a considerable speedup by caching a few things with your polyomino that are easy to calculate. As a start, the number of cells, and the dimensions of the bounding box. If your polyominoes are big, you may store projections (how many cells in each row, how many in each column).

So you can check in order the number of cells, the bounding box dimensions, the projections, and finally a full check.

I doubt any clever scheme can do better than this. Even so, this will be a very good benchmark for speed (And correctness).

Update: I want to point out the answer by Calum Gilhooley is essentially the same as the one described here. He uses the centroid to normalize translation (instead of a corner of the bounding box), and keeps calculations scaled by a factor of $n$ to avoid fractions (which also is a common strategy when working rational numbers on computers).