How to compare two shapes?
Is there a way to compare two geometric shapes (or any two more generic data structures), without using the brute force when a tolerance is involved?
The brute force (that is comparing each value of each object against each value of the other object) works but it's slow, and I can't use it.
I tried sorting the data and comparing two sorted collections. It's fast, but it only works with zero tolerance. As soon as I add the tolerance I get lost. The problem is that two values can be identical when I compare and different when I sort.
Here are some details of my problem.
In my Excel VBA add-in I have a collection of Shape
objects made by a collection of Line
objects made by two Point
objects each. The add-in scans a CAD drawing via COM and creates the collection of Shape
objects.
An simplified version could generate this:
Shape 1 Shape 2
Point 1 0.0 5.0 0.0 4.9
Point 2 4.9 0.0 5.1 0.0
Point 3 5.0 5.0 5.0 5.0
I need to find which shapes are identical to which shapes, where identical means has the same shape, size and orientation, but not the same position (so far it's trivial) plus or minus a tolerance (not so trivial now!)
The Point.IsIdenticalTo(OtherPoint)
is defined as:
Function IsIdenticalTo(OtherPoint As Point) As Boolean
IsIdenticalTo = Abs(X - OtherPoint.X) < Tolerance And Abs(Y - OtherPoint.Y) < Tolerance
End Function
The brute force implementation of the Shape.IsIdenticalTo(OtherShape)
works but it's too slow: if each Line(I)
has an identical OtherShape.Line(J)
and viceversa, then the two shapes are identical. Sometimes I have hundreds of shapes with hundreds of lines each, so the brute force solution doesn't work for me.
I tried two approaches involving sorted collections. Both are fast because comparing two sorted collections is faster than the brute force way, but both fail in some conditions:
-
A
SortedValues
collection contains all the X and Y values of all the lines. The values are sorted, so the info about whether a value is an X or a Y is lost. I have used this approach for months without problems, but it fails for example when the only difference between two shapes is between the points(10, 20)
and(20, 10)
. I added the line angle to the list of values, things have improved, but there are still cases where this approach fails, because some info is lost when the values are sorted together. In the example above this approach would work with the following collections:Shape 1 Shape 2 0.0 0.0 0.0 0.0 4.9 4.9 5.0 5.0 5.0 5.0 5.0 5.1
-
A
SortedLines
collection contains all the lines sorted counter-clockwise and starting from the point closest to the origin. This approach doesn't lose any info, but it fails in the example above because the sorting algorithm doesn't agree with the equality comparison. If the tolerance is 0.5 they should be identical, but the sorting algorithm produces the collections shown below. Things get more difficult because my shapes contain sub-shapes, so there are many starting points on each shape.Shape 1 Shape 2 Point 1 4.9 0.0 0.0 4.9 Point 2 5.0 5.0 5.1 0.0 Point 3 0.0 5.0 5.0 5.0
EDIT:
Shapes are imported from an external graphical application via COM. A shape can be as simple as rectangle or as complex as any fancy outline with 10 circles inside, 20 internal shapes and 30 lines. They represent panels with holes and simple decorations, and sometimes they have a saw-tooth shape, which makes dozen of edges.
Solution 1:
-
handle shape as polygon
convert your points (each line) to set of lines
(length,angle)
like on this image:this ensures invariance on rotation/translation. If you see more lines with
angle=PI
join them together to avoid miss comparisons of the same shapes with different sampling also try to match the same CW/CCW polygon winding rule for both shapes. -
find start point
Can be biggest or smallest
angle, length
... or specific order ofangles+lengths
. So reorder lines of one polygon(cyclic shift)
so your shapes are compared from the 'same point' if they can. -
comparison - for exact match
- number of lines have to be the same
- perimeters must be the same +/- some accuracy
so for example:
fabs (sum of all lengths of poly1 - sum of all lengths of poly2) <= 1e-3
if not shapes are different. Then compare all lengths and angles. If any one value differs more then accuracy value then shapes are different.
-
comparison - size does not matter
compute perimeter of both polygons
l1,l2
and resize all lengths of comparedpoly2
to match perimeter ofpoly1
so all lengths ofpoly2
are multiplied byvalue = l1/l2;
. After this use comparison from bullet #3 -
comparison - shape deviations can still do positive match (size must be the same)
try to set the number of lines to the same value (join all lines with angle close to
PI
). Then perimeters should "match" ...fabs(l1-l2)<=1e-3*l1
. You can use bullet #4 comparison -
comparison - size and shape deviations can still match
just resize
poly2
to match perimeter ofpoly1
as in bullet #4 and then use bullet #5
If you can not find the start point in booth polygons (bullet #2)
Then you have to check for all start point shifts so if your polygons have booth 5 lines:
poly1: l11,l12,l13,l14,l15
poly2: l21,l22,l23,l24,l25
Then you have to compare all 5 combinations (unless you found match sooner):
cmp (l11,l12,l13,l14,l15),(l21,l22,l23,l24,l25)
cmp (l11,l12,l13,l14,l15),(l22,l23,l24,l25,l21)
cmp (l11,l12,l13,l14,l15),(l22,l23,l24,l25,l21)
cmp (l11,l12,l13,l14,l15),(l23,l24,l25,l21,l22)
cmp (l11,l12,l13,l14,l15),(l24,l25,l21,l22,l23)
cmp (l11,l12,l13,l14,l15),(l25,l21,l22,l23,l24)
[Notes]
-
There are also faster ways to compare but they can miss in some cases
- you can compare histograms of lines, angles
- you can use neural network (I do not like them but they are ideal for classifications like this)
-
if your shapes have to be oriented in the same ways (no rotation invariance)
then instead of vertex angle use the line direction angle
-
if you can not ensure the same winding rule for both compared polygons
then you have to check them booth:
cmp (l11,l12,l13,l14,l15),(l21,l22,l23,l24,l25) cmp (l11,l12,l13,l14,l15),(l25,l24,l23,l22,l21)
I know it is a bit vague answer but still hope it helps at least a little ...
Solution 2:
I have the same problem. I compute the adjacent matrix of the vertex weighted with the distances. This compute all the sides length and diagonals. Then if the module of each row or column of the matrix are the same with the other matrix, then the two shapes are the same. For the tolerance just use the function round() before start. The complexity is O(n2 / 2), because you have to compute just an half of the adjacent matrix that is symmetric. The problem is that I cannot detect if a shape is flipped.