Coordinate systems' correspondence

Take into consideration that Vision/CoreML coordinate system doesn't correspond to ARKit/SceneKit coordinate system. For details look at this post.

Rotation's direction

I suppose the problem is not in matrix. It's in vertices placement. For tracking 2D images you need to place ABCD vertices counter-clockwise (the starting point is A vertex located in imaginary origin x:0, y:0). I think Apple Documentation on VNRectangleObservation class (info about projected rectangular regions detected by an image analysis request) is vague. You placed your vertices in the same order as is in official documentation:

var bottomLeft: CGPoint
var bottomRight: CGPoint
var topLeft: CGPoint
var topRight: CGPoint

But they need to be placed the same way like positive rotation direction (about Z axis) occurs in Cartesian coordinates system:

enter image description here

World Coordinate Space in ARKit (as well as in SceneKit and Vision) always follows a right-handed convention (the positive Y axis points upward, the positive Z axis points toward the viewer and the positive X axis points toward the viewer's right), but is oriented based on your session's configuration. Camera works in Local Coordinate Space.

Rotation direction about any axis is positive (Counter-Clockwise) and negative (Clockwise). For tracking in ARKit and Vision it's critically important.

enter image description here

The order of rotation also makes sense. ARKit, as well as SceneKit, applies rotation relative to the node’s pivot property in the reverse order of the components: first roll (about Z axis), then yaw (about Y axis), then pitch (about X axis). So the rotation order is ZYX.


Math (Trig.):

Equation

Notes: the bottom is l (the QR code length), the left angle is k, and the top angle is i (the camera)

Picture