ARKit – What do the different columns in Transform Matrix represent?

Solution 1:

ARKit, RealityKit and SceneKit frameworks use 4 x 4 Transformation Matrices to translate, rotate, scale and shear 3D objects (just like simd_float4x4 matrix type). Let's see how these matrices look like.

In 3D Graphics we often use a 4x4 Matrix with 16 useful elements. The Identity 4x4 Matrix is as following:

enter image description here

Between those sixteen elements there are 6 different shearing coefficients:

shear XY
shear XZ
shear YX
shear YZ
shear ZX
shear ZY

In Shear Matrix they are as followings:

enter image description here

Because there are no Rotation coefficients at all in this Matrix, six Shear coefficients along with three Scale coefficients allow you rotate 3D objects about X, Y, and Z axis using magical trigonometry (sin and cos).

Here's an example how to rotate 3D object (CCW) about its Z axis using Shear and Scale elements:

enter image description here

Look at 3 different Rotation patterns using Shear and Scale elements:

enter image description here

And, of course, 3 elements for translation (tx,ty,tz) in the 4x4 Matrix are located on the last column:

 ┌               ┐
 |  1  0  0  tx  |
 |  0  1  0  ty  |
 |  0  0  1  tz  |
 |  0  0  0  1   |
 └               ┘

The columns' indices in ARKit, SceneKit and RealityKit are: 0, 1, 2, 3.

The fourth column (index 3) is for translation values:

var translation = matrix_identity_float4x4

translation.columns.3.x
translation.columns.3.y
translation.columns.3.z


You can read my illustrated story about MATRICES on Medium.


Perspective and orthographic projections

Values located in the bottom row of a 4x4 matrix is used for perspective projection.

And, of course, you need to see an example of how to setup an orthographic projection.

Solution 2:

If you're new to 3D then these transformation matrices will seem like magic. Basically, every "point" in ARKit space is represented by a 4x4 transform matrix. This matrix describes the distance from the ARKit origin (the point at which ARKit woke up to the world), commonly known as the translation, and the orientation of the device, aka pitch, roll, and yaw. A transform matrix can also describe scale, although typically you won't deal with scale until you render something.

What do the columns mean? That gets complicated but just remember that the first 3 elements of the 4th column are the x,y,z translation. That will come in handy. The rest holds scale and rotation information.