What is orthogonal transformation?
You can transform a vector into another vector by multiplying it by a matrix: $$w=Av$$ Say, you had two vectors $v_1,v_2$, let's transform them into $w_1,w_2$ and obtain the inner product. Note that the inner product is the same as transposing then matrix multiplying: $$w_1\cdot w_2\equiv w_1^Tw_2=v_1^TA^TAv_2$$
Now, if the matrix is orthogonal, you get: $$w_1\cdot w_2=v_1^Tv_2\equiv v_1\cdot v_2$$
So, we see that the inner product is preserved when the transformation is orthogonal. Isn't this interesting? It means that if you have a geometrical object (which can be represented with a set of vectors) then orthogonal transformation $A$ will simply rotate or flip your object preserving its geometry (all angles, and sizes).
The equation $A^T A = A A^T$ says that $A^{-1} = A^T$ so, at least, orthogonal matrices are easy to invert. What does it mean? Matrices represents linear transformation (when a basis is given). Orthogonal matrices represent transformations that preserves length of vectors and all angles between vectors, and all transformations that preserve length and angles are orthogonal. Examples are rotations (about the origin) and reflections in some subspace. (and those are really all examples)
In addition to all of the above, you can think of a matrix as jamming together a bunch of responses: take some transformation $\mathcal T$ and these "basis vectors" $\hat e_1, \hat e_2, \dots \hat e_n$ that are columns with a $1$ in the $k^\text{th}$ spot and zeroes everywhere else, that transformation has the "response vectors" $\mathcal T(\hat e_k) = \vec t_k$ which can themselves be represented as columns with the above basis.
If we jam together these vectors horizontally we get the matrix. If that transformation is linear, meaning $\mathcal T(\vec a + \vec b) = \mathcal T(\vec a) + \mathcal T(\vec b)$ and $\mathcal T(k~\vec a) = k \mathcal T(\vec a),$ then matrix multiplication describes the entire transformation because $\mathcal T(\vec u) = \mathcal T(\sum_k u_k ~\hat e_k) = \sum_k u_k ~\vec v_k,$ you just combine the columns weighted by the respective components.
Just as an example before I go further, the matrix $$\begin{bmatrix}5&6&7\\1&2&7\\0&1&7\end{bmatrix}$$maps this input column $\hat e_1 = [1,0,0]^T$ to the output column $\vec v_1 = [5,1,0]^T$ and so on; you can therefore see that the matrix takes the form $[\vec v_1 ~~\vec v_2~~\vec v_3]$ with all these basic responses jammed together.
Now if you have this background, the key to an orthogonal matrix $A^{-1} = A^T$ is that the columns of the matrix are all orthogonal unit vectors which span the space. You can see this because $\mathcal T^{-1}(\vec v_k) = \hat e_k$ by definition, but $A^T~\vec v_k = \big(\vec v_k^T ~A\big)^T.$ As a matrix, $\vec v_k^T$ represents the function from vectors to dot products $(\vec v_k ~\cdot),$ while matrix multiplication implements function composition (so you feed every output-vector of the matrix on the right to the transform on the left). So equating this with $\hat e_k^T$ you get $$\hat e_k^T = \begin{bmatrix}\big(\vec v_k^T~\vec v_1\big)&\big(\vec v_k^T~\vec v_2\big)&\dots\big(\vec v_k^T~\vec v_n\big)\end{bmatrix}$$and that is how you get to this idea that $\vec v_k^T ~ \vec v_\ell$ must be $0$ if $k \ne \ell$ or else $1$ if they are equal, which makes this an orthonormal basis.
An orthogonal matrix can therefore be thought of as any "coordinate transformation" from your usual orthonormal basis $\{\hat e_i\}$ to some new orthonormal basis $\{\hat v_i\}.$ You can view other matrices as "coordinate transformations" (if they're nondegenerate square matrices), but they will in general mess with your formula for the "dot product" of vectors since that takes a different form in skewed coordinates where you have to define something called a "dual basis" and get the "contravariant" and "covariant" components of vectors: and by the time you get to that point we will be generalizing you from your familiar "dot product" to a more general "metric tensor". If you want to avoid all of these headaches, you must restrict your notion of "coordinate transformation" to precisely these orthogonal matrices.