Why is the trace of a matrix the sum along its diagonal?
Define the trace of a matrix with entries in $\mathbb C$ to be the sum of its eigenvalues, counted with multiplicity. It is a standard (but I think extremely surprising) fact that this is the sum of the elements along the diagonal. One proof of this is as follows:
Define $Tr'(A)$ to be the sum of the entries along the diagonal of $A$. If $A$ is an $n\times m$ matrix and $B$ and $m\times n$ matrix, we have $$Tr'(AB)=\sum_{i=1}^n\sum_{j=1}^m a_{ij}b_{ji}=\sum_{j=1}^m\sum_{i=1}^n b_{ji}a_{ij}=Tr'(BA)$$ and thus for any invertible matrix $P$ we have $Tr'(PAP^{-1})=Tr'(P^{-1}PA)=Tr'(A)$, i.e. $Tr'$ is independent of basis. Thus it suffices to note that when $A$ is in Jordan Normal Form, $Tr'(A)$ is the trace of $A$.
I find this proof pretty unsatisfying, mainly because I don't see any reason I would expect the sum along the diagonal to be basis-independent. Is there a more illuminating proof of this out there?
Let us start with another basis-independent yet more tractable (as it does not require the characteristic polynomial to split) definition of the trace. We will check in the end that it coincides with your definition, and with the sum of the diagonal coefficients with respect to any basis.
Let $V$ be an $n$-dimensional vector space over a field $F$. And let $L(V)$ be the algebra of $F$-linear maps from $V$ to $V$.
Note that we have a canonical isomorphism
$$ L(V)\simeq V\otimes V^* $$
via $v\otimes w^* \simeq w^*(\cdot)v$. In other words, $L(V)$ is a natural incarnation of the tensor product of $V$ with its dual $V^*$, with rank-one operators as elementary tensors.
Observe that the bilinear map $(v,w^*)\longmapsto w^*(v)$ factors uniquely through the tensor product.
That's the trace, which is therefore characterized by $$ \mathrm{tr}:V\otimes V^*\longrightarrow F\qquad \mathrm{tr}(v\otimes w^*)=w^*(v). $$
Now choose any basis $\{e_i\}$ for $V$ and denote its dual basis by $\{e_i^*\}$. We have $\mathrm{tr}(e_i\otimes e_j^*)=\delta_{ij}$. Therefore, for every $x=\sum x_{ij}e_i\otimes e_j^*\in L(V)$, we have $$ \mathrm{tr} (x)=\sum_{i=1}^n x_{ii}. $$
Conclusion When given a matrix $x$ in $M_n(F)$, think of it as an operator in $L(F^n)$ via the canonical basis of $F^n$. Its trace is then defined canonically as above. And whatever basis you choose for $F^n$, the sum of the diagonal coefficients will be equal to $\mathrm{tr}(x)$. In particular, it is also equal to the sum of the eigenvalues counted with multiplicities when the characteristic polynomial of $x$ splits.
Note It also helps understand why $\mathrm{tr} (ab)=\mathrm{tr}(ba)$, beyond the calculation you mentioned. Indeed $$ \mathrm{tr}((v_1\otimes w_1^*)(v_2\otimes w_2^*))=w_1^*(v_2)\mathrm{tr}(v_1\otimes w_2^*)=w_1^*(v_2)w_2^*(v_1) $$ $$ =w_2^*(v_1)w_1^*(v_2)=w_2^*(v_1)\mathrm{tr}(v_2\otimes w_1^*)=\mathrm{tr}((v_2\otimes w_2^*)(v_1\otimes w_1^*)) $$
Are you surprised that if a polynomial $f(x) = x^n + a_{n-1}x^{n-1} + \ldots$ of degree $n$ has roots $r_1, \ldots r_n$, then $a_{n-1} = - (r_1 + \ldots + r_n)$? Now think about how the coefficients of $x^{n-1}$ arise in the characteristic polynomial of a matrix $M$.
(And the characteristic polynomial is basis-independent, because the eigenspaces and corresponding eigenvalues are basis-independent and determine the characteristic polynomial).
If you work over an algebraically closed field (i.e. if you are over ${\mathbb R}$, just think it all over ${\mathbb C}$), then you can write your matrix in a triangular form.
In that case, you have that the elements in the diagonal are your eigenvalues. By the computation you showed the sum of the elements in the diagonal is independent of the basis chosen ($tr(PAP^{-1})=tr(AP^{-1}P)=tr(A)$), equality follows.
The rest of the answer has already been written in this posting.