Show that trace is a unique linear functional
Solution 1:
Consider the matrix $A=\begin{pmatrix}0&1\\0&0\end{pmatrix}$. We want to show that necessarily $f(A)=0$. Let $B=\begin{pmatrix}1&0\\0&0\end{pmatrix}$. Now $BA=A$ and $AB=0$, so by linearity of $f$ we have $0=f(0)=f(AB)=f(BA)=f(A)$. The same works for any matrix with a single $1$ off the diagonal in any dimension. Thus, by linearity, $f(A)$ only depends on the diagonal entries of $A$.
Let us then suppose $A$ is invertible and choose $B=A^{-1}C$ for any square matrix $C$. Then the condition $f(AB)=f(BA)$ becomes $f(C)=f(A^{-1}CA)$. If we choose $A$ to be a permutation matrix and let $C$ be a diagonal matrix, we see that $f$ is invariant under permutations of the diagonal elements. By linearity, $f(\text{diag}(c_1,\dots,c_n))=\lambda(c_1+\dots+c_n)$ for some constant $\lambda$. The normalization then implies that $f$ is the trace.
Solution 2:
I apologize for posting a belated answer but I humbly suggest that the other answers so far don't get to the heart of the matter.
You can skip this paragraph if the terms don't mean anything: The invariant definition of the trace in higher linear algebra uses the isomorphism between $Hom(V, V)$ and the tensor product $V^*\otimes V$ for a finite dimensional vector space V. In the infinite-dimensional case there is still an isomorphism but only for finite-rank linear operators. Given an element of $V^*\otimes V$ you have the linear evaluation functional defined by its action on simple tensors as $Tr(\phi \otimes x) = \phi(x)$. The well-definedness of this map follows from the universal property of tensor products since evaluation is bilinear. This is just tensor contraction.
In matrix terms this means that $Tr$ is defined by its action on rank 1 matrices $x y^T$ by $Tr(x y^T) = y^T x$. You can regard $y^T$ as just a notation for a linear functional if you want to think invariantly. Let's see how to derive this from your stated conditions:
A rank 1 projection has the form $z z^T$ where $z^T z = 1$. All rank 1 projections are conjugate and hence by commutativity of the trace they must have the same trace. Since $Tr(I_n) = n$ and $I_n$ can be decomposed as a sum of $n$ rank 1 projections, it follows that $Tr(z z^T) = 1$. (A rank $k$ projection gets trace $k$.) Replace the non-square $x$ and $y^T$ with the square $x z^T$ and $z y^T$:
$Tr(x y^T) = Tr(x z^T z y^T) = Tr(z y^T x z^T) = y^T x Tr(z z^T) = y^T x$
From this everything else readily follows. A matrix representation corresponds to a sum of rank 1 matrices: $A = \sum_{ij} A_{ij} e_i e_j^T$. Here ${e_i}$ is a basis and ${e_i^T}$ is the dual basis uniquely defined by $e_i^T e_i = 1$ and $e_i^T e_j = 0$ if $i \neq j$. Hence
$Tr(A) = \sum_{ij} A_{ij} Tr(e_i e_j^T) = \sum_{ij} A_{ij} e_j^T e_i = \sum_i A_{ii}$
I want to emphasize that this proof goes through for finite-rank linear operators if you interpret the transpose symbols as merely designating linear functionals. No inner product is required. In $z z^T$ just think of $z^T$ as any linear functional satisfying $z^T z = 1$. (When an inner product is present and you interpret the transpose symbol as an operation then you get an orthogonal projection.) You can think of this as the "abstract matrix notation" counterpart of abstract index notation for Einstein's summation convention for tensors.