Why I should believe that the derivative of the determinant is the trace
Using a Taylor expansion, it is not hard to show that the derivative of the determinant function at the identity is the trace: $$ \lim_{t \to 0} \frac{ \det(I + tA) - \det(I) }{ t } = \operatorname{tr}(A). $$ The determinant of a matrix may be viewed as the change in area of regions in $\mathbb{R}^n$. A square of area $1$ will be converted to a shape of area $\det(A)$. This change in area is the product of the eigenvalues, as opposed to the sum. What is a good intuitive way of seeing that the derivative would be the trace?
We take $N$ vectors with coordinates $(1, 0, 0...)^T, (0, 1, 0, ...)^T, ... (0, 0,..., 1)^T$.
$N$ vectors determine a parallelepiped, to find a volume of this parallelepiped we compose a matrix of components of these vectors and calculate it's determinant. In our case the matrix is identity matrix, the parallelepiped is a unit cube, it's volume is 1, $\det(I)$ is 1.
We can consider matrix $I+tA$ also as coordinates of $N$ vectors. $\det(I+tA)$ is a volume of a parallelepiped formed by these vectors.
Each of these $N$ vectors is close to the corresponding unit vector, and the whole parallelepiped is just a slightly distorted unit cube.
See what happens. There was a vector $(1, 0, 0....)^T$, now we have a slightly different vector $(1+a_1*t, a_2*t, ....)$. When we changed the first coordinate the volume of parallelepiped increased approximately by $1*a_1*t$: this is the "area of a square side * thickness of the layer". But when we change some other coordinates the affects only the regions along the edges of the cube. The change of parallelepiped's volume would be $O(t^2)$ and can be ignored.
It's easy to visualise this in 3-D case, and not much changes in case of higher dimensions.
So, the total change of volume would be $t*(a_1+a_2+...) + o(t)$.
So: $d(\det(I + tA))/dt = d(V)/dt = Tr(A)$
Update: I guess V.I.Arnold (link suggested in comments) explained the same, but better...
The identity $\det\exp X =\exp\text{tr}X$ is obviously valid for diagonal $X$, and this generalises to diagonalisable matrices (since $X\to OXO^T$ with orthogonal $O$ changes neither determinants not traces) and from there to all square matrices (because the diagonalisable matrices are dense). The choice $X=\ln (I-tA)$ for small $t$ gives $$\det (I-tA)=\exp\text{tr}\ln (I-tA)\approx\exp(-t\text{tr}A)\approx 1-t\text{tr}A=\det I-t\text{tr}A.$$
Off diagonal elements correspond to transvections which does not change the volume while a (positive) diagonal element changes the volume with precisely that factor. Write $E_{ij}$ for the matrix with 1 in the $i,j$-th place, all other being zero. By what we have just said, the change in volume when applying the matrix $I+a_{ij} E_{ij}$ is $1$ if $i\neq j$ and $1+a_{ii}$ if $i=j$.
To first order in $t$ we may decompose $I+tA$ as: $$ I+ tA = \prod_{i,j} (I+ t \; a_{ij} E_{ij}) + O(t^2) $$
On the other hand, successively applying each of the elementary operations indicated in the product we get the volume transformation: $$ \prod_{i,j} (1 + t \; a_{ij} \delta_{i,j}) = \prod_{i} (1+ t a_{ii}) = 1+ t \sum_i a_{ii} +O(t^2) = 1 + t \; {\rm tr} A + O(t^2) $$ (The above argument is similar to the one used to show geometrically, that the determinant indeed gives the volume transformation of a linear map).