Derivative of matrix w.r.t. itself
This sounds like a joke, but i am actually interested and would like an answer, but what is the derivative of a matrix $C$ w.r.t. itself?
$$ \text{What is: } \frac{\delta C}{\delta C}\text{?} $$
Is it a matrix with shape equal to $C$ and filled with ones?
Solution 1:
A definition of the derivative of a matrix is provided in Kronecker Products & Matrix Calculus with Applications by A. Graham.
Derivative of a matrix $\boldsymbol{Y}$ with respect to a scalar $x_{rs}$:
In order to become familiar with the used notation we start with the derivative of a matrix with respect to a scalar. Let $\boldsymbol{Y}=\left(y_{ij}\right)$ be a matrix of order $(p\times q)$. The derivative of $\boldsymbol{Y}$ with respect to a scalar $x_{rs}$ is defined as the matrix \begin{align*} \frac{\partial\boldsymbol{Y}}{\partial x_{rs}}= \begin{pmatrix} \frac{\partial y_{11}}{\partial x_{rs}}&\frac{\partial y_{12}}{\partial x_{rs}}&\cdots &\frac{\partial y_{1q}}{\partial x_{rs}}\\ \frac{\partial y_{21}}{\partial x_{rs}}&\frac{\partial y_{22}}{\partial x_{rs}}&\cdots &\frac{\partial y_{2q}}{\partial x_{rs}}\\ \vdots&\vdots&\ddots&\vdots\\ \frac{\partial y_{p1}}{\partial x_{rs}}&\frac{\partial y_{p2}}{\partial x_{rs}}&\cdots &\frac{\partial y_{pq}}{\partial x_{rs}}\\ \end{pmatrix} =\sum_{i,j}E_{i,j}\frac{\partial y_{ij}}{\partial x_{rs}} \end{align*} with $E_{i,j}=\left(\delta_{k,i}\delta_{l,j}\right)_{{1\leq k\leq p}\atop{1\leq l\leq q}}$ the elementary matrix of order $(p\times q)$ which has a $1$ in the $(i,j)$-th position and all other elements are zero.
Derivative of a matrix $\boldsymbol{Y}$ with respect to a matrix $\boldsymbol{X}$:
We generalise the previous section in order to obtain the derivative of a matrix $\boldsymbol{Y}$ with respect to a matrix $\boldsymbol{X}$. Let $\boldsymbol{X}=\left(x_{rs}\right)$ be a matrix of order $m\times n$. The derivative of $\boldsymbol{Y}$ with respect to $\boldsymbol{X}$ is defined as the partitioned matrix \begin{align*} \frac{\partial \boldsymbol{Y}}{\partial \boldsymbol{X}}= \begin{pmatrix} \frac{\partial \boldsymbol{Y}}{\partial x_{11}}&\frac{\partial \boldsymbol{Y}}{\partial x_{12}}&\cdots &\frac{\partial \boldsymbol{Y}}{\partial x_{1n}}\\ \frac{\partial \boldsymbol{Y}}{\partial x_{21}}&\frac{\partial \boldsymbol{Y}}{\partial x_{22}}&\cdots &\frac{\partial \boldsymbol{Y}}{\partial x_{2n}}\\ \vdots&\vdots&\ddots&\vdots\\ \frac{\partial \boldsymbol{Y}}{\partial x_{m1}}&\frac{\partial \boldsymbol{Y}}{\partial x_{m2}}&\cdots &\frac{\partial \boldsymbol{Y}}{\partial x_{mn}}\\ \end{pmatrix} =\sum_{r,s}E_{r,s}\otimes\frac{\partial \boldsymbol{Y}}{\partial x_{rs}}\tag{*} \end{align*} of order $(mp\times nq)$. Here we use the Kronecker product $\otimes$. The definition (*) can be found in section 6.2.
It follows from (*) \begin{align*} \color{blue}{\frac{\partial \boldsymbol{X}}{\partial \boldsymbol{X}} =\sum_{r,s}E_{r,s}\otimes\frac{\partial \boldsymbol{X}}{\partial x_{rs}}\tag{**} =\sum_{r,s}E_{r,s}\otimes E_{r,s}} \end{align*}
We note according to this definition $\frac{\partial \boldsymbol{X}}{\partial \boldsymbol{X}}$ is not equal with the identity matrix $\boldsymbol{I}$.
Example $\frac{\partial \boldsymbol{X}}{\partial \boldsymbol{X}}$ with $\boldsymbol{X}$ of order $(2\times 2)$: We take a small matrix $\boldsymbol{X}$ of order $(2\times 2)$ and calculate $\frac{\partial \boldsymbol{X}}{\partial \boldsymbol{X}}$ to better see what's going on.
We obtain according to (*) \begin{align*} \frac{\partial \boldsymbol{X}}{\partial \boldsymbol{X}} &=\begin{pmatrix} \color{blue}{\frac{\partial \boldsymbol{X}}{\partial x_{11}}}&\frac{\partial \boldsymbol{X}}{\partial x_{12}}\\ \frac{\partial \boldsymbol{X}}{\partial x_{21}}&\frac{\partial \boldsymbol{X}}{\partial x_{22}} \end{pmatrix}\\ &=\begin{pmatrix} \color{blue}{\frac{\partial x_{11}}{\partial x_{11}}}&\color{blue}{\frac{\partial x_{12}}{\partial x_{11}}} &\frac{\partial x_{11}}{\partial x_{12}}&\frac{\partial x_{12}}{\partial x_{12}}\\ \color{blue}{\frac{\partial x_{21}}{\partial x_{11}}}&\color{blue}{\frac{\partial x_{22}}{\partial x_{11}}} &\frac{\partial x_{21}}{\partial x_{12}}&\frac{\partial x_{22}}{\partial x_{12}}\\ \frac{\partial x_{11}}{\partial x_{21}}&\frac{\partial x_{12}}{\partial x_{21}} &\frac{\partial x_{11}}{\partial x_{22}}&\frac{\partial x_{12}}{\partial x_{22}}\\ \frac{\partial x_{21}}{\partial x_{21}}&\frac{\partial x_{22}}{\partial x_{22}} &\frac{\partial x_{21}}{\partial x_{22}}&\frac{\partial x_{22}}{\partial x_{22}}\\ \end{pmatrix}\\ &=\begin{pmatrix} \color{blue}{1}&\color{blue}{0}&0&1\\ \color{blue}{0}&\color{blue}{0}&0&0\\ 0&0&0&0\\ 1&0&0&1 \end{pmatrix}\\ &=\begin{pmatrix} \color{blue}{E_{11}}&E_{12}\\ E_{21}&E_{22} \end{pmatrix}\\ &=\begin{pmatrix} \color{blue}{E_{11}}&0\\ 0&0 \end{pmatrix} +\begin{pmatrix} 0&E_{12}\\ 0&0 \end{pmatrix} +\begin{pmatrix} 0&0\\ E_{21}&0 \end{pmatrix} +\begin{pmatrix} 0&0\\ 0&E_{22} \end{pmatrix}\\ &=\color{blue}{E_{11}}\otimes E_{11}+E_{12}\otimes E_{12}+E_{21}\otimes E_{21}+E_{22}\otimes E_{22} \end{align*} in accordance with (**).
Note: Although we have $\frac{\partial \boldsymbol{X}}{\partial \boldsymbol{X}}\ne\boldsymbol{I}$, when taking the transpose $\boldsymbol{X^T}$ we get \begin{align*} \frac{\partial \boldsymbol{X^T}}{\partial \boldsymbol{X}} =\begin{pmatrix} \color{blue}{\frac{\partial \boldsymbol{X^T}}{\partial x_{11}}}&\frac{\partial \boldsymbol{X^T}}{\partial x_{12}}\\ \frac{\partial \boldsymbol{X^T}}{\partial x_{21}}&\frac{\partial \boldsymbol{X^T}}{\partial x_{22}} \end{pmatrix} =\begin{pmatrix} \color{blue}{1}&\color{blue}{0}&0&0\\ \color{blue}{0}&\color{blue}{0}&1&0\\ 0&1&0&0\\ 0&0&0&1 \end{pmatrix}\\ \end{align*} which is a permutation matrix and so close to the identity matrix.
Solution 2:
When differentiating an $n$ dimensional object by an $m$ dimensional object, we'll have $mn$ dimensions, showing how each of the $n$ dimensions varies with respect to each of the $m$ dimensions.
If we assume that the components of $(C)_{ij}$ are all independent, then when we differentiate, we are computing: $$\frac{d(C)_{ij}}{d(C)_{kl}}$$
which has four indexes $i, j, k, l$. Since we assume that $(C)_{ij}$ are all independent, we'll have:
$$ \frac{d(C)_{ij}}{d(C)_{kl}} \equiv \begin{cases} 1 & i = k \land j = l \\ 0 & \text{otherwise} \end{cases} $$
If $i = k$ and $j = l$, then the expression becomes $d(C)_{ij}/d(C)_{ij}$, which is $1$ since the derivative of any variable with respect to itself is $1$. If $i \neq k$ or $j \neq l$, then we get the derivative of some variable $(C)_{ij}$ by another independent variable $(C)_{kl}$, which is zero.
The above expression is sometimes compactly denoted as:
$$ \frac{d(C)_{ij}}{d(C)_{kl}} \equiv \delta_{ik}\delta_{jl} $$
where $\delta_{ik}$ is the Kronecker delta function.
To wrap up, the derivative $d(C)_{ij}/d(C)_{kl}$ is a 4 dimensional object, indexed by $i, j, k, l$. It has an entry $1$ if $i = k$ and $j = l$, and zero otherwise.