A matrix calculus problem in backpropagation encountered when studying Deep Learning

I'm going to use subscripts because they're easier to type, and retains the use of superscripts to indicate things like transposes and conjugates.

Algorithm 6.4 tells you how to calculate the vector $g$. It's a chain of derivatives extending from the output layer back to the $k^{th}$ layer $$\eqalign{ g &= \frac{\partial L}{\partial {\hat y}} \frac{\partial {\hat y}}{\partial h_l} \frac{\partial h_l}{\partial a_l} \frac{\partial a_l}{\partial h_{l-1}} \frac{\partial h_{l-1}}{\partial a_{l-1}} \frac{\partial a_{l-1}}{\partial h_{l-2}} \ldots \frac{\partial h_{k+1}}{\partial a_{k+1}} \frac{\partial a_{k+1}}{\partial h_{k}} \frac{\partial h_{k}}{\partial a_{k}} &= \frac{\partial L}{\partial a_{k}} }$$ even though $\frac{\partial {\hat y}}{\partial h_l}=1,\,\,$ I added it to the chain for clarity.

Use $g$ to write the differential of $L$ then change variables to $W_k$ $$\eqalign{ dL&= g:da_k\cr &= g:dW_k\,h_{k-1}\cr &= gh_{k-1}^T:dW_k\cr \frac{\partial L}{\partial W_k} &= gh_{k-1}^T \cr }$$ where the colon denotes the trace/Frobenius product, i.e. $$A:B = {\rm tr}(A^TB)$$

The properties of the trace allow one to write things like $$\eqalign{ &{\rm tr}(ABC) = {\rm tr}(CAB) = {\rm tr}(BCA) \cr &{\rm tr}(AB) = {\rm tr}(BA) = {\rm tr}(B^TA^T) \cr }$$ which correspond to rules for rearranging the terms in a Frobenius product $$\eqalign{ &A:BC = B^TA:C = AC^T:B \cr &A:B = B:A = B^T:A^T \cr }$$ Note that the object on each side of the colon must have the same shape, i.e. equal numbers of rows and columns. In that sense, it's similar to a Hadamard product.

A matrix calculus problem in backpropagation encountered when studying Deep Learning

Related

Recent Posts