How to choose the initial value of backpropagating Seed in neural network when using automatic differentiation?

I think @Greg'answer is a bit complicated for your question. This may be because your nomenclature : seed is not correctly defined and it does not exist in the given material.

The rules, for Matrix Multiplication $\mathbf{C=AB}$ are quite simple. From $$ d\mathbf{C}=(d\mathbf{A})\mathbf{B}+\mathbf{A}(d\mathbf{B}) $$ you easily deduce

$$ \frac{\partial \phi}{\partial \mathbf{A}}= \frac{\partial \phi}{\partial \mathbf{C}}\mathbf{B}^T $$

This relation indicates how yo obtain the sensitivity of the objective function wrt to a pertubation in $\mathbf{A}$ if the sensitvity wrt to $\mathbf{C}$ was known.

The initialization you indicate is the sensitvity to the last layer/variable and depends on the data at hand.

$ \def\c#1{\color{red}{#1}} \def\a{\c{{\cal E}}}\def\aa#1{\a_{#1}} \def\d{\delta}\def\dd#1{\d_{#1}} \def\o{{\tt1}}\def\p{\partial} \def\L{\left}\def\R{\right} \def\LR#1{\L(#1\R)} \def\vecc#1{\operatorname{vec}\LR{#1}} \def\diag#1{\operatorname{diag}\LR{#1}} \def\Diag#1{\operatorname{Diag}\LR{#1}} \def\trace#1{\operatorname{Tr}\LR{#1}} \def\qiq{\quad\implies\quad} \def\grad#1#2{\frac{\p #1}{\p #2}} $Use a colon to denote the Frobenius product, which is a convenient notation for the trace $$\eqalign{ A:B &= \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij} \;=\; \trace{A^TB} \\ A:A &= \big\|A\big\|^2_F \\ }$$ This is also known as the double-dot or double contraction product.
When applied to vectors $(n=\o)$ it reduces to the ordinary dot product.
The properties of the underlying trace function allow the terms in such a product to be rearranged in many different but equivalent ways, e.g. $$\eqalign{ A:B &= B:A \\ A:B &= A^T:B^T \\ C:AB &= CB^T:A = A^TC:B \\ }$$ The fourth-order isotropic tensor can be defined in terms of Kronecker deltas as $$\aa{j\ell km} = \dd{jk}\dd{\ell m}$$ and it can be used to rearrange matrix products
$$AXB = A\a B^T:X\\$$

Using the above tools we can address your question.

Start with the matrix product and calculate its differential and gradients $$\eqalign{ C &= AB \\ dC &= A\,dB + dA\,B \\ &= A\a:dB \;+\; \a B^T:dA \\ \grad{C}{B} &= A\a, \quad\quad\quad \a B^T = \grad{C}{A} \\ }$$ Comparing this to your "rules", we see that the Seed is actually a 4th order tensor $(\a)\,$ and the second gradient formula contains a spurious transpose.

NB: In matrix notation, juxtaposition implies a single-dot (aka single contraction) product, i.e. $$\eqalign{ ABC &= A\cdot B\cdot C \\ A\a C &= A\cdot \a\cdot C \\ }$$

How to choose the initial value of backpropagating Seed in neural network when using automatic differentiation?

Related

Recent Posts