Matrix chain rule question: what is $\frac{d}{dX} f(S)$ where $S = (A+X)^{-1}$

We know how to calculate the gradient with respect to $S$ $$G=\frac{\partial f}{\partial S}$$ We also know that $$\eqalign{ X &= S^{-1} - A\cr dX &= -S^{-1}\,dS\,S^{-1} &\implies dS = -S\,dX\,S \cr }$$ Let's use this to write the differential of the function, and then perform a change of variables to find a result in terms of $X$ $$\eqalign{ df &= G:dS \cr &= -G:S\,dX\,S \cr &= -S^TGS^T:dX \cr &= -S^T\,\frac{\partial f}{\partial S}\,S^T:dX \cr \cr \frac{\partial f}{\partial X} &= -S^T\,\frac{\partial f}{\partial S}\,S^T \cr }$$ where colon denotes the inner/Frobenius product, i.e. $$A:B={\rm tr}(A^TB)$$ and the cyclic properties of the trace give rise to some rules for rearranging the product, i.e. $$\eqalign{ A:BC &= AC^T:B \cr A:BC &= B^TA:C \cr A:BC &= BC:A \cr }$$

As you've discovered, the chain rule can be difficult to apply to matrix problems when the intermediate quantities, i.e. matrix-by-matrix or vector-by-matrix derivatives, are higher-order tensors.

The virtue of the differential approach is that the differential of a matrix behaves like an ordinary matrix.