Reference Request: Differentials of Operators

Consider, for example, the map $f: \mathbb{R}^{n \times n} \rightarrow \mathbb{R}^{n \times n}, f(A) = A^2.$ Then its differential is $df(A)(T) = AT+TA$. I would like a reference that states what this differential means and then how to obtain such results, but not necessarily in a completely rigorous way. I also understand that differentials can be defined and manipulated in the usual way for functionals (e.g. for the Lagrangian, leading to the Euler-Lagrange equations) and I'd like to see this done without developing the whole machinery of variational calculus.

In short, I'm looking for a clear treatment of differentials of operator-valued functions. I've tried looking up books on matrix calculus, calculus on normed vector spaces and variational calculus but haven't found anything suitable (the closest option was Cartan's Differential Calculus, but I'd like something more concrete). Where do people learn this sort of thing?

Just compute the directional derivative, as you would in ordinary calculus. $df(A)(T) = \lim\limits_{h\to 0} \dfrac{f(A+hT)-f(A)}h$. Just do the matrix computation: \begin{align*} \frac{f(A+hT)-f(A)}h &= \frac{(A+hT)^2-A^2}h = \frac{h(AT+TA) + h^2T^2}h \\ &= (AT+TA) + hT^2 \to AT+TA \quad\text{as}\quad h\to 0. \end{align*} The point is that it's nothing different from calculus in Euclidean space, since the space of matrices is naturally a finite-dimensional Euclidean space.

Aside from other texts mentioned, Dieudonné's Treatise on Analysis is a standard reference. Differential Calculus in normed spaces appears in Volume 1.

The total derivative of a differential map $f\colon \Omega \subseteq \Bbb R^n \to \Bbb R^k$ at a point $x \in \Omega$, where $\Omega$ is open, is the unique linear map $Df(x)$ such that $$\lim_{h \to 0} \frac{f(x+h)-f(x)- Df(x)(h)}{\|h\|} = 0. $$Since matrix spaces are identified with Euclidean spaces themselves, it makes sense to compute derivatives of maps between matrix spaces. For instance, we have the chain rule $D(g\circ f)(x) = Dg(f(x))\circ Df(x)$, the total derivative of a linear map is itself, and if $B\colon \Bbb R^n \times \Bbb R^m \to \Bbb R^p$ is bilinear, its derivative is given by $$DB(x,y)(h,k) = B(x,k) + B(h,y).$$In your case, we can write $f(A) = A^2$ as $f(A) = g(\Delta(A))$, where $\Delta(A)= (A,A)$ is the (linear) diagonal map and $g(A,B) = AB$ is bilinear. So $$\begin{align} Df(A)(T) &= D(g\circ \Delta)(A)(T) = Dg(A,A) \circ D\Delta(A)(T) \\ &= Dg(A,A)(T,T) = g(A,T)+g(T,A) \\ &= AT+TA, \end{align}$$as wanted.

The right setting to talk about differentiability is the notion of a normed vector space. For example real $n\times n$ matrices are (obviously) a vector space, moreover you can introduce a norm on it. Also functionals in calculus of variations can often be written as a function between two normed vector spaces (the source being some vector space of functions, the target being the real numbers).

However, I'd recommend to start with something a bit simpler – learning how this formalism works in Euclidean spaces – and then learning the topic in more specialized contexts.

I'd recommend any of the following books:

W. Rudin's Principles of mathematical analysis,
T. Shifrin's Multivariable mathematics,
M. Spivak's Calculus on manifolds.

(Edit...) and these online materials:

Introduction to Manifolds from Oxford,
Multivariable calculus from Bristol,
Ted Shifrin's lectures on YouTube. In the context of the posed problem Lectures 21 and 22 are especially relevant.

How is $Ax + By = C$ the equation of a straight line?

Are compact subsets of $\mathbb{R}$ with the compact complement topology closed?

Is a continuous function on a bounded set bounded itself?

Prove by mathematical induction that $\forall n \in \mathbb{N} : \sum_{k=1}^{2n} \frac{(-1)^{k+1}}{k} = \sum_{k=n+1}^{2n} \frac{1}{k} $

If $\sum a_n/k^n = 0$ for all $k$, then $a_n = 0$ for all $n$

How to solve second degree recurrence relation?

Solve $(x+1)^n=(x-1)^n$, assuming $x$ is a complex number and $n>0$.

Unusual integral

$n \approxeq k + 2^{2^k}(k+1)$. How can one get the value of $k(n)$ from this equation?

integration using residue

Root of power $p$ from $1$ in the field of $p$-adic numbers

If $X\geq0$ is a random variable, show that $\lim\limits_{n\to\infty}\frac1nE\left(\frac{1}{X}I\left\{X>\frac{1}{n}\right\}\right)=0$ [duplicate]