Differentiation definition for spaces other than $\mathbb{R}^n$

Solution 1:

A lot of standard differential calculus can be generalized to the setting of Banach spaces (finite-dimensional or infinite-dimensional), and in fact conceptually I think it is much clearer. All the standard things like chain rule, product rule, inverse function theorem, implicit function theorem, even the theory of ODEs carries over without too much effort to the Banach space setting.

Here's the relevant definitions.

  • Let $\Bbb{F} \in \{\Bbb{R}, \Bbb{C}\}$ be either the real or complex field. A Banach space over $\Bbb{F}$ is a normed vector space $(E, \lVert \cdot\rVert)$, such that the norm is complete (i.e every Cauchy sequence converges to some point of $E$ with respect to the given norm).

  • Let $(E_1, \lVert \cdot\rVert_1), (E_2, \lVert\cdot\rVert_2)$ be two Banach spaces over $\Bbb{F}$ (either real or complex field). Let $U \subset E_1$ be open, and let $f:U \to E_2$ be a given map. We say that $f$ is $\Bbb{F}$-differentiable at a point $a \in U$ if there is a continuous linear transformation $B: E_1 \to E_2$ such that \begin{align} \lim_{h \to 0} \dfrac{\lVert f(a+h) - f(a) - B(h) \rVert_2}{\lVert h\rVert_1} = 0 \end{align} In other words, we require that for every $\epsilon > 0$, there exist a $\delta > 0$ such that for all $h \in E_1$, if $0 < \lVert h \rVert_1 < \delta$, then $a+h \in U$ (this is possible since $U$ is open) and \begin{align} \dfrac{\lVert f(a+h) - f(a) - B(h) \rVert_2}{\lVert h\rVert_1} < \epsilon \end{align}

Of course, if such $B$ exists, one can prove it is unique; we can denote this as $Df_a, Df(a), df_a, df(a), f'(a)$ or anything else you like. The key thing now is that the derivative is a continuous (equivalently bounded) linear transformation from $E_1$ into $E_2$.

Note that if the vector space is finite-dimensional, then we have the following facts:

  • We can always equip it with a norm.
  • It is a standard theorem that all norms on a finite-dimensional space are equivalent (i.e give rise to the same topology).
  • It is easily checked that if we replace the norm on the Banach spaces $E_1, E_2$ with equivalent norms, then the notion of continuity is unchanged (this is clear, because the topologies are unchanged, and continuity is a purely topological property) and differentiability is unchanged. So, in the finite-dimensional case, one doesn't have to be too explicit about which norm is being used on the vector spaces in the definition of differentiability.
  • Every linear transformation $B: E_1 \to E_2$ between finite-dimensional Banach spaces is continuous (so, in the definition of differentiability, one doesn't have to explicitly verify this).
  • Every continuous linear transformation $B: E_1 \to E_2$ (not necessarily finite-dimensional) is differentiable everywhere, and for every $a \in E_1$, we have $DB_a(\cdot) = B(\cdot)$.

Your question seemed more focused on the general theory, which is why I addressed that first. For your actual question, the inversion map $i : GL_n(\Bbb{R}) \to M_n(\Bbb{R})$ is indeed defined on an open subset of a normed vector space (again, the space is finite-dimensional, so it doesn't matter which norm we actually use). If $A \in GL_n(\Bbb{R})$, then the derivative $Di_A$ will be a linear transformation $M_n(\Bbb{R}) \to M_n(\Bbb{R})$. If you really want to think in terms of matrices, then sure you can introduce a basis, $\beta$ for $M_n(\Bbb{R})$, and then since $Di_A$ is a linear transformation, you can consider the matrix representation $[Di_A]_{\beta}$. Note that this will be an $n^2 \times n^2$ matrix with real entries.

However, I think introducing a basis is completely unnecessary, and in fact confusing. Here's an outline for the derivative calculation. I leave it to you to fill in what assumptions are necessary to make the following reasoning work, and I leave it to you to justify carefully each equal sign which follows:

Fix $A \in GL_n(\Bbb{R})$, and $h \in M_n(\Bbb{R})$ sufficiently small in norm so that $A+h \in GL_n(\Bbb{R})$ and $I+ A^{-1}h \in GL_n(\Bbb{R})$, and $\lVert A^{-1}h\rVert < 1$ (why is possible to choose such small $h$?). Then, \begin{align} i(A+h) &= (A+h)^{-1} \\ &= \left[ A(I + A^{-1}h)\right]^{-1} \\ &= (I+A^{-1}h)^{-1} \cdot A^{-1} \\ &= \left( \sum_{n=0}^{\infty} (-A^{-1}h)^n \right) \cdot A^{-1} \\ &= \left( I - A^{-1}h + \mathcal{O}(\lVert h\rVert^2)\right) \cdot A^{-1} \\ &= A^{-1} - A^{-1}hA^{-1} + \mathcal{O}(\lVert h \rVert^2). \\ &= i(A) - - A^{-1}hA^{-1} + \mathcal{O}(\lVert h \rVert^2). \end{align} I claim that from this this it follows $Di_A(h) = -A^{-1}hA^{-1}$ (a triple product of $n \times n$ matrices). Why is this true? What are you supposed to check?

Here's an answer I wrote a while back which talks about a related question.