Understanding the Proof of the existence of the Inverse function of a multivariable function.
Introduction:
I am having a hard time understanding this Proof. A similar scheme of the proof can be found in countable many books and lectures. However, so far the sources i looked up do not really explain every point of the proof, and the reader is left wondering about many Points. I shall write the proof now and highlight in $\color{Red}{red}$ the points to which i ask "why" or "how" and in $\color{green}{green}$ my Interpretation of the points which i believe to have understood and my Explanation to them.
Theorem:
(Requirement 1) :Let $U\subset \mathbb{R^n} $ be open.
(Requirement 2) :Let $f :U\rightarrow \mathbb{R^n}$ be a continuous differentiale function.
(Requirement 3) :Let $Det(D(f(a))\neq 0$ for some $a \in U$,
Hypothesis : For $b:=f(a)$ there exists open sets $V_a \subset U$ and $V_b \subset \mathbb{R^n}$ such that the following is true:
(Result 1) :$V_b = f(V_a)$
(Result 2):$f_{|_{V_a}} : V_a \rightarrow V_b$ is bijective.
(Result 3)$ :f^{-1} : V_b \rightarrow V_a$ is continuously differentiable.
(Result 4) : is $f\in C^k(U), k\geq 1$ then for the function $f^{-1}:V_b \rightarrow V_a $ applies that $f^{-1} \in C^k(V_b)$
Proof:
"Let us take note at start of the proof that for a general function $f$ applies at the point $a$ that $D(f(a)) = L$ and that in that point $\color{green}{D(L^{-1} \circ f) = DL^{-1} \circ Df= L^{-1} \circ L = I_n}$ $\color{green}{^{***1}}$ Where as we denote the identity matrix with $I_n$ \
*To $\color{green}{^{***1}}$ The Matrix $L^{-1}$ exists because of Requirement 3. Which states the Determenant of the linear matrix, which represents the derivation, has a none zero determinant, thus it is inversible.
Its inverse is then denoted with $L^{-1}$ which is the same as saying $D(f^{-1}(a))$ A derivative of any linear mapping $L$ is then given to be $DL= L$ Thus also For
$DL^{-1}=L^{-1}$ We recieve thus the given identity. *
$\color{Red}{\text{We can now assume that $D(f(a))= I_n$ without loss of generality, proving the statement for this case is sufficient}}$ $\color{red}{^{***1}}$. We have thus that $\partial f_i(a)/\partial x_j = \delta_{ij}$ Because if $L^{-1}\circ f$ locally inversible, thus also $f=L\circ(L^{-1} \circ f) $ is locally inversible because $L$ is a bijection
*To $\color{red}{^{***1}}$ I do not understand, why it is sufficient to show the truth of the statement to this case and ignore the other cases, it seems that some kind of implication is being derived from the shown relationships,however it does not seem trivial to me to understand it!
$\color{red} {\text{We now choose a Cuboid $K \subset U \subset \mathbb{R^n} $ which fullfills the following relaitonships:}}$ $1) a\in Int(K)\\ 2) Det(D(f_{|K}\neq 0)\\ 3)| \partial f_i/\partial x_j(x)-\partial f_i/\partial x_j(a)|\leq 1/2n^2 \forall x\in K $ $\color{red}{^{***2}}$
to $\color{red}{^{***2}}$ why does such Cuboid with these spesific requirements even exist?
Note: We know for a function $f=(f_1,...,f_n)$ such that for a constant M the following is true: $|\partial f_i/\partial x_j| \leq M$ then it is true that $||f(a)-f(b)||\leq n^2M* ||a-b||$ We will use this relationship to compute the inequality.
Let $g(x) := f(x)-x $ then it is obvious that $\forall x \in K$ that $|\partial g_i/\partial x_j (x) |= |\partial f_i/x_j (x) - \delta_{ij}| \leq 1/2n^2$
Comment: Due to the choice of $D(f)$ and the said properities of the Cuboid we can compare this equation to the above and recieve the desired right side inequality. No magic here. Compare now the "Note" to the inequality, we recieve thus: $ ||g(x_1)-g(x_2) || \leq 1/2 * || x_1-x_2|| $ $ ||g(x_1)-g(x_2) || = ||f(x_1)-x_1-f(x_2)+x_2|| \leq^{triangle inequality} || f(x_1)-f(x_2)|| + || x_2 -x_1|| $ Multiplying bothsides with $-1$ and using the identity that $ ||-x|| = ||x||$ we recieve thus in comparission: $ || x_1 -x_2||-||f(x_1)-f(x_2)|| \leq ||f(x_1)-x_1-f(x_2)+x_2||=||g(x_1)-g(x_2) || \leq 1/2 * || x_1-x_2|| $
Reorginizing the inequality now we recieve:
$ || x_1-x_2|| \leq 2||f(x_1)-f(x_2)||$ this is equivelant to saying, that the function is on the given cubioid injective.
With $Fr(K)$ we mean the boundary of $K$ say, the frontier. $\color{red}{\text{It is especially true that if $a\notin Fr(K)\Rightarrow f(a)\notin f(Fr(K))$}}$ $\color{red}{^{***3}}$ $\color{green}{\text{However, $f(Int(K))$ need not to be open, but since $Fr(K)$ is compact thus also $f(Fr(K))$}}$$\color{green}{^{***2}}$ $\color{red}{\text{Thus $f(a)$ has a actually a real positive distance from this compactspace,say, let us call it $\delta$ then it is $ 0 < \delta := [Inf_{x\in Fr(K)}||f(a)-f(x)||]$ }}$ $\color{red}{^{***4}}$
To $\color{red}{^{***3}}$ Why is this implication true and where does it come from?
To $\color{green}{^{***2}}$ Because the function is continious, and $Fr(K)$ is defined on a cuboid, Cuboids are compact and closed, thus $Fr(K)$ is compact. Continious functions carry compactness on their images. But not openeess.
To $\color{red}{^{***4}}$ Why does this distance need to be truly bigger than zero? How does one derive this implication?
We now let $V_b:=[y\in \mathbb{R^n}: ||y-f(a)||< \delta/2]$ for $y\in V_b, x\in Fr(K)$ we have thus the inequalities: $||y-f(a)||< \delta/2$ and $ ||f(a)-f(x)|| \geq \delta$
$\color{green}{\text{Combining these yields the following inequality}} $ $\color{green}{^{***3}}$
$ || y-f(a) || < ||y - (f(x) || \forall y \in V_b, x \in Fr(K)$ (Equation i)
To: $\color{green}{^{***3}}:$ i have shown graphically the correctness of this inequality, however i am not being able to derive this rigoriously. it must be some trick using inequalities that i am not finding out! Picture:
Now we want to show that $\forall y \in V_b \exists! x \in Int(K) : f(x)=y $ For that we consider the following function: $y\in V_b, h_y(x): K \rightarrow \mathbb{R}, h_y(x)=||y-f(x)||^2=\sum(y_i-f_i(x))^2 $
Now $\color{green}{\text{$h_y$ takes its absolute minimum on the compact set $K$}}$ $\color{green}{^{***4}}$
To: $\color{green}{^{***4}}$ Is this due to extreme Value theorem, because $h_y$ is continous and its on a compact set?
Now but Equation i implies that for $x\in Fr(K)$ the following applies: $h_y(a) < h_y(x)$ thus the function $h_y$ can not obtain its minmum on $Fr(K)$ rather it must be in a point $x_o \in Int(K)$ at this point the following applies: $\frac{\partial h_y}{\partial x_j}(x_0) = 0 = 2*\sum_{i=1}^n(y_i-f_i(x_o))*\frac{\partial f_i}{\partial x_j}(x_0)$
$\color{red}{\text{Because $x_o\in K$ then must apply that $Det(\frac{\partial f_i}{\partial x_j}(x_0) \neq 0$)}}$ $\color{red}{^{***5}}$
to $\color{red}{^{***5}}$ Is not determenant defined for matrices? But the given Partial derivative at a point is not a matrix Rather than a vector, or is it rather meaned that the derivative of f written as all partial derivatives at this point? say $Df(x_o)$
$\color{red}{\text{Thus accordingly follows that $y - f_i(x_o) =0 $ thus $y=f(x_o)$}}$$\color{red}{^{***6}}$
To $\color{red}{^{***6}}$ but what does the determenant not being zero have to do with the equation equaling zero above? i understood one of the terms must be zero but why does the determenant not being zero imply the other one must be zero? can we not have it mixed somehow? say one time the other term is zero and the other time the other is zero.
This implies that $V_b \subset f(K)$ Now because $V_a := Int (K) \cap f^{-1}(V_b)$ we have that $\color{green}{\text{$f_{|V_{A}}:V_a\rightarrow V_a$ is bijective}}$ $\color{green}{^{***5}}$
To $\color{green}{^{***5}}$ We have shown previously that $ f $ is injective on $K$ and then recently that for each $y\in V_b$ there exists only one $f(x_o)= y$ thus follows the bijectivity.
$\color{red}{\text{Both sets $V_a,V_b$ are open. }} $ $\color{red}{^{***7}}$
To $\color{red}{^{***7}}$ Why is this true?
Sofar Result 1 and result 2 have been shown, now we show Result 3.
We can now rewrite the inequality from before using $x_i= f^{-1}(y_i) \in V_a \subset K, i=1,2$ we recieve thus:
$||f^{-1}(y_1)-f^{-1}(y_2)|| \leq 2* || y_1-y_2|| \forall y_1,y_2 \in V_b$ This relationship is equivalent to say, that $(f_{|V_a})^{-1} V_b \rightarrow V_a $ is continious
We need only to show differentiability:
Let $x \in V_a, y\in V_b$ and let $L_x=D(f(x))$ we theorize that $f^{-1}$ in $y$ is differentiable and $Df^{-1}(y)= L^{-1}_x= [D(f(x))]^{-1} $applies. $\color{green}{\text{Truly for $x_1 \in V_a, \exists \phi: \mathbb{R^n} \rightarrow \mathbb{R^n}: f(x_1)= f(x)+ L_x(x_1-x) + \phi(x_1-x)$ and $lim_{x_{1} \rightarrow x} \frac{||\phi(x_1-x)||}{||x_1-x||}= 0$}}$ $\color{green}{^{***5}}$
To: $\color{green}{^{***5}}$ Do we recieve this function by rearanging the equation and taking the limit of $x_1 \rightarrow x$ because $f$ is differentiable and thus at the limit the LHS equals zero? Thus such function exist with the given attribute?
Now we rearange the equation and take $L^{-1} $ of both sides, same argument thus the same function ouput and the equality holds, we recieve with the notation $f(x) = y , f(x_1)=y_1$ the following:
$L_x^{-1}(y_1-y)= f^{-1}(y_1)-f^{-1}(y)+L_x^{-1}(\phi(f^{-1}(y_1)-f^{-1}(y))$
$\color{red}{\text{It is sufficent thus for the differentiability to show that: $lim_{y_{1}\rightarrow y} \frac{||L_x^{-1}(\phi(f^{-1}(y_1)-f^{-1}(y)||}{||y_1-y||}$}}$ $\color{red}{^{***8}}$
To $\color{red}{^{***8}}$ Why is this sufficent?
We proceed now. We now that for multilinear functions, the following estimation is true
$||L_x^{-1}(\phi(f^{-1}(y_1)-f^{-1}(y)|| \leq A * ||\phi(f^{-1}(y_1)-f^{-1}(y)||$ For some constant $A$ .$\color{red}{\text{Now due to the continuty of $f^{-1}$}}$ $\color{red}{^{***9}}$
$lim_{y{_1}\rightarrow y} \frac{||\phi(f^{-1}(y_1)-f^{-1}(y)||}{||y_1-y||} = lim_{x{_1}\rightarrow x} \frac{||\phi(x_1-x)||}{||f(x_1)-f(x)||} = lim_{x{_1}\rightarrow x} \frac{||\phi(x_1-x)||}{||x_1-x||}*lim_{x{_1}\rightarrow x} \frac{||x_1-x||}{||f(x_1)-f(x)||} = 0 * \text{number less or equal 2, see above} = 0$
Thus we conclude $f^{-1}$ is differentiable and $\color{red}{\text{$Df^{-1}(f(x))=[D(f(x)]^{-1}$}} $
$\color{red}{^{***9}}$
To $\color{red}{^{***9}}$: At this point, i am not sure if i am just tired, or i can not really see why that applies!
Now let $f:= (f_1,...,f_n): V_a \rightarrow V_b$ with the inverse function $f^{-1}=(u_1,...,u_n):V_b \rightarrow V_a$ Thus $f_i(u_1(y),...,u_n(y)) = y_i$ We differentiate and recieve: $\sum_{k=1}^n \frac{\partial f_i}{\partial x_k} (f^{-1}(y))\frac{\partial u_k}{\partial y_j}(y) = \delta_{ij}$ Thus the following must apply:$ [\frac{\partial u_k}{\partial y_j}(y)] =[\frac{\partial f_i}{\partial x_k} (f^{-1}(y))]^{-1}$
$\color{red}{\text{Thus every $\frac{\partial u_k}{\partial y_j}(y)$ is a rational function$q_{kj}$ of $\frac{\partial f_i}{\partial x_k} (f^{-1}(y))$. Is now $f \in C^k$ thus follows $f^{-1} \ in C^k$ because $f^{-1}$ is continious.}}$
$\color{red}{^{***10}}$
To
$\color{red}{^{***10}}$ I absolutely have no understanding of this colored line and i do not see why it is a rational function, where the continuity plays a role and why it is also from the same class.
q.e.d
In Advance, i thank the brave souls who will take on this question and bother with it. I reliaze it is very lengthy and probably hard to grasp and one needs to take multiple looks, thus for the engagement and helping me in understanding the gaps, i am thankful. Please do not feel obligated to answer every point. even if you answer some points and someone else answers the other points, i would be very happy.
Solution 1:
- Suppose the result holds for functions $\phi: U \to \mathbb{R}^n$ with $D\phi(a)=I_n$. Suppose $L=[Df(a)]^{-1}$ is an invertible linear transformation that is not $I_n$. As you have pointed out, The result can be applied to function $\phi=L \circ f$. So let us apply it to get a neighborhood $W_a$ such that $W_a$ and $W_b=\phi(W_a)$ satisfy all the results (for $\phi$). Since $L$ is invertible, the set $V_a=L^{-1}(W_a)$ is open. Since $L \circ f$ is a bijection from $W_a$ to $W_b$, $f$ is a bijection from $V_a$ to $V_b=f(V_a)$. An inverse can be defined and $f^{-1}=(L\circ f)^{-1}\circ L$ is continuously differentiable.
- The existence of such a neighborhood $K$ uses two preliminary results that are usually covered long before proving inverse function theorem. In words, the two results used in this line are the function bringing a linear operator in $\mathbb{R}^n$ to its inverse is a continuous function with respect to a certain metric defined on a space of linear operators and a function $f: \mathbb{R}^n \to \mathbb{R}^m$ is continuously differentiable if and only if all partial derivatives exist and are continuous. See Rudin's principles of mathematical analysis theorem 9.8 and 9.21.
- Suppose on contrary that $f(a) \in f(\text{Fr}(K))$. There exists an $x \in \text{Fr}(K)$ such that $f(x)=f(a)$. Let $\epsilon>0$. By continuity there is a $\delta >0$ such that $\|f(x)-f(y)\|< \epsilon$ for every $y \in \mathbb{R}^n$ satisfying $\|x-y\|< \delta$. Since $x \in \text{Fr}(K)$, every neighborhood of $x$ intersects $K$ and $K^c$. We can therefore choose a $y$ inside $K$. However, you just proved that $2\|f(x_1)-f(x_2)\|\geqslant \|x_1-x_2\|$ for every $x_1$ and every $x_2$ inside $K$. Apply it with $x_1=y$ and $x_2=a$ you find out, by triangular inequality, that $2\epsilon \geqslant \|a-y\|$. This is not possible as $a$ is an interior point and $y$ can be chosen arbitrarily close to $x$. (draw the picture)
- Once you understand 3, you can immediately understand this sentence.
- I think you are right that this is a typo.
- The author emphasizes determinant so much such that it writes a typo in 5 is because the author wants to use determinant! For simplicity, denote $a_i=y_i-f_i(x_0)$ and $a=(a_1,\cdots,a_n)$. Indeed, apply the result you have for every $j \in \{1,2,\cdots, n\}$. We end up having, as matrix multiplication:
\begin{equation*} (Df(x_0))(a)=0 \end{equation*} Since $Df(x_0)$ is invertible, it follows that $a$ is the zero vector.
- $V_b$ is a neighborhood of $f(a)$ so it is open. Since $f$ is continuous, $f^{-1}(V_b)$ is an open set. $V_a$ is open as finite intersection of open sets.
- According to the line right before this limit, the numerator is just $L_x^{-1}(y_1-y)-f^{-1}(y_1)+f^{-1}(y)$. If the limit is zero, then, by definition, this proves the derivative is $L_x^{-1}$.
- The continuity is used in the denominator: if $x_1$ tends to $x$, then $f(x_1)$ tends to $f(x)$. Please look at the definition of $L_x$ to see why the result applies.
- See If $f$ is differentiable in $(a,b)$ then $\frac{1}{f}$ is differentiable at $(a,b)$, provided $f(a,b)\neq0$. Higher order derivatives are the same thing with the same proof.