Baker-Hausdorff Lemma from Sakurai's book
I'd like to show that, given to hermitian operators $A,G$ on a Hilbert space $\mathscr{H}$, the following identity holds: $$ e^{iG\lambda}A e^{-iG\lambda} = A + i\lambda [G,A] + \frac{\left(i\lambda\right)^2}{2!}[G,[G,A]]+\ldots+\frac{(i\lambda)^n}{n!}\underbrace{[G,[G,[G,\ldots[G}_{n\ times},A]]]\ldots]+\ldots $$ where $\lambda$ denotes a real parameter and $[\ \!,]$ indicates the commutator.
This is a proof left to the reader by Sakurai in his books on Modern Quantum Mechanics.
Using the series definition of exponential:
$$ e^{iG\lambda}A e^{-iG\lambda} = \sum_{p=0}^\infty\frac{(iG\lambda)^p}{p!}A\sum_{q=0}^\infty\frac{(-iG\lambda)^q}{q!} = \sum_{p=0}^\infty\sum_{q=0}^\infty(-)^q\frac{(i\lambda)^{p+q}}{p!q!}G^pAG^q=\\ \sum_{s=0}^\infty\sum_{d=0}^s(-)^d\frac{(i\lambda)^s}{d!(s-d)!}G^{s-d}AG^d=\\ A+i\lambda[G,A]+\frac{(i\lambda)^2}{2!}[G,[G,A]]+\ldots+\frac{(i\lambda)^n}{n!}\sum_{k=0}^n(-)^k \binom{n}{k}G^{n-k}AG^k+\ldots $$ So we are left with the following relation which we have to verify, and which would prove the statement: $$ \mathscr{F}(n): \sum_{k=0}^n(-)^k \binom{n}{k}G^{n-k}AG^k=\underbrace{[G,[G,[G,\ldots[G}_{n\ times},A]]]\ldots]. $$ Proceeding by induction, since the first terms shown above are compatible with the formula, we have to show that, if $\mathscr{F}$(n) holds then $\mathscr{F}$(n+1) is true as well.
To do this we exploit: $$ \underbrace{[G,[G,[G,\ldots[G}_{n+1\ times},A]]]]\ldots] = \underbrace{[G,[G,[G,\ldots[G}_{n\ times}[G,A]]]\ldots] $$
Then substituting $\mathscr{F}(n)$ yields: $$ \underbrace{[G,[G,[G,\ldots[G}_{n+1\ times},A]]]]\ldots]= \sum_{k=0}^n(-)^k \binom{n}{k}G^{n-k}(GA-AG)G^k =\\ \sum_{k=0}^n(-)^k \binom{n}{k}G^{n+1-k}AG^{k}-\sum_{k=0}^n(-)^k \binom{n}{k}G^{n-k}AG^{k+1}=\\ G^{n+1}A+\sum_{k=1}^n(-)^k \binom{n}{k}G^{n+1-k}AG^{k}-\sum_{k'=1}^{n}(-)^{k'-1} \binom{n}{k'-1}G^{n+1-k'}AG^{k'}+(-)^{n+1}AG^{n+1} $$ where in the last passage we changed summing index in the second sum, and took out the first term from the first and the last from the second. Now: $$ \binom{n}{k}+\binom{n}{k-1} = \binom{n+1}{k} $$ which gives $$ \ldots=G^{n+1}A + \sum_{k=1}^n(-)^k \binom{n+1}{k}G^{n+1-k}AG^{k} + (-)^{n+1}AG^{n+1}= \sum_{k=0}^{n+1}(-)^k \binom{n+1}{k}G^{n+1-k}AG^{k}.$$
And therefore $\mathscr{F}$(n+1) holds.
Let $A$ and $B$ be any two operators on the Hilbert space $\mathscr H$, hermitian or not. We assume $A, B \in L(\mathscr H)$, the Banach algebra of bounded linear maps from $\mathscr H$ to itself. Consider the linear operator ordinary differential equation
$\dfrac{dX}{d \lambda} = [B, X] \tag{1}$
with initial condition
$X(0) = A. \tag{2}$
We observe that
$X(\lambda) = e^{\lambda B}Ae^{-\lambda B} \tag{3}$
is the unique solution to (1), (2), for from (3) it follows that
$\dfrac{dX}{d \lambda} = \dfrac{e^{\lambda B}}{d \lambda}Ae^{-\lambda B} + e^{\lambda B}\dfrac{dA}{d \lambda}e^{-\lambda B} + e^{\lambda B}A\dfrac{e^{-\lambda B}}{d \lambda} =$ $Be^{\lambda B}Ae^{-\lambda B} - e^{\lambda B}Ae^{-\lambda B}B = [B, e^{\lambda B}Ae^{-\lambda B}], \tag{4}$
where we have used the fact that $dA / d \lambda = 0$ and the Leibniz product rule for derivatives in (4), and furthermore it is evident from (3) that $X(0) = A$.
We next recall that for any $B \in L(\mathscr H)$ the adjoint linear operator $\text{ad}_B: L(\mathscr H) \to L(\mathscr H)$ may be defined via
$\text{ad}_B(A) = [B, A] \tag{5}$
for all $A \in L(\mathscr H)$. Denoting by $\Vert T \Vert _L$ the standard operator norm on $L(\mathscr H)$, we see that
$\Vert \text{ad}_B(A) \Vert_L = \Vert [B, A] \Vert_L = \Vert BA - AB \Vert_L \le \Vert BA \Vert_L + \Vert AB \Vert_L$ $\le \Vert B \Vert_L \Vert A \Vert_L + \Vert A \Vert_L \Vert B \Vert_L = 2 \Vert B \Vert_L \Vert A \Vert_L, \tag{6}$
which shows that
$\Vert \text{ad}_B \Vert_L \le 2 \Vert B \Vert_L, \tag{7}$
i.e. that $\text{ad}_B \in L(\mathscr H)$ is itself a bounded linear operator of norm at most $2\Vert B \Vert_L$. Furthermore, we have
$\text{ad}_B^2(A) = \text{ad}_B (\text{ad}_B(A)) = \text{ad}_B([B, A]) = [B, [B, A]], \tag{8}$
$\text{ad}_B^3(A) = \text{ad}_B (\text{ad}_B^2(A)) = \text{ad}_B([B, [B, A]]) = [B, [B, [B, A]]], \tag{9}$
and so on:
$\text{ad}_B^n(A) = [B, [B, [B, . . . [B, A]]] . . . ], \tag{10}$
where the operator $\text{ad}_B = [B, \cdot]$ occurs a total of $n$ times on the right-hand side of (10). We see that in fact (1) may be written in terms of $\text{ad}_B$ as
$\dfrac{dX}{d \lambda} = \text{ad}_B(X). \tag{11}$
Now set
$Y(\lambda) = A + \lambda [B, A] + \dfrac{\lambda^2}{2!}[B, [B, A]]$ $+ \ldots + \dfrac{\lambda^n}{n!}\underbrace{[B, [B, [B, \ldots [B}_{n \; \text{times}}, A]]]] \ldots ] + \ldots; \tag{12}$
from the above we see that $Y(\lambda)$ may be written
$Y(\lambda) = A + \lambda \text{ad}_B(A) + \dfrac{\lambda^2}{2!} \text{ad}_B^2(A) + \ldots + \dfrac{\lambda^n}{n!} \text{ad}_B^n(A) + \ldots$ $= \sum_0^\infty \dfrac{\lambda^n}{n!}\text{ad}_B^n(A) + \ldots = e^{\lambda \text{ad}_B}(A); \tag{13}$
since by (7) $\text{ad}_B$ is a bounded operator on $L(\mathscr H)$, all the series occuring above converge absolutely and uniformly on compacta for all $\lambda \in \Bbb R$, in fact for all $\lambda \in \Bbb C$. We thus have, exactly as in the case of ordinary calculus, that the derivative $Y'(\lambda)$ is given by
$\dfrac{dY}{d\lambda} = \text{ad}_B(e^{\lambda \text{ad}_B}(A)) = [B, e^{\lambda \text{ad}_B}(A)], \tag{14}$
and furthermore
$Y(0) = A, \tag{15}$
which follows trivially from (12) and/or (13). Comparing (1), (2), (11), (14) and (15), we see that $X(\lambda)$ and $Y(\lambda)$, satisfying as they do the same ODE with identical initial conditions, must by uniqueness etc. be identical for all $\lambda$: $X(\lambda) = Y(\lambda)$. Using (3) and (12), (13) we thus see that
$e^{\lambda B}Ae^{-\lambda B} = e^{\lambda \text{ad}_B}(A)$ $= A + \lambda [B, A] + \ldots + \dfrac{\lambda^n}{n!}\underbrace{[B, [B, [B, \ldots [B}_{n \; \text{times}}, A]]]] \ldots ] + \ldots; \tag{16}$
if we now set $B = iG$ we obtain
$e^{i\lambda G}Ae^{-i\lambda G} = e^{i\lambda \text{ad}_G}(A)$ $= A + i\lambda [G, A] + \ldots + \dfrac{(i\lambda)^n}{n!}\underbrace{[G, [G, [G, \ldots [G}_{n \; \text{times}}, A]]]] \ldots ] + \ldots, \tag{17}$
where we have used the fact that $\text{ad}_{iG} = i\text{ad}_G$, a consequence of the linearity of the bracket $[G, A]$ in each of its variables $A, G$. Equation (17) is the desired result. QED.
Note: The technique used here, based on uniqueness of ODEs, is similar in spirit to that used in my answers to several other questions; in particular see this one and this one.
Another Note: A couple of interesting formulas related to the above: $[B, e^{\lambda B}Ae^{-\lambda B}] = e^{\lambda B}[B, A]e^{-\lambda B}$ and $A = e^{-\lambda B} e^{\lambda \text{ad}_B(A)} e^{\lambda B}$.
Hope this helps. Cheerio,
and as always,
Fiat Lux!!!