Prove that simultaneously diagonalizable matrices commute
Solution 1:
This has undoubtedly been answered (likely multiple times) here before, so I post this at the risk of beating a dead (and decaying) horse.
Let me first link you to this page, which contains two excellent answers (I particularly recommend Keith Conrad's expository paper linked in Pierre-Yves Gaillard's answer). However, let me provide a perhaps more elementary viewpoint since, from experience, many people beginning this topic are not quite comfortable with minimal polynomial based arguments yet.
You seem to have covered part a quite adequately so let me focus on part b. I apologize in advance for the length, but I feel that this is a topic which requires thorough understanding.
The main thing to remember about commuting matrices is the fact that commuting matrices respect each other's eigenspaces. What does this mean? To talk about that, we first have to introduce the topic of an invariant subspace.
Consider a matrix mapping $A:\ V \rightarrow V$ for a vector space $V$. If there is some subspace $U$ of $V$ such that the restriction of $A$ to $U$ remains an operator in the sense that $A:\ U\rightarrow U$, then we say that $U$ is an invariant subspace of $A$. The term stable is also sometimes used. The significance of this is that $A(U) \subseteq U$, the image of $U$ is entirely contained within $U$. This way, it makes sense to talk about a restriction of the mapping to the smaller vector space $U$.
This is desirable for several reasons, the main one being that linear mappings on smaller vector spaces are easier to analyze. We can look at the action of the mapping on each invariant subspace and then piece them together to get an overall picture. This is what diagonalization does; we break down the vector space into smaller invariant subspaces, the eigenspaces, and then piece together the facts to get a simpler picture of how the mapping works. Many of the simpler, canonical representations are dependent on this fact (for example, the Jordan canonical form looks at the invariant generalized eigenspaces).
Now, if we have two commuting, diagonalizable matrices, then each eigenspace of $B$ is not only invariant under $B$ itself, but also under $A$. This is what we mean by preserving each other's eigenspaces. To see this, let $\mathbf{v}$ be an eigenvector of $B$ under eigenvalue $\lambda$. Then $$B(A\mathbf{v}) = A(B\mathbf{v}) = \lambda A\mathbf{v}$$ so that $A\mathbf{v}$ is again an eigenvector of $B$ under eigenvalue $\lambda$. In our new language, this means that the eigenspace $E_\lambda$ of $B$ is invariant under $A$. This means it makes sense to look at the restriction of $A$ to $E_\lambda$.
Now consider the restriction of $A$ to $E_\lambda$. If all the eigenvalues of $B$ are simple (multiplicity one) then that means each eigenspace of $B$ is one dimensional. We have therefore restricted $A:\ E_\lambda \rightarrow E_\lambda$ to a mapping on a one-dimensional vector space. But this means that $A$ must take each vector of $E_\lambda$ to a scalar multiple of itself. You can check that this necessarily implies that $E_\lambda$ is also an eigenspace of $A$. Therefore, for any eigenbasis of $B$ that we take, the corresponding vectors also form an eigenbasis of $A$. This means that the two matrices are simultaneously diagonalizable; they share a common eigenbasis.
The general case is a bit more involved in that the restrictions to the invariant subspaces are more complex (they're no longer one-dimensional), but the ideas are identical.
P.S. Since you seem to be interested in physics, let me mention a crucial application of commuting operators. In quantum mechanics, you have quantities called observables, each of which is roughly speaking represented by a Hermitian matrix. Unlike in classical physics, different observables need not be simultaneously measurable (by measuring position for example, you cannot simultaneously measure momentum and vice versa) which is ultimately due to the fact that the position operator and the momentum operator do not commute (this is the underlying reasons behind the uncertainty principle). They do not have a shared basis which can represent the states of a system. Commuting operators therefore form a key element of quantum physics in that they define quantities which are compatible, i.e. simultaneously defined.