Why use Einstein Summation Notation?
Solution 1:
What is Einstein's summation notation?
While Einstein may have taken it to be simply a convention to sum "any repeated indices", as Zev Chronocles alluded to in a comment, such a summation convention would not satisfy the "makes it impossible to write down anything that is not coordinate-independent" property that proponents of the convention often claim.
In modern geometric language, one should think of Einstein's summation convention as a very precise way to express the natural duality pairings/contractions when looking at a multilinear object.
More precisely: let $V$ be some vector space and $V^*$ its dual. There is a natural bilinear operation taking $v\in V$ and $\omega\in V^*$ to obtain a scalar value $\omega(v)$; this could alternatively be denoted as $\omega\cdot v$ or $\langle \omega,v\rangle$. This duality pairing can also be called contraction and sometimes denoted by $\mathfrak{c}: V\otimes V^* \to \mathbb{R}$ (or different scalar field if your vector space is over some other field).
Now, letting $\eta$ be an arbitrary element of $V^{p,q}:= (\otimes^p V)\otimes (\otimes^q V^*)$, as long as $p,q$ are both positive, we can take a contraction between any one factor of $V$ against any other factor of $V^*$. Each one of these contractions give a mapping $V^{p,q} \to V^{p-1,q-1}$, and it is tedious to name every one of them (you can index each one by calling $\mathfrak{c}_{i,j}$ the contraction between the $i$th factor of $V$ with the $j$th factor of $V^*$).
The Einstein convention gets around this by being an index convention, where $\eta$ is written as $\eta^{i_1\cdots i_p}_{j_1\cdots j_q}$, an indexed object, each of the index corresponds to one of the $V$ or $V^*$ factors. Then instead of $\mathfrak{c}_{i,j}$, we just single out the relevant factor in the index and trace over it. For example $$ \mathfrak{c}_{1,1}(\eta)^{i_1\cdots i_{p-1}}_{j_1 \cdots j_{q-1}} = \eta^{k i_1\cdots i_{p-1}}_{k j_1 \cdots j_{q-1}} $$ where the summation symbol over $k$ is suppressed. For one single tensor the advantage of this notation is not clear, but for multiple contractions, you see the advantage
$$ \mathfrak{c}_{1,1} \mathfrak{c}_{p,q} \eta = \mathfrak{c}_{p-1,q-1} \mathfrak{c}_{1,1} \eta $$
if $\eta \in V^{p,q}$. Basically, if you have multiple contractions on one expression, you will have to keep careful track of the level of contractions to put in the correct indices in the contraction symbol; in particular the symbols are not commutative. The same expression above in Einstein notation would only be
$$ \eta^{k i_1\cdots i_{p-2} \ell}_{k j_1\cdots j_{q-2} \ell} $$
and it is immediately clear which slots are contracted together. Furthermore, it is manifest that the "formulae" obtained thus are independent of the choice of basis of $V$ and $V^*$ (with respect to which we can write down the actual components of $\eta$).
What is the correct use of Einstein's notation?
- Einstein's notation should only be used to denote contraction of one contravariant slot with one covariant slot. That's it. Don't sum over two covariant slots. Don't use triply-repeated indices. If you limit it to these kinds of contractions, you are using it to denote a "natural operation" and therefore will never get you expressions that are coordinate-dependent/non-geometric.
- This is especially an issue in Lorentzian or other pseudo-Riemannian geometric set-ups, or in situation where you don't have a metric at all. That in Riemannian geometry often times we can get away with doing contraction of a pair of covariant indices or a pair of contravariant indices is that there is a natural isomorphism (given by the metric) between $V$ and $V^*$ in this situation. Furthermore, in usual convention this isomorphism doesn't "change sign". In the situation without any metric there are no preferred isomorphism between $V$ and $V^*$, and so the bilinear map $V\otimes V\to \mathbb{R}$ would necessarily be coordinate dependent. In the Lorentzian case there can be sign issues if you are not careful.
- Einstein's summation convention takes advantage of the fact that the dual pairing $\omega(v)$ can be expressed as first taking the tensor product $\omega\otimes v$ then taking the contraction. So you should only use it when this procedure makes sense: don't use it to do elementwise division, for example.
- Einstein's summation convention should be used when there are no "coordinate dependent manipulations". In particular, if you ever find the need to speak of one particular component of a tensor when expressed in one particular coordinate system, then you should not use Einstein notation. Alternatively, you should find an invariant way of expressing that particular component (for example, fixing a distinguished one-form/vector field and write the component as the contraction of your tensor against that one-form or vector field).
Alternatives
Einstein's summation notation is ultimately about pairings between $V$ and $V^*$, so (in spite of its likely origin) you should not think of it primarily as a notation used for decluttering computations of tensor components in local coordinates, but rather as a way to efficiently solve the problem of "which two slots are we contracting again?"
From this point of view the alternatives to Einstein's notation are "invariant notation" (don't use any index; write everything in coordinate free manner) and the "Penrose diagrammatic notation" (see e.g. https://en.wikipedia.org/wiki/Penrose_graphical_notation).
Solution 2:
Einstein notation is a coordinate-based implementation of abstract index notation when there is a fixed set of bases for all vector spaces. This is the same as how matrices are coordinate-based implementations of linear maps between vector spaces.
In turn, abstract index notation is a highly convenient notation for chaining together complex combinations of multilinear functions, and fluidly converting inputs of multilinear functions to outputs and vice versa.
The inputs of a multilinear function can be turned into outputs, and vice versa, by viewing certain inputs to the function as "fixed" and other inputs as "free". Everytime you convert an input to an output or vice versa, you dualize the relevant space. For example, consider a multilinear function with 3 inputs: $$T(\cdot, \cdot, \cdot) : U \times V \times W \rightarrow \mathbb{R}.$$ It can instead be viewed as the following multilinear function with the first two spaces as inputs, and the dual of the third space as the output: \begin{align} &\tilde T(u,v) \mapsto T(u,v,\cdot), \\ &\tilde T :U \times V \rightarrow W^* \end{align}
Given a collection of multilinear functions, the outputs of one can be plugged in as inputs into another whenever the spaces are compatible, creating an interconnected network of multilinear functions.
Continuing the example, say $S$ is another multilinear function with $W^*$ as an input space: $$S:X \times W^* \mapsto \mathbb{R}.$$ Then one can define a new multilinear function composing the two: $$(x,u,v) \mapsto S(x,\tilde{T}(u,v)).$$
Even though this combination of multilinear functions is conceptually simple, it was clunky to write down (requiring the definition of an auxiliary function $\tilde{T}$). Instead, one can use abstract index notation to write the same thing concisely as: $$T_{uvw}S_x^w.$$