Why would I define Alexander–Spanier cohomology?
I'm not an expert, the following is all just guesswork -- I similarly found the original papers unenlightening wrt their motivation.
As you said, the mystery mainly lies in the motivation of the additional step: modding out the functions from $X^{k+1} \to R$ by the subcomplex of functions which disappear on the neighborhood of the diagonal.
First, let's justify looking at neighborhoods of a space. We know from Alexander duality the philosophy of looking at tautness of a subspace $U$ with respect to a space $Y$.
We look at neighborhood $N$ of $U$ in Y (by neighborhood, we mean a subset $N$ of $Y$ that contains $U$ in its interior). The intersection of two neighborhoods of $U$ in $Y$ will be another neighborhood of $U$ in $Y$, so this gives us a system of groups $\{H^q(N)\}$ where $N$ ranges over all neighborhoods of $U$ in $Y$.
For each $N$, this gives us an inclusion $U \in N$, which induces a homomorphism $H^q(N) \to H^q(U)$. The subspace $U$ is said to be "tautly embedded" in $Y$ if this is an isomorphism for all $q$, all $N$, and all coefficient groups. Being taut implies that $U$ is compact and $Y$ is Hausdorff.
This gives us a hint: we are probably modding out by this subcomplex in order to deal with NON compact Hausdorff spaces.
Second, let's justify looking at the diagonal. The diagonal embedding $X \xrightarrow{\Delta} X \times X$, is simply a canonical way to embed a space X into an ambient space endowed with the product topology, $\Delta X := \{(x,x) \in X \times X\}$. It is useful when want to look in the neighborhood of a space $X$ (e.g., at germs of functions on $X$), but $X$ sits in no ambient space. The word, "diagonal embedding," comes from the example of embedding of $R^1 \hookrightarrow R^2$ taking $x \mapsto (x,x)$, that is, taking the line $R^1$ and embedding it into $R^2$ as the line $y=x$.
With this in mind, let's return our gaze to Alexander-Spanier cochains.
Here's my naive guess: modding out functions which disappear on any neighborhood of $X$, $N(X)$, artifically forces $X$ to satisfy the condition that $$H^q(\text{functions which disappear on }N(X)) \simeq H^q(\text{functions which disappear on }X)$$ for all $N$, all $q$, and all coefficient groups. Perhaps modding out by the subcomplex lets us "falsely" satisfy that $X$ is tautly embedded in $X \times X$, so that we may treat $X$ as if it were a compact space.
Below are a few additional comments toward why someone might have thought of modding out by that particular subcomplex.
Establishing notation: $X^{p+1}$ is the (p+1)-fold product of X with itself, that is, for $x_i \in X$, $(x_1, ..., x_{p+1}) \in X^{p+1}$.
$f^p(X) := \{$ functions $X^{p+1} \to \mathbb{Z} \}$, with functional addition as the group operation.
$f^p_0(X) :=$ elements of $f^p(X)$ which are zero in the neighborhood of the diagonal $\Delta X^{p+1}$
If we are examining functions defined pointwise on $X$, it’s natural to look at $X$-embedded in an ambient space, rather than the space $X$ itself. That is, $N(X)$ is the natural home of the jet bundle of $X$.
Functions which disappear on $N(X)$ form a group. If $f$ and $f’$ are both zero on $N(X)$ then $f-f’$ is zero on $N(X)$.
I'm not sure if the following is useful, nor how it fits into the story, but I figured I'd mention it.
The natural home of jet bundles (over a space $X$) is over the diagonal of X. From reading this paper, it seems that Grothendieck brought to the fore the kth neighborhood of the diagonal of a manifold $X$ when he was porting notions of differential geometry into algebraic geometry (this was then ported back into differential geometry by Spencer, Kumpera, and Malgrange). We'll use the standard notation $\Delta X \subseteq X_{(k)} \subseteq X \times X$. The only points of $X_{(k)}$ are the diagonal points $(x, x)$, but, we equip our space $X_{(k)}$ with a structure sheaf of functions, and treat $X_{(k)}$ as if it is made of "k-neighbor points" (x,y) where x and y are the closest points to one another, what Weil called "points proches").
To picture $X_{(1)}$, we might imagine $X$ with an infinitesimal normal bundle, for $X_{(2)}$, an infinitesimal bundle that’s ever so slightly larger of the second derivatives (as we need more local information to take the 2nd derivative), and so on.
If we think of a function $\omega: X_{(k)} \to R$ which vanishes on $X \subseteq X_{(k)}$ as a “differential k-form,” then maybe:
- the functions which vanish to the first order can be thought of as closed forms, $d\omega = 0$,
- the functions which vanish to the second order on the diagonal $X \subseteq X_{(k+1)}$ can be thought of as exact forms for they satisfy $\omega = d\beta$, s.t. $d(\omega) = d(d\beta) = 0$.
I have just looked up the definition on Wikipedia, so I am not an expert on Alexander-Spanier cohomology, but here is what I understand from it.
I will consider taking cohomology as considering functions from $X$ to $G$ (an ordinary cohomology theory allows you to make sense of what a "continuous function" from $X$ to $G$ is. Different theories give different meanings, which should be equivalent if $X$ is nice).
Then it's a bit like in algebraic geometry, if you have a closed subscheme $Spec (A/I) \rightarrow Spec (A)$, then the functions defined on it are exactly the functions which are defined on $Spec (A)$ (that is, $A$), quotiented by the functions which are $0$ on it (that is, $I$).
For the analogy, $X$ is $Spec (A/I)$, $Spec A$ is $X^n$ with the trivial topology and the morphism $X \rightarrow X^n$ is the diagonal morphism. Instead of quotienting by functions which are $0$ on $X$ (seen as a subspace of $X^n$), you quotient by functions which are $0$ near $X$ (maybe due to the discrepancy between the trivial topology on $X^n$ and the product topology).
If you try to compare the singular complex with the Alexander-Spanier complex, then $X^{n+1}$ stands for $n$-simplices of $X$ (where you remember only the vertices). If you restrict to the diagonal and your space $X$ is locally contractible then it makes more sense, because the diagonal corresponds to small simplices, and choosing a small $n$-simplex is almost the same as choosing its vertices.
Restricting focus to functions defined on small simplices is enough to reconstruct cocycles by decomposing big simplices into small simplices.
Now, I will try to be a bit more precise.
If you kill all functions vanishing around the diagonal, then you only keep functions supported on the diagonal. That is, you're left with the complex of germs of functions at the diagonal: any function $f$ defined on a subset can be extended by $0$ to give a function $f^!$ defined on the whole set. Now given any function $f$, pick any neighborhood $U$ of the diagonal, let $g$ be the restriction of $f$ to the complement of $U$, then $g^!$ vanishes on $U$ hence $f$ is equivalent to $f-g^! = h^!$, where $h$ is the restriction of $f$ to $U$.
Therefore, $f$ is equivalent to any of its restrictions around the diagonal, so that we can indeed see the Alexander-Spanier complex $AS$ as germs of functions on the diagonal. I think it makes it more believable that it is dual to Vietoris homology.
Assume we have a "triangulation function" $t$ which maps any $n+1$-tuple of points to a $n$-simplex in X whose vertices are the $n+1$-tuple. It is a section of the canonical map mapping a simplex to its vertices.
To compare $AS$ with, let's say, the singular one $S$, we could for instance precompose the elements of $AS$ with $t$, which gives a map $S \xrightarrow m AS$. In a way, we restrict singular cochains to functions defined on "canonical simplices" uniquely determined by their vertices through $t$.
We want to understand why $m$ is an equivalence (let's say when X is metric, compact).
There is clearly an issue with choosing $t$, but since $AS$ consists of germs of functions on the diagonal, it only cares about $n$-tuples close to the diagonal, therefore we only need to define $t$ on small $n$-tuples (those with a small diameter). Then the actual choice of $t$ won't matter if the $n$-tuples lie in contractible opens (let's say you take geodesic simplices).
The difference between $AS$ and $S$ is that $AS$ consists in functions being defined on small canonical simplices instead of all of them. But that's fine since you can extend the functions in $AS$ to all simplices by linearity and up to homotopy (any small simplex is homotopic to a small canonical simplex, and any big canonical simplex can be decomposed into small canonical simplices).
A nice motivation for the Alexander-Spanier cohomology is given in Godement's text Topologie Algebrique et Theorie des Faisceau in Remark 4.3.2 on page 169, in terms of sheaf cohomology. Here when I say "motivated in terms of sheaf cohomology" I do not merely mean that he presents the standard proof that the Alexander-Spanier can be reduced to an instance of sheaf cohomology, by giving a certain sheaf resolution whose global sections happen to align with the Abelian group of AS cochains. What I mean is that he shows that if one defines sheaf cohomology by the "canonical flasque resolution" (How to define the canonical Godement resolution), then if you try to write down an explicit construction for what the cochains are in the $n$th object in the resolution, you will get something that looks very similar to the Alexander-Spanier construction.
Later, near the end of the book, (in the chapter on products, I think section 6.4) he constructs another resolution of a sheaf by a cosimplicial sheaf, using a monad. By careful analysis of the cochains of this cosimplicial sheaf he again shows that they resemble the Alexander-Spanier cochains.
From this perspective one might ask - given these two definitions of resolutions, if we tried to simplify the definitions of the cochains to give the Alexander-Spanier cochains, would it still yield a good cohomology theory?