Why can we interchange summations?
Suppose we have the following
$$ \sum_{i=1}^{\infty}\sum_{j=1}^{\infty}a_{ij}$$
where all the $a_{ij}$ are non-negative.
We know that we can interchange the order of summations here. My interpretation of why this is true is that both this iterated sums are rearrangements of the same series and hence converge to the same value, or diverge to infinity (as convergence and absolute convergence are same here and all the rearrangements of an absolutely convergent series converge to the same value as the series).
Is this interpretation correct. Or can some one offer some more insightful interpretation of this result?
Please note that I am not asking for a proof but interpretations, although an insightful proof would be appreciated.
Solution 1:
This isn't a proof, but perhaps can give you the insight you are looking for. Any nondecreasing sequence converges to its (possibly infinite) supremum. Thus a series of nonnegative terms converges to the supremum of its partial sums and interchanging the order of summation doesn't affect the value of the supremum: there is no accidental cancellation of terms of opposite sign.
Solution 2:
The double sum you are asking about can be considered to be the sum of all terms of the infinite array of numbers $a_{ij}$:
$$\begin{pmatrix} a_{11} & a_{12} & \cdots & a_{1j} & \cdots\\ a_{21} & a_{22} & \cdots & a_{2j} & \cdots\\ \vdots& \vdots & &\vdots\\ a_{i1} & a_{i2} & \cdots & a_{ij} & \cdots\\ \vdots & \vdots & & \vdots \\ \end{pmatrix},$$
where the sum is taken with the particular order of first adding all elements of particular rows and then adding the resulting "row totals". The result you are citing claims that if all $a_{ij}$ is nonnegative, then it matters not in which order you add the entries of the infinite matrix, be that rows first and then row totals or columns first and then column totals.
I can't say your interpretation is right on the mark, because the issue is that even though one adds the entries of the same infinite array, the order of summation may result in summing the terms of different series in actuality. As the other answers point out this result holds essentially because we have no terms diminishing the total sum. Saying "both this iterated sums are rearrangements of the same series and hence converge to the same value, or diverge to infinity" really does not make any emphasis on this advantage, which I believe is sweeped under the phrase "same series".
I'd like to mention yet another (measure theoretical) interpretation of this result, which I recently encountered in Rudin's Real and Complex Analysis (p. 23). In the book Rudin gives this result as a corollary of Lebesgue Monotone Convergence Theorem applied to series of functions.
Theorem: Let $X$ be a measure space. If $\forall n: f_n:X\to [0,\infty]$ is measurable, then
$$\int_X \sum_n f_n d\mu= \sum_n \int_X f_n d\mu (\ast).$$
Corollary: Let $X:=\{x_1,x_2,...,x_n,...\}$ be a countable set and $\mu:\mathcal{M}_X:=\mathcal{P}(X)\to[0,\infty]$ be the counting measure:
$$\mu(E):= \begin{cases} |E|&, \mbox{ if} |E|<\infty\\ \infty&, \mbox{ if} |E|=\infty \end{cases}. $$
If $\forall i,j:a_{ij}\geq0$, then
$$\sum_i \sum_j a_{ij}=\sum_j \sum_i a_{ij}.$$
Proof:
- Set $\forall j: f_j:X\to[0,\infty], f_j(x):=\sum_i a_{ij}\chi_{\{x_i\}}(x)$; and $\forall i: \bar{f_i}:X\to[0,\infty], \bar{f_i}(x):=\sum_j a_{ij}\chi_{\{x_i\}}(x).$ Then $\sum_j f_j=\sum_i \bar{f_i}$. Indeed, let $x_{i_0}\in X$. Then
$$\sum_j f_j(x_{i_0})= \sum_j \sum_i a_{ij} \chi_{\{x_i\}}(x_{i_0})= \sum_j a_{{i_0}j},$$
and
$$\sum_i \bar{f_i}(x_{i_0})=\bar{f_{i_0}}(x_{i_0})=\sum_j a_{{i_0}j}.$$
- Observe that the (inner) sum over $i$ is the integral of $f_j$ and the (inner) sum over $j$ is the integral of $\bar{f_i}$. Then we have:
$$\sum_j\sum_i a_{ij}= \sum_j \int_X f_j d\mu\stackrel{(\ast)}{=}\int_X \sum_j f_j d\mu=\int_X \sum_i \bar{f_i}d\mu\stackrel{(\ast)}{=}\sum_i\int_X\bar{f_i}d\mu =\sum_i \sum_j a_{ij}.$$
Now consider the interpretation induced by the above discourse, viz., the (countable) combinations of characteristic functions behave in such a way that one can concentrate nonnegative "weights" of individual points. In the above proof $f_j$ puts a single weight on each point, while $\bar{f_i}$ concentrates all the weights of the point $x_i$.
Solution 3:
Well, I think that interpretation is a bit circular, as the reason absolute convergence allows for rearrangements without changing the limit is precisely because of this point.
The gist of it is that no cancellations due to sign are possible. I think it is very illuminating to consider the counterexample for the rearrangement theorem when you do not have absolute convergence, but still have conditional convergence (Rearrangements can lead to any limiting value!).
You can add up as many negative terms as you want until you are happy (as it does not absolutely converge, you can continue adding until you are lower than any value in the real line you desire), and likewise for the positive terms (note, if adding the rest of the positive terms gives a finite value, then actually the whole sum didn't conditionally converge, and in fact will be $-\infty$). This procedure of adding negative terms until your sum is below $L$, and then adding positive terms until the sum is above $L$, and etc. will converge because conditional convergence requires the things you are summing to go to $0$ at the end of the day.
Okay, but to the non-negative double sum. Towards proving the double sum result (since you are only considering two orderings), I suppose one cute thing you could try is start with rows (arrange the numbers on the upper right quadrant for this picture). Let's assume the row sums converge first. Now, for each row sum, take out the first term of each, and set it aside, adding them upwards (the first column sum). Note that of course, the value of the sum didn't change, because you took out values one at a time, putting them into the column sum, and at no point did anything funny happen because only finite sums were involved (the full sum, the partial column sum, and the full sum minus the finitely many elements you took out). Now continue this procedure, extracting the second column sum, and etc. At the "end" you have what you want to prove.
Two limiting procedures hidden: When you "finish" the first column sum, you have to take a limit procedure. When you "finish" all the columns, there's another limit there. The issue if you include negative terms is that the first column sum might already diverge. And even if all the column sums converge, summing the column sums might converge to an entirely different value! I think to really finish this proof idea, you should use the fact that the values you are working with vary monotonely at each step.