Why is the distributive property so pervasive in mathematics?

I just read this post which gives a geometric argument for the distributive law for real numbers, which I liked: https://math.stackexchange.com/a/466397/241685

However the distributive law comes up everywhere, not just for numbers. Set intersection distributes over union, inner products distribute over vector addition, wedge products distribute over vector addition, ring multiplication distributes over ring addition, matrix multiplication distributes over matrix addition, etc.

Is it that we are intentionally studying systems which generalize the distributive law for numbers, or is it that the systems we study which are interesting happen to generalize the distributive law? And either way, why is this the case?


Let's start very fundamentally. A lot of mathematics is concerned about sets of objects and sets of functions mapping one or more of those objects to another one.

Now let's look at the case that we have a set $X$ and a function $f$ takes two elements of that set and gives a third one, that is, $f:X\times X\to X$. Note that this is just an arbitrary function acting on two elements of an arbitrary set. Now if we need such a function often, we like to write $f(x,y)$ in a slightly simpler form. This usually is something like $x*y$, $x\cdot y$ or simply $xy$, but especially if $f(x,y)=f(y,x)$ (and in some rare cases, even if not), it is not uncommon to write $f(x,y)=x+y$. Well, we usually want a few other conditions, but at the moment we don't even need those. So at this point, all we care about is that $x+y$ takes two elements from some set $X$, and gives another element of the same set $X$.

Now let's look at two sets $X$ and $Y$, both coming with their own $+_X$ and $+_Y$ (this includes the case that $X=Y$ and $+_X=+_Y$). Now as with any two sets, we can consider functions from $X$ to $Y$. However, there are certain functions that are special: Namely those functions $\phi:X\to Y$ which respect our additions. That is, $\phi(x+y)=\phi(x)+\phi(x)$. Now if we do this a lot, we may like to omit the parentheses wherever possible, that is, write the function application as product of the function and the argument. The most prominent example of this is operators in linear algebra. If we do that, the above law reads: $$\phi(x+y) = \phi x + \phi y$$ Voila, a distributive law!

But wait, there is more: Given two functions $\phi:X\to Y$ and $\chi:X\to Y$, it is natural to ask about a function $x\mapsto \phi(x)+\chi(x)$. Now this gives a function operating on functions from $X\to Y$, that takes two functions and returns a new function. That new function is exactly the function that applies $\phi$ and $\chi$ to its argument and then adds that result. It is natural to consider that function on functions also as addition (the technical term is "pointwise addition"), and again denote it with $+$. So, using the product notation for function application, we have, by definition, $$(\phi + \chi)x = \phi x + \chi x$$ Voila, another distributive law!

OK, so now where do the numbers come in? Well, let's now consider one of the additional requirements which I didn't talk about above: Namely we also require of an operation we want to call $+$ that it is associative, that is, $(a+b)+c = a+(b+c)$. This means that when you add up many things, you can basically omit the parentheses.

This in particular means that if you repeatedly add something to itself, like $b+b+b+\dots+b$, all that matters is how many $b$s you've got in that sum. Therefore we again introduce a new multiplication, this time with a positive integer $n$: $$nb = \underbrace{b+b+\dots+b}_{n\text{terms}}$$ Note that this could also be seen as interpreting the integer $n$ as a function that takes an argument $b$ and returns a sum of $n$ $b$s. That is, we have functions \begin{align} 1&:x\mapsto x\\ 2&:x\mapsto x+x\\ 3&:x\mapsto x+x+x\\ &\vdots \end{align} Now we can ask: Are these "number functions" functions that respect the addition structure? Well, let's try e.g. with $3$: \begin{align} 3(x+y) &= (x+y)+(x+y)+(x+y) && \text{Definition of multiplication with $3$}\\ &= x+y+x+y+x+y && \text{because we require associativity}\\ &= x+x+x+y+y+y && \text{because we earlier required *commutativity* ($x+y=y+x$)}\\ &= 3x + 3y && \text{again, definition of multiplication} \end{align} The same works of course with any $n$ (a strict mathematical proof is slightly more involved). So we have: $$n(x+y) = nx + ny$$ Again, a distributive law.

Now by interpreting numbers as functions as above, we get an addition, namely pointwise addition. And of course we have the normal number addition. But it is not hard to check that those two additions indeed give identical results, that is, they can be regarded as the same addition.

But for pointwise addition, we already know that there is a distributive law, which therefore carries over also to multiplications with numbers: $$(m+n)x = mx + nx$$

OK, now let's consider the case where $X$ is actually a set of numbers itself, and $+$ is the normal addition of numbers. Then quite obviously, you recover the usual multiplication, and the corresponding distributive law.

So distribution laws occur quite naturally, in many contexts by requiring only that a binary operation exists without making any further requirements, and asking about operations that respect that operation, and for operations involving numbers, with only the additional requirement that addition is associative and commutative (two things we commonly demand from operations we call "addition").


The distributive property it just a compatibility condition between operations. If you have two operations it's logical to ask a condition that will aloud these operations to fit smoothly togheter, namely if you have an addition and a multiplication it's quite natural to ask that $$(a+b)v=av+bv.$$ It's clear this is not the only possible way to combine two operation, but it's quite natural. We could also not require they're compatible at all, but it's clear that the smooth combination of two different structures aloud the arising of wonderful properties (look at the compatibility request between the group structure togheter with the manifold structure that broght to the study of Lie Groups). So it's logical to impose a compatibility condition and study what happens when this condition is satisfied.

An extremely efficient way to see this specific issue of distributivity is to see it throught the relation between modules and representation, but I'm not sure if you're familiar with this language. If so I can edit the post explaining.


EDIT: I will explain how to see this compatibility using modules, just for completion sake. Modules are generalization of vectorial spaces where you have a ring instead of a scalar field (means you don't have multiplicative inverse on the scalar field), for our purpose you can think them just as vectorial space it will work out anyway. As you may understand, being a generalization of vectorial spaces, they're quite a very fundamental structure in mathematics. So believe me we are not talking about some kind of exotic forgotten and esoteric construction, we are just talking in a different language of something so basic that even include vector spaces as corollary.

I will give you the definition and then analyze where the distributivity part comes in. A module is a ring $R$ (which will be our scalar field) which acts over an abelian group $M$ which it will be called the module. The laws this action has to satisfy are these:

  1. $r\left(m+n\right)=rm+rn$;
  2. $r\left(sm\right)=\left(rs\right)m$;
  3. $\left(r+s\right)m=rm+sm$;
  4. $1m=m$.

Why did I say it's an efficient way to see distributivity acting? Because if you look closely here you have two distributivity relations (namely 1 and 3) and it result evident that these relations are compatibility relations between two different structures that are (the ring which is acting and the abelian group which will be the module). In other word it evidenciate the different roles of things that are in other examples usually identified togheter. Now to understand even better we can switch to representation notation evidenciating the action of an element of he ring to the element of the module, i.e. I will write $\rho(r)m$ instead of $rm$, i.e. $$\rho\left(r\right)m=rm.$$ Then you have the first relation $r\left(m+n\right)=rm+rn$ (which is the linearity condition in vectorial spaces) which will be $$\rho(r)(m+n)=\rho(r)m+\rho(r)n,$$ evidenciating this as a compatibility condition between the action of $R$ and the group operation on $M$. While the third condition $\left(r+s\right)m=rm+sm$ now is read $$\rho(r+s)(m) =\rho(r)(m)+\rho(s)(m),$$ which is just the request of the action being an homomorphism of abelian groups (you want to have somthing that in $M$ behaves like $R$). So as you can see this notation evidenciate the compatibility conditions between the two different structure that are underlined when you usually ask the distributive law. Unfortunately if you're not use to this language probably you won't appreciate the naturality of the construction.

One last thought to confuse you definitely: any ring $R$ can be thought as a module over itself, just as a complex line can be thought as a vectorial space of dimension 1 over the complex field. And that's exactly what you should think when you're thinking about distributive law. The thing is that in any ring there are two structure that are radically different (multiplicative and additive) and you're trying to glue them togheter. If you substitute the module $M$ with the ring itself $R$ and repeat the steps I've done you will notice the difference between these two structure and the compatibility conditions that you're asking in 1. and 3. that you usually request as distributivity law. It's clear that the whole thing here it's just illustrative since you tautologically already started with a ring, but it's pedagogical to illustrate the different role of the two parts of the structure.