Is there a deeper reason why exponentiation is not associative?
Solution 1:
I've tried to omit most of the technical details and go for a more intuitive explanation, since I believe one should always try to explain elementary topics like this one in an accessible way. I also mostly used set-theoretic terminology for the sake of familiarity, but the underlying concepts fundamentally come from category/type theory.
The operations of addition, multiplication and exponentiation on natural numbers are important and useful not because they form part of an infinite sequence, but because they mimic natural constructions we can apply to collections of things (after all, natural numbers were originally used for counting). In particular, given two sets $A$ and $B$, we can construct:
-
The disjoint union $A\sqcup B$ containing labeled elements $(1,a)$ and $(2,b)$ where $a\in A, b\in B$. The labels $1$ and $2$ tell us which set the element came from, to avoid conflict in case $A$ and $B$ have nonempty intersection.
-
The Cartesian product $A\times B$, whose elements are all pairs $(a,b)$ where $a\in A, b\in B$.
-
The function set $A\to B$, whose elements are all functions $f$ from $A$ to $B$.
If we denote by $|X|$ the number of elements of a set $X$, we have $|A\sqcup B|=|A|+|B|$, $|A\times B| = |A|\cdot |B|$ and $|A\to B| = |B|^{|A|}$, so these are indeed the counterparts of addition, multiplication and exponentiation respectively. In fact function sets are frequently denoted by $B^A$, and sometimes disjoint union is denoted by $+$; I deliberately chose different notations here to distinguish between the arithmetic and set-theoretic settings. (I am excluding the operation of succession because it is fundamentally different from the others, as it is a unary operation instead of binary, so it doesn't make much sense to talk about its associativity or commutativity; however, it does have a counterpart in set theory, namely the disjoint union with a prescribed one-element set).
In this set-theoretic language, arithmetic identities such as $a^{b+c} = a^b \cdot a^c$ turn into isomorphisms $(B\sqcup C)\to A \simeq (B\to A) \times (C\to A)$ between the underlying sets.
There are two main phenomena at play:
Repetition
What does it mean for a set construction to be another construction "repeated"? The following is enough for the purposes of this answer: if we have two constructions $\star$ and $\odot$, we will say that $\star$ is repeated $\odot$ if it is possible to define an operator $\bigodot_{x\in A} B(x)$ taking as inputs a family of sets $B(x)$ indexed by elements of another set $A$, such that the two operations are limiting cases of it, that is,
$$\bigodot_{x \in \{1,2,\ldots,n\}} \: B(x) \simeq ((B(1) \odot B(2)) \odot \ldots) \odot B(n),$$
and
$$\bigodot_{x \in A} \: B \simeq A \star B$$
whenever $B$ does not depend on $x$. (Note that for the big operator to be defined independently of the ordering of $A$, $\odot$ already needs to be commutative and associative up to isomorphism, so addition and multiplication are the only operations in the hyperoperation sequence whose set-theoretic counterparts can be "repeated", at least under this definition of repetition).
In the first case we have an obvious candidate, the generalized disjoint union $\bigsqcup_{x \in A} \: B(x)$ whose elements are pairs $(a,b)$ such that $a \in A$ and $b \in B(a)$. We can check by induction that the two above isomorphisms hold, so we can say that $\times$ is repeated $\sqcup$.
For the second case, we need to use an equivalent$^{(*)}$ definition of the Cartesian product:
- The Cartesian product $A\times B$ can be equivalently defined as the set of functions $f$ from $\{1, 2\}$ to $A \cup B$ such that $f(1) \in A$ and $f(2) \in B$.
We can think of $f$ as the pair $(f(1), f(2))$ encoded as a function of its two slots. Defining the generalized Cartesian product $\prod_{x \in A} \: B(x)$ whose elements are functions $f$ from $A$ to $\bigcup_{x\in A} B(x)$ such that $f(a) \in B(a)$, we can again check that the above properties hold, so $\to$ is repeated $\times$.
Duality
The phenomenon of duality is found everywhere in category theory; each concept has a corresponding opposite or co-concept. For example, the set-theoretic Cartesian product (under the second definition) is an example of what in category theory is called a product, and the disjoint union is the corresponding coproduct. Similarly, the function set corresponds to an exponential object, which is dual to the product in one of its arguments (categories having all three constructions are called bicartesian closed). Actually, to properly express the universal properties of exponential objects (and products under the first definition) would require something stronger than a category, called a Cartesian multicategory (see e.g. this PDF, or this recent MathOverflow answer). Category theory can even handle the big operators defined above, though their definition is more complicated.
To avoid introducing so much new terminology and state the dualities in the most obvious way possible, let me recapitulate what the elements on each construction do, in terms of the information they give you:
-
Labeled element: "I give you $1$ and I give you an element of $A$, or I give you $2$ and I give you an element of $B$".
-
Pair (definition I): "I give you an element of $A$ and I give you an element of $B$".
-
Pair (definition II): "You give me $1$, then I give you an element of $A$, or you give me $2$, then I give you an element of $B$".
-
Function: "You give me an element of $A$, then I give you an element of $B$".
Categorical duality in this case corresponds to replacing one or more of the outputs by inputs or viceversa, that is, swapping some "I give you... and" by "you give me... then". Hence there is an obvious duality between $A\sqcup B$ and $A\times B$ (under definition II), and another partial duality between $A\times B$ (under definition I) and $A\to B$, which affects only the first argument. I don't want to go into much detail, but this duality is manifested in category theory itself by interchanging source and target in the relevant morphisms (which in this context are just functions between sets, or multivariable functions in the case of multicategories).
Categorical duality is a powerful notion, and lets us translate properties of one construction to properties of the other. For example, to prove that $X\sqcup Y \simeq Y\sqcup X$, we note that if we swap the labels $1\leftrightarrow 2$ in the element of $A\sqcup B$ described by "You give me $1$, then I give you an $a$, or you give me $2$, then I give you a $b$", we get an element of $B\sqcup A$, and if we do it again we recover the original element, so we have an isomorphism. Now, applying duality, we immediately find an isomorphism $X\times Y \simeq Y\times X$. It's easy to see that the same argument allows us to translate any equational property of $\sqcup$ alone to a corresponding property of $\times$.
In the case of exponentiation, the partial duality means that not all morphisms are reversed, so in some sense we can only translate arithmetic properties that involve exclusively the exponents or exclusively the bases: defining $g_a(b,c)=(a^b)^c$, we have commutativity $g_a(b,c)=g_a(c,b)$ and something reminiscent of associativity, namely that $g_a(b,c,d,\ldots)$ exists and is well-defined for any number of arguments (but note that $g_a(g_a(b,c),d)\neq g_a(b,g_a(c,d))$). Intuitively, in a multivariable function described by "You give me a $c$, then you give me a $b$, then I give you an $a$", the elements $b$ and $c$ can freely switch places without modifying the overall meaning, but we can't do the same with $a$.
The question remains, what goes wrong if we try to dualize $A\times B$ with respect to both arguments? Taking the cardinality of the resulting construction, we do indeed obtain an operation which has all desirable properties: it is commutative, associative and distributes over $\times$. What is this mysterious operation?
Sadly, the operation turns out to be just $(1^a)^b=1$. The issue is that the corresponding construction receives two inputs but gives no output, and any function that gives no output must be isomorphic to the empty function, the unique member of the singleton set $\mathbf{1}$.
$(*)$: The fact that there are two equivalent definitions of Cartesian product is important, because there are situations where these definitions are not equivalent!
For example, in settings such as intuitionistic linear type theory (whose category-theoretic counterpart would be something like monoidal closed categories, or rather multicategories, as I mentioned above), there are two versions of the product, the tensor product $\otimes$ (definition I) and Cartesian product $\&$ (definition II), such that $\otimes$ is repeated $\sqcup$ and $\to$ is repeated $\&$, but $\otimes \neq \&$. This is because in this setting, the ambient logic (called linear logic) only allows you to call a function exactly once, so given a pair in function form (an element of $\&$) you will never be able to recover both coordinates, only one of them. Thus $\otimes$ needs not always share all the properties of $\sqcup$, and can be, for example, noncommutative (and indeed there is a noncommutative version of linear logic).
This might seem like a strange setting, but it has found applications in fields such as quantum mechanics (because of the so-called no-cloning theorem) and computer science (to handle situations where the availability of resources matters).