Why are the order-of-operations conventions good?

Children are sometimes taught silly mnemonics like "PEMDAS" to remember conventions on order of operations. (I never heard of "PEMDAS" until long after graduating from college, as far as I can recall. I think it means (1) parentheses, (2) exponentiation, (3) multiplication and division, and (4) addition and subtraction.)

I think it would be better to help them understand why those particular conventions, rather than some others, are a good thing. Maybe even demonstrably optimal by some precisely definable desiderata?

How would one make the case for the usual conventions against possible alternatives?

I think I might say that if one operation distributes over another, it should come first, and we go from left to right because we normally read that way. Maybe I'll post my own answer to this if I'm so inspired at some point.

Solution 1:

Well, since parentheses exist precisely to specify the intended order of operations in case the usual default rules don't cut it, it makes sense that they come first

As for exponentation, I'd say that this is a consequence of using superscripts to indicate exponentation, since those (via font size) provide a natural grouping. It'd certainly be very weird if $a^b + c$ meant $a^{(b+c)}$ instead of $(a^b) + c$, since the different font sizes of $b$ and $c$ indicate that they're somehow on different levels.

As MJD pointed out though, this arguments only applies to the exponent. Font size alone doesn't explain why $a + b^c$ means $a + (b^c)$ and not $(a + b)^c$ and the same for $a\cdot b^c$ vs. $a\cdot(b^c)$ respectively $(a\cdot b)^c$. For these, I'd argue that it's also a matter of visual grouping. In both $a\cdot b^c$ and $a + b^c$, the exponent is written extremely close to the $b$, without a symbol which'd separate the two. On the other hand $a$ and $b$ are separated by either a $+$ or a $\cdot$. Now, for multiplication the dot may be omitted, but it doesn't have to be omitted, i.e. since $ab$ and $a\cdot b$ are equivalent, one naturally wants $ab^c$ and $a\cdot b^c$ to be equivalent too.

For multiplication, division, addition subtraction, I always felt that the choice is somewhat arbitrary. Having said that, one reason that does speak in favour of having multiplication take precedence over addition is that one is allowed to leave out the dot and simply write $ab$ instead of $a\cdot b$. Since this isn't allowed for addition, in a lot of cases the terms which are multiplied will be closer together than those which are added, so most people will probably recognize them as "belonging together".

You may then ask "how come we're allowed to leave out the dot, but not the plus sign". This, I believe is a leftover from times when equations where stated in natural language. In most langues, you say something like "three apples" to indicate, well, three apples. In other words, you simply prefix a thing by a number to indicate multiple instances of that thing. This property of natural languages is mimicked in equations by allowing one to write $3x$ with the understanding that it means "3 of whatever $x$ is".

Solution 2:

I suggest there is a fundamental reason we do exponentiation before multiplication before addition: we do the "most powerful" operation first. I don't have any clear evidence to cite, though in pre-SE days Dr Math agreed with me that this is the key.

To clarify what I mean by "powerful", the hyperoperation sequence is a sequence of arithmetic operations, starting with the most basic: finding the successor. For instance the successor of 5 is 6. If I want to add 3 to 5, then that means I have to find 5's successor is 6, 6's successor is 7, and 7's successor is 8. In other words addition of 3 is just succession repeated (iterated) 3 times. So addition is the next operation in the sequence.

If I do iterated addition, I perform a multiplication (e.g. $3 \times 5 = 5+5+5$) so that's next on the list. And iterated multiplication is exponentiation (e.g. $5^3=5\times 5\times 5$). These are considered the "elementary operations". Of course the hyperoperation sequence doesn't stop there: iterated expoentiation is tetration , e.g. $^{3}2 = 2^{2^{{2}}}=2^4=16$ (had to pick one with small numbers as they get very big very quickly!). Next come pentation (iterated tetration), hexation (iterated pentation)... Knuth invented a lovely system of up-arrow notation to represent these hugely powerful operations in a neat way. Learn it, and you can now win all those "who can write down the biggest number" games that kids play!

So what's my point? There really is a clear and well-defined sense in which addition, multiplication and exponentiation belong on a sequence of increasing power. Our order of operations are defined so that we do the most powerful first, unless parantheses tell us to do things differently. It makes intuitive sense to me that more powerful operations should have priority, although if on some other world the least important ones get done first, people used to that system may see it as intuitive too! At any rate, this seems more convincing to me than typesetting convention (circular argument, since different orders of operation may have led to different notation?) or compliance with certain practical examples (e.g. in which we multiply first then add, but not with the equally abundant practical examples where we add first then multiply so have to resort to parantheses to express the order correctly).

There are many other interesting ways to order arithmetic around. For anybody who hasn't done so, have a play with Reverse Polish notation sometime! I found this really clarified the importance of order of operations to me (you need to think carefully what to type in first), as well as "how a computer/calculator thinks".

Final thought: why do I find it intuitive to do most powerful first, other than being accustomed to it? A consequence of my earlier answer is that the more powerful operations are defined by iteratively applying less powerful operations, so maybe "more to less" is more natural. Someone who tries to sort out the less powerful operations first, and "move up" to the more powerful operations, will still end up breaking down the higher operations back into lower ones. In that sense "less to more" doesn't work so well. In fact if you really want to, you can break all the operations down to the successor function $\operatorname{succ}(n)=n+1$, its inverse, the predecessor function $\operatorname{pred}(n)=n-1$, and $H_n(a,b)$ defined by:

$$H_n(a, b) = \begin{cases} \operatorname{succ}(b) & \text{if } n = 0 \\ a &\text{if } n = 1, b = 0 \\ 0 &\text{if } n = 2, b = 0 \\ 1 &\text{if } n \ge 3, b = 0 \\ H_{\operatorname{pred}(n)}(a, H_n(a, \operatorname{pred}(b))) & \text{otherwise} \end{cases}\,\!$$

Setting $n=0,1,2,3,4,\ldots$ gives succession, addition, multiplication, exponentiation, tetration... the whole hyperoperation sequence! In particular $H_0(a, b) = \operatorname{succ}(b)$, $H_1(a, b) = a + b$, $H_2(a, b) = a \times b$, $H_3(a, b) = a^{b} = a\uparrow{b}$ (in Knuth's notation), $H_4(a, b)=^{b}a=a\uparrow\uparrow{b}$ and so on. Try expanding by hand some of the examples in my answer: $H_0(7,5)= \operatorname{succ}(5)$, $H_1(5,3)=5+3$, $H_2(5,3)=5\times 3$, $H_3(5,3)=5^3$, $H_4(2,3)=2\uparrow\uparrow{3}$. It's quite instructive (once you get on to $H_2$ it's less tedious if you use the fact that $H_1$ means addition, and so on for higher $n$) - one gets to see how the operations all follow from iterating successor and predecessor, and in what sense each is an extension of the previous in the sequence. You'll notice how $b$ acts as a "counter" that ticks down to zero, and understand why you need the $a$, 0 and 1 for the cases $n=1,n=2,n\ge 3$. (Roughly it's to handle inconsistencies in the manner each operation repeats the previous one. When I say $5+3$ is succession repeated 3 times, I start applying those successions to the 5. When I say $5 \times 3$ is adding 5 on, 3 times, it is being added on to zero. And $5^3$ is only "multiplying by 5, 3 times" if I start at 1! I have found these inconsistencies to be a source of confusion to high school students.)

Solution 3:

I think that the conventions now in use are not necessarily better than any other possible convention; that they are what they are simply is a codification of historical usage; and the way to “make the case for the usual conventions against possible alternatives” is to observe that to change them would cause an untold amount of unnecessary difficulty, frustration, and anger.

Solution 4:

Order of operations conventions are rooted in reality. At the heart of it all, mathematics tries to model our everyday experience, and our order of operations reflects that.

First of all, let's agree that addition and subtraction are really the same thing, and multiplication and division (when defined) are also really the same thing, so I can just say "addition" when I mean "addition and subtraction" and "multiplication" when I mean "multiplication and division."

Why should multiplication take precedence over addition? Suppose Alice were to give me $5$ bags of $8$ apples, and Bob were to give me $3$ bags of $4$ apples. How many apples do I have?

A good guess is that the number of apples I have is equal to the number that Alice gave me plus the number that Bob gave me. Alice gave me $5 \cdot 8=40$ apples, and Bob gave me $3\cdot 4=12$ apples; therefore I should have received $40+12=52$ apples. This is an observation from everyday experience: if I get $x$ things from Alice and $y$ things from Bob, I received $x+y$ things in total.

Let's look at how varying the order of operations plays out! The number is given by the expression $5\cdot 8+3\cdot 4$. If multiplication comes first, we get $52$ apples; if addition comes first, then we get $220$ apples; and if neither takes precedence and we proceed left to right, we get $172$ apples. Via experiment, we have obtained evidence suggesting that doing multiplication first is wiser than the other options.

The reasoning for exponentiation is similar. Why should $3\cdot 2^3$ be $3\cdot 8$ instead of $6^3$? Imagine I had $3$ identical cubes of side length $2$ cm. What is the total volume of the cubes? Volume is additive, so I should add together the volumes of each cube. Each cube has volume $2^3~\text{cm}^3$, and there are 3 cubes; thus the total volume is $24~\text{cm}^3$. I would be very surprised if you told me these 3 cubes took up $6^3=216$ cubic centimeters - they certainly didn't look that big when I first got them!

I'll let you imagine a reason in the same vein for exponentiation before addition. For parentheses fgp has given the essential answer - the entire purpose of parentheses is to group things and make sure operations inside them happen independently of the rest of the expression. And you can make a similar real-life analogy here too.

Of course, once we start talking about real numbers things get a little more delicate. However, historically these operations started out defined for just the natural numbers (no zero or negative numbers yet), then became generalized. And when they were generalized, the generalization happened in a way that preserved these properties. (Michael Spivak gives a charming explanation of this sort when he defines the exponential function in Calculus, 4th edition.) And this is why the group and field axioms are what they are!

Moral: Mathematics is an experimental science at heart.

Solution 5:

The "order of operations" is just a (more or less) arbitrary convention, mostly used to simplify writing.

It is definitely not universal. E.g. programming languages have subtly different takes on this, for eternal fun of bitten programmers. There are also extreme examples, like APL, where all operations are done strictly right-to-left, unless parentheses say otherwise. I.e., in APL a * b + c and a * (b + c) are the same. In LISP and its many derivatives, every operation has to be expressed with parentheses, like (+ (* a b) c)

Convolution intuition: clarifying Terence Tao's "blurring"/"fuzz" interpretation

Uniform convergence and weak convergence

A conjecture: for all $n\in\mathbb{N}$, the least $k>1$ such that $\phi(k)\geqslant n$ is a prime

Show that the sequence ${a_n}$ converges where $a_n = \sqrt{1+\sqrt{2+\sqrt{3+\cdots+\sqrt{n}}}}$ for $n\geq 1$.

What is a concrete example of why one wants to have a *derived category* in algebraic geometry?

Chinese Remainder Theorem clarification

Making a convex polyhedron with two sheets of paper

The function $f'+f'''$ has at least $3$ zeros on $[0,2\pi]$.

Show $\sum_{k=1}^{\infty}\left(\frac{1+\sin(k)}{2}\right)^k$ diverges

Bayes, two tests in a row

"Wheel Theory", Extended Reals, Limits, and "Nullity": Can DNE limits be made to equal the element "$0/0$"?

How do you prove $x^2$ is convex using only the definition of convexity?