What does the term "reason about" mean in computer science?
While learning functional programming, I keep coming across the term "reason about" especially in the context of pure functions and/or referential transparency. Can somebody explain what exactly does this mean?
Typically when writing a program, your job doesn't end with merely writing the code, but you would also want to know some properties your code exhibits. You can arrive at these properties by two means: either by logical analysis or by empirical observation.
Examples of such properties include:
- correctness (does the program do what it is supposed to)
- performance (how long does it take)
- scalability (how is performance affected with input)
- security (can the algorithm be maliciously misused)
When you measure these properties empirically, you get results with limited precision. Therefore mathematically proving these properties is far superior, however it is not always easy to do. Functional languages typically have as one of their design goals making mathematical proofs of their properties more tractable. This is what is typically meant by reasoning about programs.
In terms of functions or lesser units, the above applies, but also sometimes the author simply means thinking about the algorithm or designing the algorithm. It depends on the particular usage.
As an aside, some examples of how one might reason about some of these things and how empirical observation can be made:
Correctness: We can prove that code is correct, if we can show equationally that it does what it is supposed to do. So for a sort function if we can show that any list we give it will have the property of being sorted, we know our code is correct. Empirically we can create a unit test suite where we give our code examples of input and check that the code has the desired output.
Performance & Scalability: We can analyze our code and prove performance bounds of the algorithm so that we know how does the time it takes depend on the size of the input. Empirically we can benchmark our code and see how quickly it actually runs on a particular machine. We can perform load testing and seeing how much actual input can our machine/algorithm take before it folds/becomes impractical.
Reasoning about code, in the most loose sense of the word, means thinking about your code and what it really does (not what you think it should do.) It means
- being aware of the way your code behaves when you throw data at it,
- knowing what things you can refactor without breaking it, and
- keeping tabs on what optimisations can be performed,
among other things. For me, the reasoning part plays the biggest role when I'm debugging or refactoring.
To use an example you mentioned: Referential transparency helps me a lot when I'm trying to figure out what's wrong with a function. The referential transparency guarantees that as I'm poking around with the function, giving it different arguments, I know that the function will react the same way inside my program. It doesn't depend on anything other than its arguments. That makes the function easier to reason about – as opposed to an imperative language where the function may depend on some external variable that can change under my nose.
Another way of looking at it (and this is more helpful when refactoring) is that the more you know your code satisfies certain properties, the easier it becomes to reason about. I know, for example, that
map f (map g xs) === map (f . g) xs
This is a useful property I can just apply directly when I'm refactoring. The fact that I can state such properties of Haskell code makes it easier to reason about. I could try to state this property in a Python program, but I would be much, much less confident of it, because if I'm unlucky in my choices of f
and g
the results may vary wildly.
Informally it means, "Being able to tell what a program will do just by looking at the code." This can be surprisingly difficult in most languages, due to side-effects, casting, implicit conversions, overloaded functions and operators, etc. I.e., when you can't reason about code using just your brain, you have to run it to see what it will do for a given input.
Usually when people say "reasoning" they mean "equational reasoning", which means proving properties about your code without ever running it.
These properties can be very simple. For example, given the following definition of (.)
and id
:
id :: a -> a
id x = x
(.) :: (b -> c) -> (a -> b) -> (a -> c)
(f . g) = \x -> f (g x)
... we might then want to prove that:
f . id = f
This is easy to prove because:
(f . id) = \x -> f (id x) = \x -> f x = f
Notice how I've proven this for all f
. This means that I know that this property is always true no matter what, therefore I no longer need to test this property in some sort of unit test suite because I know that it can never fail.
"Reasoning about a program" is just "analyzing a program to see what it does".
The idea is that purity simplifies understanding, both by a human changing a program and by a machine compiling a program or analyzing it for broken corner cases.