Technical terminology for a non-technical audience

In documentation for a non-technical audience, I was asked about usage around the term "regular expression."

My colleague was asking if one of "RegEx", "Regex" or "regex" would be preferable, I responded with "regular expression" on first usage, then "expression" or "pattern" on subsequent usages.

Has anyone else had to tackle this question?

I fear my AP style guide is silent on this topic.


You should call them a pattern. Tell them I said so.


Edit

Apparently my drive-by downvoter didn’t care for “tell ’em I said so”. However, I quite assure you that it is germane, and indeed, a proper reference.

In particular, I said so in the Glossary of Programming Perl [O’Reilly Media], in its 2nd, 3rd, 4th editions published respectively in 1995, 2000, and 2012:

pattern: A template used in pattern matching.

pattern matching: Taking a pattern, usually a regular expression, and trying the pattern various ways on a string to see whether there’s any way to make it fit. Often used to pick interesting tidbits out of a file.

regex: See regular expression.

regular expression: A single entity with various interpretations, like an elephant. To a computer scientist, it’s a grammar for a little language in which some strings are legal and others aren’t. To normal people, it’s a pattern you can use to find what you’re looking for when it varies from case to case. Perl’s regular expressions are far from regular in the theoretical sense, but in regular use they work quite well. Here’s a regular expression: /Oh s.*t./. This will match strings like “Oh say can you see by the dawn's early light” and “Oh sit!”. See Chapter 5.

The referenced Chapter 5 is titled “Pattern Matching”. It is not titled “Regular Expressions”. There are several reasons for that. One is that it is a mouthful. Another, as alluded to in the glossary entries above, is that in formal computational theory, a regular expression is an expression that describes a particular type of formal language: a regular one. And what “regular” means is quite specialized.

But before we can get into that, understand that even “language” here means a specific, technical thing that is not what users of regular English mean by “language”. It means “a possibly infinite set of strings”.

Moreover, a regular language is a language (that is, a set of strings) that can be parsed without requiring any auxiliary space proportionate to the size of the input set. Another way of saying it is that if the language is regular, then it is one that can be parsed using a deterministic finite automaton, or DFA for short.

In theory, there are formal requirements for determining whether a language is regular or not. You can read the link for how to do so, because lamentably enough, I cannot use TeX for the equations needed here in the ELU StackExchange site.

Examples of languages that cannot be parsed by a DFA include such things as palindromes and nested parentheses. Formally put, even something so simple as anbn cannot be solved (parsed) by a DFA, where that means n repetitions of some string a followed by that same n repetitions of some other string b.

However, format definitions and formal theory aside, the patterns used by modern programming languages and tools can often enough indeed match such things. Indeed, as soon as write a pattern as simple as (a+).*\1, you have violated the rules for what a DFA can solve, because you just required unbounded auxiliary storage to solve the back reference.

So theory gets in the way of practice here. The practice is that regular users (for the regular sense of regular, not the irregular one) continue to use in their daily language (here for the first time in this article at last meaning the regular thing most people mean by “language”) the term “regular expressions” long after those patterns had ceased to be such.

Plus no matter how you define it, a “regular expression” is a mouthful. It’s six syllables, and neither word means what people think it means. Both “regex” and “pattern” address the length concern, but only “pattern” alone addresses the formal concern.

It is for these reasons, amongst others, that I prefer the term “pattern”. Indeed, all told, the term “regular expression” occurs only 197 times in the 4th Edition of Programming Perl, while “pattern” occurs 541 times. Furthermore, “regex” occurs 156 times. Replacing the shorter terms with the longer one would not only have been technically inaccurate, it would have required more tree-killing and back-breaking madness, and I might have had to resort to a different binding technology, as the book is already at risk of being bigger than a breadbox.

And that is why I said, “tell them I said so”. I really did say so, and from an actual position of authority on the matter. The laconic response that my downvoter seems to have confused for cavalier flippancy was nothing more, nor one whit less, than genuine compassion for the reader. I was being compassionate in my brevity. Apparently compassion is not here universally valued—nor, by some, brevity; and so, with this brief edit, do I set both such matters aright..

Or you can just tell ’em I said so. Your choice.


There are several shorten versions of the term "regular expression". Such as "regex" and "regexp".

The regular expression part is nothing more then a special character string representing a non-standard pattern of matching terms. The implementation is dependent upon the regex library that was used to implement the matching.

Regular expressions by design are technical. You must be capable of understanding sequences, groups and pattern matching. Otherwise, educating a reader on the subject of regular expressions is pointless.

As a result, I see no reason why your readers could not handle the following.

"A regular expression (regex or regexp for short) is a special text string for describing a search pattern."

And then continue to refer to the term as regex. A task that children might find difficult to follow in text, but who would write a children's book on regular expressions?