Why doesn't [01-12] range work as expected?
Solution 1:
You seem to have misunderstood how character classes definition works in regex.
To match any of the strings 01
, 02
, 03
, 04
, 05
, 06
, 07
, 08
, 09
, 10
, 11
, or 12
, something like this works:
0[1-9]|1[0-2]
References
-
regular-expressions.info/Character Classes
- Numeric Ranges (have many examples on matching strings interpreted as numeric ranges)
Explanation
A character class, by itself, attempts to match one and exactly one character from the input string. [01-12]
actually defines [012]
, a character class that matches one character from the input against any of the 3 characters 0
, 1
, or 2
.
The -
range definition goes from 1
to 1
, which includes just 1
. On the other hand, something like [1-9]
includes 1
, 2
, 3
, 4
, 5
, 6
, 7
, 8
, 9
.
Beginners often make the mistakes of defining things like [this|that]
. This doesn't "work". This character definition defines [this|a]
, i.e. it matches one character from the input against any of 6 characters in t
, h
, i
, s
, |
or a
. More than likely (this|that)
is what is intended.
References
- regular-expressions.info/Brackets for Grouping and Alternation with the vertical bar
How ranges are defined
So it's obvious now that a pattern like between [24-48] hours
doesn't "work". The character class in this case is equivalent to [248]
.
That is, -
in a character class definition doesn't define numeric range in the pattern. Regex engines doesn't really "understand" numbers in the pattern, with the exception of finite repetition syntax (e.g. a{3,5}
matches between 3 and 5 a
).
Range definition instead uses ASCII/Unicode encoding of the characters to define ranges. The character 0
is encoded in ASCII as decimal 48; 9
is 57. Thus, the character definition [0-9]
includes all character whose values are between decimal 48 and 57 in the encoding. Rather sensibly, by design these are the characters 0
, 1
, ..., 9
.
See also
- Wikipedia/ASCII
Another example: A to Z
Let's take a look at another common character class definition [a-zA-Z]
In ASCII:
-
A
= 65,Z
= 90 -
a
= 97,z
= 122
This means that:
-
[a-zA-Z]
and[A-Za-z]
are equivalent - In most flavors,
[a-Z]
is likely to be an illegal character range- because
a
(97) is "greater than" thanZ
(90)
- because
-
[A-z]
is legal, but also includes these six characters:-
[
(91),\
(92),]
(93),^
(94),_
(95),`
(96)
-
Related questions
- is the regex [a-Z] valid and if yes then is it the same as [a-zA-Z]
Solution 2:
A character class in regular expressions, denoted by the [...]
syntax, specifies the rules to match a single character in the input. As such, everything you write between the brackets specify how to match a single character.
Your pattern, [01-12]
is thus broken down as follows:
- 0 - match the single digit 0
- or, 1-1, match a single digit in the range of 1 through 1
- or, 2, match a single digit 2
So basically all you're matching is 0, 1 or 2.
In order to do the matching you want, matching two digits, ranging from 01-12 as numbers, you need to think about how they will look as text.
You have:
- 01-09 (ie. first digit is 0, second digit is 1-9)
- 10-12 (ie. first digit is 1, second digit is 0-2)
You will then have to write a regular expression for that, which can look like this:
+-- a 0 followed by 1-9
|
| +-- a 1 followed by 0-2
| |
<-+--> <-+-->
0[1-9]|1[0-2]
^
|
+-- vertical bar, this roughly means "OR" in this context
Note that trying to combine them in order to get a shorter expression will fail, by giving false positive matches for invalid input.
For instance, the pattern [0-1][0-9]
would basically match the numbers 00-19, which is a bit more than what you want.
I tried finding a definite source for more information about character classes, but for now all I can give you is this Google Query for Regex Character Classes. Hopefully you'll be able to find some more information there to help you.
Solution 3:
This also works:
^([1-9]|[0-1][0-2])$
[1-9]
matches single digits between 1 and 9
[0-1][0-2]
matches double digits between 10 and 12
There are some good examples here
Solution 4:
The []
s in a regex denote a character class. If no ranges are specified, it implicitly ors every character within it together. Thus, [abcde]
is the same as (a|b|c|d|e)
, except that it doesn't capture anything; it will match any one of a
, b
, c
, d
, or e
. All a range indicates is a set of characters; [ac-eg]
says "match any one of: a
; any character between c
and e
; or g
". Thus, your match says "match any one of: 0
; any character between 1
and 1
(i.e., just 1
); or 2
.
Your goal is evidently to specify a number range: any number between 01
and 12
written with two digits. In this specific case, you can match it with 0[1-9]|1[0-2]
: either a 0
followed by any digit between 1
and 9
, or a 1
followed by any digit between 0
and 2
. In general, you can transform any number range into a valid regex in a similar manner. There may be a better option than regular expressions, however, or an existing function or module which can construct the regex for you. It depends on your language.
Solution 5:
Use this:
0?[1-9]|1[012]
- 07: valid
- 7: valid
- 0: not match
- 00 : not match
- 13 : not match
- 21 : not match
To test a pattern as 07/2018 use this:
/^(0?[1-9]|1[012])\/([2-9][0-9]{3})$/
(Date range between 01/2000 to 12/9999 )