Regular expression help - comma delimited string
I don't write many regular expressions so I'm going to need some help on the one.
I need a regular expression that can validate that a string is an alphanumeric comma delimited string.
Examples:
-
123, 4A67, GGG, 767
would be valid. -
12333, 78787&*, GH778
would be invalid -
fghkjhfdg8797<
would be invalid
This is what I have so far, but isn't quite right: ^(?=.*[a-zA-Z0-9][,]).*$
Any suggestions?
Sounds like you need an expression like this:
^[0-9a-zA-Z]+(,[0-9a-zA-Z]+)*$
Posix allows for the more self-descriptive version:
^[[:alnum:]]+(,[[:alnum:]]+)*$
^[[:alnum:]]+([[:space:]]*,[[:space:]]*[[:alnum:]]+)*$ // allow whitespace
If you're willing to admit underscores, too, search for entire words (\w+
):
^\w+(,\w+)*$
^\w+(\s*,\s*\w+)*$ // allow whitespaces around the comma
Try this pattern: ^([a-zA-Z0-9]+,?\s*)+$
I tested it with your cases, as well as just a single number "123". I don't know if you will always have a comma or not.
The [a-zA-Z0-9]+
means match 1 or more of these symbols
The ,?
means match 0 or 1 commas (basically, the comma is optional)
The \s*
handles 1 or more spaces after the comma
and finally the outer +
says match 1 or more of the pattern.
This will also match
123 123 abc
(no commas) which might be a problem
This will also match 123,
(ends with a comma) which might be a problem.
You seem to be lacking repetition. How about:
^(?:[a-zA-Z0-9 ]+,)*[a-zA-Z0-9 ]+$
I'm not sure how you'd express that in VB.Net, but in Python:
>>> import re
>>> x [ "123, $a67, GGG, 767", "12333, 78787&*, GH778" ]
>>> r = '^(?:[a-zA-Z0-9 ]+,)*[a-zA-Z0-9 ]+$'
>>> for s in x:
... print re.match( r, s )
...
<_sre.SRE_Match object at 0xb75c8218>
None
>>>>
You can use shortcuts instead of listing the [a-zA-Z0-9 ]
part, but this is probably easier to understand.
Analyzing the highlights:
-
[a-zA-Z0-9 ]+
: capture one or more (but not zero) of the listed ranges, and space. -
(?:[...]+,)*
: In non-capturing parenthesis, match one or more of the characters, plus a comma at the end. Match such sequences zero or more times. Capturing zero times allows for no comma. -
[...]+
: capture at least one of these. This does not include a comma. This is to ensure that it does not accept a trailing comma. If a trailing comma is acceptable, then the expression is easier:^[a-zA-Z0-9 ,]+
Yes, when you want to catch comma separated things where a comma at the end is not legal, and the things match to $LONGSTUFF
, you have to repeat $LONGSTUFF
:
$LONGSTUFF(,$LONGSTUFF)*
If $LONGSTUFF
is really long and contains comma repeated items itself etc., it might be a good idea to not build the regexp by hand and instead rely on a computer for doing that for you, even if it's just through string concatenation. For example, I just wanted to build a regular expression to validate the CPUID parameter of a XEN configuration file, of the ['1:a=b,c=d','2:e=f,g=h']
type. I... believe this mostly fits the bill: (whitespace notwithstanding!)
xend_fudge_item_re = r"""
e[a-d]x= #register of the call return value to fudge
(
0x[0-9A-F]+ | #either hardcode the reply
[10xks]{32} #or edit the bitfield directly
)
"""
xend_string_item_re = r"""
(0x)?[0-9A-F]+: #leafnum (the contents of EAX before the call)
%s #one fudge
(,%s)* #repeated multiple times
""" % (xend_fudge_item_re, xend_fudge_item_re)
xend_syntax = re.compile(r"""
\[ #a list of
'%s' #string elements
(,'%s')* #repeated multiple times
\]
$ #and nothing else
""" % (xend_string_item_re, xend_string_item_re), re.VERBOSE | re.MULTILINE)
Try the following expression:
/^([a-z0-9\s]+,)*([a-z0-9\s]+){1}$/i
This will work for:
- test
- test, test
- test123,Test 123,test
I would strongly suggest trimming the whitespaces at the beginning and end of each item in the comma-separated list.