Regular expression help - comma delimited string

I don't write many regular expressions so I'm going to need some help on the one.

I need a regular expression that can validate that a string is an alphanumeric comma delimited string.

Examples:

  • 123, 4A67, GGG, 767 would be valid.
  • 12333, 78787&*, GH778 would be invalid
  • fghkjhfdg8797< would be invalid

This is what I have so far, but isn't quite right: ^(?=.*[a-zA-Z0-9][,]).*$

Any suggestions?


Sounds like you need an expression like this:

^[0-9a-zA-Z]+(,[0-9a-zA-Z]+)*$

Posix allows for the more self-descriptive version:

^[[:alnum:]]+(,[[:alnum:]]+)*$
^[[:alnum:]]+([[:space:]]*,[[:space:]]*[[:alnum:]]+)*$  // allow whitespace

If you're willing to admit underscores, too, search for entire words (\w+):

^\w+(,\w+)*$
^\w+(\s*,\s*\w+)*$  // allow whitespaces around the comma

Try this pattern: ^([a-zA-Z0-9]+,?\s*)+$

I tested it with your cases, as well as just a single number "123". I don't know if you will always have a comma or not.

The [a-zA-Z0-9]+ means match 1 or more of these symbols The ,? means match 0 or 1 commas (basically, the comma is optional) The \s* handles 1 or more spaces after the comma and finally the outer + says match 1 or more of the pattern.

This will also match 123 123 abc (no commas) which might be a problem This will also match 123, (ends with a comma) which might be a problem.


You seem to be lacking repetition. How about:

^(?:[a-zA-Z0-9 ]+,)*[a-zA-Z0-9 ]+$

I'm not sure how you'd express that in VB.Net, but in Python:

>>> import re
>>> x [ "123, $a67, GGG, 767", "12333, 78787&*, GH778" ]
>>> r = '^(?:[a-zA-Z0-9 ]+,)*[a-zA-Z0-9 ]+$'
>>> for s in x:
...    print re.match( r, s )
...
<_sre.SRE_Match object at 0xb75c8218>
None
>>>>

You can use shortcuts instead of listing the [a-zA-Z0-9 ] part, but this is probably easier to understand.

Analyzing the highlights:

  • [a-zA-Z0-9 ]+ : capture one or more (but not zero) of the listed ranges, and space.
  • (?:[...]+,)* : In non-capturing parenthesis, match one or more of the characters, plus a comma at the end. Match such sequences zero or more times. Capturing zero times allows for no comma.
  • [...]+ : capture at least one of these. This does not include a comma. This is to ensure that it does not accept a trailing comma. If a trailing comma is acceptable, then the expression is easier: ^[a-zA-Z0-9 ,]+

Yes, when you want to catch comma separated things where a comma at the end is not legal, and the things match to $LONGSTUFF, you have to repeat $LONGSTUFF:

$LONGSTUFF(,$LONGSTUFF)*

If $LONGSTUFF is really long and contains comma repeated items itself etc., it might be a good idea to not build the regexp by hand and instead rely on a computer for doing that for you, even if it's just through string concatenation. For example, I just wanted to build a regular expression to validate the CPUID parameter of a XEN configuration file, of the ['1:a=b,c=d','2:e=f,g=h'] type. I... believe this mostly fits the bill: (whitespace notwithstanding!)

xend_fudge_item_re = r"""
  e[a-d]x=          #register of the call return value to fudge
  (
    0x[0-9A-F]+ |   #either hardcode the reply
    [10xks]{32}     #or edit the bitfield directly
  )
"""
xend_string_item_re = r"""
  (0x)?[0-9A-F]+:   #leafnum (the contents of EAX before the call)
  %s                #one fudge
  (,%s)*            #repeated multiple times
""" % (xend_fudge_item_re, xend_fudge_item_re)
xend_syntax = re.compile(r"""
  \[                #a list of
   '%s'             #string elements
   (,'%s')*         #repeated multiple times
  \]
  $                 #and nothing else
""" % (xend_string_item_re, xend_string_item_re), re.VERBOSE | re.MULTILINE)

Try the following expression:

/^([a-z0-9\s]+,)*([a-z0-9\s]+){1}$/i

This will work for:

  1. test
  2. test, test
  3. test123,Test 123,test

I would strongly suggest trimming the whitespaces at the beginning and end of each item in the comma-separated list.