Regular expression for letters, numbers and - _

I'm having trouble checking in PHP if a value is is any of the following combinations

  • letters (upper or lowercase)
  • numbers (0-9)
  • underscore (_)
  • dash (-)
  • point (.)
  • no spaces! or other characters

a few examples:

  • OK: "screen123.css"
  • OK: "screen-new-file.css"
  • OK: "screen_new.js"
  • NOT OK: "screen new file.css"

I guess I need a regex for this, since I need to throw an error when a give string has other characters in it than the ones mentioned above.


The pattern you want is something like (see it on rubular.com):

^[a-zA-Z0-9_.-]*$

Explanation:

  • ^ is the beginning of the line anchor
  • $ is the end of the line anchor
  • [...] is a character class definition
  • * is "zero-or-more" repetition

Note that the literal dash - is the last character in the character class definition, otherwise it has a different meaning (i.e. range). The . also has a different meaning outside character class definitions, but inside, it's just a literal .

References

  • regular-expressions.info/Anchors, Character Classes and Repetition

In PHP

Here's a snippet to show how you can use this pattern:

<?php

$arr = array(
  'screen123.css',
  'screen-new-file.css',
  'screen_new.js',
  'screen new file.css'
);

foreach ($arr as $s) {
  if (preg_match('/^[\w.-]*$/', $s)) {
    print "$s is a match\n";
  } else {
    print "$s is NO match!!!\n";
  };
}

?>

The above prints (as seen on ideone.com):

screen123.css is a match
screen-new-file.css is a match
screen_new.js is a match
screen new file.css is NO match!!!

Note that the pattern is slightly different, using \w instead. This is the character class for "word character".

API references

  • preg_match

Note on specification

This seems to follow your specification, but note that this will match things like ....., etc, which may or may not be what you desire. If you can be more specific what pattern you want to match, the regex will be slightly more complicated.

The above regex also matches the empty string. If you need at least one character, then use + (one-or-more) instead of * (zero-or-more) for repetition.

In any case, you can further clarify your specification (always helps when asking regex question), but hopefully you can also learn how to write the pattern yourself given the above information.


you can use

^[\w.-]+$

the + is to make sure it has at least 1 character. Need the ^ and $ to denote the begin and end, otherwise if the string has a match in the middle, such as @@@@xyz%%%% then it is still a match.

\w already includes alphabets (upper and lower case), numbers, and underscore. So the rest ., -, are just put into the "class" to match. The + means 1 occurrence or more.

P.S. thanks for the note in the comment about preventing - to denote a range.


To actually cover your pattern, i.e, valid file names according to your rules, I think that you need a little more. Note this doesn't match legal file names from a system perspective. That would be system dependent and more liberal in what it accepts. This is intended to match your acceptable patterns.

^([a-zA-Z0-9]+[_-])*[a-zA-Z0-9]+\.[a-zA-Z0-9]+$

Explanation:

  • ^ Match the start of a string. This (plus the end match) forces the string to conform to the exact expression, not merely contain a substring matching the expression.
  • ([a-zA-Z0-9]+[_-])* Zero or more occurrences of one or more letters or numbers followed by an underscore or dash. This causes all names that contain a dash or underscore to have letters or numbers between them.
  • [a-zA-Z0-9]+ One or more letters or numbers. This covers all names that do not contain an underscore or a dash.
  • \. A literal period (dot). Forces the file name to have an extension and, by exclusion from the rest of the pattern, only allow the period to be used between the name and the extension. If you want more than one extension that could be handled as well using the same technique as for the dash/underscore, just at the end.
  • [a-zA-Z0-9]+ One or more letters or numbers. The extension must be at least one character long and must contain only letters and numbers. This is typical, but if you wanted allow underscores, that could be addressed as well. You could also supply a length range {2,3} instead of the one or more + matcher, if that were more appropriate.
  • $ Match the end of the string. See the starting character.

This is the pattern you are looking for

/^[\w-_.]*$/

What this means:

  • ^ Start of string
  • [...] Match characters inside
  • \w Any word character so 0-9 a-z A-Z
  • -_. Match - and _ and .
  • * Zero or more of pattern or unlimited
  • $ End of string

If you want to limit the amount of characters:

/^[\w-_.]{0,5}$/

{0,5} Means 0-5 characters