How to match accented characters with a regex?
Instead of \w
, use the POSIX bracket expression [:alpha:]
:
"blåbær dèjá vu".scan /[[:alpha:]]+/ # => ["blåbær", "dèjá", "vu"]
"blåbær dèjá vu".scan /\w+/ # => ["bl", "b", "r", "d", "j", "vu"]
In your particular case, change the regex to this:
NAME_REGEX = /^[[:alpha:]\s'"\-_&@!?()\[\]-]*$/u
This does match much more than just accented characters, though. Which is a good thing. Make sure you read this blog entry about common misconceptions regarding names in software applications.