Matching accented characters with Javascript regexes

Solution 1:

This worked for me:

/^[a-z\u00E0-\u00FC]+$/i

With help from here

Solution 2:

The reason why /\bà/.test("à") doesn't match is because "à" is not a word character. The escape sequence \b matches only between a boundary of word character and a non word character. /\ba/.test("a") matches because "a" is a word character. Because of that, there is a boundary between the beginning of the string (which is not a word character) and the letter "a" which is a word character.

Word characters in JavaScript's regex is defined as [a-zA-Z0-9_].

To match an accented character at the start of a string, just use the ^ character at the beginning of the regex (e.g. /^à/). That character means the beginning of the string (unlike \b which matches at any word boundary within the string). It's most basic and standard regular expression, so it's definitely not over the top.

Solution 3:

Stack Overflow had also an issue with non ASCII characters in regex, you can find it here. They are not coping with word boundaries, but maybe gives you anyway useful hints.

There is another page, but he wants to match strings and not words.

I don't know, and did not find now, an anchor for your problem, but when I see what monster regexes in my first link are used, your group, that you want to avoid, is not over the top and to my opinion your solution.

Solution 4:

const regex = /^[\-/A-Za-z\u00C0-\u017F ]+$/;
const test1 = regex.test("à");
const test2 = regex.test("Martinez-Cortez");
const test3 = regex.test("Leonardo da vinci");
const test4 = regex.test("ï");

console.log('test1', test1);
console.log('test2', test2);
console.log('test3', test3);
console.log('test4', test4);

Building off of Wak's and Cœur's answer:

/^[\-/A-Za-z\u00C0-\u017F ]+$/

Works for spaces and dashes too.

Example: Leonardo da vinci, Martinez-Cortez