How to match Cyrillic characters with a regular expression
How do I match French and Russian Cyrillic alphabet characters with a regular expression? I only want to do the alpha characters, no numbers or special characters. Right now I have
[A-Za-z]
Solution 1:
If your regex
flavor supports Unicode blocks ([\p{IsCyrillic}]
), you can match Cyrillic characters with:
[\p{IsCyrillic}] or [\p{Cyrillic}]
Otherwise try using:
[U+0400–U+04FF]
For PHP
use:
[\x{0400}-\x{04FF}]
Explanation:
[\p{IsCyrillic}]
Match a character from the Unicode block “Cyrillic” (U+0400–U+04FF) «[\p{IsCyrillic}]»
Note:
Unicode Characters list and Numeric HTML Entities of [U+0400–U+04FF]
.
Solution 2:
It depends on your regex flavor. If it supports Unicode character classes (like .NET, for instance), \p{L}
matches a letter character (in any character set).
Solution 3:
To match only Russian Cyrillic characters use:
[\u0401\u0451\u0410-\u044f]
which is the equivalent of:
[ЁёА-я]
where А
is Cyrillic, not Latin. (Despite looking the same they have different codes)
\p{IsCyrillic}
, \p{Cyrillic}
, [\u0400-\u04FF]
which others suggested will match all variants of Cyrillic, not only Russian