Are Java and C# regular expressions compatible?
Both languages claim to use Perl style regular expressions. If I have one language test a regular expression for validity, will it work in the other? Where do the regular expression syntaxes differ?
The use case here is a C# (.NET) UI talking to an eventual Java back end implementation that will use the regex to match data.
Note that I only need to worry about matching, not about extracting portions of the matched data.
There are quite (a lot of) differences.
Character Class
- Character classes subtraction
[abc-[cde]]
- .NET YES (2.0)
- Java: Emulated via character class intersection and negation:
[abc&&[^cde]]
)
- Character classes intersection
[abc&&[cde]]
- .NET: Emulated via character class subtraction and negation:
[abc-[^cde]]
) - Java YES
- .NET: Emulated via character class subtraction and negation:
-
\p{Alpha}
POSIX character class- .NET NO
- Java YES (US-ASCII)
- Under
(?x)
modeCOMMENTS
/IgnorePatternWhitespace
, space (U+0020) in character class is significant.- .NET YES
- Java NO
-
Unicode Category (L, M, N, P, S, Z, C)
- .NET YES:
\p{L}
form only - Java YES:
- From Java 5:
\pL
,\p{L}
,\p{IsL}
- From Java 7:
\p{general_category=L}
,\p{gc=L}
- From Java 5:
- .NET YES:
-
Unicode Category (Lu, Ll, Lt, ...)
- .NET YES:
\p{Lu}
form only - Java YES:
- From Java 5:
\p{Lu}
,\p{IsLu}
- From Java 7:
\p{general_category=Lu}
,\p{gc=Lu}
- From Java 5:
- .NET YES:
-
Unicode Block
- .NET YES:
\p{IsBasicLatin}
only. (Supported Named Blocks) - Java YES: (name of the block is free-casing)
- From Java 5:
\p{InBasicLatin}
- From Java 7:
\p{block=BasicLatin}
,\p{blk=BasicLatin}
- From Java 5:
- .NET YES:
- Spaces, and underscores allowed in all long block names (e.g.
BasicLatin
can be written asBasic_Latin
orBasic Latin
)- .NET NO
- Java YES (Java 5)
Quantifier
-
?+
,*+
,++
and{m,n}+
(possessive quantifiers)- .NET NO
- Java YES
Quotation
-
\Q...\E
escapes a string of metacharacters- .NET NO
- Java YES
-
\Q...\E
escapes a string of character class metacharacters (in character sets)- .NET NO
- Java YES
Matching construct
- Conditional matching
(?(?=regex)then|else)
,(?(regex)then|else)
,(?(1)then|else)
or(?(group)then|else)
- .NET YES
- Java NO
- Named capturing group and named backreference
- .NET YES:
- Capturing group:
(?<name>regex)
or(?'name'regex)
- Backreference:
\k<name>
or\k'name'
- Capturing group:
- Java YES (Java 7):
- Capturing group:
(?<name>regex)
- Backreference:
\k<name>
- Capturing group:
- .NET YES:
- Multiple capturing groups can have the same name
- .NET YES
- Java NO (Java 7)
- Balancing group definition
(?<name1-name2>regex)
or(?'name1-name2'subexpression)
- .NET YES
- Java NO
Assertions
-
(?<=text)
(positive lookbehind)- .NET Variable-width
- Java Obvious width
-
(?<!text)
(negative lookbehind)- .NET Variable-width
- Java Obvious width
Mode Options/Flags
-
ExplicitCapture
option(?n)
- .NET YES
- Java NO
Miscellaneous
-
(?#comment)
inline comments- .NET YES
- Java NO
References
- regular-expressions.info - Comparison of Different Regex Flavors
- MSDN Library Reference - .NET Framework 4.5 - Regular Expression Language
- Pattern (Java Platform SE 7)
Check out: http://www.regular-expressions.info/refflavors.html Plenty of regex info on that site, and there's a nice chart that details the differences between java & .net.
c# regex has its own convention for named groups (?<name>)
. I don't know of any other differences.