Working regex fails when using Scala pattern matching
In a following code the same pattern matches when Java API is used, but not when using Scala pattern matching.
import java.util.regex.Pattern
object Main extends App {
val text = "/oAuth.html?state=abcde&code=hfjksdhfrufhjjfkdjfkds"
val statePatternString = """\/.*\?.*state=([^&\?]*)"""
val statePattern = statePatternString.r
val statePatternJ = Pattern.compile(statePatternString)
val sj = statePatternJ.matcher(text)
val sjMatch = if (sj.find()) sj.group(1) else ""
println(s"Java match $sjMatch")
val ss = statePattern.unapplySeq(text)
println(s"Scala unapplySeq $ss")
val sm = statePattern.findFirstIn(text)
println(s"Scala findFirstIn $sm")
text match {
case statePattern(s) =>
println(s"Scala matching $s")
case _ =>
println("Scala not matching")
}
}
The app output is:
Java match abcde
Scala unapplySeq None
Scala findFirstIn Some(/oAuth.html?state=abcde)
Scala not matching
When using the extractor syntax val statePattern(se) = text
the error is scala.MatchError
.
What is causing the Scala regex unapplySeq to fail?
Solution 1:
When you define a Scala pattern, it is anchored by default (=requires a full string match), while your Java sj.find()
is looking for a match anywhere inside the string. Add .unanchored
for the Scala regex to also allow partial matches:
val statePattern = statePatternString.r.unanchored
^^^^^^^^^^^
See IDEONE demo
Some UnanchoredRegex
reference:
def unanchored: UnanchoredRegex
Create a new Regex with the same pattern, but no requirement that the entire String matches in extractor patterns.
Normally, matching on date behaves as though the pattern were enclosed in anchors,
^pattern$
.The unanchored Regex behaves as though those anchors were removed.
Note that this method does not actually strip any matchers from the pattern.
AN ALTERNATIVE SOLUTION would mean adding the .*
at the pattern end, but remember that a dot does not match a newline by default. If a solution should be generic, the (?s)
DOTALL modifier should be specified at the beginning of the pattern to make sure the whole string with potential newline sequences is matched.