Working regex fails when using Scala pattern matching

In a following code the same pattern matches when Java API is used, but not when using Scala pattern matching.

import java.util.regex.Pattern

object Main extends App {
  val text = "/oAuth.html?state=abcde&code=hfjksdhfrufhjjfkdjfkds"

  val statePatternString = """\/.*\?.*state=([^&\?]*)"""
  val statePattern = statePatternString.r
  val statePatternJ = Pattern.compile(statePatternString)

  val sj = statePatternJ.matcher(text)
  val sjMatch = if (sj.find()) sj.group(1) else ""
  println(s"Java match $sjMatch")

  val ss = statePattern.unapplySeq(text)
  println(s"Scala unapplySeq $ss")
  val sm = statePattern.findFirstIn(text)
  println(s"Scala findFirstIn $sm")

  text match {
    case statePattern(s) =>
      println(s"Scala matching $s")
    case _ =>
      println("Scala not matching")
  }

}

The app output is:

Java match abcde

Scala unapplySeq None

Scala findFirstIn Some(/oAuth.html?state=abcde)

Scala not matching

When using the extractor syntax val statePattern(se) = text the error is scala.MatchError.

What is causing the Scala regex unapplySeq to fail?


Solution 1:

When you define a Scala pattern, it is anchored by default (=requires a full string match), while your Java sj.find() is looking for a match anywhere inside the string. Add .unanchored for the Scala regex to also allow partial matches:

val statePattern = statePatternString.r.unanchored
                                       ^^^^^^^^^^^

See IDEONE demo

Some UnanchoredRegex reference:

def unanchored: UnanchoredRegex

Create a new Regex with the same pattern, but no requirement that the entire String matches in extractor patterns.

Normally, matching on date behaves as though the pattern were enclosed in anchors, ^pattern$.

The unanchored Regex behaves as though those anchors were removed.

Note that this method does not actually strip any matchers from the pattern.

AN ALTERNATIVE SOLUTION would mean adding the .* at the pattern end, but remember that a dot does not match a newline by default. If a solution should be generic, the (?s) DOTALL modifier should be specified at the beginning of the pattern to make sure the whole string with potential newline sequences is matched.