Scala capture group using regex

Let's say I have this code:

val string = "one493two483three"
val pattern = """two(\d+)three""".r
pattern.findAllIn(string).foreach(println)

I expected findAllIn to only return 483, but instead, it returned two483three. I know I could use unapply to extract only that part, but I'd have to have a pattern for the entire string, something like:

 val pattern = """one.*two(\d+)three""".r
 val pattern(aMatch) = string
 println(aMatch) // prints 483

Is there another way of achieving this, without using the classes from java.util directly, and without using unapply?


Here's an example of how you can access group(1) of each match:

val string = "one493two483three"
val pattern = """two(\d+)three""".r
pattern.findAllIn(string).matchData foreach {
   m => println(m.group(1))
}

This prints "483" (as seen on ideone.com).


The lookaround option

Depending on the complexity of the pattern, you can also use lookarounds to only match the portion you want. It'll look something like this:

val string = "one493two483three"
val pattern = """(?<=two)\d+(?=three)""".r
pattern.findAllIn(string).foreach(println)

The above also prints "483" (as seen on ideone.com).

References

  • regular-expressions.info/Lookarounds

val string = "one493two483three"
val pattern = """.*two(\d+)three.*""".r

string match {
  case pattern(a483) => println(a483) //matched group(1) assigned to variable a483
  case _ => // no match
}

Starting Scala 2.13, as an alternative to regex solutions, it's also possible to pattern match a String by unapplying a string interpolator:

"one493two483three" match { case s"${x}two${y}three" => y }
// String = "483"

Or even:

val s"${x}two${y}three" = "one493two483three"
// x: String = one493
// y: String = 483

If you expect non matching input, you can add a default pattern guard:

"one493deux483three" match {
  case s"${x}two${y}three" => y
  case _                   => "no match"
}
// String = "no match"

You want to look at group(1), you're currently looking at group(0), which is "the entire matched string".

See this regex tutorial.


def extractFileNameFromHttpFilePathExpression(expr: String) = {
//define regex
val regex = "http4.*\\/(\\w+.(xlsx|xls|zip))$".r
// findFirstMatchIn/findAllMatchIn returns Option[Match] and Match has methods to access capture groups.
regex.findFirstMatchIn(expr) match {
  case Some(i) => i.group(1)
  case None => "regex_error"
}
}
extractFileNameFromHttpFilePathExpression(
    "http4://testing.bbmkl.com/document/sth1234.zip")