How to get capturing group functionality in Go regular expressions

Solution 1:

how should I re-write these expressions?

Add some Ps, as defined here:

(?P<Year>\d{4})-(?P<Month>\d{2})-(?P<Day>\d{2})

Cross reference capture group names with re.SubexpNames().

And use as follows:

package main

import (
    "fmt"
    "regexp"
)

func main() {
    r := regexp.MustCompile(`(?P<Year>\d{4})-(?P<Month>\d{2})-(?P<Day>\d{2})`)
    fmt.Printf("%#v\n", r.FindStringSubmatch(`2015-05-27`))
    fmt.Printf("%#v\n", r.SubexpNames())
}

Solution 2:

I had created a function for handling url expressions but it suits your needs too. You can check this snippet but it simply works like this:

/**
 * Parses url with the given regular expression and returns the 
 * group values defined in the expression.
 *
 */
func getParams(regEx, url string) (paramsMap map[string]string) {

    var compRegEx = regexp.MustCompile(regEx)
    match := compRegEx.FindStringSubmatch(url)

    paramsMap = make(map[string]string)
    for i, name := range compRegEx.SubexpNames() {
        if i > 0 && i <= len(match) {
            paramsMap[name] = match[i]
        }
    }
    return paramsMap
}

You can use this function like:

params := getParams(`(?P<Year>\d{4})-(?P<Month>\d{2})-(?P<Day>\d{2})`, `2015-05-27`)
fmt.Println(params)

and the output will be:

map[Year:2015 Month:05 Day:27]

Solution 3:

To improve RAM and CPU usage without calling anonymous functions inside loop and without copying arrays in memory inside loop with "append" function see the next example:

You can store more than one subgroup with multiline text, without appending string with '+' and without using for loop inside for loop (like other examples posted here).

txt := `2001-01-20
2009-03-22
2018-02-25
2018-06-07`

regex := *regexp.MustCompile(`(?s)(\d{4})-(\d{2})-(\d{2})`)
res := regex.FindAllStringSubmatch(txt, -1)
for i := range res {
    //like Java: match.group(1), match.gropu(2), etc
    fmt.Printf("year: %s, month: %s, day: %s\n", res[i][1], res[i][2], res[i][3])
}

Output:

year: 2001, month: 01, day: 20
year: 2009, month: 03, day: 22
year: 2018, month: 02, day: 25
year: 2018, month: 06, day: 07

Note: res[i][0] =~ match.group(0) Java

If you want to store this information use a struct type:

type date struct {
  y,m,d int
}
...
func main() {
   ...
   dates := make([]date, 0, len(res))
   for ... {
      dates[index] = date{y: res[index][1], m: res[index][2], d: res[index][3]}
   }
}

It's better to use anonymous groups (performance improvement)

Using "ReplaceAllGroupFunc" posted on Github is bad idea because:

  1. is using loop inside loop
  2. is using anonymous function call inside loop
  3. has a lot of code
  4. is using the "append" function inside loop and that's bad. Every time a call is made to "append" function, is copying the array to new memory position

Solution 4:

As of GO 1.15, you can simplify the process by using Regexp.SubexpIndex. You can check the release notes at https://golang.org/doc/go1.15#regexp.

Based in your example, you'd have something like the following:

re := regexp.MustCompile(`(?P<Year>\d{4})-(?P<Month>\d{2})-(?P<Day>\d{2})`)
matches := re.FindStringSubmatch("Some random date: 2001-01-20")
yearIndex := re.SubexpIndex("Year")
fmt.Println(matches[yearIndex])

You can check and execute this example at https://play.golang.org/p/ImJ7i_ZQ3Hu.