Extracting named groups of regular expression in LibreOffice Calc

Solution 1:

In both Excel and Calc, the cleanest solution is to create a general-purpose regular expression macro. To do this in Calc, go to Tools -> Macros -> Organize Macros -> LibreOffice Basic and add the following code to Module1:

Function ReFind(findIn, patt, Optional group_param As Integer,  _
                Optional ignoreCase_param As Boolean)
    ' findIn - string or cell to search in
    ' patt - regexp string or cell containing regexp string
    ' group - which group to grab - analogy to \n in regexp syntax
    ' ignoreCase - false for case sensitive matches
    If IsMissing (group_param) Then
        group = 0
    Else
        group = group_param
    End If
    If IsMissing (ignoreCase_param) Then
        ignoreCase = False
    Else
        ignoreCase = ignoreCase_param
    End If
    oTextSearch = CreateUnoService("com.sun.star.util.TextSearch")
    oOptions = CreateUnoStruct("com.sun.star.util.SearchOptions")
    oOptions.algorithmType = com.sun.star.util.SearchAlgorithms.REGEXP
    If ignoreCase Then
        oOptions.transliterateFlags = _
            com.sun.star.i18n.TransliterationModules.IGNORE_CASE
    End If
    oOptions.searchString = patt
    oTextSearch.setOptions(oOptions)
    oFound = oTextSearch.searchForward(findIn, 0, Len(findIn))
    If oFound.subRegExpressions = 0 Then
        ReFind = "No results"
        MsgBox "No results"
        Exit Function
    ElseIf group >= oFound.subRegExpressions Then 
         ReFind = "No result for that group"
         MsgBox "No result for that group"
         Exit Function
    Else
         nStart = oFound.startOffset()
         nEnd = oFound.endOffset()
         ReFind = Mid(findIn, nStart(group) + 1, nEnd(group) - nStart(group))
    End If
End Function

Now you can use ReFind for any regular expressions needed in the spreadsheet. For example, in cell A1 enter 12345. In cell B1, enter the formula =REFIND($A$1,"(\d\d)(\d)",2). This will retrieve the third number, which is 3.

The code was adapted from https://forum.openoffice.org/en/forum/viewtopic.php?t=30502.

Note: It would be better to create the regular expression in python or java using an add-in. However that requires XML declaration files in an extension, which takes more time to set up.

Solution 2:

AFAIK you can't use named groups in LO Calc Search/Replace or formulas, but you can use numerical references to pattern groups:

  • In the search field, you can use \1 to reference the first pattern group, \2 for the second and so on.
  • In the replace expression, use $1 as reference for the first search pattern group, $2 for the second and so on.

Search example

Assuming the four strings blue bluefish, black blackfish, blue blackfish and black bluefish, you can replace every string where the same color appears twice (strings 1 and 2) using the search pattern: (blue|black) \1fish. The \1 will reference the matching group, matching the whole string only if the color matched in the regex group (blue|black) appears before fish, too. (Example based on the OOo Wiki Documentation).

Replace Example

To transform the string 100/200/300/400 to 300/100/400/200 (with regex enabled in the search options), search for the pattern (\d+)/(\d+)/(\d+)/(\d+) and replace with $3/$1/$4/$2.