Extracting named groups of regular expression in LibreOffice Calc
Solution 1:
In both Excel and Calc, the cleanest solution is to create a general-purpose regular expression macro. To do this in Calc, go to Tools -> Macros -> Organize Macros -> LibreOffice Basic
and add the following code to Module1:
Function ReFind(findIn, patt, Optional group_param As Integer, _
Optional ignoreCase_param As Boolean)
' findIn - string or cell to search in
' patt - regexp string or cell containing regexp string
' group - which group to grab - analogy to \n in regexp syntax
' ignoreCase - false for case sensitive matches
If IsMissing (group_param) Then
group = 0
Else
group = group_param
End If
If IsMissing (ignoreCase_param) Then
ignoreCase = False
Else
ignoreCase = ignoreCase_param
End If
oTextSearch = CreateUnoService("com.sun.star.util.TextSearch")
oOptions = CreateUnoStruct("com.sun.star.util.SearchOptions")
oOptions.algorithmType = com.sun.star.util.SearchAlgorithms.REGEXP
If ignoreCase Then
oOptions.transliterateFlags = _
com.sun.star.i18n.TransliterationModules.IGNORE_CASE
End If
oOptions.searchString = patt
oTextSearch.setOptions(oOptions)
oFound = oTextSearch.searchForward(findIn, 0, Len(findIn))
If oFound.subRegExpressions = 0 Then
ReFind = "No results"
MsgBox "No results"
Exit Function
ElseIf group >= oFound.subRegExpressions Then
ReFind = "No result for that group"
MsgBox "No result for that group"
Exit Function
Else
nStart = oFound.startOffset()
nEnd = oFound.endOffset()
ReFind = Mid(findIn, nStart(group) + 1, nEnd(group) - nStart(group))
End If
End Function
Now you can use ReFind
for any regular expressions needed in the spreadsheet. For example, in cell A1 enter 12345
. In cell B1, enter the formula =REFIND($A$1,"(\d\d)(\d)",2)
. This will retrieve the third number, which is 3.
The code was adapted from https://forum.openoffice.org/en/forum/viewtopic.php?t=30502.
Note: It would be better to create the regular expression in python or java using an add-in. However that requires XML declaration files in an extension, which takes more time to set up.
Solution 2:
AFAIK you can't use named groups in LO Calc Search/Replace or formulas, but you can use numerical references to pattern groups:
- In the search field, you can use
\1
to reference the first pattern group,\2
for the second and so on. - In the replace expression, use
$1
as reference for the first search pattern group,$2
for the second and so on.
Search example
Assuming the four strings blue bluefish
, black blackfish
, blue blackfish
and black bluefish
, you can replace every string where the same color appears twice (strings 1 and 2) using the search pattern: (blue|black) \1fish
. The \1
will reference the matching group, matching the whole string only if the color matched in the regex group (blue|black)
appears before fish
, too. (Example based on the OOo Wiki Documentation).
Replace Example
To transform the string 100/200/300/400
to 300/100/400/200
(with regex enabled in the search options), search for the pattern (\d+)/(\d+)/(\d+)/(\d+)
and replace with $3/$1/$4/$2
.