regex optional word match
I'm trying to create a regex for extracting singers, lyricists. I was wondering how to make lyricists search optional.
Sample Multiline String:
Fireworks Singer: Katy Perry
Vogue Singers: Madonna, Karen Lyricist: Madonna
Regex: /Singers?:(.\*)\s?Lyricists?:(.\*)/
This matches the second line correctly and extracts Singers(Madonna, Karen)
and Lyricists(Madonna)
But it does not work with the first line, when there are no Lyricists.
How do I make Lyricists search optional?
You can enclose the part you want to match in a non-capturing group: (?:)
. Then it can be treated as a single unit in the regex, and subsequently you can put a ?
after it to make it optional. Example:
/Singers?:(.*)\s?(?:Lyricists?:(.*))?/
Note that here the \s?
is useless since .*
will greedily eat all characters, and no backtracking will be necessary. This also means that the (?:Lyricists?:(.*))
part will never be matched for the same reason. You can use the non-greedy version of .*
, .*?
along with the $
to fix this:
/Singers?:(.*?)\s*(?:Lyricists?:(.*))?$/
Some extra whitespace ends up captured; this can be removed also, giving a final regex of:
/Singers?:\s*(.*?)\s*(?:Lyricists?:\s*(.*))?$/
Just to add to Cameron's solution. if the source string has multiple lines each containing both Singers and Lyricists, you'll probably need to add the 'm' multi-line modifier so that the '$' will match ends-of-lines. (You didn't say what language you are using - you may want to add the 'i' modifier as well.)