How can a regex catch all parts before a keyword from a finite set, but sometimes separated only by a single space
Solution 1:
If I understand correctly, you want all content before the country (excluding spaces before the country). The country will always be present at the end of the line and comes from a list.
So you should be able to set the 'global
' and 'multiline
' options and then use the following regex:
^(.*?)(?=\s+(USA|Canada)\s*$)
Explanation:
^(.*)
match all characters from start of line
(?=\s+(USA|Canada)\s*$)
look ahead for one or more spaces
, followed by one of the country names, followed by zero or more spaces
and end of line
.
That should give you a list with all addresses.
Edit:
I have changed the first part to: (.*?)
, making it non-greedy
. That way the match will stop at the last letter before country instead of including some spaces.