How can a regex catch all parts before a keyword from a finite set, but sometimes separated only by a single space

Solution 1:

If I understand correctly, you want all content before the country (excluding spaces before the country). The country will always be present at the end of the line and comes from a list.

So you should be able to set the 'global' and 'multiline' options and then use the following regex:

^(.*?)(?=\s+(USA|Canada)\s*$)

Explanation:

^(.*) match all characters from start of line

(?=\s+(USA|Canada)\s*$) look ahead for one or more spaces, followed by one of the country names, followed by zero or more spaces and end of line.

That should give you a list with all addresses.

Edit:

I have changed the first part to: (.*?), making it non-greedy. That way the match will stop at the last letter before country instead of including some spaces.