Regex: Select everything up to | but not | (between <title></title>)
I have this example:
<title>Square Meters | Dragon White (en)</title>
I want to use regex as to select everything up to |
but not |
(between ...)
My 2 regex selects also the |
, this is why I need a better formula, without that |
SEARCH: \w+.*\|
or \w+.*?[\s\S]\|
This is the line from my Python code, with the regex I must change a little bit:
words = re.findall(r'\w+', new_filename)
Right now the result is square-meters-dragon-white-en.html
But the expected result should be: square-meters.html
This is the part with python code:
new_filename = title.get_text()
new_filename = new_filename.lower()
words = re.findall(r'\w+', new_filename)
new_filename = '-'.join(words)
new_filename = new_filename + '.html'
print(new_filename)
I get very close, if I change this way the regex: (?=\w+).*(?= \|)
words = re.findall(r'(?=\w+).*(?= \|)', new_filename)
and I get: square meters.html
(but without little dash)
Use simply: [^|]+
# 1 or more any character that is not a pipe, this also selects linebreak.
If you don't want to select linebreak, use: [^|\r\n]+
.
This will work in any text editor that support regex.