Using ^ to match beginning of line in Python regex
re.findall(r'^PY (\d\d\d\d)', wosrecords, flags=re.MULTILINE)
should work
Use re.search
with re.M
:
import re
p = re.compile(r'^PY\s+(\d{4})', re.M)
test_str = "PY123\nPY 2015\nPY 2017"
print(re.findall(p, test_str))
See IDEONE demo
EXPLANATION:
-
^
- Start of a line (due tore.M
) -
PY
- LiteralPY
-
\s+
- 1 or more whitespace -
(\d{4})
- Capture group holding 4 digits
In this particular case there is no need to use regular expressions, because the searched string is always 'PY' and is expected to be at the beginning of the line, so one can use string.find
for this job. The find
function returns the position the substring is found in the given string or line, so if it is found at the start of the string the returned value is 0 (-1 if not found at all), ie.:
In [12]: 'PY 2015'.find('PY')
Out[12]: 0
In [13]: ' PY 2015'.find('PY')
Out[13]: 1
Perhaps it could be a good idea to strip the white spaces, ie.:
In [14]: ' PY 2015'.find('PY')
Out[14]: 2
In [15]: ' PY 2015'.strip().find('PY')
Out[15]: 0
And next if only the year is of interest it can be extracted with split, ie.:
In [16]: ' PY 2015'.strip().split()[1]
Out[16]: '2015'