Regex to find the HTTP response code number
You are creating a string using r=str(sample_text)
and now the string ends on ']
Then there is only 1 end of string using $
and you will get multiple matches as the lookahead is true at more positions. See the matches here
What you could do is for example join with a newline, use a capture group that will be returned by re.findall and use re.M
for multiline.
\bHTTP/\d\.\d"\s\d+\s(\d+)$
The pattern matches:
-
\bHTTP/
MatchHTTP/
-
\d\.\d"\s\d+\s
Match a digit.
digit whitespace char 1+ digits and whitespace char -
(\d+)
Capture 1+ digit in group 1 -
$
End of string
See a Regex demo and a Python demo.
import re
sample_text = ['199.72.81.55 - - [01/Jul/1995:00:00:01 -0400] "GET /history/apollo/HTTP/1.0" 200 6245',
'unicomp6.unicomp.net - - [01/Jul/1995:00:00:06 -0400] "GET /shuttle/countdown/HTTP/1.0" 200 3985',
'199.120.110.21 - - [01/Jul/1995:00:00:09 -0400] "GET /shuttle/missions/sts-73/mission-sts-73.html HTTP/1.0" 200 4085',
'burger.letters.com - - [01/Jul/1995:00:00:11 -0400] "GET /shuttle/countdown/liftoff.html HTTP/1.0" 304 0',
'199.120.110.21 - - [01/Jul/1995:00:00:11 -0400] "GET /shuttle/missions/sts-73/sts-73-patch-small.gif HTTP/1.0" 200 4179']
def func():
r = "\n".join(sample_text)
regext = r'\bHTTP/\d\.\d"\s\d+\s(\d+)$'
content_size = re.findall(regext, r, re.M)
print(content_size)
func()
Output
['6245', '3985', '4085', '0', '4179']
Or using a list comprehension
def func():
return [m.group(1) for m in (re.search(r'\bHTTP/\d\.\d"\s\d+\s(\d+)$', s) for s in sample_text) if m]