Regex to find the HTTP response code number

You are creating a string using r=str(sample_text) and now the string ends on ']

Then there is only 1 end of string using $ and you will get multiple matches as the lookahead is true at more positions. See the matches here

What you could do is for example join with a newline, use a capture group that will be returned by re.findall and use re.M for multiline.

\bHTTP/\d\.\d"\s\d+\s(\d+)$

The pattern matches:

\bHTTP/ Match HTTP/
\d\.\d"\s\d+\s Match a digit . digit whitespace char 1+ digits and whitespace char
(\d+) Capture 1+ digit in group 1
$ End of string

See a Regex demo and a Python demo.

import re

sample_text = ['199.72.81.55 - - [01/Jul/1995:00:00:01 -0400] "GET /history/apollo/HTTP/1.0" 200 6245',
               'unicomp6.unicomp.net - - [01/Jul/1995:00:00:06 -0400] "GET /shuttle/countdown/HTTP/1.0" 200 3985',
               '199.120.110.21 - - [01/Jul/1995:00:00:09 -0400] "GET /shuttle/missions/sts-73/mission-sts-73.html HTTP/1.0" 200 4085',
               'burger.letters.com - - [01/Jul/1995:00:00:11 -0400] "GET /shuttle/countdown/liftoff.html HTTP/1.0" 304 0',
               '199.120.110.21 - - [01/Jul/1995:00:00:11 -0400] "GET /shuttle/missions/sts-73/sts-73-patch-small.gif HTTP/1.0" 200 4179']

def func():
    r = "\n".join(sample_text)
    regext = r'\bHTTP/\d\.\d"\s\d+\s(\d+)$'
    content_size = re.findall(regext, r, re.M)
    print(content_size)
func()

Output

['6245', '3985', '4085', '0', '4179']

Or using a list comprehension

def func():
    return [m.group(1) for m in (re.search(r'\bHTTP/\d\.\d"\s\d+\s(\d+)$', s) for s in sample_text) if m]

Regex to find the HTTP response code number

Related

Recent Posts