How to extract slug from URL with regular expression in Python?

Solution 1:

Use a capturing group by putting parentheses around the part of the regex that you want to capture (...). You can get the contents of a capturing group by passing in its number as an argument to m.group():

>>> m = re.search('/([0-9]+)-', url)
>>> m.group(1) 
123456

From the docs:

(...)
Matches whatever regular expression is inside the parentheses, and indicates the start and end of a group; the contents of a group can be retrieved after a match has been performed, and can be matched later in the string with the \number special sequence, described below. To match the literals '(' or ')', use \( or \), or enclose them inside a character class: [(] [)].

Solution 2:

You may want to use urllib.parse combined with a capturing group for mildly cleaner code.

import urllib.parse, re

url = 'http://www.example.com/this-2-me-4/123456-subj'
parsed = urllib.parse.urlparse(url)
path = parsed.path
slug = re.search(r'/([\d]+)-', path).group(1)
print(slug)

Result:

123456

In Python 2, use urlparse instead of urllib.parse.