How to get everything after last slash in a URL?
How can I extract whatever follows the last slash in a URL in Python? For example, these URLs should return the following:
URL: http://www.test.com/TEST1
returns: TEST1
URL: http://www.test.com/page/TEST2
returns: TEST2
URL: http://www.test.com/page/page/12345
returns: 12345
I've tried urlparse, but that gives me the full path filename, such as page/page/12345
.
Solution 1:
You don't need fancy things, just see the string methods in the standard library and you can easily split your url between 'filename' part and the rest:
url.rsplit('/', 1)
So you can get the part you're interested in simply with:
url.rsplit('/', 1)[-1]
Solution 2:
One more (idio(ma)tic) way:
URL.split("/")[-1]
Solution 3:
rsplit
should be up to the task:
In [1]: 'http://www.test.com/page/TEST2'.rsplit('/', 1)[1]
Out[1]: 'TEST2'
Solution 4:
You can do like this:
head, tail = os.path.split(url)
Where tail will be your file name.
Solution 5:
urlparse is fine to use if you want to (say, to get rid of any query string parameters).
import urllib.parse
urls = [
'http://www.test.com/TEST1',
'http://www.test.com/page/TEST2',
'http://www.test.com/page/page/12345',
'http://www.test.com/page/page/12345?abc=123'
]
for i in urls:
url_parts = urllib.parse.urlparse(i)
path_parts = url_parts[2].rpartition('/')
print('URL: {}\nreturns: {}\n'.format(i, path_parts[2]))
Output:
URL: http://www.test.com/TEST1
returns: TEST1
URL: http://www.test.com/page/TEST2
returns: TEST2
URL: http://www.test.com/page/page/12345
returns: 12345
URL: http://www.test.com/page/page/12345?abc=123
returns: 12345