How to extract a filename from a URL & append a word to it?
I have the following url:
url = http://photographs.500px.com/kyle/09-09-201315-47-571378756077.jpg
I would like to extract the file name in this url: 09-09-201315-47-571378756077.jpg
Once I get this file name, I'm going to save it with this name to the Desktop.
filename = **extracted file name from the url**
download_photo = urllib.urlretrieve(url, "/home/ubuntu/Desktop/%s.jpg" % (filename))
After this, I'm going to resize the photo, once that is done, I've going to save the resized version and append the word "_small" to the end of the filename.
downloadedphoto = Image.open("/home/ubuntu/Desktop/%s.jpg" % (filename))
resize_downloadedphoto = downloadedphoto.resize.((300, 300), Image.ANTIALIAS)
resize_downloadedphoto.save("/home/ubuntu/Desktop/%s.jpg" % (filename + _small))
From this, what I am trying to achieve is to get two files, the original photo with the original name, then the resized photo with the modified name. Like so:
09-09-201315-47-571378756077.jpg
09-09-201315-47-571378756077_small.jpg
How can I go about doing this?
You can use urllib.parse.urlparse
with os.path.basename
:
import os
from urllib.parse import urlparse
url = "http://photographs.500px.com/kyle/09-09-201315-47-571378756077.jpg"
a = urlparse(url)
print(a.path) # Output: /kyle/09-09-201315-47-571378756077.jpg
print(os.path.basename(a.path)) # Output: 09-09-201315-47-571378756077.jpg
os.path.basename(url)
Why try harder?
In [1]: os.path.basename("https://example.com/file.html")
Out[1]: 'file.html'
In [2]: os.path.basename("https://example.com/file")
Out[2]: 'file'
In [3]: os.path.basename("https://example.com/")
Out[3]: ''
In [4]: os.path.basename("https://example.com")
Out[4]: 'example.com'
Note 2020-12-20
Nobody has thus far provided a complete solution.
A URL can contain a ?[query-string]
and/or a #[fragment Identifier]
(but only in that order: ref)
In [1]: from os import path
In [2]: def get_filename(url):
...: fragment_removed = url.split("#")[0] # keep to left of first #
...: query_string_removed = fragment_removed.split("?")[0]
...: scheme_removed = query_string_removed.split("://")[-1].split(":")[-1]
...: if scheme_removed.find("/") == -1:
...: return ""
...: return path.basename(scheme_removed)
...:
In [3]: get_filename("a.com/b")
Out[3]: 'b'
In [4]: get_filename("a.com/")
Out[4]: ''
In [5]: get_filename("https://a.com/")
Out[5]: ''
In [6]: get_filename("https://a.com/b")
Out[6]: 'b'
In [7]: get_filename("https://a.com/b?c=d#e")
Out[7]: 'b'