Download PDF using Selenium in Python + save each PDF with an assigned name
Solution 1:
As RJ Adriaansen pointed out there is a JSON file in the Developer Tools - Network - fetch/XHR that can easily be scraped without Selenium:
import requests
import re
for year in range(2015,2023):
data_url = f'https://www1.hkexnews.hk/ncms/json/eds/app_{str(year)}_sehk_e.json?_=1641899494829' #found in the Developer Tools - Network - fetch/XHR
data = requests.get(data_url).json()
for company in data['app']:
filename = re.sub(r'[^\w\-_ ]', '_',company['a'])+'.pdf' #company name remove bad characters for filename
try:
pdf_url = 'https://www1.hkexnews.hk/app/'+company['ls'][0]['u1']
except:
continue
pdf_data = requests.get(pdf_url)
print(f'Saving {filename}')
with open(filename,'wb') as file:
file.write(pdf_data.content)