Download PDF using Selenium in Python + save each PDF with an assigned name

Solution 1:

As RJ Adriaansen pointed out there is a JSON file in the Developer Tools - Network - fetch/XHR that can easily be scraped without Selenium:

import requests
import re

for year in range(2015,2023):

    data_url = f'https://www1.hkexnews.hk/ncms/json/eds/app_{str(year)}_sehk_e.json?_=1641899494829' #found in the Developer Tools - Network - fetch/XHR
    data = requests.get(data_url).json()

    for company in data['app']:
        filename = re.sub(r'[^\w\-_ ]', '_',company['a'])+'.pdf' #company name remove bad characters for filename
        try:
            pdf_url = 'https://www1.hkexnews.hk/app/'+company['ls'][0]['u1']

        except:
            continue

        pdf_data = requests.get(pdf_url)

        print(f'Saving {filename}')
        with open(filename,'wb') as file:
            file.write(pdf_data.content)

Download PDF using Selenium in Python + save each PDF with an assigned name

Solution 1:

Related

Recent Posts