Python selenium get "Developer Tools" →Network→Media logs

Finally I have done it, all by myself, without anybody's help.

The trick is simple, once you know what to do, it isn't so hard to achieve.

The responses are in json format, so we need the json module.

The structure of the json varies, but the first level keys are fixed, there are always three keys: level, message, timestamp.

We need the message key, its value is a json object packed in a string, so we need json.loads to unpack it.

The structure of these packed json objects varies a lot, but there is always a message key and a method key inside the message key.

Here we are trying to scrape received media file addresses, and long story short, the messagemessagemethod key should equal to 'Network.responseReceived'.

If messagemessagemethod key equals to 'Network.responseReceived', then there will always be a messagemessageparamsresponsemimeType key.

That key stores the file type of the resource, I will spare you the details, I know .mp4 stands for Motion Picture Expert Group-4 and is a video format, but here the media type should be 'audio/mp4'.

If all the about criteria are satisfied then the address of the media file is the value of messagemessageparamsresponseurl key.

This is the final code:

import json
import os
import random
import sys
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

path = (os.environ['LOCALAPPDATA'] + '\\Google\\Chrome\\User Data')

options = webdriver.ChromeOptions()
options.add_argument('--disable-gpu')
options.add_argument('--headless')
options.add_argument('--log-level=3')
options.add_argument('--mute-audio')
options.add_argument(f'--user-data-dir={path}')

capabilities = DesiredCapabilities.CHROME
capabilities["goog:loggingPrefs"] = {'performance': 'ALL'}

Chrome = webdriver.Chrome(options=options, desired_capabilities=capabilities)
wait = WebDriverWait(Chrome, 5)

def getlink(addr):
    Chrome.get(addr)
    iframe = Chrome.find_element_by_xpath('//iframe[@id="g_iframe"]')
    Chrome.switch_to.frame(iframe)
    wait.until(EC.visibility_of_element_located((By.XPATH, '//div[2]/div/a[1]')))
    play = Chrome.find_element_by_xpath('//div[2]/div/a[1]')
    play.click()
    time.sleep(5)
    logs = Chrome.get_log('performance')
    addresses = []
    for i in logs:
        log = json.loads(i['message'])
        if log['message']['method'] == 'Network.responseReceived':
            if log['message']['params']['response']['mimeType'] == 'audio/mp4':
                addresses.append(log['message']['params']['response']['url'])
    check = set([i.split('/')[-1] for i in addresses])
    if len(check) == 1:
        return random.choice(addresses)

if __name__ == '__main__':
    print(getlink(sys.argv[1]))