Is this possible to load the page after the javascript execute using python?
Here is the page I read:
<html>
<head>
<script type="text/javascript">
document.write("Hello World")
</script>
</head>
<body>
</body>
</html>
As you can see, the Hello World
is added on the HTML page using javascript, when I use the HTML parser, like the BeautifulSoup
to parse it, it can't parse the Hello World
, it is possible to me parse the actually result on how the client side really see....? Thanks.
I ran into a similar problem when writing web scrapers in python, and I found Selenium Web Driver in combination with BeautifulSoup very useful. The code ends up looking something like this:
from selenium import webdriver
browser = webdriver.Firefox()
browser.get("http://www.yoursite.com")
soup = BeautifulSoup(browser.page_source, "html.parser")
...
With Selenium WebDriver, there's also functionally for a "wait until a certain DOM element has loaded", which makes the timing with javascript elements easier too.
For a correct representation of what the DOM looks like after javascript manipulation, you'll have to actually execute the javascript. This has to be done by something that has a javascript engine and a DOM (rather than text/markup) representation of the document - typically, a browser.