Get the content (text) of an URL after Javascript has run with PHP
Solution 1:
Update 2 Adds more details on how to use phantomjs
from PHP.
Update 1 (after clarification that javascript on target page need to run first)
Method 1:Use phantomjs(will execute javascript);
1. Download phantomjs and place the executable in a path that your PHP binary can reach.
2. Place the following 2 files in the same directory:
get-website.php
<?php
$phantom_script= dirname(__FILE__). '/get-website.js';
$response = exec ('phantomjs ' . $phantom_script);
echo htmlspecialchars($response);
?>
get-website.js
var webPage = require('webpage');
var page = webPage.create();
page.open('http://google.com/', function(status) {
console.log(page.content);
phantom.exit();
});
3. Browse to get-website.php
and the target site, http://google.com
contents will return after executing inline javascript. You can also call this from a command line using php /path/to/get-website.php
.
Method 2:Use Ajax with PHP (No phantomjs so won't run javascript);
/get-website.php
<?php
$html=file_get_contents('http://google.com');
echo $html;
?>
test.html
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>on demo</title>
<style>
p {
color: red;
}
span {
color: blue;
}
</style>
<script src="https://code.jquery.com/jquery-1.10.2.js"></script>
</head>
<body>
<button id='click_me'>Click me</button>
<span style="display:none;"></span>
<script>
$( "#click_me" ).click(function () {
$.get("/get-website.php", function(data) {
var json = {
html: JSON.stringify(data),
delay: 1
};
alert(json.html);
});
});
</script>
</body>
</html>
Solution 2:
I found a fantastic page on this, it's an entire tutorial on how to process the DOM of a page in PHP which is entirely created using javascript.
https://www.jacobward.co.uk/using-php-to-scrape-javascript-jquery-json-websites/ "PhantomJS development is suspended until further notice" so that option isn't a good one.