Executing Javascript from Python
I have HTML webpages that I am crawling using xpath. The etree.tostring
of a certain node gives me this string:
<script>
<!--
function escramble_758(){
var a,b,c
a='+1 '
b='84-'
a+='425-'
b+='7450'
c='9'
document.write(a+c+b)
}
escramble_758()
//-->
</script>
I just need the output of escramble_758()
. I can write a regex to figure out the whole thing, but I want my code to remain tidy. What is the best alternative?
I am zipping through the following libraries, but I didnt see an exact solution. Most of them are trying to emulate browser, making things snail slow.
-
http://code.google.com/p/python-spidermonkey/ (clearly says
it's not yet possible to call a function defined in Javascript
) - http://code.google.com/p/webscraping/ (don't see anything for Javascript, I may be wrong)
- http://pypi.python.org/pypi/selenium (Emulating browser)
Edit: An example will be great.. (barebones will do)
You can also use Js2Py which is written in pure python and is able to both execute and translate javascript to python. Supports virtually whole JavaScript even labels, getters, setters and other rarely used features.
import js2py
js = """
function escramble_758(){
var a,b,c
a='+1 '
b='84-'
a+='425-'
b+='7450'
c='9'
document.write(a+c+b)
}
escramble_758()
""".replace("document.write", "return ")
result = js2py.eval_js(js) # executing JavaScript and converting the result to python string
Advantages of Js2Py include portability and extremely easy integration with python (since basically JavaScript is being translated to python).
To install:
pip install js2py
Using PyV8, I can do this. However, I have to replace document.write
with return
because there's no DOM and therefore no document
.
import PyV8
ctx = PyV8.JSContext()
ctx.enter()
js = """
function escramble_758(){
var a,b,c
a='+1 '
b='84-'
a+='425-'
b+='7450'
c='9'
document.write(a+c+b)
}
escramble_758()
"""
print ctx.eval(js.replace("document.write", "return "))
Or you could create a mock document object
class MockDocument(object):
def __init__(self):
self.value = ''
def write(self, *args):
self.value += ''.join(str(i) for i in args)
class Global(PyV8.JSClass):
def __init__(self):
self.document = MockDocument()
scope = Global()
ctx = PyV8.JSContext(scope)
ctx.enter()
ctx.eval(js)
print scope.document.value
One more solution as PyV8 seems to be unmaintained and dependent on the old version of libv8.
PyMiniRacer It's a wrapper around the v8 engine and it works with the new version and is actively maintained.
pip install py-mini-racer
from py_mini_racer import py_mini_racer
ctx = py_mini_racer.MiniRacer()
ctx.eval("""
function escramble_758(){
var a,b,c
a='+1 '
b='84-'
a+='425-'
b+='7450'
c='9'
return a+c+b;
}
""")
ctx.call("escramble_758")
And yes, you have to replace document.write
with return
as others suggested
You can use js2py context to execute your js code and get output from document.write with mock document object:
import js2py
js = """
var output;
document = {
write: function(value){
output = value;
}
}
""" + your_script
context = js2py.EvalJs()
context.execute(js)
print(context.output)
You can use requests-html which will download and use chromium underneath.
from requests_html import HTML
html = HTML(html="<a href='http://www.example.com/'>")
script = """
function escramble_758(){
var a,b,c
a='+1 '
b='84-'
a+='425-'
b+='7450'
c='9'
return a+c+b;
}
"""
val = html.render(script=script, reload=False)
print(val)
# +1 425-984-7450
More on this read here