JavaScript parser in Python [closed]

Nowadays, there is at least one better tool, called slimit:

SlimIt is a JavaScript minifier written in Python. It compiles JavaScript into more compact code so that it downloads and runs faster.

SlimIt also provides a library that includes a JavaScript parser, lexer, pretty printer and a tree visitor.

Demo:

Imagine we have the following javascript code:

$.ajax({
    type: "POST",
    url: 'http://www.example.com',
    data: {
        email: '[email protected]',
        phone: '9999999999',
        name: 'XYZ'
    }
});

And now we need to get email, phone and name values from the data object.

The idea here would be to instantiate a slimit parser, visit all nodes, filter all assignments and put them into the dictionary:

from slimit import ast
from slimit.parser import Parser
from slimit.visitors import nodevisitor


data = """
$.ajax({
    type: "POST",
    url: 'http://www.example.com',
    data: {
        email: '[email protected]',
        phone: '9999999999',
        name: 'XYZ'
    }
});
"""

parser = Parser()
tree = parser.parse(data)
fields = {getattr(node.left, 'value', ''): getattr(node.right, 'value', '')
          for node in nodevisitor.visit(tree)
          if isinstance(node, ast.Assign)}

print fields

It prints:

{'name': "'XYZ'", 
 'url': "'http://www.example.com'", 
 'type': '"POST"', 
 'phone': "'9999999999'", 
 'data': '', 
 'email': "'[email protected]'"}

ANTLR, ANother Tool for Language Recognition, is a language tool that provides a framework for constructing recognizers, interpreters, compilers, and translators from grammatical descriptions containing actions in a variety of target languages.

The ANTLR site provides many grammars, including one for JavaScript.

As it happens, there is a Python API available - so you can call the lexer (recognizer) generated from the grammar directly from Python (good luck).

I have translated esprima.js to Python:

https://github.com/PiotrDabkowski/pyjsparser

>>> from pyjsparser import parse
>>> parse('var $ = "Hello!"')
{
"type": "Program",
"body": [
    {
        "type": "VariableDeclaration",
        "declarations": [
            {
                "type": "VariableDeclarator",
                "id": {
                    "type": "Identifier",
                    "name": "$"
                },
                "init": {
                    "type": "Literal",
                    "value": "Hello!",
                    "raw": '"Hello!"'
                }
            }
        ],
        "kind": "var"
    }
  ]
}

It's a manual translation so its very fast, takes about 1 second to parse angular.js file (so 100k characters per second). It supports whole ECMAScript 5.1 and parts of version 6 - for example Arrow functions, const, let.

If you need support for all the newest JS6 features you can translate esprima on the fly with Js2Py:

import js2py
esprima = js2py.require("[email protected]")
esprima.parse("a = () => {return 11};")
# {'body': [{'expression': {'left': {'name': 'a', 'type': 'Identifier'}, 'operator': '=', 'right': {'async': False, 'body': {'body': [{'argument': {'raw': '11', 'type': 'Literal', 'value': 11}, 'type': 'ReturnStatement'}], 'type': 'BlockStatement'}, 'expression': False, 'generator': False, 'id': None, 'params': [], 'type': 'ArrowFunctionExpression'}, 'type': 'AssignmentExpression'}, 'type': 'ExpressionStatement'}], 'sourceType': 'script', 'type': 'Program'}

JavaScript parser in Python [closed]

Related

Recent Posts