Javascript parser for Java [closed]

Solution 1:

From https://github.com/google/caja/blob/master/src/com/google/caja/parser/js/Parser.java

The grammar below is a context-free representation of the grammar this parser parses. It disagrees with EcmaScript 262 Edition 3 (ES3) where implementations disagree with ES3. The rules for semicolon insertion and the possible backtracking in expressions needed to properly handle backtracking are commented thoroughly in code, since semicolon insertion requires information from both the lexer and parser and is not determinable with finite lookahead.

Noteworthy features

  1. Reports warnings on a queue where an error doesn't prevent any further errors, so that we can report multiple errors in a single compile pass instead of forcing developers to play whack-a-mole.
  2. Does not parse Firefox style catch (<Identifier> if <Expression>) since those don't work on IE and many other interpreters.
  3. Recognizes const since many interpreters do (not IE) but warns.
  4. Allows, but warns, on trailing commas in Array and Object constructors.
  5. Allows keywords as identifier names but warns since different interpreters have different keyword sets. This allows us to use an expansive keyword set.

To parse strict code, pass in a PedanticWarningMessageQueue that converts MessageLevel#WARNING and above to MessageLevel#FATAL_ERROR.


CajaTestCase.js shows how to set up a parser, and [fromResource] and [fromString] in the same class show how to get an input of the right kind.

Solution 2:

When using Java V1.8, there is a trick you can use to parse with the Nashorn implementation that comes out the box. By looking at the unit tests in the OpenSDK source code, you can see how to use the parser only, without doing all the extra compilation etc...

Options options = new Options("nashorn");
options.set("anon.functions", true);
options.set("parse.only", true);
options.set("scripting", true);

ErrorManager errors = new ErrorManager();
Context context = new Context(options, errors, Thread.currentThread().getContextClassLoader());
Source source   = new Source("test", "var a = 10; var b = a + 1;" +
            "function someFunction() { return b + 1; }  ");
Parser parser = new Parser(context.getEnv(), source, errors);
FunctionNode functionNode = parser.parse();
Block block = functionNode.getBody();
List<Statement> statements = block.getStatements();

Once this code runs, you will have the Abstract Syntax Tree (AST) for the 3 expressions in the 'statements' list.

This can then be interpreted or manipulated to your needs.

The previous example works with following imports:

import jdk.nashorn.internal.ir.Block;
import jdk.nashorn.internal.ir.FunctionNode;
import jdk.nashorn.internal.ir.Statement;
import jdk.nashorn.internal.parser.Parser;
import jdk.nashorn.internal.runtime.Context;
import jdk.nashorn.internal.runtime.ErrorManager;
import jdk.nashorn.internal.runtime.Source;
import jdk.nashorn.internal.runtime.options.Options;

You might need to add an access rule to make jdk/nashorn/internal/** accessible.


In my context, I am using Java Script as an expression language for my own Domain Specific Language (DSL) which I will then compile to Java classes at runtime and use. The AST lets me generate appropriate Java code that captures the intent of the Java Script expressions.


Nashorn is available with Java SE 8.

The link to information about getting the Nashorn source code is here: https://wiki.openjdk.java.net/display/Nashorn/Building+Nashorn