Intercept and modify DOM before page is displayed to user

I'm trying to create a Firefox addon (using addon SDK) that will modify how the page is displayed, mostly as a training/learning exercise.

For some tasks (like augmenting pages with new functionality) using pageMod is perfectly fine. Page loads and I run some JS to show/hide/add elements.

My problem is: can I perform modification on DOM (so: the HTML document that is returned by server) before the page even starts displaying?

For example: the page returned from server is:

<html>
    <body>
        <table>
            <tr>
                <td>Item 1.1</td>
                <td>Item 1.2</td>
                <td>Item 1.3</td>
            </tr>
            <tr>
                <td>Item 2.1</td>
                <td>Item 2.2</td>
                <td>Item 2.3</td>
            </tr>
        </table>
    </body>
</html>

but I would like for FF to render instead:

<html>
    <body>
        <ul>
            <li>Item 1.1, Item 1.2, Item 1.3</li>
            <li>Item 2.1, Item 2.2, Item 2.3</li>
        </ul>
    </body>
</html>

Doing it after the page loads would first display the table, and then it would quickly 'blink' into a list. It could be fast enough, but if I were to change <img> tags into <a>, to for example prevent (not wanted) image loads, it is not sufficient.

I was looking at using contentScriptWhen: "start" in pageMod and trying attaching listeners, but I just can't see how I can actually modify the DOM 'on the fly' (or event prevent any kind of page display before all page was loaded).

I've checked cloud-to-butt extension, as it does modify the page on the fly, but I wasn't even able to get it to work: when attached as a pageMod on start the code failed on:

 document.getElementById('appcontent').addEventListener('DOMContentLoaded', function(e)

because document.getElementById('appcontent') was returning null.

I would be immensely thankful for some pointers: is it possible, how to attach the script, how to intercept the HTML and send it back on its way after some modifications.

EDIT: Ok, so I think that I'm able to intercept the data:

let { Ci,Cr,CC } = require('chrome');
let { on } = require('sdk/system/events');
let { newURI } = require('sdk/url/utils');
let ScriptableInputStream = CC("@mozilla.org/scriptableinputstream;1", "nsIScriptableInputStream", "init");
on('http-on-examine-response', function (event) {
    var httpChannel = event.subject.QueryInterface(Ci.nsIHttpChannel);
    var traceChannel = event.subject.QueryInterface(Ci.nsITraceableChannel);
    if (/example.com/.test(event.subject.URI.spec)) {
        traceChannel.setNewListener(new MyListener());
    }
}, true);

function MyListener(downloader) {
    this.data = "";
}

MyListener.prototype = {
    onStartRequest: function(request, ctx) {
        this.data = [];
    },

    onDataAvailable : function(request, context, inputStream, offset, count) {
        var scriptStream = new ScriptableInputStream(inputStream);
        this.data.push(scriptStream.read(count));
        scriptStream.close();
    },

    onStopRequest: function(request, ctx, status) {
        console.log(this.data.join(''));
    }
}

Now in onStopRequest I'd like to do something to the data and output it back to where it was originally going...

Note, that this works on strings not DOM, so it's not perfect, but it's a place to start :)

EDIT2:

Huh, I got it working, though I have a feeling I'm not really supposed to this that way:

onStopRequest: function(request, ctx, status) {
        //var newPage = this.data.join('');
        var newPage = "<html><body><h1>TEST!</h1></body></html>";
        var stream = converter.convertToInputStream(newPage);
        var count = {};
        converter.convertToByteArray(newPage, count);
        this.originalListener.onDataAvailable(request, ctx,
            stream, 0, count.value);

        this.originalListener.onStopRequest(request, ctx, status);
    },

Solution 1:

My problem is: can I perform modification on DOM (so: the HTML document that is returned by server) before the page even starts displaying?

Yes, javascript execution starts before the page is rendered the first time. The DOM Parser does notify mutation observers, so you can immediately strip elements as soon as they are added by the parser.

You can register mutation observers even in content scripts loaded with contentScriptWhen: "start" so they should be notified of all elements being added to the tree before they are rendered since the observer notifications are performed in the micro task queue while rendering happens on the macro task queue.

but I wasn't even able to get it to work: when attached as a pageMod on start the code failed on: document.getElementById('appcontent').addEventListener('DOMContentLoaded', function(e)

Of course. You should not assume that any element in particular - not even the <body> tag - is already available that early during page load. You will have to wait for them to become available.

And the DOMContentLoaded event can simply be registered on the document object. I don't know why you would register it on some Element.

(or event prevent any kind of page display before all page was loaded).

You don't really want that because it would increase page load times and thus reduce responsiveness of the website.

Solution 2:

/*
 * contentScriptWhen: "start"
 *
 * "start": Load content scripts immediately after the document
 * element is inserted into the DOM, but before the DOM content
 * itself has been loaded
 */

/*
 * use an empty HTMLElement both as a place_holder
 * and a way to prevent the DOM content from loading
 */
document.replaceChild(
        document.createElement("html"), document.children[0]);
var rqst = new XMLHttpRequest();
rqst.open("GET", document.URL);
rqst.responseType = 'document';
rqst.onload = function(){
    if(this.status == 200) {
        /* edit the document */
        this.response.children[0].children[1].querySelector(
                "#content-load + div + script").remove();

        /* replace the place_holder */
        document.replaceChild(
                document.adoptNode(
                    this.response.children[0]),
                document.children[0]);

        // use_the_new_world();
    }
};
rqst.send();