Save an exact copy of a secure webpage in vector-graphics form

I'd like to save an exact replica of a webpage in vector-graphics form, so I cannot use a screenshot technique (since that stores the image in a rastor-graphics form).

I've tried 'print to pdf' and 'save as pdf' through Safari, Chrome, and Firefox. This works most of the time. However, the pdf saved is not an exact replica for all webpages. For example, try saving this webpage as a pdf, and note how the upvote/downvote icons are not included in the saved pdf.

I've also tried saving as a WebArchive with Safari. Problem here is that I need to crop the resulting file, and I do not know how to crop a WebArchive, since Preview cannot open it, and it simply opens up in Safari (back to square one).

I've also tried web browser plugins that provide a one-click solution to save the webpage as pdf (vector-graphics form). This works better (exact page is saved) and almost solves the problem, except that these programs work by sending the page url to a cloud-based program to query and then save the page. This means that this technique will not work for https sites that need my credentials to login.

So I'm in a corner. I'm trying to save an exact vector-graphics replica of a webpage that needs my login credentials to view. How can I do this?


Solution 1:

You are getting different results printing the page to PDF than you see when viewing the page on screen.

This happens because the web page includes a CSS stylesheet which changes the page when it is being printed.

This question will help you avoid that problem: How do I print with the screen stylesheet?

Follow the instructions there to print the page with the on-screen stylesheet.

Then you should be able to print to PDF and get the same result as you see on screen.

Solution 2:

If you're not afraid of a little scripting you can try using the phantomjs application for OSX from http://phantomjs.org/

Then you would just run the included binary using the rasterize.js script with a command like:

phantomjs.exe rasterize.js http://www.example.com/sitepage 8.5in*11in outfile.pdf

A couple notes:

  • It's called 'rasterize.js' but the text itself is saved into the PDF as actual text.

  • Authentication to a secure-site using windows authentication can be accomplished by adding a couple lines to the rasterize.js script after initializing the page object:

var page = require('webpage').create(),
    system = require('system'),
    address, output, size;
    page.settings.userName="serviceUserName"; // I added these
    page.settings.password="servicePassword"; // 2 lines here

if (system.args.length  5) {