Mirroring a web site behind a login form

Short version:

  • I'd like to snap an offline copy of a URL which includes the full HTML+CSS+JS+images and saves them locally, keeping the structure and file content of the original site.
  • I'm having trouble with the tools I can find (e.g. "Save Complete" Firefox extension, HTTrack, wget, Teleport Pro) partly because the URL is behind a login form.

Longer version:

When working on my app I often want to snap an offline full HTML+CSS+JS+images version to send to the designer I work with, who makes modifications and sends it back. I then apply the changes to the app.

This has turned out to be much more efficient than having him/her navigate our code with a live app, but there's one snag - I can't find a mirroring app that's convenient.

Firefox extensions like "Save Complete" have the login cookie already so don't care that they're behind a login form, but they mangle the locally-saved files making it impossible to work with them.

Mirroring tools like wget or Teleport Pro don't support our login form.

HTTrack, though, is supposed to be able to run in proxy mode to detect the login info, but I could never get it to work. As a fallback it can accept cookies that I hard-wire into its cookies.txt file, but it always takes me hours to do this reliably.

Any tools, browser extensions, etc. that could do this? Open source, commercial - anything. If I've been misusing HTTrack and it's actually trivial to do — that's a great answer as well.


Solution 1:

I've done this successfully with WinHTTrack. You can follow the normal procedure for capturing a website, with two minor settings tweaks:

  1. In Chrome, open Dev Tools, then login to the website you need to capture. In the Network tab, click on the HTML page you requested to find your session cookie (the name of this will differ depending on the back-end framework used). Place this into HTTrack under "Additional HTTP Headers".

  2. Also ensure your user agent string matches, as sometimes sessions are blocked if the user agent string is changed.

    Session cookie login into HTTrack

  3. Start downloading the site. The result should be just as if you're logged in.

Solution 2:

With HTTrack you can have it uses a cookies.txt file when downloading. I've used it to successfully mirror a moodle site.

Solution 3:

Have you tried Offline Explorer ?

I remember something like it would let you to login, thus saving cookies for consequent requests and will do the rest. Not sure for 100% as I was using it long time back.

Solution 4:

Teleport Pro allows for a login and password to be used.

As you start a New Project Wizard you'll come to a point where it gives you that option (I think it's in the 3rd screen of options).

And even if you miss it you can access that option again.

In the main window (after have gone through the Project Wizard) right click your project (little folder icon displaying the URL you're trying to download, on the left pane) and chose the last option Starting Address Properties and you're presented with an options screen where you can specify a user login and password to be used in that site.