Automate daily saving of webarchive?
Is it possible to automate the saving of a webpage (using the .webarchive
format) using either Automator (in a background process) or using Terminal?
Downloading & saving as webarchive
A command line tool named webarchiver will download URLs and save them to .webarchive
format. You can install this tool via MacPorts (alas, not homebrew!) or compile it with XCode. I am a XCode dummy, but succeeded with instructions found here.
How to operate:
webarchiver 0.5
Usage: webarchiver -url URL -output FILE
Example: webarchiver -url http://www.google.com -output google.webarchive
-url http:// or path to local file
-output File to write webarchive to
Nice file names
This lenghty one-liner for terminal allows you to configure the desired URL and will download a YYYY-MM-DD-prefixed webarchive file:
URL="www.nytimes.com"; ./webarchiver -url "http://$URL" -output "/Users/<your username>/Desktop/$(date +"%Y-%m-%d-$URL.webarchive")"
This will save a webarchive to your Desktop:
2014-02-10-www.nytimes.com.webarchive
If you are not sure what <your username>
is, enter whoami
in Terminal.app (and press enter, of course).
Cron
I would rather use launchd
, as "the use of cron on OS X is discouraged". There is a nice launchd editor named Lingon. Have fun!
Yes is the simple answer with either.
I am on my iPad at the mo. But you can use unix command curl to download the webpage to and pipe it to the unix command textutil which can output it to a webarchive file.
If I get a chance I will post an example.
Here is a small example (quick ) of what I was thinking. Written in Applescript running do shell script commands.
property agent : "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.1.3) Gecko/20090824 Firefox/3.5.3"
property outPutFormat : "rtf"
property saveDIR : "/Users/USERNAME/Desktop/"
property fileName : "test2"
set theData to do shell script "curl " & " -A" & space & quoted form of agent & space & "http://weather.yahoo.com/france/auvergne/france-29332634/" as string
do shell script "echo " & quoted form of theData & "|textutil -format html -convert" & space & outPutFormat & space & "-stdin -output " & space & saveDIR & fileName & "." & outPutFormat
Although this works. I am not very happy with the results. This is due to curl and textutil only processing the html code but not resources.
So am working on something else that will save a webArchive in a much better way. 90% there but will take a little longer for me to write