download file via http only if changed since last update

I need to download a file from a HTTP server, but only if it changed since the last time I downloaded it (e.g. via the If-Modified-Since header). I also need to use a custom name for the file on my disk.

What tool can I use for this task on linux?


wget -N cannot be used because -N cannot be used with -O.


Consider using curl instead of wget:

curl -o "$file" -z "$file" "$uri"

man curl says:

-z/--time-cond <date expression>

(HTTP/FTP) Request a file that has been modified later than the given time and date, or one that has been modified before that time. The date expression can be all sorts of date strings or if it doesn't match any internal ones, it tries to get the time from a given file name instead.

If $file doesn't necessarily pre-exist, you'll need to make the use of the -z flag conditional, using test -e "$file":

if test -e "$file"
then zflag="-z '$file'"
else zflag=
fi
curl -o "$file" $zflag "$uri"

(Note that we don't quote the expansion of $zflag here, as we want it to undergo splitting to 0 or 2 tokens).

If your shell supports arrays (e.g. Bash), then we have a safer and cleaner version:

if test -e "$file"
then zflag=(-z "$file")
else zflag=()
fi
curl -o "$file" "${zflag[@]}" "$uri"

The wget switch -N only gets the file if it has changed so a possible approach would be to use the simple -N switch which will get the file if it needs to but leaves it with the wrong name. Then create a hard link using the ln -P command to link it to a "file" with the correct name. The linked file has the same metadata as the original.

The only limitation being that you cannot have hard links across file system boundaries.


Python 3.5+ script for wrapping curl command:

import argparse
import pathlib

from subprocess import run
from itertools import chain

parser = argparse.ArgumentParser()
parser.add_argument('url')
parser.add_argument('filename', type=pathlib.Path)
args = parser.parse_args()

run(chain(
    ('curl', '-s', args.url),
    ('-o', str(args.filename)),
    ('-z', str(args.filename)) if args.filename.exists() else (),
))

A similar approach to "date check" (with "curl --time-cond"), would be to download according to file size comparison, i.e. Download only if the local file has a different size than the remote file.

It is useful for example, when the download process failed in the middle, and thus the local downloaded file gets a newer date than the remote file, but it's actually corrupted, and re-downloading is required:

local_file_size=$([[ -f ${FILE_NAME} ]] && wc -c < ${FILE_NAME} || echo "0")
remote_file_size=$(curl -sI ${FILE_URL} | awk '/Content-Length/ { print $2 }' | tr -d '\r' )

if [[ "$local_file_size" -ne "$remote_file_size" ]]; then
    curl -o ${FILE_NAME} ${FILE_URL}
fi

The "curl -z / --time-cond" option (that was suggested in another answer) will not download the remote file in this case (cause the local file has a newer date), but this "size check" script will!