Can I stop wget creating duplicates?

Solution 1:

I suggest you use the -N option.

-N
--timestamping
    Turn on time-stamping.

It enables time-stamping, which re-downloads the file only if its newer on the server than the downloaded version.

$ wget -N https://cdn.sstatic.net/askubuntu/img/logo.png
...
Saving to: ‘logo.png’
...

$ wget -N https://cdn.sstatic.net/askubuntu/img/logo.png
...
Server file no newer than local file ‘logo.png’ -- not retrieving.

Caveat (from αғsнιη's comment)

If the server is not configured properly, it may always report that the file is new and -N will always re-download the file. In this case, -nc is probably a better option.

Solution 2:

Yes it's -c option.

--continue
    Continue getting a partially-downloaded file.  This is useful when you want to
    finish up a download started by a previous instance of Wget, or by another
    program.

If the file is the same, the second download attempt will stop.

$ wget -c https://cdn.sstatic.net/askubuntu/img/logo.png
...
Saving to: ‘logo.png’
...

$ wget -c https://cdn.sstatic.net/askubuntu/img/logo.png
...
The file is already fully retrieved; nothing to do.

Caveats (from jofel's comments)

If the file has changed on the server, the -c option can give incorrect results.

With -c, wget simply asks the server for any data beyond the part of the already downloaded file, nothing else. It does not check if there was any change in the part of the file that is already downloaded. Thus, you could a corrupted file which is a mixture of the old and new file.


Local test

You can test it by running simple local web-server as following(Thanks to @roadmr's answer):

Open a Terminal windows and type:

cd /path/to/parent-download-dir/
python -m SimpleHTTPServer

Now open another Terminal and do:

wget -c http://localhost:8000/filename-to-download

Note that filename-to-download is the file that located in /path/to/parent-download-dir/ which we want to download it.

Now if you run wget command for multiple times you will see:

The file is already fully retrieved; nothing to do.

Ok,now go to /path/to/parent-download-dir/ directory and add something to the source file, for example if it is a text file, add a simple extra line in it and save the file. Now try with wget -c ... . Great, now you will see the file re-downloads again but you already have downloaded it before.

Reason: why re-downloading?

because its size changed to larger size than old downloaded file and nothing else.