rsnapshot : will the initial backup be retained forever?
No, this is not correct. If you have multiple hardlinks to a file, it doesn't matter which one originally created the file, the file will only be deleted if the last link to the file is deleted (see the difference between a hard link as used by rsnapshot and a symbolic link) In the case of rsnapshot this means that every backup directory is self-contained and you can delete all other backup directories (including the initial one) and still have a full set of data.
Depending on how you configure rsnapshot, it will eventually delete the orginal backup set.
TL;DR: no.
It depends on what you define an "initial backup".
You first create a backup (hourly.0
), which has all the files from today.
On the next iteration, it "copies" the files (cp -L
, just copies the links to the data), to hourly.1
folder.
If all the files are the same as before, rsync won't write anything, so you have one block of data for a file (let's use myfile.jpg
), and two links (hourly.0/myfile.jpg
and hourly.1/myfile.jpg
) pointing to the same file on the drive.
On the next iteration with no changes, you still have the same data, just another pointer (hourly.2/myfile.jpg
) pointing to that data. If you have set it up to keep 3 backups, it will then delete hourly.2
, movde hourly.1
to hourly.2
, move hourly.0
to hourly.1
, "copy" (create hardlinks) from hourly.1
to create hourly.0
, and then run rsync again.
If the file changes, rsync will "remove" the file (just the link actually) hourly.0/myfile.jpg
(the data stays on the drive, since there are still two links pointing to it). Rsync will then create a new file (link+data) with the new myfile.jpg
.
So now you have one block of data with one link for the new file, one block with two links to it for the old version of the file.
On the next iteration, it deletes the hourly.2
(one link less for the data of the old file), "copies" (hard links) the new file (new link for the new file. There are two pointers for the new file data, and one for the old versions data.
On the next iteration, it deletes the last link for the old version (data with no links pointing to it, is considered free by the filesystem, and will get overwriten when needed), and three links towards the new file data.
If there is a link pointing to data (no matter from which directory), this data stays on the drive. Only once you delete all the links, then the data can get overwriten.