rsync: Sync folders, but keep extra files in target
rsync
has an option called --exclude-from
option which allows you to create a file containing a list of any files which you would like to exclude. You can update this file whenever you want to add a new exclusion, or remove an old one.
If you create the exclude file at /home/user/rsync_exclude
the new command would be:
rsync -a --delete --exclude-from="/home/user/rsync_exclude" "${source_dir}" "${target_dir}"
When creating the exclude list file, you should put each exclusion rule on a separate line. The exclusions are relative to your source directory. If your /home/user/rsync_exclude
file contained the following options:
secret_file
first_dir/subdir/*
second_dir/common_name.*
- Any file or directory called
secret_file
in your source directory will be excluded. - Any files in
${source_dir}/first_dir/subdir
will be excluded, but an empty version ofsubdir
will be synced. - Any files in
${source_dir}/second_dir
with a prefix ofcommon_name.
will be ignored. Socommon_name.txt
,common_name.jpg
etc.
Since you mentioned: I am not limited to rsync:
Script to maintain the mirror, allowing to add extra files to target
Below a script that does exactly what you describe.
The script can be run in verbose mode (to be set in the script), which will output the progress of the backup (mirroring). No need to say this can also be used to log the backups:
Verbose option
The concept
1. On first backup, the script:
- creates a file (in the target directory), where all files and directories are listed;
.recentfiles
- creates an exact copy (mirror) of all files and directories in the target directory
2. On the next and so on backup
- The script compares directory structure and modification date(s) of the files. New files and dirs in the source are copied to the mirror. At the same time a second (temporary) file is created, listing the current files and dirs in the source directory;
.currentfiles
. - Subsequently,
.recentfiles
(listing the situation on previous backup) is compared to.currentfiles
. Only files from.recentfiles
which are not in.currentfiles
are obviously removed from the source, and will be removed from the target. - Files you manually added to the target folder are not in anyway "seen" by the script, and are left alone.
- Finally, the temporary
.currentfiles
is renamed to.recentfiles
to serve the next backup cycle and so on.
The script
#!/usr/bin/env python3
import os
import sys
import shutil
dr1 = sys.argv[1]; dr2 = sys.argv[2]
# --- choose verbose (or not)
verbose = True
# ---
recentfiles = os.path.join(dr2, ".recentfiles")
currentfiles = os.path.join(dr2, ".currentfiles")
if verbose:
print("Counting items in source...")
file_count = sum([len(files)+len(d) for r, d, files in os.walk(dr1)])
print(file_count, "items in source")
print("Reading directory & file structure...")
done = 0; chunk = int(file_count/5); full = chunk*5
def show_percentage(done):
if done % chunk == 0:
print(str(int(done/full*100))+"%...", end = " ")
for root, dirs, files in os.walk(dr1):
for dr in dirs:
if verbose:
if done == 0:
print("Updating mirror...")
done = done + 1
show_percentage(done)
target = os.path.join(root, dr).replace(dr1, dr2)
source = os.path.join(root, dr)
open(currentfiles, "a+").write(target+"\n")
if not os.path.exists(target):
shutil.copytree(source, target)
for f in files:
if verbose:
done = done + 1
show_percentage(done)
target = os.path.join(root, f).replace(dr1, dr2)
source = os.path.join(root, f)
open(currentfiles, "a+").write(target+"\n")
sourcedit = os.path.getmtime(source)
try:
if os.path.getmtime(source) > os.path.getmtime(target):
shutil.copy(source, target)
except FileNotFoundError:
shutil.copy(source, target)
if verbose:
print("\nChecking for deleted files in source...")
if os.path.exists(recentfiles):
recent = [f.strip() for f in open(recentfiles).readlines()]
current = [f.strip() for f in open(currentfiles).readlines()]
remove = set([f for f in recent if not f in current])
for f in remove:
try:
os.remove(f)
except IsADirectoryError:
shutil.rmtree(f)
except FileNotFoundError:
pass
if verbose:
print("Removed:", f.split("/")[-1])
if verbose:
print("Done.")
shutil.move(currentfiles, recentfiles)
How to use
- Copy the script into an empty file, save it as
backup_special.py
-
Change -if you want- the verbose option in the head of the script:
# --- choose verbose (or not) verbose = True # ---
-
Run it with source and target as arguments:
python3 /path/to/backup_special.py <source_directory> <target_directory>
Speed
I tested the script on a 10 GB directory with some 40.000 files and dirs on my network drive (NAS), it made the backup in pretty much the same time as rsync.
Updating the whole directory took only a few seconds more than rsync, on 40.000 files, which is imo acceptable and no surprise, since the script needs to compare the content to the last made backup.