rsync: Sync folders, but keep extra files in target

rsync has an option called --exclude-from option which allows you to create a file containing a list of any files which you would like to exclude. You can update this file whenever you want to add a new exclusion, or remove an old one.

If you create the exclude file at /home/user/rsync_exclude the new command would be:

rsync -a --delete --exclude-from="/home/user/rsync_exclude" "${source_dir}" "${target_dir}"

When creating the exclude list file, you should put each exclusion rule on a separate line. The exclusions are relative to your source directory. If your /home/user/rsync_exclude file contained the following options:

secret_file
first_dir/subdir/*
second_dir/common_name.*
  • Any file or directory called secret_file in your source directory will be excluded.
  • Any files in ${source_dir}/first_dir/subdir will be excluded, but an empty version of subdir will be synced.
  • Any files in ${source_dir}/second_dir with a prefix of common_name. will be ignored. So common_name.txt, common_name.jpg etc.

Since you mentioned: I am not limited to rsync:

Script to maintain the mirror, allowing to add extra files to target

Below a script that does exactly what you describe.

The script can be run in verbose mode (to be set in the script), which will output the progress of the backup (mirroring). No need to say this can also be used to log the backups:

Verbose option

enter image description here


The concept

1. On first backup, the script:

  • creates a file (in the target directory), where all files and directories are listed; .recentfiles
  • creates an exact copy (mirror) of all files and directories in the target directory

2. On the next and so on backup

  • The script compares directory structure and modification date(s) of the files. New files and dirs in the source are copied to the mirror. At the same time a second (temporary) file is created, listing the current files and dirs in the source directory; .currentfiles.
  • Subsequently, .recentfiles (listing the situation on previous backup) is compared to .currentfiles. Only files from .recentfiles which are not in .currentfiles are obviously removed from the source, and will be removed from the target.
  • Files you manually added to the target folder are not in anyway "seen" by the script, and are left alone.
  • Finally, the temporary .currentfiles is renamed to .recentfiles to serve the next backup cycle and so on.

The script

#!/usr/bin/env python3
import os
import sys
import shutil

dr1 = sys.argv[1]; dr2 = sys.argv[2]

# --- choose verbose (or not)
verbose = True
# ---

recentfiles = os.path.join(dr2, ".recentfiles")
currentfiles = os.path.join(dr2, ".currentfiles")

if verbose:
    print("Counting items in source...")
    file_count = sum([len(files)+len(d) for r, d, files in os.walk(dr1)])
    print(file_count, "items in source")
    print("Reading directory & file structure...")
    done = 0; chunk = int(file_count/5); full = chunk*5

def show_percentage(done):
    if done % chunk == 0:
        print(str(int(done/full*100))+"%...", end = " ")

for root, dirs, files in os.walk(dr1):
    for dr in dirs:
        if verbose:
            if done == 0:
                print("Updating mirror...")
            done = done + 1
            show_percentage(done) 
        target = os.path.join(root, dr).replace(dr1, dr2)
        source = os.path.join(root, dr)
        open(currentfiles, "a+").write(target+"\n")
        if not os.path.exists(target):
            shutil.copytree(source, target)
    for f in files:
        if verbose:
            done = done + 1
            show_percentage(done)
        target = os.path.join(root, f).replace(dr1, dr2)
        source = os.path.join(root, f)
        open(currentfiles, "a+").write(target+"\n") 
        sourcedit = os.path.getmtime(source)
        try:
            if os.path.getmtime(source) > os.path.getmtime(target):
                shutil.copy(source, target)   
        except FileNotFoundError:
            shutil.copy(source, target)

if verbose:
    print("\nChecking for deleted files in source...")

if os.path.exists(recentfiles):
    recent = [f.strip() for f in open(recentfiles).readlines()]
    current = [f.strip() for f in open(currentfiles).readlines()]
    remove = set([f for f in recent if not f in current])
    for f in remove:
        try:
            os.remove(f)
        except IsADirectoryError:
            shutil.rmtree(f)
        except FileNotFoundError:     
            pass
        if verbose:
            print("Removed:", f.split("/")[-1])

if verbose:
    print("Done.")

shutil.move(currentfiles, recentfiles)

How to use

  1. Copy the script into an empty file, save it as backup_special.py
  2. Change -if you want- the verbose option in the head of the script:

    # --- choose verbose (or not)
    verbose = True
    # ---
    
  3. Run it with source and target as arguments:

     python3 /path/to/backup_special.py <source_directory> <target_directory>
    

Speed

I tested the script on a 10 GB directory with some 40.000 files and dirs on my network drive (NAS), it made the backup in pretty much the same time as rsync.

Updating the whole directory took only a few seconds more than rsync, on 40.000 files, which is imo acceptable and no surprise, since the script needs to compare the content to the last made backup.