How to deduplicate iMovie files in "Original Media"?

With each project, iMovie imports videos, images, and audio into an iMovie Library.imovielibrary. I want my originals in a separate folder, along with other originals that I didn't import into iMovie and others for which I use other tools such as ffmpeg, but I still want to keep the ability to edit and export projects. At the moment I have videos in two places and the iMovie library is a bloated 300 GB in a 1 TB drive.

How can I avoid duplicating video or other files in an iMovie library and save disk space?


Solution 1:

First a disclaimer: As you can see from miguelmorin's answer, some people have created various scripts to replace the duplicate images and videos in the iMovies library with hard links or symlinks. Before going any further, I would avoid hard links. Symlinks seem to work fine with iMovie, and hard links can have weird side effects, for instance Time Machine may back them up as separate files.

In my case, I used rdfind, which is an existing utility for cleaning up duplicate files and isn't specific to iMovie or even macOS.

  1. Install rdfind

    brew install rdfind
    
  2. Do a dry run

    e.g.

    rdfind -dryrun true -minsize 1048576 -makesymlinks true ~/Pictures/ ~/Movies/
    
    • -minsize is used to avoid touching any files that aren't image or video files. Adjust it as needed.
    • Replace ~/Pictures/ with the location(s) of the original image/video files. You can list as many directories as you want, but ~/Movies/ should be last because rdfind expects the locations of the original files to be listed first.

    Update: YMMV but it looks like iMovie 10 puts all of the original images and videos in Original Media directories under ~/Movies/iMovie Librarie.imovielibrary. This will go through those directories only and run rdfind on them, in which case -minsize shouldn't be needed (as above, replace ~/Pictures/ as needed):

    find ~/Movies/ -type d -name "Original Media" -exec rdfind -dryrun true -makesymlinks true ~/Pictures/ {} \;
    
  3. Create the symlinks

    Once you're happy with the output of the command from the dry run, remove -dryrun true to replace the duplicate files with symlinks, e.g.

    rdfind -minsize 1048576 -makesymlinks true ~/Pictures/ ~/Movies/
    

    Or:

    find ~/Movies/ -type d -name "Original Media" -exec rdfind -makesymlinks true ~/Pictures/ {} \;
    

Pros:

  • Dry run option to show you what it's going to do first
  • Will actually check the files to see if they're the same rather than just compare file names
  • Will find duplicates even if the filenames are different

Cons:

  • There's no way to restrict it to only image and video files (worked around above either by using -minsize or by only running rdfind on the Original Media directories)

Solution 2:

This page suggests replacing the video files with links to the original, which saves space. It has this gist in ruby, and I coded this gist in Python, which is also below. The iMovie library went from 300 GB to 5 GB because I skipped two projects I was still working on.

Like the ruby version:

  • it goes through an iMovie 10 library and replaces the files in Original Media for which it can find a correspondence with links
  • it requires you to import into the library, quit iMovie, and then run the script.

Unlike the ruby version:

  • it uses symlinks to the original media instead of hard links (I confirmed that it works just as well)
  • you can define the filetypes to replace (movie, audio, image)
  • you can adapt the global variable PROJECTS_TO_SKIP to avoid replacing media on some projects that you may be working on.
  • you can skip projects that you're still working on
  • it assumes that your iMovie library and originals folder are organized by the same event name, because in my case I had multiple DSC001.MOV and I use the event name to distinguish them
  • if the event names are different, e.g. if you create two events titled "movie" then iMovie renames the second to "movie 1", you can adapt the global variable SHOW_NAME_CORRESPONDENCE to map the name of the iMovie event to the name of the folder with the original content.
import doctest
import glob
import os
import pathlib
import shutil
import sys

FILE_SUFFIXES_LOWERCASE = [".mp4", ".mts", ".mov", ".jpg", ".jpeg", ".png"]

PROJECTS_TO_SKIP = []  # e.g., ["project 1", "project 2"]

SHOW_NAME_CORRESPONDENCE = {}  # e.g. {"movie": "movie 1"}

def skip(f):
    """Returns a boolean for whether to skip a file depending on suffix.
    >>> skip("abc.mp4")
    False
    >>> skip("ABC.JPEG")
    False
    >>> skip("abc.plist")
    True
    >>> skip("00114.MTS")
    False
    """
    suffix = pathlib.Path(f).suffix.lower()
    return suffix not in FILE_SUFFIXES_LOWERCASE

def get_show_and_name(f):
    """
    >>> show, name = get_show_and_name("/Volumes/video/iMovie Library.imovielibrary/my great show/Original Media/00117.mts")
    >>> "my great show" == show
    True
    >>> "00117.mts" == name
    True
    >>> show, name = get_show_and_name("/Volumes/video/path/to/originals/my great show/00117.mts")
    >>> "my great show" == show
    True
    >>> "00117.mts" == name
    True
    """
    path = pathlib.Path(f)
    name = path.name.lower()

    dirname = str(path.parents[0])

    imovie = "iMovie Library.imovielibrary" in dirname
    parent_dir = str(path.parents[2 if imovie else 1])
    show = dirname.replace(parent_dir, "")

    if imovie:
        assert show.endswith("/Original Media"), f
        show = show.replace("/Original Media", "")

    assert show.startswith("/")
    show = show[1:].lower()

    if show in SHOW_NAME_CORRESPONDENCE:
        show = SHOW_NAME_CORRESPONDENCE[show]

    return show, name

def build_originals_dict(originals):
    """Go through the original directory to build a dictionary of filenames to paths."""
    originals_dic = dict()

    for f in glob.glob(os.path.join(originals, "**", "*.*"), recursive=True):
        if skip(f):
            continue

        show, name = get_show_and_name(f)

        originals_dic[(show, name)] = f

    return originals_dic

def replace_files_with_symlinks(library, originals):
    """Go through the iMovie library and find the replacements."""
    originals_dic = build_originals_dict(originals)

    # List files recursively
    for f in glob.glob(os.path.join(library, "**", "*.*"), recursive=True):
        if skip(f) or os.path.islink(f):
            continue

        show, name = get_show_and_name(f)

        if (show, name) in originals_dic:
            target = originals_dic[(show, name)]

            print("Replacing %s with %s" % (f, target))
            os.unlink(f)
            os.symlink(target, f)
        else:
            print("No original found for %s" % f)

def main():
    args = sys.argv
    assert 3 == len(args), "You need to pass 3 arguments"
    library = args[1]
    originals = args[2]

    replace_files_with_symlinks(library = library, originals = originals)

if "__main__" == __name__:
    r = doctest.testmod()
    assert 0 == r.failed, "Problem: doc-tests do not pass!"

    main()