What is an efficient way to do a video crossfade with FFmpeg?

TL;DR version:

This example performs video only, assuming both video clips are the same resolution, frame rate, etc. This will create a 1-second fade in between fadeoutclip and fadeinclip. Assume that fadeoutclip is 10 seconds long. Note that this is formatted for clarity: it's really one line of code.

ffmpeg -i fadeoutclip.mp4 -i fadeinclip.mp4 -an \
-filter_complex "\
    [0:v]trim=start=0:end=9,setpts=PTS-STARTPTS[firstclip]; \
    [1:v]trim=start=1,setpts=PTS-STARTPTS[secondclip]; \
    [0:v]trim=start=9:end=10,setpts=PTS-STARTPTS[fadeoutsrc]; \
    [1:v]trim=start=0:end=1,setpts=PTS-STARTPTS[fadeinsrc]; \
    [fadeinsrc]format=pix_fmts=yuva420p, \
                fade=t=in:st=0:d=1:alpha=1[fadein]; \
    [fadeoutsrc]format=pix_fmts=yuva420p, \
                fade=t=out:st=0:d=1:alpha=1[fadeout]; \
    [fadein]fifo[fadeinfifo]; \
    [fadeout]fifo[fadeoutfifo]; \
    [fadeoutfifo][fadeinfifo]overlay[crossfade]; \
    [firstclip][crossfade][secondclip]concat=n=3[output] \
    " \
-map "[output]" <add in encoding part here>

Full Version:

Here is an explanation of what this was all about:

Input Specification...obvious

ffmpeg -i fadeoutclip.mp4 -i fadeinclip.mp4 -an

Creating a filter_complex: assuming you already understand filter complexes:

-filter_complex

First we break the two streams into two pieces each using the trim filter: the content and the cross fade section. The fade out is broken into content and fade section, while the fade in is cut into the fade section and content. Total of four sections.

Note that strictly speaking, we don't have to break the cross fade sections out: we COULD just specify the fade out and fade in times for the two video clips. However, by doing this, we:

  • Follow the methodology typically used by GUI video editors
  • Avoid the frustrating complexity of the overlay filter's usage
  • Ensure that the solution is as general as possible (i.e. reusable code)
  • Allow us to pre-process and post-process the crossfade section as necessary (not done here)

Each of these four sections specifies: start time (seconds), end time (seconds), and the mysterious setpts=PTS-STARTPTS filter, which essentially makes each video subclip start at 0 seconds. This will be vital when re-compositing them.

Note that the s=0 specifiers are redundant, and the setpts filter for the s=0 ones is ALSO redundant. However, both are specified redundantly to allow the start time change from 0, without breaking the filter complex. Also, the second content clip runs to the end, so the e= part (end=) is not specified.

    [0:v]trim=s=0:e=9,setpts=PTS-STARTPTS[firstclip];
    [1:v]trim=s=1,setpts=PTS-STARTPTS[secondclip];
    [0:v]trim=s=9:e=10,setpts=PTS-STARTPTS[fadeoutsrc];
    [1:v]trim=s=0:e=1,setpts=PTS-STARTPTS[fadeinsrc];

Next, we specify the fade in and fade out: We first add an alpha (transparency) channel to both fade sections by specifying a pixel format of yuva420p. You can actually use any format that provides an alpha channel.

Next in this filter subcomplex we specify one to fade out, and one to fade in. The alpha=1 means that the video itself will not darken, only the transparency amount will "fade". st means start, d means duration.

    [fadeinsrc]format=pix_fmts=yuva420p,      
                fade=t=in:st=0:d=1:alpha=1[fadein];
    [fadeoutsrc]format=pix_fmts=yuva420p,
                fade=t=out:st=0:d=1:alpha=1[fadeout];

What is this?: The fifo filter ensures that there is buffer space available in the filter complex. Amazingly, this is NOT the default. If you don't do this, the crossfade could fail if the output of the stage above overruns the overlay filter below. Yeah, I know what you're thinking right now. It is indeed an FFMPEG bug.

    [fadein]fifo[fadeinfifo];
    [fadeout]fifo[fadeoutfifo];

Now, overlay the two fade sections: By making sure the two crossfade sections are the same size, we don't have to worry about the rather nasty options the overlay filter takes (and so we ignore them here):

    [fadeoutfifo][fadeinfifo]overlay[crossfade];

Finally, we line up our three segments using the concat filter.

    [firstclip][crossfade][secondclip]concat=n=3[output]

And now, map the output pad as your video source.

DO NOT FORGET to set the pixel format TO WHAT YOU NORMALLY USE (typically yuv420p), as the crossfade section will have set it to yuv420 on the output channel! (as we didn't specify it, you can using the overlay arguments) Of course, if you WANT yuv420, then you're fine :-)

-map "[output]" <add your normal encoding part here>

You can then recombine audio in later (outside of the scope of this Q&A)


FFmpeg now has an xfade (crossfade) filter.

You can use this on two inputs via a complex filterchain:

ffmpeg -i input1.mp4 -i input2.mp4 \
  -filter_complex "[0:v][1:v]xfade=transition=fade:duration=1:offset=2.5" \
  output.mp4

Here, the videos will be concatenated, and after 2.5 seconds (offset=2.5) the first one will fade over to the second one, with a fade duration of one second (duration=1).

You can see examples of what crossfades are available on the FFmpeg wiki.