What is the correct way to fix keyframes in FFmpeg for DASH?

When conditioning a stream for DASH playback, random access points must be at the exact same source stream time in all streams. The usual way to do this is to force a fixed frame rate and fixed GOP length (i.e. a keyframe every N frames).

In FFmpeg, fixed frame rate is easy (-r NUMBER).

But for fixed keyframe locations (GOP length), there are three methods...which one is "correct"? The FFmpeg documentation is frustratingly vague on this.

Method 1: messing with libx264's arguments

-c:v libx264 -x264opts keyint=GOPSIZE:min-keyint=GOPSIZE:scenecut=-1

There seems to be some debate if scenecut should be turned off or not, as it is unclear if the keyframe "counter" is restarted when a scene cut happens.

Method 2: setting a fixed GOP size:

-g GOP_LEN_IN_FRAMES

This is unfortunately only documented in passing in the FFMPEG documentation, and thus the effect of this argument is very unclear.

Method 3: insert a keyframe every N seconds (Maybe?):

-force_key_frames expr:gte(t,n_forced*GOP_LEN_IN_SECONDS)

This is explicitly documented. But it is still not immediately clear if the "time counter" restarts after every key frame. For instance, in an expected 5-second GOP, if there is a scenecut keyframe injected 3 seconds in by libx264, would the next keyframe be 5 seconds later or 2 seconds later?

In fact, the FFmpeg documentation differentiates between this and the -g option, but it doesn't really say how these two options above are the least bit different (obviously, -g is going to require a fixed frame rate).

Which is right?

It would seem that the -force_key_frames would be superior, as it would not require a fixed frame rate. However, this requires that

  • it conforms to GOP specifications in H.264 (if any)
  • it GUARANTEES that there would be a keyframe in fixed cadence, irrespective of libx264 scenecut keyframes.

It would also seem that -g could not work without forcing a fixed frame rate (-r), as there is no guarantee that multiple runs of ffmpeg with different codec arguments would provide the same instantaneous frame rate in each resolution. Fixed frame rates may reduce compression performance (IMPORTANT in a DASH scenario!).

Finally, the keyint method just seems like a hack. I hope against hope that this isn't the correct answer.

References:

An example using the -force_key_frames method

An example using the keyint method

FFmpeg advanced video options section


TL;DR

I would recommend the following:

  • libx264: -g X -keyint_min X (and optionally add -force_key_frames "expr:gte(t,n_forced*N)")
  • libx265: -x265-params "keyint=X:min-keyint=X"
  • libvpx-vp9: -g X

where X is the interval in frames and N is the interval in seconds. For example, for a 2-second interval with a 30fps video, X = 60 and N = 2.

A note about different frame types

In order to properly explain this topic, we first have to define the two types of I-frames / keyframes:

  • Instantaneous Decoder Refresh (IDR) frames: These allow independent decoding of the following frames, without access to frames previous to the IDR frame.
  • Non-IDR-frames: These require a previous IDR frame for the decoding to work. Non-IDR frames can be used for scene cuts in the middle of a GOP (group of pictures).

What is recommended for streaming?

For the streaming case, you want to:

  • Ensure that all IDR frames are at regular positions (e.g. at 2, 4, 6, … seconds) so that the video can be split up into segments of equal length.
  • Enable scene cut detection, so as to improve coding efficiency / quality. This means allowing I-frames to be placed in between IDR frames. You can still work with scene cut detection disabled (and this is part of many guides, still), but it's not necessary.

What do the parameters do?

In order to configure the encoder, we have to understand what the keyframe parameters do. I did some tests and discovered the following, for the three encoders libx264, libx265 and libvpx-vp9 in FFmpeg:

  • libx264:

    • -g sets the keyframe interval.
    • -keyint_min sets the minimum keyframe interval.
    • -x264-params "keyint=x:min-keyint=y" is the same as -g x -keyint_min y.
    • Note: When setting both to the same value, the minimum is internally set to half the maximum interval plus one, as seen in the x264 code:

      h->param.i_keyint_min = x264_clip3( h->param.i_keyint_min, 1, h->param.i_keyint_max/2+1 );
      
  • libx265:

    • -g is not implemented.
    • -x265-params "keyint=x:min-keyint=y" works.
  • libvpx-vp9:

    • -g sets the keyframe interval.
    • -keyint_min sets the minimum keyframe interval
    • Note: Due to how FFmpeg works, -keyint_min is only forwarded to the encoder when it is the same as -g. In the code from libvpxenc.c in FFmpeg we can find:

      if (avctx->keyint_min >= 0 && avctx->keyint_min == avctx->gop_size)
          enccfg.kf_min_dist = avctx->keyint_min;
      if (avctx->gop_size >= 0)
          enccfg.kf_max_dist = avctx->gop_size;
      

      This might be a bug (or lack of feature?), since libvpx definitely supports setting a different value for kf_min_dist.

Should you use -force_key_frames?

The -force_key_frames option forcibly inserts keyframes at the given interval (expression). This works for all encoders, but it might mess with the rate control mechanism. Especially for VP9, I've noticed severe quality fluctuations, so I cannot recommend using it in this case.


Here is my fifty cents for the case.

Method 1:

messing with libx264's arguments

-c:v libx264 -x264opts keyint=GOPSIZE:min-keyint=GOPSIZE:scenecut=-1

Generate iframes only at the desired intervals.

Example 1:

ffmpeg -i test.mp4 -codec:v libx264 \
-r 23.976 \
-x264opts "keyint=48:min-keyint=48:no-scenecut" \
-c:a copy \
-y test_keyint_48.mp4

Generate iframes as expected like this:

Iframes     Seconds
1           0
49          2
97          4
145         6
193         8
241         10
289         12
337         14
385         16
433         18
481         20
529         22
577         24
625         26
673         28
721         30
769         32
817         34
865         36
913         38
961         40
1009        42
1057        44
1105        46
1153        48
1201        50
1249        52
1297        54
1345        56
1393        58

Method 2 is depreciated. Ommitted.

Method 3:

insert a keyframe every N seconds (MAYBE):

-force_key_frames expr:gte(t,n_forced*GOP_LEN_IN_SECONDS)

Example 2

ffmpeg -i test.mp4 -codec:v libx264 \
-r 23.976 \
-force_key_frames "expr:gte(t,n_forced*2)"
-c:a copy \
-y test_fkf_2.mp4

Generate an iframes in a slightly different way:

Iframes     Seconds
1           0
49          2
97          4
145         6
193         8
241         10
289         12
337         14
385         16
433         18
481         20
519         21.58333333
529         22
577         24
625         26
673         28
721         30
769         32
817         34
865         36
913         38
931         38.75
941         39.16666667
961         40
1008        42
1056        44
1104        46
1152        48
1200        50
1248        52
1296        54
1305        54.375
1344        56
1367        56.95833333
1392        58
1430        59.58333333
1440        60
1475        61.45833333
1488        62
1536        64
1544        64.33333333
1584        66
1591        66.29166667
1632        68
1680        70
1728        72
1765        73.54166667
1776        74
1811        75.45833333
1824        75.95833333
1853        77.16666667
1872        77.95833333
1896        78.95833333
1920        79.95833333
1939        80.75
1968        81.95833333

As you can see it places iframes every 2 seconds AND on scenecut (seconds with floating part) which is important for video stream complexity in my opinion.

Genearated file sizes are pretty the same. Very strange that even with more keyframes in Method 3 it generates sometimes less file than standard x264 library algorithm.

For generating multiple bitrate files for HLS stream we choose method three. It perfectly aligned with 2 seconds between chunks, they have iframe at the beginning of every chunk and they have additional iframes on complex scenes which provides better experience for users who has an outdated devices and can not playback x264 high profiles.

Hope it helps someone.


The answer therefore seems to be:

  • Method 1 is verified to work, but is libx264-specific, and comes at the cost of eliminating the very useful scenecut option in libx264.
  • Method 3 works as of the FFMPEG version of April 2015, but you should verify your results with with the script included at the bottom of this post, as the FFMPEG documentation is unclear as to the effect of the option. If it works, it is the superior of the two options.
  • DO NOT USE Method 2, -g appears to be deprecated. It neither appears to work, nor is it explicitly defined in the documentation, nor is found in the help, nor does it appear to be used in the code. Code inspection shows that the -g option is likely meant for MPEG-2 streams (there are even code stanzas referring to PAL and NTSC!).

Also:

  • Files generated with Method 3 may be slightly larger than Method 1, as interstitial I frames (keyframes) are allowed.
  • You should explicitly set the "-r" flag in both cases, even though Method 3 places an I frame at the next frameslot on or after the time specified. Failure to set the "-r" flag places you at the mercy of the source file, possibly with a variable frame rate. Incompatible DASH transitions may result.
  • Despite the warnings in the FFMPEG documentation, method 3 is NOT less efficient than others. In fact, tests show that it might be slightly MORE efficient than method 1.

Script for the -force_key_frames option

Here is a short PERL program I used to verify I-frame cadence based on the output of slhck's ffprobe suggestion. It seems to verify that the -force_key_frames method will also work, and has the added benefit of allowing for scenecut frames. I have absolutely no idea how FFMPEG makes this work, or if I just lucked out somehow because my streams happen to be well-conditioned.

In my case, I encoded at 30fps with an expected GOP size of 6 seconds, or 180 frames. I used 180 as the gopsize argument to this program verified an I frame at each multiple of 180, but setting it to 181 (or any other number not a multiple of 180) made it complain.

#!/usr/bin/perl
use strict;
my $gopsize = shift(@ARGV);
my $file = shift(@ARGV);
print "GOPSIZE = $gopsize\n";
my $linenum = 0;
my $expected = 0;
open my $pipe, "ffprobe -i $file -select_streams v -show_frames -of csv -show_entries frame=pict_type |"
        or die "Blah";
while (<$pipe>) {
  if ($linenum > $expected) {
    # Won't catch all the misses. But even one is good enough to fail.
    print "Missed IFrame at $expected\n";
    $expected = (int($linenum/$gopsize) + 1)*$gopsize;
  }
  if (m/,I\s*$/) {
    if ($linenum < $expected) {
      # Don't care term, just an extra I frame. Snore.
      #print "Free IFrame at $linenum\n";
    } else {
      #print "IFrame HIT at $expected\n";
      $expected += $gopsize;
    }
  }
  $linenum += 1;
}