What is the correct way to fix keyframes in FFmpeg for DASH?
When conditioning a stream for DASH playback, random access points must be at the exact same source stream time in all streams. The usual way to do this is to force a fixed frame rate and fixed GOP length (i.e. a keyframe every N frames).
In FFmpeg, fixed frame rate is easy (-r NUMBER).
But for fixed keyframe locations (GOP length), there are three methods...which one is "correct"? The FFmpeg documentation is frustratingly vague on this.
Method 1: messing with libx264's arguments
-c:v libx264 -x264opts keyint=GOPSIZE:min-keyint=GOPSIZE:scenecut=-1
There seems to be some debate if scenecut should be turned off or not, as it is unclear if the keyframe "counter" is restarted when a scene cut happens.
Method 2: setting a fixed GOP size:
-g GOP_LEN_IN_FRAMES
This is unfortunately only documented in passing in the FFMPEG documentation, and thus the effect of this argument is very unclear.
Method 3: insert a keyframe every N seconds (Maybe?):
-force_key_frames expr:gte(t,n_forced*GOP_LEN_IN_SECONDS)
This is explicitly documented. But it is still not immediately clear if the "time counter" restarts after every key frame. For instance, in an expected 5-second GOP, if there is a scenecut
keyframe injected 3 seconds in by libx264, would the next keyframe be 5 seconds later or 2 seconds later?
In fact, the FFmpeg documentation differentiates between this and the -g
option, but it doesn't really say how these two options above are the least bit different (obviously, -g
is going to require a fixed frame rate).
Which is right?
It would seem that the -force_key_frames
would be superior, as it would not require a fixed frame rate. However, this requires that
- it conforms to GOP specifications in H.264 (if any)
- it GUARANTEES that there would be a keyframe in fixed cadence, irrespective of libx264
scenecut
keyframes.
It would also seem that -g
could not work without forcing a fixed frame rate (-r
), as there is no guarantee that multiple runs of ffmpeg
with different codec arguments would provide the same instantaneous frame rate in each resolution. Fixed frame rates may reduce compression performance (IMPORTANT in a DASH scenario!).
Finally, the keyint
method just seems like a hack. I hope against hope that this isn't the correct answer.
References:
An example using the -force_key_frames
method
An example using the keyint
method
FFmpeg advanced video options section
TL;DR
I would recommend the following:
-
libx264
:-g X -keyint_min X
(and optionally add-force_key_frames "expr:gte(t,n_forced*N)"
) -
libx265
:-x265-params "keyint=X:min-keyint=X"
-
libvpx-vp9
:-g X
where X
is the interval in frames and N
is the interval in seconds. For example, for a 2-second interval with a 30fps video, X
= 60 and N
= 2.
A note about different frame types
In order to properly explain this topic, we first have to define the two types of I-frames / keyframes:
- Instantaneous Decoder Refresh (IDR) frames: These allow independent decoding of the following frames, without access to frames previous to the IDR frame.
- Non-IDR-frames: These require a previous IDR frame for the decoding to work. Non-IDR frames can be used for scene cuts in the middle of a GOP (group of pictures).
What is recommended for streaming?
For the streaming case, you want to:
- Ensure that all IDR frames are at regular positions (e.g. at 2, 4, 6, … seconds) so that the video can be split up into segments of equal length.
- Enable scene cut detection, so as to improve coding efficiency / quality. This means allowing I-frames to be placed in between IDR frames. You can still work with scene cut detection disabled (and this is part of many guides, still), but it's not necessary.
What do the parameters do?
In order to configure the encoder, we have to understand what the keyframe parameters do. I did some tests and discovered the following, for the three encoders libx264
, libx265
and libvpx-vp9
in FFmpeg:
-
libx264
:-
-g
sets the keyframe interval. -
-keyint_min
sets the minimum keyframe interval. -
-x264-params "keyint=x:min-keyint=y"
is the same as-g x -keyint_min y
. -
Note: When setting both to the same value, the minimum is internally set to half the maximum interval plus one, as seen in the
x264
code:h->param.i_keyint_min = x264_clip3( h->param.i_keyint_min, 1, h->param.i_keyint_max/2+1 );
-
-
libx265
:-
-g
is not implemented. -
-x265-params "keyint=x:min-keyint=y"
works.
-
-
libvpx-vp9
:-
-g
sets the keyframe interval. -
-keyint_min
sets the minimum keyframe interval -
Note: Due to how FFmpeg works,
-keyint_min
is only forwarded to the encoder when it is the same as-g
. In the code fromlibvpxenc.c
in FFmpeg we can find:if (avctx->keyint_min >= 0 && avctx->keyint_min == avctx->gop_size) enccfg.kf_min_dist = avctx->keyint_min; if (avctx->gop_size >= 0) enccfg.kf_max_dist = avctx->gop_size;
This might be a bug (or lack of feature?), since
libvpx
definitely supports setting a different value forkf_min_dist
.
-
Should you use -force_key_frames
?
The -force_key_frames
option forcibly inserts keyframes at the given interval (expression). This works for all encoders, but it might mess with the rate control mechanism. Especially for VP9, I've noticed severe quality fluctuations, so I cannot recommend using it in this case.
Here is my fifty cents for the case.
Method 1:
messing with libx264's arguments
-c:v libx264 -x264opts keyint=GOPSIZE:min-keyint=GOPSIZE:scenecut=-1
Generate iframes only at the desired intervals.
Example 1:
ffmpeg -i test.mp4 -codec:v libx264 \
-r 23.976 \
-x264opts "keyint=48:min-keyint=48:no-scenecut" \
-c:a copy \
-y test_keyint_48.mp4
Generate iframes as expected like this:
Iframes Seconds
1 0
49 2
97 4
145 6
193 8
241 10
289 12
337 14
385 16
433 18
481 20
529 22
577 24
625 26
673 28
721 30
769 32
817 34
865 36
913 38
961 40
1009 42
1057 44
1105 46
1153 48
1201 50
1249 52
1297 54
1345 56
1393 58
Method 2 is depreciated. Ommitted.
Method 3:
insert a keyframe every N seconds (MAYBE):
-force_key_frames expr:gte(t,n_forced*GOP_LEN_IN_SECONDS)
Example 2
ffmpeg -i test.mp4 -codec:v libx264 \
-r 23.976 \
-force_key_frames "expr:gte(t,n_forced*2)"
-c:a copy \
-y test_fkf_2.mp4
Generate an iframes in a slightly different way:
Iframes Seconds
1 0
49 2
97 4
145 6
193 8
241 10
289 12
337 14
385 16
433 18
481 20
519 21.58333333
529 22
577 24
625 26
673 28
721 30
769 32
817 34
865 36
913 38
931 38.75
941 39.16666667
961 40
1008 42
1056 44
1104 46
1152 48
1200 50
1248 52
1296 54
1305 54.375
1344 56
1367 56.95833333
1392 58
1430 59.58333333
1440 60
1475 61.45833333
1488 62
1536 64
1544 64.33333333
1584 66
1591 66.29166667
1632 68
1680 70
1728 72
1765 73.54166667
1776 74
1811 75.45833333
1824 75.95833333
1853 77.16666667
1872 77.95833333
1896 78.95833333
1920 79.95833333
1939 80.75
1968 81.95833333
As you can see it places iframes every 2 seconds AND on scenecut (seconds with floating part) which is important for video stream complexity in my opinion.
Genearated file sizes are pretty the same. Very strange that even with more keyframes in Method 3 it generates sometimes less file than standard x264 library algorithm.
For generating multiple bitrate files for HLS stream we choose method three. It perfectly aligned with 2 seconds between chunks, they have iframe at the beginning of every chunk and they have additional iframes on complex scenes which provides better experience for users who has an outdated devices and can not playback x264 high profiles.
Hope it helps someone.
The answer therefore seems to be:
- Method 1 is verified to work, but is
libx264
-specific, and comes at the cost of eliminating the very usefulscenecut
option inlibx264
. - Method 3 works as of the FFMPEG version of April 2015, but you should verify your results with with the script included at the bottom of this post, as the FFMPEG documentation is unclear as to the effect of the option. If it works, it is the superior of the two options.
-
DO NOT USE Method 2,
-g
appears to be deprecated. It neither appears to work, nor is it explicitly defined in the documentation, nor is found in the help, nor does it appear to be used in the code. Code inspection shows that the-g
option is likely meant for MPEG-2 streams (there are even code stanzas referring to PAL and NTSC!).
Also:
- Files generated with Method 3 may be slightly larger than Method 1, as interstitial I frames (keyframes) are allowed.
- You should explicitly set the "-r" flag in both cases, even though Method 3 places an I frame at the next frameslot on or after the time specified. Failure to set the "-r" flag places you at the mercy of the source file, possibly with a variable frame rate. Incompatible DASH transitions may result.
- Despite the warnings in the FFMPEG documentation, method 3 is NOT less efficient than others. In fact, tests show that it might be slightly MORE efficient than method 1.
Script for the -force_key_frames
option
Here is a short PERL program I used to verify I-frame cadence based on the output of slhck's ffprobe suggestion. It seems to verify that the -force_key_frames
method will also work, and has the added benefit of allowing for scenecut
frames. I have absolutely no idea how FFMPEG makes this work, or if I just lucked out somehow because my streams happen to be well-conditioned.
In my case, I encoded at 30fps with an expected GOP size of 6 seconds, or 180 frames. I used 180 as the gopsize argument to this program verified an I frame at each multiple of 180, but setting it to 181 (or any other number not a multiple of 180) made it complain.
#!/usr/bin/perl
use strict;
my $gopsize = shift(@ARGV);
my $file = shift(@ARGV);
print "GOPSIZE = $gopsize\n";
my $linenum = 0;
my $expected = 0;
open my $pipe, "ffprobe -i $file -select_streams v -show_frames -of csv -show_entries frame=pict_type |"
or die "Blah";
while (<$pipe>) {
if ($linenum > $expected) {
# Won't catch all the misses. But even one is good enough to fail.
print "Missed IFrame at $expected\n";
$expected = (int($linenum/$gopsize) + 1)*$gopsize;
}
if (m/,I\s*$/) {
if ($linenum < $expected) {
# Don't care term, just an extra I frame. Snore.
#print "Free IFrame at $linenum\n";
} else {
#print "IFrame HIT at $expected\n";
$expected += $gopsize;
}
}
$linenum += 1;
}