Convert ffmpeg encoding from libx264 to h264_nvenc
Solution 1:
Try these commands with the following assumptions:
i. The default GPU selected for NVENC is 0
, and that the GPU is NVENC capable.
See this for more information on NVENC capabilities, including the hardware acceleration infrastructure available to FFmpeg on capable NVIDIA hardware. GPU selection with NVDEC hardware acceleration is toggled via the global option -hwaccel_device 0
and for the encoder, via h264_nvenc
's private codec option -gpu 0
. On a multi-GPU system, select a valid GPU as listed by nvidia-smi
.
ii. Scaling operations are done with either scale_npp
OR scale_cuda
, as these run purely on the GPU. The availability of these filters depends on how FFmpeg was configured, as explained below. For their usability, please see ffmpeg -h filter=scale_npp
and ffmpeg -h filter=scale_cuda
respectively.
iii. Hardware acceleration support for specific codecs are platform and driver-dependent. These can be verified with Philip Langdale's nv-video-info
project's nvdecinfo
's program, as documented in this answer, if you're on Linux. Note that not all video formats are supported by NVDEC, and as such, a fallback to software-based decoding will be provided.
iv. For the rate control in use, I've selected variable bitrate mode (set via the private codec option -rc:v vbr
) with constant quality parameter rate control set to a value of 21. Valid values for NVENC are from 0 to 51. The bitrate is explicitly unset via -b:v 0
to comply with the constraints of the selected rate control mode in NVENC. However, the maximum bitrate and buffer size(s) remain set to match your settings in libx264
. See this answer on why these parameters were selected: https://superuser.com/a/1630606/473795.
v. Depending on the specific version of FFmpeg built & the version of the ffnvcodec
header package in use, ie either from a release version OR from git master, the following paramerers will need to change:
(i). The rate control mode for older release builds of FFmpeg configured with older release versions of ffnvcodec
header packages only need the rate control option modified to constant bitrate via-rc:v vbr_hq
. The preset names remain unchanged.
(ii). For current FFmpeg builds from git matched with current git version of the ffnvcodec
header package, then the parameters for rate control (set via rc:v vbr
) must be set to constant bitrate mode, paired with a named preset (via -preset:v p{1-7}
), where the preset ranges from p1
(fastest) to p7
(slowest) and a valid tuning option (set via -tune:v
), where the value ranges from:
hq 1 E..V...... High quality
ll 2 E..V...... Low latency
ull 3 E..V...... Ultra low latency
lossless 4 E..V...... Lossless
You can substitute the tuning values by full names (ll
through lossless
) in FFmpeg's command lines or via the numbers identifying them (1
through 4
) as shown above.
For your reference, I'll include the parameters that the h264_nvenc
encoder wrapper's help option prints out on:
- Current FFmpeg builds:
ffmpeg -h encoder=h264_nvenc
Encoder h264_nvenc [NVIDIA NVENC H.264 encoder]:
General capabilities: delay hardware
Threading capabilities: none
Supported hardware devices: cuda cuda
Supported pixel formats: yuv420p nv12 p010le yuv444p p016le yuv444p16le bgr0 rgb0 cuda
h264_nvenc AVOptions:
-preset <int> E..V...... Set the encoding preset (from 0 to 18) (default p4)
default 0 E..V......
slow 1 E..V...... hq 2 passes
medium 2 E..V...... hq 1 pass
fast 3 E..V...... hp 1 pass
hp 4 E..V......
hq 5 E..V......
bd 6 E..V......
ll 7 E..V...... low latency
llhq 8 E..V...... low latency hq
llhp 9 E..V...... low latency hp
lossless 10 E..V......
losslesshp 11 E..V......
p1 12 E..V...... fastest (lowest quality)
p2 13 E..V...... faster (lower quality)
p3 14 E..V...... fast (low quality)
p4 15 E..V...... medium (default)
p5 16 E..V...... slow (good quality)
p6 17 E..V...... slower (better quality)
p7 18 E..V...... slowest (best quality)
-tune <int> E..V...... Set the encoding tuning info (from 1 to 4) (default hq)
hq 1 E..V...... High quality
ll 2 E..V...... Low latency
ull 3 E..V...... Ultra low latency
lossless 4 E..V...... Lossless
-profile <int> E..V...... Set the encoding profile (from 0 to 3) (default main)
baseline 0 E..V......
main 1 E..V......
high 2 E..V......
high444p 3 E..V......
-level <int> E..V...... Set the encoding level restriction (from 0 to 62) (default auto)
auto 0 E..V......
1 10 E..V......
1.0 10 E..V......
1b 9 E..V......
1.0b 9 E..V......
1.1 11 E..V......
1.2 12 E..V......
1.3 13 E..V......
2 20 E..V......
2.0 20 E..V......
2.1 21 E..V......
2.2 22 E..V......
3 30 E..V......
3.0 30 E..V......
3.1 31 E..V......
3.2 32 E..V......
4 40 E..V......
4.0 40 E..V......
4.1 41 E..V......
4.2 42 E..V......
5 50 E..V......
5.0 50 E..V......
5.1 51 E..V......
5.2 52 E..V......
6.0 60 E..V......
6.1 61 E..V......
6.2 62 E..V......
-rc <int> E..V...... Override the preset rate-control (from -1 to INT_MAX) (default -1)
constqp 0 E..V...... Constant QP mode
vbr 1 E..V...... Variable bitrate mode
cbr 2 E..V...... Constant bitrate mode
vbr_minqp 8388612 E..V...... Variable bitrate mode with MinQP (deprecated)
ll_2pass_quality 8388616 E..V...... Multi-pass optimized for image quality (deprecated)
ll_2pass_size 8388624 E..V...... Multi-pass optimized for constant frame size (deprecated)
vbr_2pass 8388640 E..V...... Multi-pass variable bitrate mode (deprecated)
cbr_ld_hq 8388616 E..V...... Constant bitrate low delay high quality mode
cbr_hq 8388624 E..V...... Constant bitrate high quality mode
vbr_hq 8388640 E..V...... Variable bitrate high quality mode
-rc-lookahead <int> E..V...... Number of frames to look ahead for rate-control (from 0 to INT_MAX) (default 0)
-surfaces <int> E..V...... Number of concurrent surfaces (from 0 to 64) (default 0)
-cbr <boolean> E..V...... Use cbr encoding mode (default false)
-2pass <boolean> E..V...... Use 2pass encoding mode (default auto)
-gpu <int> E..V...... Selects which NVENC capable GPU to use. First GPU is 0, second is 1, and so on. (from -2 to INT_MAX) (default any)
any -1 E..V...... Pick the first device available
list -2 E..V...... List the available devices
-delay <int> E..V...... Delay frame output by the given amount of frames (from 0 to INT_MAX) (default INT_MAX)
-no-scenecut <boolean> E..V...... When lookahead is enabled, set this to 1 to disable adaptive I-frame insertion at scene cuts (default false)
-forced-idr <boolean> E..V...... If forcing keyframes, force them as IDR frames. (default false)
-b_adapt <boolean> E..V...... When lookahead is enabled, set this to 0 to disable adaptive B-frame decision (default true)
-spatial-aq <boolean> E..V...... set to 1 to enable Spatial AQ (default false)
-spatial_aq <boolean> E..V...... set to 1 to enable Spatial AQ (default false)
-temporal-aq <boolean> E..V...... set to 1 to enable Temporal AQ (default false)
-temporal_aq <boolean> E..V...... set to 1 to enable Temporal AQ (default false)
-zerolatency <boolean> E..V...... Set 1 to indicate zero latency operation (no reordering delay) (default false)
-nonref_p <boolean> E..V...... Set this to 1 to enable automatic insertion of non-reference P-frames (default false)
-strict_gop <boolean> E..V...... Set 1 to minimize GOP-to-GOP rate fluctuations (default false)
-aq-strength <int> E..V...... When Spatial AQ is enabled, this field is used to specify AQ strength. AQ strength scale is from 1 (low) - 15 (aggressive) (from 1 to 15) (default 8)
-cq <float> E..V...... Set target quality level (0 to 51, 0 means automatic) for constant quality mode in VBR rate control (from 0 to 51) (default 0)
-aud <boolean> E..V...... Use access unit delimiters (default false)
-bluray-compat <boolean> E..V...... Bluray compatibility workarounds (default false)
-init_qpP <int> E..V...... Initial QP value for P frame (from -1 to 51) (default -1)
-init_qpB <int> E..V...... Initial QP value for B frame (from -1 to 51) (default -1)
-init_qpI <int> E..V...... Initial QP value for I frame (from -1 to 51) (default -1)
-qp <int> E..V...... Constant quantization parameter rate control method (from -1 to 51) (default -1)
-weighted_pred <int> E..V...... Set 1 to enable weighted prediction (from 0 to 1) (default 0)
-coder <int> E..V...... Coder type (from -1 to 2) (default default)
default -1 E..V......
auto 0 E..V......
cabac 1 E..V......
cavlc 2 E..V......
ac 1 E..V......
vlc 2 E..V......
-b_ref_mode <int> E..V...... Use B frames as references (from 0 to 2) (default disabled)
disabled 0 E..V...... B frames will not be used for reference
each 1 E..V...... Each B frame will be used for reference
middle 2 E..V...... Only (number of B frames)/2 will be used for reference
-a53cc <boolean> E..V...... Use A53 Closed Captions (if available) (default true)
-dpb_size <int> E..V...... Specifies the DPB size used for encoding (0 means automatic) (from 0 to INT_MAX) (default 0)
-multipass <int> E..V...... Set the multipass encoding (from 0 to 2) (default disabled)
disabled 0 E..V...... Single Pass
qres 1 E..V...... Two Pass encoding is enabled where first Pass is quarter resolution
fullres 2 E..V...... Two Pass encoding is enabled where first Pass is full resolution
-ldkfs <int> E..V...... Low delay key frame scale; Specifies the Scene Change frame size increase allowed in case of single frame VBV and CBR (from 0 to 255) (default 0)
- Older FFmpeg builds:
ffmpeg -h encoder=h264_nvenc
Encoder h264_nvenc [NVIDIA NVENC H.264 encoder]:
General capabilities: delay
Threading capabilities: none
Supported pixel formats: yuv420p nv12 p010le yuv444p yuv444p16le bgr0 rgb0 cuda
h264_nvenc AVOptions:
-preset <int> E..V.... Set the encoding preset (from 0 to 11) (default medium)
default E..V....
slow E..V.... hq 2 passes
medium E..V.... hq 1 pass
fast E..V.... hp 1 pass
hp E..V....
hq E..V....
bd E..V....
ll E..V.... low latency
llhq E..V.... low latency hq
llhp E..V.... low latency hp
lossless E..V....
losslesshp E..V....
-profile <int> E..V.... Set the encoding profile (from 0 to 3) (default main)
baseline E..V....
main E..V....
high E..V....
high444p E..V....
-level <int> E..V.... Set the encoding level restriction (from 0 to 51) (default auto)
auto E..V....
1 E..V....
1.0 E..V....
1b E..V....
1.0b E..V....
1.1 E..V....
1.2 E..V....
1.3 E..V....
2 E..V....
2.0 E..V....
2.1 E..V....
2.2 E..V....
3 E..V....
3.0 E..V....
3.1 E..V....
3.2 E..V....
4 E..V....
4.0 E..V....
4.1 E..V....
4.2 E..V....
5 E..V....
5.0 E..V....
5.1 E..V....
-rc <int> E..V.... Override the preset rate-control (from -1 to INT_MAX) (default -1)
constqp E..V.... Constant QP mode
vbr E..V.... Variable bitrate mode
cbr E..V.... Constant bitrate mode
vbr_minqp E..V.... Variable bitrate mode with MinQP (deprecated)
ll_2pass_quality E..V.... Multi-pass optimized for image quality (deprecated)
ll_2pass_size E..V.... Multi-pass optimized for constant frame size (deprecated)
vbr_2pass E..V.... Multi-pass variable bitrate mode (deprecated)
cbr_ld_hq E..V.... Constant bitrate low delay high quality mode
cbr_hq E..V.... Constant bitrate high quality mode
vbr_hq E..V.... Variable bitrate high quality mode
-rc-lookahead <int> E..V.... Number of frames to look ahead for rate-control (from 0 to INT_MAX) (default 0)
-surfaces <int> E..V.... Number of concurrent surfaces (from 0 to 64) (default 0)
-cbr <boolean> E..V.... Use cbr encoding mode (default false)
-2pass <boolean> E..V.... Use 2pass encoding mode (default auto)
-gpu <int> E..V.... Selects which NVENC capable GPU to use. First GPU is 0, second is 1, and so on. (from -2 to INT_MAX) (default any)
any E..V.... Pick the first device available
list E..V.... List the available devices
-delay <int> E..V.... Delay frame output by the given amount of frames (from 0 to INT_MAX) (default INT_MAX)
-no-scenecut <boolean> E..V.... When lookahead is enabled, set this to 1 to disable adaptive I-frame insertion at scene cuts (default false)
-forced-idr <boolean> E..V.... If forcing keyframes, force them as IDR frames. (default false)
-b_adapt <boolean> E..V.... When lookahead is enabled, set this to 0 to disable adaptive B-frame decision (default true)
-spatial-aq <boolean> E..V.... set to 1 to enable Spatial AQ (default false)
-temporal-aq <boolean> E..V.... set to 1 to enable Temporal AQ (default false)
-zerolatency <boolean> E..V.... Set 1 to indicate zero latency operation (no reordering delay) (default false)
-nonref_p <boolean> E..V.... Set this to 1 to enable automatic insertion of non-reference P-frames (default false)
-strict_gop <boolean> E..V.... Set 1 to minimize GOP-to-GOP rate fluctuations (default false)
-aq-strength <int> E..V.... When Spatial AQ is enabled, this field is used to specify AQ strength. AQ strength scale is from 1 (low) - 15 (aggressive) (from 1 to 15) (default 8)
-cq <float> E..V.... Set target quality level (0 to 51, 0 means automatic) for constant quality mode in VBR rate control (from 0 to 51) (default 0)
-aud <boolean> E..V.... Use access unit delimiters (default false)
-bluray-compat <boolean> E..V.... Bluray compatibility workarounds (default false)
-init_qpP <int> E..V.... Initial QP value for P frame (from -1 to 51) (default -1)
-init_qpB <int> E..V.... Initial QP value for B frame (from -1 to 51) (default -1)
-init_qpI <int> E..V.... Initial QP value for I frame (from -1 to 51) (default -1)
-qp <int> E..V.... Constant quantization parameter rate control method (from -1 to 51) (default -1)
-weighted_pred <int> E..V.... Set 1 to enable weighted prediction (from 0 to 1) (default 0)
-coder <int> E..V.... Coder type (from -1 to 2) (default default)
default E..V....
auto E..V....
cabac E..V....
cavlc E..V....
ac E..V....
vlc E..V....
And now to the parameters in use:
1. Using FFmpeg with Full hardware-accelerated decoding via nvdec
:
(a). Scaling being done with the scale_npp
filer, available when FFmpeg is built with the proprietary CUDA SDK (when the flags --enable-nonfree --enable-cuda-nvcc --nvccflags="-gencode arch=compute_52,code=sm_52 -O2"
are passed to ./configure
on build time):
Older builds:
ffmpeg -threads 1 -hwaccel nvdec -hwaccel_device 0 -hwaccel_output_format cuda -i input.avi \
-vf 'scale_npp=w=1920:h=1080:interp_algo=lanczos' -c:v h264_nvenc \
-gpu:v 0 -cq:v 21 -rc:v vbr -preset:v fast \
-b:v 0 -maxrate:v 5000K -bufsize:v 5000K -c:a aac -b:a 160k -movflags +faststart -f mp4 output.mp4
Current builds:
ffmpeg -threads 1 -hwaccel nvdec -hwaccel_device 0 -hwaccel_output_format cuda -i input.avi \
-vf 'scale_npp=w=1920:h=1080:interp_algo=lanczos' -c:v h264_nvenc \
-gpu:v 0 -cq:v 21 -rc:v vbr -tune:v ll -preset:v p1 \
-b:v 0 -maxrate:v 5000K -bufsize:v 5000K -c:a aac -b:a 160k -movflags +faststart -f mp4 output.mp4
(b). Scaling being done with the scale_cuda
filer, available when FFmpeg is built with the clang back-end configured as the nvcc
generator through llvm
(when the flags --enable-cuda-llvm --nvccflags="--cuda-gpu-arch=sm_52 -O2"
are passed to ./configure
on build time):
Older builds:
ffmpeg -threads 1 -hwaccel nvdec -hwaccel_device 0 -hwaccel_output_format cuda -i input.avi \
-vf 'scale_cuda=w=1920:h=1080' -c:v h264_nvenc \
-gpu:v 0 -cq:v 21 -rc:v vbr -preset:v fast \
-b:v 0 -maxrate:v 5000K -bufsize:v 5000K -c:a aac -b:a 160k -movflags +faststart -f mp4 output.mp4
Current builds:
ffmpeg -threads 1 -hwaccel nvdec -hwaccel_device 0 -hwaccel_output_format cuda -i input.avi \
-vf 'scale_cuda=w=1920:h=1080' -c:v h264_nvenc \
-gpu:v 0 -cq:v 21 -rc:v vbr -tune:v ll -preset:v p1 \
-b:v 0 -maxrate:v 5000K -bufsize:v 5000K -c:a aac -b:a 160k -movflags +faststart -f mp4 output.mp4
2. Using software-based decode fallback:
Note that GPU allocation is done via the filter hwupload_cuda=0
which initializes a CUDA HWContext bound to GPU 0 for all scaling operations, and for the h264_nvenc
encoder wrapper, the private option -gpu:v 0
follows.
(a). Using the scale_npp
filter:
Older builds:
ffmpeg -threads 2 -i input.avi -vf 'hwupload_cuda=0,scale_npp=w=1920:h=1080:interp_algo=lanczos' \
-c:v h264_nvenc -gpu:v 0 -cq:v 21 -rc:v vbr -preset:v fast -b:v 0 \
-maxrate:v 5000K -bufsize:v 5000K -c:a aac -b:a 160k -movflags +faststart -f mp4 output.mp4
Current builds:
ffmpeg -threads 2 -i input.avi -vf 'hwupload_cuda=0,scale_npp=w=1920:h=1080:interp_algo=lanczos' \
-c:v h264_nvenc -gpu:v 0 -cq:v 21 -rc:v vbr -tune:v ll -preset:v p1 -b:v 0 \
-maxrate:v 5000K -bufsize:v 5000K -c:a aac -b:a 160k -movflags +faststart -f mp4 output.mp4
(b). Using the scale_cuda
filter:
Older builds:
ffmpeg -threads 2 -i input.avi -vf 'hwupload_cuda=0,scale_cuda=w=1920:h=1080' \
-c:v h264_nvenc -gpu:v 0 -cq:v 21 -rc:v vbr -preset:v fast -b:v 0 \
-maxrate:v 5000K -bufsize:v 5000K -c:a aac -b:a 160k -movflags +faststart -f mp4 output.mp4
Current builds:
ffmpeg -threads 2 -i input.avi -vf 'hwupload_cuda=0,scale_cuda=w=1920:h=1080' \
-c:v h264_nvenc -gpu:v 0 -cq:v 21 -rc:v vbr -tune:v ll -preset:v p1 -b:v 0 \
-maxrate:v 5000K -bufsize:v 5000K -c:a aac -b:a 160k -movflags +faststart -f mp4 output.mp4
Kindly test and report back with your findings.