Convert ffmpeg encoding from libx264 to h264_nvenc

Solution 1:

Try these commands with the following assumptions:

i. The default GPU selected for NVENC is 0, and that the GPU is NVENC capable. See this for more information on NVENC capabilities, including the hardware acceleration infrastructure available to FFmpeg on capable NVIDIA hardware. GPU selection with NVDEC hardware acceleration is toggled via the global option -hwaccel_device 0 and for the encoder, via h264_nvenc's private codec option -gpu 0. On a multi-GPU system, select a valid GPU as listed by nvidia-smi.

ii. Scaling operations are done with either scale_npp OR scale_cuda, as these run purely on the GPU. The availability of these filters depends on how FFmpeg was configured, as explained below. For their usability, please see ffmpeg -h filter=scale_npp and ffmpeg -h filter=scale_cuda respectively.

iii. Hardware acceleration support for specific codecs are platform and driver-dependent. These can be verified with Philip Langdale's nv-video-info project's nvdecinfo's program, as documented in this answer, if you're on Linux. Note that not all video formats are supported by NVDEC, and as such, a fallback to software-based decoding will be provided.

iv. For the rate control in use, I've selected variable bitrate mode (set via the private codec option -rc:v vbr) with constant quality parameter rate control set to a value of 21. Valid values for NVENC are from 0 to 51. The bitrate is explicitly unset via -b:v 0 to comply with the constraints of the selected rate control mode in NVENC. However, the maximum bitrate and buffer size(s) remain set to match your settings in libx264. See this answer on why these parameters were selected: https://superuser.com/a/1630606/473795.

v. Depending on the specific version of FFmpeg built & the version of the ffnvcodec header package in use, ie either from a release version OR from git master, the following paramerers will need to change: (i). The rate control mode for older release builds of FFmpeg configured with older release versions of ffnvcodec header packages only need the rate control option modified to constant bitrate via-rc:v vbr_hq. The preset names remain unchanged.

(ii). For current FFmpeg builds from git matched with current git version of the ffnvcodec header package, then the parameters for rate control (set via rc:v vbr) must be set to constant bitrate mode, paired with a named preset (via -preset:v p{1-7}), where the preset ranges from p1 (fastest) to p7 (slowest) and a valid tuning option (set via -tune:v ), where the value ranges from:

hq              1            E..V...... High quality
ll              2            E..V...... Low latency
ull             3            E..V...... Ultra low latency
lossless        4            E..V...... Lossless

You can substitute the tuning values by full names (ll through lossless) in FFmpeg's command lines or via the numbers identifying them (1 through 4) as shown above.

For your reference, I'll include the parameters that the h264_nvenc encoder wrapper's help option prints out on:

  1. Current FFmpeg builds:
ffmpeg -h encoder=h264_nvenc

Encoder h264_nvenc [NVIDIA NVENC H.264 encoder]:
    General capabilities: delay hardware 
    Threading capabilities: none
    Supported hardware devices: cuda cuda 
    Supported pixel formats: yuv420p nv12 p010le yuv444p p016le yuv444p16le bgr0 rgb0 cuda
h264_nvenc AVOptions:
  -preset            <int>        E..V...... Set the encoding preset (from 0 to 18) (default p4)
     default         0            E..V...... 
     slow            1            E..V...... hq 2 passes
     medium          2            E..V...... hq 1 pass
     fast            3            E..V...... hp 1 pass
     hp              4            E..V...... 
     hq              5            E..V...... 
     bd              6            E..V...... 
     ll              7            E..V...... low latency
     llhq            8            E..V...... low latency hq
     llhp            9            E..V...... low latency hp
     lossless        10           E..V...... 
     losslesshp      11           E..V...... 
     p1              12           E..V...... fastest (lowest quality)
     p2              13           E..V...... faster (lower quality)
     p3              14           E..V...... fast (low quality)
     p4              15           E..V...... medium (default)
     p5              16           E..V...... slow (good quality)
     p6              17           E..V...... slower (better quality)
     p7              18           E..V...... slowest (best quality)
  -tune              <int>        E..V...... Set the encoding tuning info (from 1 to 4) (default hq)
     hq              1            E..V...... High quality
     ll              2            E..V...... Low latency
     ull             3            E..V...... Ultra low latency
     lossless        4            E..V...... Lossless
  -profile           <int>        E..V...... Set the encoding profile (from 0 to 3) (default main)
     baseline        0            E..V...... 
     main            1            E..V...... 
     high            2            E..V...... 
     high444p        3            E..V...... 
  -level             <int>        E..V...... Set the encoding level restriction (from 0 to 62) (default auto)
     auto            0            E..V...... 
     1               10           E..V...... 
     1.0             10           E..V...... 
     1b              9            E..V...... 
     1.0b            9            E..V...... 
     1.1             11           E..V...... 
     1.2             12           E..V...... 
     1.3             13           E..V...... 
     2               20           E..V...... 
     2.0             20           E..V...... 
     2.1             21           E..V...... 
     2.2             22           E..V...... 
     3               30           E..V...... 
     3.0             30           E..V...... 
     3.1             31           E..V...... 
     3.2             32           E..V...... 
     4               40           E..V...... 
     4.0             40           E..V...... 
     4.1             41           E..V...... 
     4.2             42           E..V...... 
     5               50           E..V...... 
     5.0             50           E..V...... 
     5.1             51           E..V...... 
     5.2             52           E..V...... 
     6.0             60           E..V...... 
     6.1             61           E..V...... 
     6.2             62           E..V...... 
  -rc                <int>        E..V...... Override the preset rate-control (from -1 to INT_MAX) (default -1)
     constqp         0            E..V...... Constant QP mode
     vbr             1            E..V...... Variable bitrate mode
     cbr             2            E..V...... Constant bitrate mode
     vbr_minqp       8388612      E..V...... Variable bitrate mode with MinQP (deprecated)
     ll_2pass_quality 8388616      E..V...... Multi-pass optimized for image quality (deprecated)
     ll_2pass_size   8388624      E..V...... Multi-pass optimized for constant frame size (deprecated)
     vbr_2pass       8388640      E..V...... Multi-pass variable bitrate mode (deprecated)
     cbr_ld_hq       8388616      E..V...... Constant bitrate low delay high quality mode
     cbr_hq          8388624      E..V...... Constant bitrate high quality mode
     vbr_hq          8388640      E..V...... Variable bitrate high quality mode
  -rc-lookahead      <int>        E..V...... Number of frames to look ahead for rate-control (from 0 to INT_MAX) (default 0)
  -surfaces          <int>        E..V...... Number of concurrent surfaces (from 0 to 64) (default 0)
  -cbr               <boolean>    E..V...... Use cbr encoding mode (default false)
  -2pass             <boolean>    E..V...... Use 2pass encoding mode (default auto)
  -gpu               <int>        E..V...... Selects which NVENC capable GPU to use. First GPU is 0, second is 1, and so on. (from -2 to INT_MAX) (default any)
     any             -1           E..V...... Pick the first device available
     list            -2           E..V...... List the available devices
  -delay             <int>        E..V...... Delay frame output by the given amount of frames (from 0 to INT_MAX) (default INT_MAX)
  -no-scenecut       <boolean>    E..V...... When lookahead is enabled, set this to 1 to disable adaptive I-frame insertion at scene cuts (default false)
  -forced-idr        <boolean>    E..V...... If forcing keyframes, force them as IDR frames. (default false)
  -b_adapt           <boolean>    E..V...... When lookahead is enabled, set this to 0 to disable adaptive B-frame decision (default true)
  -spatial-aq        <boolean>    E..V...... set to 1 to enable Spatial AQ (default false)
  -spatial_aq        <boolean>    E..V...... set to 1 to enable Spatial AQ (default false)
  -temporal-aq       <boolean>    E..V...... set to 1 to enable Temporal AQ (default false)
  -temporal_aq       <boolean>    E..V...... set to 1 to enable Temporal AQ (default false)
  -zerolatency       <boolean>    E..V...... Set 1 to indicate zero latency operation (no reordering delay) (default false)
  -nonref_p          <boolean>    E..V...... Set this to 1 to enable automatic insertion of non-reference P-frames (default false)
  -strict_gop        <boolean>    E..V...... Set 1 to minimize GOP-to-GOP rate fluctuations (default false)
  -aq-strength       <int>        E..V...... When Spatial AQ is enabled, this field is used to specify AQ strength. AQ strength scale is from 1 (low) - 15 (aggressive) (from 1 to 15) (default 8)
  -cq                <float>      E..V...... Set target quality level (0 to 51, 0 means automatic) for constant quality mode in VBR rate control (from 0 to 51) (default 0)
  -aud               <boolean>    E..V...... Use access unit delimiters (default false)
  -bluray-compat     <boolean>    E..V...... Bluray compatibility workarounds (default false)
  -init_qpP          <int>        E..V...... Initial QP value for P frame (from -1 to 51) (default -1)
  -init_qpB          <int>        E..V...... Initial QP value for B frame (from -1 to 51) (default -1)
  -init_qpI          <int>        E..V...... Initial QP value for I frame (from -1 to 51) (default -1)
  -qp                <int>        E..V...... Constant quantization parameter rate control method (from -1 to 51) (default -1)
  -weighted_pred     <int>        E..V...... Set 1 to enable weighted prediction (from 0 to 1) (default 0)
  -coder             <int>        E..V...... Coder type (from -1 to 2) (default default)
     default         -1           E..V...... 
     auto            0            E..V...... 
     cabac           1            E..V...... 
     cavlc           2            E..V...... 
     ac              1            E..V...... 
     vlc             2            E..V...... 
  -b_ref_mode        <int>        E..V...... Use B frames as references (from 0 to 2) (default disabled)
     disabled        0            E..V...... B frames will not be used for reference
     each            1            E..V...... Each B frame will be used for reference
     middle          2            E..V...... Only (number of B frames)/2 will be used for reference
  -a53cc             <boolean>    E..V...... Use A53 Closed Captions (if available) (default true)
  -dpb_size          <int>        E..V...... Specifies the DPB size used for encoding (0 means automatic) (from 0 to INT_MAX) (default 0)
  -multipass         <int>        E..V...... Set the multipass encoding (from 0 to 2) (default disabled)
     disabled        0            E..V...... Single Pass
     qres            1            E..V...... Two Pass encoding is enabled where first Pass is quarter resolution
     fullres         2            E..V...... Two Pass encoding is enabled where first Pass is full resolution
  -ldkfs             <int>        E..V...... Low delay key frame scale; Specifies the Scene Change frame size increase allowed in case of single frame VBV and CBR (from 0 to 255) (default 0)

  1. Older FFmpeg builds:
ffmpeg -h encoder=h264_nvenc

Encoder h264_nvenc [NVIDIA NVENC H.264 encoder]:
    General capabilities: delay 
    Threading capabilities: none
    Supported pixel formats: yuv420p nv12 p010le yuv444p yuv444p16le bgr0 rgb0 cuda
h264_nvenc AVOptions:
  -preset            <int>        E..V.... Set the encoding preset (from 0 to 11) (default medium)
     default                      E..V.... 
     slow                         E..V.... hq 2 passes
     medium                       E..V.... hq 1 pass
     fast                         E..V.... hp 1 pass
     hp                           E..V.... 
     hq                           E..V.... 
     bd                           E..V.... 
     ll                           E..V.... low latency
     llhq                         E..V.... low latency hq
     llhp                         E..V.... low latency hp
     lossless                     E..V.... 
     losslesshp                   E..V.... 
  -profile           <int>        E..V.... Set the encoding profile (from 0 to 3) (default main)
     baseline                     E..V.... 
     main                         E..V.... 
     high                         E..V.... 
     high444p                     E..V.... 
  -level             <int>        E..V.... Set the encoding level restriction (from 0 to 51) (default auto)
     auto                         E..V.... 
     1                            E..V.... 
     1.0                          E..V.... 
     1b                           E..V.... 
     1.0b                         E..V.... 
     1.1                          E..V.... 
     1.2                          E..V.... 
     1.3                          E..V.... 
     2                            E..V.... 
     2.0                          E..V.... 
     2.1                          E..V.... 
     2.2                          E..V.... 
     3                            E..V.... 
     3.0                          E..V.... 
     3.1                          E..V.... 
     3.2                          E..V.... 
     4                            E..V.... 
     4.0                          E..V.... 
     4.1                          E..V.... 
     4.2                          E..V.... 
     5                            E..V.... 
     5.0                          E..V.... 
     5.1                          E..V.... 
  -rc                <int>        E..V.... Override the preset rate-control (from -1 to INT_MAX) (default -1)
     constqp                      E..V.... Constant QP mode
     vbr                          E..V.... Variable bitrate mode
     cbr                          E..V.... Constant bitrate mode
     vbr_minqp                    E..V.... Variable bitrate mode with MinQP (deprecated)
     ll_2pass_quality              E..V.... Multi-pass optimized for image quality (deprecated)
     ll_2pass_size                E..V.... Multi-pass optimized for constant frame size (deprecated)
     vbr_2pass                    E..V.... Multi-pass variable bitrate mode (deprecated)
     cbr_ld_hq                    E..V.... Constant bitrate low delay high quality mode
     cbr_hq                       E..V.... Constant bitrate high quality mode
     vbr_hq                       E..V.... Variable bitrate high quality mode
  -rc-lookahead      <int>        E..V.... Number of frames to look ahead for rate-control (from 0 to INT_MAX) (default 0)
  -surfaces          <int>        E..V.... Number of concurrent surfaces (from 0 to 64) (default 0)
  -cbr               <boolean>    E..V.... Use cbr encoding mode (default false)
  -2pass             <boolean>    E..V.... Use 2pass encoding mode (default auto)
  -gpu               <int>        E..V.... Selects which NVENC capable GPU to use. First GPU is 0, second is 1, and so on. (from -2 to INT_MAX) (default any)
     any                          E..V.... Pick the first device available
     list                         E..V.... List the available devices
  -delay             <int>        E..V.... Delay frame output by the given amount of frames (from 0 to INT_MAX) (default INT_MAX)
  -no-scenecut       <boolean>    E..V.... When lookahead is enabled, set this to 1 to disable adaptive I-frame insertion at scene cuts (default false)
  -forced-idr        <boolean>    E..V.... If forcing keyframes, force them as IDR frames. (default false)
  -b_adapt           <boolean>    E..V.... When lookahead is enabled, set this to 0 to disable adaptive B-frame decision (default true)
  -spatial-aq        <boolean>    E..V.... set to 1 to enable Spatial AQ (default false)
  -temporal-aq       <boolean>    E..V.... set to 1 to enable Temporal AQ (default false)
  -zerolatency       <boolean>    E..V.... Set 1 to indicate zero latency operation (no reordering delay) (default false)
  -nonref_p          <boolean>    E..V.... Set this to 1 to enable automatic insertion of non-reference P-frames (default false)
  -strict_gop        <boolean>    E..V.... Set 1 to minimize GOP-to-GOP rate fluctuations (default false)
  -aq-strength       <int>        E..V.... When Spatial AQ is enabled, this field is used to specify AQ strength. AQ strength scale is from 1 (low) - 15 (aggressive) (from 1 to 15) (default 8)
  -cq                <float>      E..V.... Set target quality level (0 to 51, 0 means automatic) for constant quality mode in VBR rate control (from 0 to 51) (default 0)
  -aud               <boolean>    E..V.... Use access unit delimiters (default false)
  -bluray-compat     <boolean>    E..V.... Bluray compatibility workarounds (default false)
  -init_qpP          <int>        E..V.... Initial QP value for P frame (from -1 to 51) (default -1)
  -init_qpB          <int>        E..V.... Initial QP value for B frame (from -1 to 51) (default -1)
  -init_qpI          <int>        E..V.... Initial QP value for I frame (from -1 to 51) (default -1)
  -qp                <int>        E..V.... Constant quantization parameter rate control method (from -1 to 51) (default -1)
  -weighted_pred     <int>        E..V.... Set 1 to enable weighted prediction (from 0 to 1) (default 0)
  -coder             <int>        E..V.... Coder type (from -1 to 2) (default default)
     default                      E..V.... 
     auto                         E..V.... 
     cabac                        E..V.... 
     cavlc                        E..V.... 
     ac                           E..V.... 
     vlc                          E..V.... 

And now to the parameters in use:

1. Using FFmpeg with Full hardware-accelerated decoding via nvdec:

(a). Scaling being done with the scale_npp filer, available when FFmpeg is built with the proprietary CUDA SDK (when the flags --enable-nonfree --enable-cuda-nvcc --nvccflags="-gencode arch=compute_52,code=sm_52 -O2" are passed to ./configure on build time):

Older builds:

ffmpeg -threads 1 -hwaccel nvdec -hwaccel_device 0 -hwaccel_output_format cuda -i input.avi \
-vf 'scale_npp=w=1920:h=1080:interp_algo=lanczos' -c:v h264_nvenc \
-gpu:v 0 -cq:v 21 -rc:v vbr -preset:v fast \
-b:v 0 -maxrate:v 5000K -bufsize:v 5000K -c:a aac -b:a 160k -movflags +faststart -f mp4 output.mp4

Current builds:

ffmpeg -threads 1 -hwaccel nvdec -hwaccel_device 0 -hwaccel_output_format cuda -i input.avi \
-vf 'scale_npp=w=1920:h=1080:interp_algo=lanczos' -c:v h264_nvenc \
-gpu:v 0 -cq:v 21 -rc:v vbr -tune:v ll -preset:v p1 \
-b:v 0 -maxrate:v 5000K -bufsize:v 5000K -c:a aac -b:a 160k -movflags +faststart -f mp4 output.mp4

(b). Scaling being done with the scale_cuda filer, available when FFmpeg is built with the clang back-end configured as the nvcc generator through llvm (when the flags --enable-cuda-llvm --nvccflags="--cuda-gpu-arch=sm_52 -O2" are passed to ./configure on build time):

Older builds:

ffmpeg -threads 1 -hwaccel nvdec -hwaccel_device 0 -hwaccel_output_format cuda -i input.avi \
-vf 'scale_cuda=w=1920:h=1080' -c:v h264_nvenc \
-gpu:v 0 -cq:v 21 -rc:v vbr -preset:v fast \
-b:v 0 -maxrate:v 5000K -bufsize:v 5000K -c:a aac -b:a 160k -movflags +faststart -f mp4 output.mp4

Current builds:

ffmpeg -threads 1 -hwaccel nvdec -hwaccel_device 0 -hwaccel_output_format cuda -i input.avi \
-vf 'scale_cuda=w=1920:h=1080' -c:v h264_nvenc \
-gpu:v 0 -cq:v 21 -rc:v vbr -tune:v ll -preset:v p1 \
-b:v 0 -maxrate:v 5000K -bufsize:v 5000K -c:a aac -b:a 160k -movflags +faststart -f mp4 output.mp4

2. Using software-based decode fallback:

Note that GPU allocation is done via the filter hwupload_cuda=0 which initializes a CUDA HWContext bound to GPU 0 for all scaling operations, and for the h264_nvenc encoder wrapper, the private option -gpu:v 0 follows.

(a). Using the scale_npp filter:

Older builds:

ffmpeg -threads 2 -i input.avi -vf 'hwupload_cuda=0,scale_npp=w=1920:h=1080:interp_algo=lanczos' \
-c:v h264_nvenc -gpu:v 0 -cq:v 21 -rc:v vbr -preset:v fast -b:v 0 \
-maxrate:v 5000K -bufsize:v 5000K -c:a aac -b:a 160k -movflags +faststart -f mp4 output.mp4

Current builds:

ffmpeg -threads 2 -i input.avi -vf 'hwupload_cuda=0,scale_npp=w=1920:h=1080:interp_algo=lanczos' \
-c:v h264_nvenc -gpu:v 0 -cq:v 21 -rc:v vbr -tune:v ll -preset:v p1 -b:v 0 \
-maxrate:v 5000K -bufsize:v 5000K -c:a aac -b:a 160k -movflags +faststart -f mp4 output.mp4

(b). Using the scale_cuda filter:

Older builds:

ffmpeg -threads 2 -i input.avi -vf 'hwupload_cuda=0,scale_cuda=w=1920:h=1080' \
-c:v h264_nvenc -gpu:v 0 -cq:v 21 -rc:v vbr -preset:v fast -b:v 0 \
-maxrate:v 5000K -bufsize:v 5000K -c:a aac -b:a 160k -movflags +faststart -f mp4 output.mp4

Current builds:

ffmpeg -threads 2 -i input.avi -vf 'hwupload_cuda=0,scale_cuda=w=1920:h=1080' \
-c:v h264_nvenc -gpu:v 0 -cq:v 21 -rc:v vbr -tune:v ll -preset:v p1 -b:v 0 \
-maxrate:v 5000K -bufsize:v 5000K -c:a aac -b:a 160k -movflags +faststart -f mp4 output.mp4

Kindly test and report back with your findings.