Understanding pixel format and profile when encoding 10-bit video in ffmpeg with nvenc

...even though it supports main10, no 10-bit pixel format is supported:

Hardware HEVC encoder uses pixel formats p010le and p016le for 10-bit output where first one produces yuv 4:2:0 and the second one yuv 4:4:4.

If I disable the fallback to p010le, the output would be Bit depth : 8 bits but Format profile : Main 10@L5@Main. What does it mean?

The profile specifies minimum capabilities of a device to be able to play the video and vice-versa for encoder specifies maximum values that can be used to encode the video.

This means: if you specify that video is Main 10@L5@Main it can be played on any TV that supports 10bit format and is able to decode at least 25Mbps. However this does not tell encoder how to actually encode the video but rather tells it that the video can have maximum of 10bits per sample and the bitrate cannot exceed 25Mbps, which means if encoder creates 8bit video with 5Mbps, it still fulfills the given conditions and can mark the video as Main 10@L5@Main.

If you want to tell the encoder what color depth and bitrate it should use, you must specify it by other parameters (see below).


Here is a command I use to convert videos from AVC to HEVC 10-bit using Pascal encoder (GTX 10x0 cards):

ffmpeg -y -hide_banner -hwaccel nvdec -hwaccel_device 0 -vsync 0 -i "input.mp4" -c copy -c:v:0 hevc_nvenc -profile:v main10 -pix_fmt p010le -rc:v:0 vbr_hq -rc-lookahead 32 -cq 21 -qmin 1 -qmax 51 -b:v:0 10M -maxrate:v:0 20M -gpu 0 "output.mkv"

Similar command that can be used on newer Turing (GTX 20x0) and Ampere (RTX 30x0) encoder:

ffmpeg -y -hide_banner -vsync 0 -hwaccel cuda -hwaccel_output_format cuda  -hwaccel_device 0 -c:v:0 h264_cuvid -i "input.mp4" -vf "hwdownload,format=nv12" -c copy -c:v:0 hevc_nvenc -profile:v main10 -pix_fmt p010le -rc:v:0 vbr -tune hq -preset p5 -multipass 1 -bf 4 -b_ref_mode 1 -nonref_p 1 -rc-lookahead 75 -spatial-aq 1 -aq-strength 8 -temporal-aq 1 -cq 21 -qmin 1 -qmax 99 -b:v:0 10M -maxrate:v:0 20M -gpu 0 "output.mkv"

Params explanation:

  • -pix_fmt p010le converts 8bit input into 10bit; note that conversion is done by CPU so it makes the encoding slower but produces better quality video and in CRF also lower bitrate (smaller file). For CUDA decoder must be used with -vf "hwdownload,format=nv12" (or -vf "hwdownload,format=p010le" for 10 bit input video) to copy decoded frames from CUDA into CPU for conversion (NVDEC decoder sends frames into CPU automatically.) Specifying -profile main10 is required to allow 10bit encoding but does not accually affect how the encoder encodes the video - encoder itself does not change the bit depth of the input!
  • -rc:v:0 vbr_hq -cq 21 -qmin 1 -qmax 99 is needed to fully enable CRF mode. Increase qmin to lower bitrate peaks, lower qmax to prevent low-quality frames (recommended for encoding without AQ). On Turing and Ampere use -rc:v:0 vbr -tune hq instead of vbr_hqfor same result. BTW for HEVC is recommended quality -cq 28 (or -cq 30 with AQ enabled).
  • -b:v:0 10M -maxrate:v:0 20M specifies recommended and maximum bitrate supported by the target device. For main tier @L5 you can use max. 25M, for @L6 maximum is 60Mbps (for 30fps video). This is also needed for the hardware encoder to know how to calculate the QP value in CRF mode. I use 10M/20M for videos stored on NAS and played on TV over LAN.
  • present=slow enables 2-pass processing and other advanced optimizations; since hardware encoder is faster than software encoders, you can go with slow preset and still get a lot faster processing than from CPU on faster preset. On Ampere you must use -preset p5 -multipass 2 which equals to the slow preset (you can go up to p7 which equals to very slow but has almost no additional effect on file size in most cases; you can use -multipass 1 for 4-times faster first pass).
  • hwaccell enables hardware decoder and specify which device will decode the video (if you have SLI). Based on your CPU speed you can test which is best for you. NVDEC can decode any MPEG video but is slower; for faster CUDA you must specify if source is AVC, HEVC or AV1. For DivX, Xvid and non-MPEG input remove it completely to switch to software decoder using CPU.
  • -bf 4 -b_ref_mode 1 -nonref_p 1 enables improved B-frames processing on Turing and Ampere (note that it's not supported by h264_nvenc).
  • Alternatively you can use -bf 0 -weighted_pred 1 to use Weighted prediction instead of B-frames if your source has uneven lighting (flickering lights or lots of fade-ins/-outs) to get better quality and smaller file (however disabling B-frames increases file size for other sources with stable lighting).
  • -rc-lookahead 75 -spatial-aq 1 -aq-strength 8 -temporal-aq 1 enables Adaptive quantifier supported on Turing and Ampere. This improves video quality with same or lower bitrate in CRF mode. Change the rc-lookahead to either get faster speed or better quality. Increase aq-strength if you see artifacts in very dark colors.
  • using -gpu 0 you specify which device encode the video if you have SLI or on-board (Intel/AMD) card.
  • in addition with CUDA decoder you can add -resize WIDTHxHEIGHT and/or -crop TOPxBOTTOMxLEFTxRIGHT (before the -i parameter) to change the input using the hardware decoder. This is faster than using -vf scale and -vf crop which is done on CPU.

According to this reddit post by @Anton1699:

p010le is equivalent to yuv420p10le (it's 10-bit video with 4:2:0 subsampling = 15-bit per pixel).

I have yet to find a more authoritative source in documentation.

p010le is supported by nvenc. The output log also indicates that this works. As a result, I put this up as a tentative answer. Sample command:

ffmpeg -i input.mkv -pix_fmt p010le -c:v hevc_nvenc -profile:v main10 -cq 21 out.mkv