Checking for GPU and H264 hardware decoding support
I want to run on a backend server a program that decodes video, applies filters on it using OpenGL and then encodes the video to a new file using h.264. In order to verify that the process is as fast as possible, I want to be sure that I have access to the GPU, and to hardware decoding/encoding capabilities, even if the program runs in a docker container.
Is there some way to verify that I have access to the actual hardware, instead of software emulation?
Solution 1:
Judging by the tag applied to your question and the mention of containerization through Docker, I'm assuming that you intend to run this on Ubuntu.
Details on specific hardware were omitted in your query, and as such, I'll be including all relevant information pertaining to checking for hardware accelerated encode and decode capabilities of common GPUs which support platform-specific hardware acceleration APIs, typically NVIDIA's NVENC and NVDEC (for encoding and decoding respectively), AMD's AMF and Intel's QuickSync and VAAPI implementations.
1. On NVIDIA's NVENC and NVDEC:
Compile and run Philip Langdale's nv-video-info project, which contains the nvencinfo and nvdecinfo programs. For usage, see this blog post by Jaroslav Svoboda for a primer. The project depends on the nv-codec-headers package, which you can fetch and build from here.
The nvencinfo
program will show you the NVENC encode capabilities of your NVIDIA GPU, and the information reported herein will depend on the GPU's architecture and the presence of a supported GPU driver. Not all NVENC capabilities may be shown if you're using an older device driver.
See an example below from a system running an RTX 2080 (the laptop I'm typing this on):
Loaded Nvenc version 9.1
Nvenc initialized successfully
Device 0: GeForce RTX 2080
==========================================================
Codec | H264 | HEVC |
==========================================================
Input Buffer Formats | | |
----------------------------------------------------------
NV12 | x | x |
YV12 | x | x |
IYUV | x | x |
YUV444 | x | x |
P010 | . | x |
YUV444P10 | . | x |
ARGB | x | x |
ARGB10 | x | x |
AYUV | x | x |
ABGR | x | x |
ABGR10 | x | x |
----------------------------------------------------------
Limits | | |
----------------------------------------------------------
Maximum Width | 4096 | 8192 |
Maximum Hight | 4096 | 8192 |
Maximum Macroblocks/frame | 65536 | 262144 |
Maximum Macroblocks/second | 983040 | 983040 |
Max Encoding Level | 51 | 62 |
Min Encoding Level | 1 | 1 |
Max No. of B-Frames | 4 | 5 |
Maxmimum LT Reference Frames | 8 | 7 |
----------------------------------------------------------
Capabilities | | |
----------------------------------------------------------
Supported Rate-Control Modes | 63 | 63 |
Supports Field-Encoding | 0 | 0 |
Supports Monochrome | 0 | 0 |
Supports FMO | 0 | 0 |
Supports QPEL Motion Estimation | 1 | 1 |
Supports BDirect Mode | 1 | 0 |
Supports CABAC | 1 | 1 |
Supports Adaptive Transform | 1 | 0 |
Supports Temporal Layers | 0 | 0 |
Supports Hierarchical P-Frames | 0 | 0 |
Supports Hierarchical B-Frames | 0 | 0 |
Supports Separate Colour Planes | 1 | 0 |
Supports Temporal SVC | 0 | 0 |
Supports Dynamic Resolution Change | 1 | 1 |
Supports Dynamic Bitrate Change | 1 | 1 |
Supports Dynamic Force Const-QP | 1 | 1 |
Supports Dynamic RC-Mode Change | 0 | 0 |
Supports Sub-Frame Read-back | 1 | 1 |
Supports Constrained Encoding | 1 | 0 |
Supports Intra Refresh | 1 | 1 |
Supports Custom VBV Buffer Size | 1 | 1 |
Supports Dynamic Slice Mode | 1 | 1 |
Supports Ref Pic Invalidation | 1 | 1 |
Supports PreProcessing | 0 | 0 |
Supports Async Encoding | 0 | 0 |
Supports YUV444 Encoding | 1 | 1 |
Supports Lossless Encoding | 1 | 1 |
Supports SAO | 0 | 1 |
Supports ME-Only Mode | 1 | 1 |
Supports Lookahead Encoding | 1 | 1 |
Supports Temporal AQ | 1 | 1 |
Supports 10-bit Encoding | 0 | 1 |
Supports Weighted Prediction | 1 | 1 |
Supports B-Frames as References | 2 | 3 |
Supports Emphasis Level Map | 1 | 0 |
----------------------------------------------------------
Profiles | | |
----------------------------------------------------------
| Baseline | Auto |
| Main | Main |
| High | Main10 |
| MVC | Main444 |
| High444 | |
| Auto | |
----------------------------------------------------------
Presets | | |
----------------------------------------------------------
| default | default |
| ll | ll |
| hp | hp |
| hq | hq |
| bluray | bluray |
| llhq | llhq |
| llhp | llhp |
| Unknown | Unknown |
| lossless | lossless |
==========================================================
Likewise, nvdecinfo
will show you decode capabilities available on an NVDEC-capable GPU. The same caveat on available features as mentioned above applies here.
Here's sample output nvdecinfo on an RTX 2080 GPU:
Device 0: GeForce RTX 2080
-----------------------------------------------
Codec | Chroma | Depth | Max Width | Max Height
-----------------------------------------------
MPEG1 | 420 | 8 | 4080 | 4080
MPEG2 | 420 | 8 | 4080 | 4080
MPEG4 | 420 | 8 | 2032 | 2032
VC1 | 420 | 8 | 2032 | 2032
H264 | 420 | 8 | 4096 | 4096
MJPEG | 400 | 8 | 32768 | 16384
MJPEG | 420 | 8 | 32768 | 16384
MJPEG | 422 | 8 | 32768 | 16384
MJPEG | 444 | 8 | 32768 | 16384
HEVC | 420 | 8 | 8192 | 8192
HEVC | 420 | 10 | 8192 | 8192
HEVC | 420 | 12 | 8192 | 8192
HEVC | 444 | 8 | 8192 | 8192
HEVC | 444 | 10 | 8192 | 8192
HEVC | 444 | 12 | 8192 | 8192
VP8 | 420 | 8 | 4096 | 4096
VP9 | 420 | 8 | 8192 | 8192
VP9 | 420 | 10 | 8192 | 8192
VP9 | 420 | 12 | 8192 | 8192
-----------------------------------------------
2. On Intel's VAAPI and QuickSync feature support:
VAAPI's capabilities are trivially queried via the vainfo utility included as part of libva-utils. Codec support is typically constrained by the GPU's generation, with only IceLake (ICL+) exposing support for the entire feature set for advanced codecs such as HEVC and VP9 encoding at both normal and high bit depth encoding scenarios.
Here's a sample output of vainfo on an older Skylake testbed:
libva info: VA-API version 0.40.0
libva info: va_getDriverName() returns 0
libva info: Trying to open /usr/local/lib/dri/i965_drv_video.so
libva info: Found init function __vaDriverInit_0_40
libva info: va_openDriver() returns 0
vainfo: VA-API version: 0.40 (libva 1.7.3)
vainfo: Driver version: Intel i965 driver for Intel(R) Skylake - 1.8.4.pre1 (glk-alpha-71-gc3110dc)
vainfo: Supported profile and entrypoints
VAProfileMPEG2Simple : VAEntrypointVLD
VAProfileMPEG2Simple : VAEntrypointEncSlice
VAProfileMPEG2Main : VAEntrypointVLD
VAProfileMPEG2Main : VAEntrypointEncSlice
VAProfileH264ConstrainedBaseline: VAEntrypointVLD
VAProfileH264ConstrainedBaseline: VAEntrypointEncSlice
VAProfileH264ConstrainedBaseline: VAEntrypointEncSliceLP
VAProfileH264Main : VAEntrypointVLD
VAProfileH264Main : VAEntrypointEncSlice
VAProfileH264Main : VAEntrypointEncSliceLP
VAProfileH264High : VAEntrypointVLD
VAProfileH264High : VAEntrypointEncSlice
VAProfileH264High : VAEntrypointEncSliceLP
VAProfileH264MultiviewHigh : VAEntrypointVLD
VAProfileH264MultiviewHigh : VAEntrypointEncSlice
VAProfileH264StereoHigh : VAEntrypointVLD
VAProfileH264StereoHigh : VAEntrypointEncSlice
VAProfileVC1Simple : VAEntrypointVLD
VAProfileVC1Main : VAEntrypointVLD
VAProfileVC1Advanced : VAEntrypointVLD
VAProfileNone : VAEntrypointVideoProc
VAProfileJPEGBaseline : VAEntrypointVLD
VAProfileJPEGBaseline : VAEntrypointEncPicture
VAProfileVP8Version0_3 : VAEntrypointVLD
VAProfileVP8Version0_3 : VAEntrypointEncSlice
VAProfileHEVCMain : VAEntrypointVLD
VAProfileHEVCMain : VAEntrypointEncSlice
VAProfileVP9Profile0 : VAEntrypointVLD
To translate, encode capabilities are tied to the Slice
entry points, whereas decode capabilities are tied to VLD
entry points. Low power encoding support is exposed via the LP
Slice entry points.
The same information can also be mapped to QuickSync's capabilities, but with some caveats: Hardware older than IceLake (ICL+) does not expose all encoder capabilities, such as (official) VP9 encoding support, etc. To confirm feature support, see the documentation availed in the media-driver package.
Note that QuickSync support requires the presence of the libmfx runtime and the media-driver package as installed components.
For Intel hardware, both i965
and iHD
(proprietary media-driver package) drivers can be used as the VAAPI drivers, and this can be set by the environment variable LIBVA_DRIVER_NAME
. This can be set either globally or at application launch time.
FFmpeg can also set the driver to use via the -init_hw_device
directive.
For example, to use the iHD
driver with FFmpeg, use the -init_hw_device vaapi=va:/dev/dri/renderD128,driver=iHD
option on hardware device initialization which will also set the device type to VAAPI, bound to the DRM node /dev/dri/renderD128
, after which the derived device can be used down the line via -filter_hw_device va
. More documentation is available on the wiki.
The hwmap filter in ffmpeg can also be used to derive a QuickSync device from a VAAPI context, as documented on the wiki.
For AMD VCE (and VCN) capable hardware, VAAPI is available only when using the standard mesa driver(s), ie radeon(si) and the amdgpu opensource driver. The AMDGPU-PRO proprietary driver packages only expose support for AMF (via a Vulkan interop).
3. AMD's case: VAAPI and AMF:
Any VCE (and VCN) capable AMD GPU will expose VAAPI support provided that the mesa drivers are in use, as explained above. Likewise, feature support can also be printed out via libva-util's vainfo.
AMF explicitly requires the amdgpu-pro driver to be installed and loaded, as it uses a Vulkan interop only availed by the proprietary driver.
For feature support on AMD AMF, use this wiki for more information. Unlike NVIDIA and Intel, AMD's VCE (and VCN) implementations across multiple hardware generations is not always an upgrade.
Case in point: On Polaris, the addition of HEVC encoding support crippled the H.264/AVC encoder by stripping support for B-frames. And to date, weighted prediction is not implemented on their encoders.
Notes on Docker:
NVIDIA hardware is supported directly on the current release of docker-ce
via the nvidia-container-toolkit
package. A prior (and now deprecated) release of nvidia-docker2 implementation also exists. See this project for more details.
For VAAPI (and potentially QuickSync support), you'll need to pass the DRI device nodes in a privileged manner, as shown in this example.