Checking for GPU and H264 hardware decoding support

I want to run on a backend server a program that decodes video, applies filters on it using OpenGL and then encodes the video to a new file using h.264. In order to verify that the process is as fast as possible, I want to be sure that I have access to the GPU, and to hardware decoding/encoding capabilities, even if the program runs in a docker container.

Is there some way to verify that I have access to the actual hardware, instead of software emulation?


Solution 1:

Judging by the tag applied to your question and the mention of containerization through Docker, I'm assuming that you intend to run this on Ubuntu.

Details on specific hardware were omitted in your query, and as such, I'll be including all relevant information pertaining to checking for hardware accelerated encode and decode capabilities of common GPUs which support platform-specific hardware acceleration APIs, typically NVIDIA's NVENC and NVDEC (for encoding and decoding respectively), AMD's AMF and Intel's QuickSync and VAAPI implementations.

1. On NVIDIA's NVENC and NVDEC:

Compile and run Philip Langdale's nv-video-info project, which contains the nvencinfo and nvdecinfo programs. For usage, see this blog post by Jaroslav Svoboda for a primer. The project depends on the nv-codec-headers package, which you can fetch and build from here.

The nvencinfo program will show you the NVENC encode capabilities of your NVIDIA GPU, and the information reported herein will depend on the GPU's architecture and the presence of a supported GPU driver. Not all NVENC capabilities may be shown if you're using an older device driver.

See an example below from a system running an RTX 2080 (the laptop I'm typing this on):

Loaded Nvenc version 9.1
Nvenc initialized successfully
Device 0: GeForce RTX 2080
==========================================================
                             Codec |   H264   |   HEVC   |
==========================================================
        Input Buffer Formats       |          |          |
----------------------------------------------------------
                              NV12 |        x |        x |
                              YV12 |        x |        x |
                              IYUV |        x |        x |
                            YUV444 |        x |        x |
                              P010 |        . |        x |
                         YUV444P10 |        . |        x |
                              ARGB |        x |        x |
                            ARGB10 |        x |        x |
                              AYUV |        x |        x |
                              ABGR |        x |        x |
                            ABGR10 |        x |        x |
----------------------------------------------------------
               Limits              |          |          |
----------------------------------------------------------
                     Maximum Width |     4096 |     8192 |
                     Maximum Hight |     4096 |     8192 |
         Maximum Macroblocks/frame |    65536 |   262144 |
        Maximum Macroblocks/second |   983040 |   983040 |
                Max Encoding Level |       51 |       62 |
                Min Encoding Level |        1 |        1 |
               Max No. of B-Frames |        4 |        5 |
      Maxmimum LT Reference Frames |        8 |        7 |
----------------------------------------------------------
            Capabilities           |          |          |
----------------------------------------------------------
      Supported Rate-Control Modes |       63 |       63 |
           Supports Field-Encoding |        0 |        0 |
               Supports Monochrome |        0 |        0 |
                      Supports FMO |        0 |        0 |
   Supports QPEL Motion Estimation |        1 |        1 |
             Supports BDirect Mode |        1 |        0 |
                    Supports CABAC |        1 |        1 |
       Supports Adaptive Transform |        1 |        0 |
          Supports Temporal Layers |        0 |        0 |
    Supports Hierarchical P-Frames |        0 |        0 |
    Supports Hierarchical B-Frames |        0 |        0 |
   Supports Separate Colour Planes |        1 |        0 |
             Supports Temporal SVC |        0 |        0 |
Supports Dynamic Resolution Change |        1 |        1 |
   Supports Dynamic Bitrate Change |        1 |        1 |
   Supports Dynamic Force Const-QP |        1 |        1 |
   Supports Dynamic RC-Mode Change |        0 |        0 |
      Supports Sub-Frame Read-back |        1 |        1 |
     Supports Constrained Encoding |        1 |        0 |
            Supports Intra Refresh |        1 |        1 |
   Supports Custom VBV Buffer Size |        1 |        1 |
       Supports Dynamic Slice Mode |        1 |        1 |
     Supports Ref Pic Invalidation |        1 |        1 |
            Supports PreProcessing |        0 |        0 |
           Supports Async Encoding |        0 |        0 |
          Supports YUV444 Encoding |        1 |        1 |
        Supports Lossless Encoding |        1 |        1 |
                      Supports SAO |        0 |        1 |
             Supports ME-Only Mode |        1 |        1 |
       Supports Lookahead Encoding |        1 |        1 |
              Supports Temporal AQ |        1 |        1 |
          Supports 10-bit Encoding |        0 |        1 |
      Supports Weighted Prediction |        1 |        1 |
   Supports B-Frames as References |        2 |        3 |
       Supports Emphasis Level Map |        1 |        0 |
----------------------------------------------------------
              Profiles             |          |          |
----------------------------------------------------------
                                   | Baseline |     Auto |
                                   |     Main |     Main |
                                   |     High |   Main10 |
                                   |      MVC |  Main444 |
                                   |  High444 |          |
                                   |     Auto |          |
----------------------------------------------------------
               Presets             |          |          |
----------------------------------------------------------
                                   |  default |  default |
                                   |       ll |       ll |
                                   |       hp |       hp |
                                   |       hq |       hq |
                                   |   bluray |   bluray |
                                   |     llhq |     llhq |
                                   |     llhp |     llhp |
                                   |  Unknown |  Unknown |
                                   | lossless | lossless |
==========================================================

Likewise, nvdecinfo will show you decode capabilities available on an NVDEC-capable GPU. The same caveat on available features as mentioned above applies here.

Here's sample output nvdecinfo on an RTX 2080 GPU:

Device 0: GeForce RTX 2080
-----------------------------------------------
Codec | Chroma | Depth | Max Width | Max Height
-----------------------------------------------
MPEG1 |    420 |     8 |      4080 |       4080
MPEG2 |    420 |     8 |      4080 |       4080
MPEG4 |    420 |     8 |      2032 |       2032
  VC1 |    420 |     8 |      2032 |       2032
 H264 |    420 |     8 |      4096 |       4096
MJPEG |    400 |     8 |     32768 |      16384
MJPEG |    420 |     8 |     32768 |      16384
MJPEG |    422 |     8 |     32768 |      16384
MJPEG |    444 |     8 |     32768 |      16384
 HEVC |    420 |     8 |      8192 |       8192
 HEVC |    420 |    10 |      8192 |       8192
 HEVC |    420 |    12 |      8192 |       8192
 HEVC |    444 |     8 |      8192 |       8192
 HEVC |    444 |    10 |      8192 |       8192
 HEVC |    444 |    12 |      8192 |       8192
  VP8 |    420 |     8 |      4096 |       4096
  VP9 |    420 |     8 |      8192 |       8192
  VP9 |    420 |    10 |      8192 |       8192
  VP9 |    420 |    12 |      8192 |       8192
-----------------------------------------------

2. On Intel's VAAPI and QuickSync feature support:

VAAPI's capabilities are trivially queried via the vainfo utility included as part of libva-utils. Codec support is typically constrained by the GPU's generation, with only IceLake (ICL+) exposing support for the entire feature set for advanced codecs such as HEVC and VP9 encoding at both normal and high bit depth encoding scenarios.

Here's a sample output of vainfo on an older Skylake testbed:

libva info: VA-API version 0.40.0
libva info: va_getDriverName() returns 0
libva info: Trying to open /usr/local/lib/dri/i965_drv_video.so
libva info: Found init function __vaDriverInit_0_40
libva info: va_openDriver() returns 0
vainfo: VA-API version: 0.40 (libva 1.7.3)
vainfo: Driver version: Intel i965 driver for Intel(R) Skylake - 1.8.4.pre1 (glk-alpha-71-gc3110dc)
vainfo: Supported profile and entrypoints
      VAProfileMPEG2Simple            : VAEntrypointVLD
      VAProfileMPEG2Simple            : VAEntrypointEncSlice
      VAProfileMPEG2Main              : VAEntrypointVLD
      VAProfileMPEG2Main              : VAEntrypointEncSlice
      VAProfileH264ConstrainedBaseline: VAEntrypointVLD
      VAProfileH264ConstrainedBaseline: VAEntrypointEncSlice
      VAProfileH264ConstrainedBaseline: VAEntrypointEncSliceLP
      VAProfileH264Main               : VAEntrypointVLD
      VAProfileH264Main               : VAEntrypointEncSlice
      VAProfileH264Main               : VAEntrypointEncSliceLP
      VAProfileH264High               : VAEntrypointVLD
      VAProfileH264High               : VAEntrypointEncSlice
      VAProfileH264High               : VAEntrypointEncSliceLP
      VAProfileH264MultiviewHigh      : VAEntrypointVLD
      VAProfileH264MultiviewHigh      : VAEntrypointEncSlice
      VAProfileH264StereoHigh         : VAEntrypointVLD
      VAProfileH264StereoHigh         : VAEntrypointEncSlice
      VAProfileVC1Simple              : VAEntrypointVLD
      VAProfileVC1Main                : VAEntrypointVLD
      VAProfileVC1Advanced            : VAEntrypointVLD
      VAProfileNone                   : VAEntrypointVideoProc
      VAProfileJPEGBaseline           : VAEntrypointVLD
      VAProfileJPEGBaseline           : VAEntrypointEncPicture
      VAProfileVP8Version0_3          : VAEntrypointVLD
      VAProfileVP8Version0_3          : VAEntrypointEncSlice
      VAProfileHEVCMain               : VAEntrypointVLD
      VAProfileHEVCMain               : VAEntrypointEncSlice
      VAProfileVP9Profile0            : VAEntrypointVLD

To translate, encode capabilities are tied to the Slice entry points, whereas decode capabilities are tied to VLD entry points. Low power encoding support is exposed via the LP Slice entry points.

The same information can also be mapped to QuickSync's capabilities, but with some caveats: Hardware older than IceLake (ICL+) does not expose all encoder capabilities, such as (official) VP9 encoding support, etc. To confirm feature support, see the documentation availed in the media-driver package.

Note that QuickSync support requires the presence of the libmfx runtime and the media-driver package as installed components.

For Intel hardware, both i965 and iHD (proprietary media-driver package) drivers can be used as the VAAPI drivers, and this can be set by the environment variable LIBVA_DRIVER_NAME. This can be set either globally or at application launch time.

FFmpeg can also set the driver to use via the -init_hw_device directive. For example, to use the iHD driver with FFmpeg, use the -init_hw_device vaapi=va:/dev/dri/renderD128,driver=iHD option on hardware device initialization which will also set the device type to VAAPI, bound to the DRM node /dev/dri/renderD128, after which the derived device can be used down the line via -filter_hw_device va. More documentation is available on the wiki.

The hwmap filter in ffmpeg can also be used to derive a QuickSync device from a VAAPI context, as documented on the wiki.

For AMD VCE (and VCN) capable hardware, VAAPI is available only when using the standard mesa driver(s), ie radeon(si) and the amdgpu opensource driver. The AMDGPU-PRO proprietary driver packages only expose support for AMF (via a Vulkan interop).

3. AMD's case: VAAPI and AMF:

Any VCE (and VCN) capable AMD GPU will expose VAAPI support provided that the mesa drivers are in use, as explained above. Likewise, feature support can also be printed out via libva-util's vainfo.

AMF explicitly requires the amdgpu-pro driver to be installed and loaded, as it uses a Vulkan interop only availed by the proprietary driver.

For feature support on AMD AMF, use this wiki for more information. Unlike NVIDIA and Intel, AMD's VCE (and VCN) implementations across multiple hardware generations is not always an upgrade.

Case in point: On Polaris, the addition of HEVC encoding support crippled the H.264/AVC encoder by stripping support for B-frames. And to date, weighted prediction is not implemented on their encoders.

Notes on Docker:

NVIDIA hardware is supported directly on the current release of docker-ce via the nvidia-container-toolkit package. A prior (and now deprecated) release of nvidia-docker2 implementation also exists. See this project for more details.

For VAAPI (and potentially QuickSync support), you'll need to pass the DRI device nodes in a privileged manner, as shown in this example.