Weird graphics artifacts / monitor off with Ubuntu 20.04 and Nvidia RTX 5000

I am trying to migrate from Windows to Linux, hibernated Windows and installed a fresh new Ubuntu 20.04 on a new disc (LUKS with increased Swap Size for hibernation). But I get sporadic screen corruption or weird graphics artifacts:

  • Random artifacts 1
  • Random artifacts 2
  • Sometimes no artifacts

How can I solve it?

System configuration

  • Dell Precision 15" 7540, Xeon [email protected], 128 GB ECC RAM, NVIDIA Quadro RTX 5000 16GB VRAM

  • Dockingstations at Home and offices: WD19DC (240W)

  • Monitors used for Linux migration: 2x 4K U4320Q, 42.5" @ 96dpi font / No scaling

  • There are no problems/artifacts with the graphics card in windows (three month uptime
    until a planned restart): In this time following things have been done: Changing daily monitor layout (working at Home2x4k 42"/Office1 2x38"/Office2 3xFullHD 24"), performing daily sleep/resume cycles, some hibernation cycle in case charging is not possible during traveling

Ubuntu 20.04 LTS:

  • Artifacts occur after start, reboot, resume from sleep and hibernate

  • Artifacts occur rectangular on random locations.

  • Sometimes a monitor blanks out for one second without any reasons (hard to reproduce)

  • The NVIDA driver 460 version which gets auto-installed by Ubuntu causes artifacts

  • I freshly installed again Ubuntu and installed a driver which is currently offered as download on nvidia.com (see below)

  • The artifacts occur with GNome and KDE, too. Artfacts starts as soon as I start a terminal window.

  • In KDE: I tried kwin --replace& but problem remains

  • In KDE: When I press alt tab for switch apps, then artifacts disappear, When I take a screenshot, the image itself has no artifacts.

Last installed versions:

  • Kernel: user01@earth2:~$ uname -a Linux earth2 5.8.0-41-generic #46~20.04.1-Ubuntu SMP Mon Jan 18 17:52:23 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
  • KDE Version:
  • user01@earth2:~$ kf5-config --version Qt: 5.12.8 KDE Frameworks: 5.68.0 kf5-config: 1.0
  • NVIDIA:
#NVIDIA-Linux-x86_64-450.102.04.run
user01@earth2:~$ nvidia-smi 
Sun Feb  7 18:18:18 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.102.04   Driver Version: 450.102.04   CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Quadro RTX 5000     Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   47C    P0    31W /  N/A |   1227MiB / 16091MiB |      2%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
  • lspci
user01@earth2:~/Downloads$ lspci
00:00.0 Host bridge: Intel Corporation Device 3e20 (rev 0d)
00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x16) (rev 0d)
00:04.0 Signal processing controller: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Thermal Subsystem (rev 0d)
00:08.0 System peripheral: Intel Corporation Xeon E3-1200 v5/v6 / E3-1500 v5 / 6th/7th/8th Gen Core Processor Gaussian Mixture Model
00:12.0 Signal processing controller: Intel Corporation Cannon Lake PCH Thermal Controller (rev 10)
00:14.0 USB controller: Intel Corporation Cannon Lake PCH USB 3.1 xHCI Host Controller (rev 10)
00:14.2 RAM memory: Intel Corporation Cannon Lake PCH Shared SRAM (rev 10)
00:15.0 Serial bus controller [0c80]: Intel Corporation Cannon Lake PCH Serial IO I2C Controller #0 (rev 10)
00:15.1 Serial bus controller [0c80]: Intel Corporation Cannon Lake PCH Serial IO I2C Controller #1 (rev 10)
00:16.0 Communication controller: Intel Corporation Cannon Lake PCH HECI Controller (rev 10)
00:17.0 SATA controller: Intel Corporation Cannon Lake Mobile PCH SATA AHCI Controller (rev 10)
00:1b.0 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #21 (rev f0)
00:1c.0 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #1 (rev f0)
00:1c.5 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #6 (rev f0)
00:1c.6 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #7 (rev f0)
00:1d.0 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #9 (rev f0)
00:1f.0 ISA bridge: Intel Corporation Device a30e (rev 10)
00:1f.3 Audio device: Intel Corporation Cannon Lake PCH cAVS (rev 10)
00:1f.4 SMBus: Intel Corporation Cannon Lake PCH SMBus Controller (rev 10)
00:1f.5 Serial bus controller [0c80]: Intel Corporation Cannon Lake PCH SPI Controller (rev 10)
00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (7) I219-LM (rev 10)
01:00.0 VGA compatible controller: NVIDIA Corporation TU104GLM [Quadro RTX 5000 Mobile / Max-Q] (rev a1)
01:00.1 Audio device: NVIDIA Corporation TU104 HD Audio Controller (rev a1)
01:00.2 USB controller: NVIDIA Corporation TU104 USB 3.1 Host Controller (rev a1)
01:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU104 USB Type-C UCSI Controller (rev a1)
02:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983
03:00.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06)
04:00.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06)
04:01.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06)
04:02.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06)
04:04.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06)
05:00.0 System peripheral: Intel Corporation JHL7540 Thunderbolt 3 NHI [Titan Ridge 4C 2018] (rev 06)
39:00.0 USB controller: Intel Corporation JHL7540 Thunderbolt 3 USB Controller [Titan Ridge 4C 2018] (rev 06)
6e:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS5260 PCI Express Card Reader (rev 01)
6f:00.0 Network controller: Intel Corporation Wi-Fi 6 AX200 (rev 1a)
70:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983

user01@earth2:~/Downloads$ free -h
              total        used        free      shared  buff/cache   available
Mem:          125Gi       3,5Gi       118Gi       584Mi       3,6Gi       120Gi
Swap:         405Gi          0B       405Gi
user01@earth2:~/Downloads$


The following solves the screen corruption / artifact problem by disabling the EDID query. But it is not a viable solution for changing workplaces with different monitors. It may damage your monitor, so do not use it 1:1!

  • Save Xorg configuration in nvidia-settings tools
  • Save Edid file for monitor
  • Get modelines
parse-edid < edid_from_nvidia.bin
  • Modify these sections:
Section "Monitor"
    # HorizSync source: edid, VertRefresh source: edid
    Identifier     "Monitor0"
    VendorName     "Unknown"
    ModelName      "LGD"
        Horizsync 30-140
        VertRefresh 29-76
        # Maximum pixel clock is 600MHz
        #Extension block found. Parsing...
       Modeline        "Mode 13" 533.25 3840 3888 3920 4000 2160 2163 2168 2222 +hsync -vsync
       Option "PreferredMode" "Mode 13"
EndSection

Section "Device"
    Identifier     "Device0"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BoardName      "Quadro RTX 5000"
    Option         "UseEDID" "FALSE"    
EndSection
  • After systemctl restart gdm, the monitors switch to 24bits -> No screen corruption anymore

The Onscreen display of monitor was helpful, which was my starting point for further research:

  • In Bios, during start and when Windows start the OSD shows 38402160 60Hz @24bit
  • As soon as Linux boots, even before the Loginscreen appears, the OSD shows 3840x2160 60Hz @30bit, but nvidia-settings and xorg log tells me it is as 24bit.

Notes:

  • In Linux: Screen corruption remains at any resolution and frequency as long as the OSD shows 30Bit.

  • No screen corruption when connecting directly to the laptop, then 30bit works reliable

  • The HDMI port of the dockingstation outputs @24bit -> No screen corruption

  • Performed fw-updr updates hasn't helped either

  • The corruption also occured on my Dual U3818DW, 37.5" at work due to 30Bit switch.

  • I came across 'https://wiki.ubuntu.com/DeepColourDepthSupportPlan' but reducing to bpc 8 did not work: The command " xrandr --output {your screen} --set "max bpc" 8" brings out errors.

  • The two monitors EDID tell, 30bit is possible (10bits)

  • Recent Versions:

    VMM5331 in Dell dock: Device ID:
    Summary: Multi Stream Transport controller Current version: 05.04.06

    NVIDIA Version: 460.73.01

    Ubuntu 20.04 Version: 5.8.0-55-generic

Summary:

  • It seems, that the dockingstation is the culprit. Updating the firmware ( fwupdmgr updates) doesn't solve the problem (already most recent versions)

  • Windows somehow stays at 24bits and is reliable, so I stay with Windows for a while as 24bit is good enough for me.

  • I do not know how to tell Linux/NVidia to stay at 24bit without complicated hacking of Xorg.conf or shiming a custom EDID files For me creating xorg.conf is very bad as I cannot use the laptop at different places with different monitors without restart.

Update I just solved my Linux 30bit problem by using a Displayport to HDMI converter[1]. Now Linux is somehow forced to use 4K 60Hz at only 24bits. No more screen corruption and no EDID file hacking needed.

[1] https://www.amazon.de/gp/product/B017BQ8I54/ref=ppx_yo_dt_b_asin_title_o00_s01?ie=UTF8&psc=1