Weird graphics artifacts / monitor off with Ubuntu 20.04 and Nvidia RTX 5000
I am trying to migrate from Windows to Linux, hibernated Windows and installed a fresh new Ubuntu 20.04 on a new disc (LUKS with increased Swap Size for hibernation). But I get sporadic screen corruption or weird graphics artifacts:
- Random artifacts 1
- Random artifacts 2
- Sometimes no artifacts
How can I solve it?
System configuration
-
Dell Precision 15" 7540, Xeon [email protected], 128 GB ECC RAM, NVIDIA Quadro RTX 5000 16GB VRAM
-
Dockingstations at Home and offices: WD19DC (240W)
-
Monitors used for Linux migration: 2x 4K U4320Q, 42.5" @ 96dpi font / No scaling
-
There are no problems/artifacts with the graphics card in windows (three month uptime
until a planned restart): In this time following things have been done: Changing daily monitor layout (working at Home2x4k 42"/Office1 2x38"/Office2 3xFullHD 24"), performing daily sleep/resume cycles, some hibernation cycle in case charging is not possible during traveling
Ubuntu 20.04 LTS:
-
Artifacts occur after start, reboot, resume from sleep and hibernate
-
Artifacts occur rectangular on random locations.
-
Sometimes a monitor blanks out for one second without any reasons (hard to reproduce)
-
The NVIDA driver 460 version which gets auto-installed by Ubuntu causes artifacts
-
I freshly installed again Ubuntu and installed a driver which is currently offered as download on nvidia.com (see below)
-
The artifacts occur with GNome and KDE, too. Artfacts starts as soon as I start a terminal window.
-
In KDE: I tried kwin --replace& but problem remains
-
In KDE: When I press alt tab for switch apps, then artifacts disappear, When I take a screenshot, the image itself has no artifacts.
Last installed versions:
- Kernel:
user01@earth2:~$ uname -a Linux earth2 5.8.0-41-generic #46~20.04.1-Ubuntu SMP Mon Jan 18 17:52:23 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
- KDE Version:
user01@earth2:~$ kf5-config --version Qt: 5.12.8 KDE Frameworks: 5.68.0 kf5-config: 1.0
- NVIDIA:
#NVIDIA-Linux-x86_64-450.102.04.run
user01@earth2:~$ nvidia-smi
Sun Feb 7 18:18:18 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.102.04 Driver Version: 450.102.04 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Quadro RTX 5000 Off | 00000000:01:00.0 Off | N/A |
| N/A 47C P0 31W / N/A | 1227MiB / 16091MiB | 2% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
- lspci
user01@earth2:~/Downloads$ lspci
00:00.0 Host bridge: Intel Corporation Device 3e20 (rev 0d)
00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x16) (rev 0d)
00:04.0 Signal processing controller: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Thermal Subsystem (rev 0d)
00:08.0 System peripheral: Intel Corporation Xeon E3-1200 v5/v6 / E3-1500 v5 / 6th/7th/8th Gen Core Processor Gaussian Mixture Model
00:12.0 Signal processing controller: Intel Corporation Cannon Lake PCH Thermal Controller (rev 10)
00:14.0 USB controller: Intel Corporation Cannon Lake PCH USB 3.1 xHCI Host Controller (rev 10)
00:14.2 RAM memory: Intel Corporation Cannon Lake PCH Shared SRAM (rev 10)
00:15.0 Serial bus controller [0c80]: Intel Corporation Cannon Lake PCH Serial IO I2C Controller #0 (rev 10)
00:15.1 Serial bus controller [0c80]: Intel Corporation Cannon Lake PCH Serial IO I2C Controller #1 (rev 10)
00:16.0 Communication controller: Intel Corporation Cannon Lake PCH HECI Controller (rev 10)
00:17.0 SATA controller: Intel Corporation Cannon Lake Mobile PCH SATA AHCI Controller (rev 10)
00:1b.0 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #21 (rev f0)
00:1c.0 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #1 (rev f0)
00:1c.5 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #6 (rev f0)
00:1c.6 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #7 (rev f0)
00:1d.0 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #9 (rev f0)
00:1f.0 ISA bridge: Intel Corporation Device a30e (rev 10)
00:1f.3 Audio device: Intel Corporation Cannon Lake PCH cAVS (rev 10)
00:1f.4 SMBus: Intel Corporation Cannon Lake PCH SMBus Controller (rev 10)
00:1f.5 Serial bus controller [0c80]: Intel Corporation Cannon Lake PCH SPI Controller (rev 10)
00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (7) I219-LM (rev 10)
01:00.0 VGA compatible controller: NVIDIA Corporation TU104GLM [Quadro RTX 5000 Mobile / Max-Q] (rev a1)
01:00.1 Audio device: NVIDIA Corporation TU104 HD Audio Controller (rev a1)
01:00.2 USB controller: NVIDIA Corporation TU104 USB 3.1 Host Controller (rev a1)
01:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU104 USB Type-C UCSI Controller (rev a1)
02:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983
03:00.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06)
04:00.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06)
04:01.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06)
04:02.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06)
04:04.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06)
05:00.0 System peripheral: Intel Corporation JHL7540 Thunderbolt 3 NHI [Titan Ridge 4C 2018] (rev 06)
39:00.0 USB controller: Intel Corporation JHL7540 Thunderbolt 3 USB Controller [Titan Ridge 4C 2018] (rev 06)
6e:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS5260 PCI Express Card Reader (rev 01)
6f:00.0 Network controller: Intel Corporation Wi-Fi 6 AX200 (rev 1a)
70:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983
user01@earth2:~/Downloads$ free -h
total used free shared buff/cache available
Mem: 125Gi 3,5Gi 118Gi 584Mi 3,6Gi 120Gi
Swap: 405Gi 0B 405Gi
user01@earth2:~/Downloads$
The following solves the screen corruption / artifact problem by disabling the EDID query. But it is not a viable solution for changing workplaces with different monitors. It may damage your monitor, so do not use it 1:1!
- Save Xorg configuration in nvidia-settings tools
- Save Edid file for monitor
- Get modelines
parse-edid < edid_from_nvidia.bin
- Modify these sections:
Section "Monitor" # HorizSync source: edid, VertRefresh source: edid Identifier "Monitor0" VendorName "Unknown" ModelName "LGD" Horizsync 30-140 VertRefresh 29-76 # Maximum pixel clock is 600MHz #Extension block found. Parsing... Modeline "Mode 13" 533.25 3840 3888 3920 4000 2160 2163 2168 2222 +hsync -vsync Option "PreferredMode" "Mode 13" EndSection Section "Device" Identifier "Device0" Driver "nvidia" VendorName "NVIDIA Corporation" BoardName "Quadro RTX 5000" Option "UseEDID" "FALSE" EndSection
- After systemctl restart gdm, the monitors switch to 24bits -> No screen corruption anymore
The Onscreen display of monitor was helpful, which was my starting point for further research:
- In Bios, during start and when Windows start the OSD shows 38402160 60Hz @24bit
- As soon as Linux boots, even before the Loginscreen appears, the OSD shows 3840x2160 60Hz @30bit, but nvidia-settings and xorg log tells me it is as 24bit.
Notes:
-
In Linux: Screen corruption remains at any resolution and frequency as long as the OSD shows 30Bit.
-
No screen corruption when connecting directly to the laptop, then 30bit works reliable
-
The HDMI port of the dockingstation outputs @24bit -> No screen corruption
-
Performed fw-updr updates hasn't helped either
-
The corruption also occured on my Dual U3818DW, 37.5" at work due to 30Bit switch.
-
I came across 'https://wiki.ubuntu.com/DeepColourDepthSupportPlan' but reducing to bpc 8 did not work: The command " xrandr --output {your screen} --set "max bpc" 8" brings out errors.
-
The two monitors EDID tell, 30bit is possible (10bits)
-
Recent Versions:
VMM5331 in Dell dock: Device ID:
Summary: Multi Stream Transport controller Current version: 05.04.06NVIDIA Version: 460.73.01
Ubuntu 20.04 Version: 5.8.0-55-generic
Summary:
-
It seems, that the dockingstation is the culprit. Updating the firmware ( fwupdmgr updates) doesn't solve the problem (already most recent versions)
-
Windows somehow stays at 24bits and is reliable, so I stay with Windows for a while as 24bit is good enough for me.
-
I do not know how to tell Linux/NVidia to stay at 24bit without complicated hacking of Xorg.conf or shiming a custom EDID files For me creating xorg.conf is very bad as I cannot use the laptop at different places with different monitors without restart.
Update I just solved my Linux 30bit problem by using a Displayport to HDMI converter[1]. Now Linux is somehow forced to use 4K 60Hz at only 24bits. No more screen corruption and no EDID file hacking needed.
[1] https://www.amazon.de/gp/product/B017BQ8I54/ref=ppx_yo_dt_b_asin_title_o00_s01?ie=UTF8&psc=1