Odyssey of Ubuntu 20.04 on Dell XPS13 laptop / eGPU Razor Core X / Nvidia GTX 1660 SUPER/ external Dell display

Solution 1:

After long battles I actually was able to solve my issue base mostly on this comment: https://forums.developer.nvidia.com/t/nvidia-xconfig-doesnt-do-what-i-want-it-to-nor-does-nvidia-settings/107883/7

So, I think it is vital to understand that xorg.conf can not help you on this context. No matter what I did, I was not able to get any results while I had a xorg.conf.

What worked for me was:

Remove all nvidia things you might have tried: sudo apt --purge remove 'nvidia-*'
Download latest Nvidia driver from the nvidia website and make it executable.
reboot in recovery mode (or without a x server running) and run the driver installer even if it says that no gpu was found on your system
delete any /etc/X11/xorg.conf you may have
reboot normally
Install nvidia-prime if it is not installed yet
sudo prime-select nvidia
Update /usr/share/X11/xorg.conf.d/10-amdgpu.conf replase driver with modesetting

Section "OutputClass"
        Identifier "AMDgpu"
        MatchDriver "amdgpu"
        Driver "modesetting"
EndSection

Update to something like:

Section "OutputClass"
    Identifier "nvidia"
    MatchDriver "nvidia-drm"
    Driver "nvidia"
    Option "AllowEmptyInitialConfiguration"
    ModulePath "/usr/lib/x86_64-linux-gnu/nvidia/xorg"
    Option "PrimaryGPU" "Yes"
    Option "AllowExternalGpus" "True"
EndSection

Create two files optimus.desktop in /etc/xdg/autostart/ and /usr/share/gdm/greeter/autostart/ containing:

[Desktop Entry]
Type=Application
Name=Optimus
Exec=sh -c "xrandr --setprovideroutputsource modesetting 0; xrandr --auto"
NoDisplay=true
X-GNOME-Autostart-Phase=DisplayServer

(@generix is saying there modesetting NVIDIA-0; but for me it never worked like that. However it works with modesetting 0;)

reboot
Test that everything is good by running: __NV_PRIME_RENDER_OFFLOAD=1 __GLX_VENDOR_LIBRARY_NAME=nvidia glxinfo | grep vendor If it doesn't return lines with nvidia, it's not good In my case, I get:

server glx vendor string: NVIDIA Corporation
client glx vendor string: NVIDIA Corporation
OpenGL vendor string: NVIDIA Corporation

another check is that running nvidia-smi would list you at least some processes.

And I get signal out of the nvidia gpu on a external display, as I wanted :)

Thanks ;)

Solution 2:

Also found the solution by @bluehipy very helpful in getting my Acer Predator Helios 300 running Ubuntu 20.04 to work with an external monitor and have the NVIDIA/CUDA stack installed properly for deep learning work as it was causing issues.

I only found this thread when thinking of actually returning the Acer Predator Helios 300 and seeing if Dell XPS 13 with eGPU could work for "thin client" type of workflow when debugging machine learning / data science models locally and actually training on cloud.

So might as well put my small tweaks to the original instructions if someone else is struggling to make their laptop work?

Prerequisites:

sudo apt install gcc make mesa-utils mpich

Install NVIDIA Driver

What worked for me was:

Remove all nvidia things you might have tried: sudo apt --purge remove nvidia-*
Original instructions said that download the latest drivers, but you might want to have the driver version found from the latest CUDA toolkit so check what that is when you are installing things. Might work with the latest NVIDIA driver? You need to check the old drivers most likely to match the CUDA toolkit driver version, e.g. at the time of these instruction the matching version was 470.57.02 (NVIDIA-Linux-x86_64-470.57.02.run).
reboot in recovery mode (or without a x server running) and run the driver installer even if it says that no gpu was found on your system (drop to root, and e.g. cd ../home/username/Downloads and ./NVIDIA-Linux-x86_64-470.74.run)
delete any /etc/X11/xorg.conf you may have
reboot (hit e on the grub menu for Ubuntu and add the nomodeset at the end)
Install nvidia-prime if it is not installed yet
sudo prime-select nvidia
Update /usr/share/X11/xorg.conf.d/10-amdgpu.conf replace driver with modesetting

Section "OutputClass" 
    Identifier "AMDgpu" 
    MatchDriver "amdgpu" 
    Driver "modesetting"
EndSection

Create the nvidia config file (sudo gedit /usr/share/X11/xorg.conf.d/10-nvidia.conf) with something like:

Section "OutputClass" 
    Identifier "nvidia" 
    MatchDriver "nvidia-drm" 
    Driver "nvidia" 
    Option "AllowEmptyInitialConfiguration" 
    ModulePath "/usr/lib/x86_64-linux-gnu/nvidia/xorg" 
    Option "PrimaryGPU" "Yes" 
    Option "AllowExternalGpus" "True"
EndSection

10 Create two files optimus.desktop in /etc/xdg/autostart/ and /usr/share/gdm/greeter/autostart/ containing:

[Desktop Entry]
Type=Application
Name=Optimus
Exec=sh -c "xrandr --setprovideroutputsource modesetting 0; xrandr --auto"
NoDisplay=true
X-GNOME-Autostart-Phase=DisplayServer

Modify the grub so that nomodeset is there every time: sudo gedit /etc/default/grub
reboot
Test that everything is good by running: __NV_PRIME_RENDER_OFFLOAD=1 __GLX_VENDOR_LIBRARY_NAME=nvidia glxinfo | grep vendor
Check that nvidia-smi would list you at least some processes.

| NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.4     | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC 
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M.
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0  On |                  N/A
| N/A   42C    P8    14W /  N/A |    264MiB /  5946MiB |      1%      Default

Install CUDA Toolkit

Latest CUDA toolkit at the time of the instruction was cuda_11.4.2_470.57.02_linux.run so installed that without re-installing the NVIDIA driver

wget https://developer.download.nvidia.com/compute/cuda/11.4.2/local_installers/cuda_11.4.2_470.57.02_linux.run
sudo sh cuda_11.4.2_470.57.02_linux.run

CUDA toolkit installation

Verify CUDA installation

See https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#install-samples

Prerequisites: (if you want to have all the samples compiled properly) From:

sudo apt-get install g++ freeglut3-dev build-essential libx11-dev \
    libxmu-dev libxi-dev libglu1-mesa libglu1-mesa-dev libfreeimage-dev

e.g. ./deviceQuery returns:

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "NVIDIA GeForce RTX 3060 Laptop GPU"
  CUDA Driver Version / Runtime Version          11.4 / 11.4
  CUDA Capability Major/Minor version number:    8.6
  Total amount of global memory:                 5947 MBytes (6235422720 bytes)
  (030) Multiprocessors, (128) CUDA Cores/MP:    3840 CUDA Cores
  GPU Max Clock rate:                            1425 MHz (1.42 GHz)

...

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 11.4, CUDA Runtime Version = 11.4, NumDevs = 1
Result = PASS

cudnn installation

See guide from https://docs.nvidia.com/deeplearning/cudnn/install-guide/index.html

Download cuDNN v8.2.4 (September 2nd, 2021), for CUDA 11.4

-> cuDNN Library for Linux (x86_64), e.g. cudnn-11.4-linux-x64-v8.2.4.15.tgz