Please help configuring NVIDIA-SMI Ubuntu 20.04 on WSL 2

Following this announcement and somewhat trying to follow this confusing thread, I

installed Windows Version 10.0.20150 Build 20150
installed NVidia Driver version 455.51
installed Ubuntu 20.04 LTS from the Windows Store

I started Ubuntu and tried to run NVIDIA-SMI. It told me it wasn't there but that I could install it with one of these options:

Command 'nvidia-smi' not found, but can be installed with:

sudo apt install nvidia-340        # version 340.108-0ubuntu2, or
sudo apt install nvidia-utils-390  # version 390.132-0ubuntu2
sudo apt install nvidia-utils-435  # version 435.21-0ubuntu7
sudo apt install nvidia-utils-440  # version 440.82+really.440.64-0ubuntu6

Note that there is no nvidia-utils-450 option corresponding to my 455.51, which the NVidia thread above said somewhere is required to make things go. I then ran

sudo apt install nvidia-utils-440
nvidia-smi

and it said "No devices found".

Then I found this guide. I uninstalled Ubunto 20.04, and then followed the guide. The guide asked me to

install a vanilla Ubuntu (no release number), which I did instead of 20.04. (This turns out to give me 20.04).
install Windows Terminal (I chose the Preview version)
check to receive updates for related Windows programs
update the kernel to 4.9.121
install NVIDIA CUDA drivers on Windows 10 (I already did 455, have to check the CUDA release)
install Docker
install NVidia Container Toolkit
test

The "install docker" part of that guide seems to be buggy. I couldn't get docker service to start. So I uninstalled my Ubuntu and repeated the steps up to that point, without touching Docker. Then (my version), the steps from the Docker point are (for docker part I am following these instructions to get Docker):

sudo apt-get update
sudo apt-get upgrade
sudo apt update
sudo apt install apt-transport-https ca-certificates curl software-properties-common
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu focal stable"
sudo apt update
apt-cache policy docker-ce
sudo apt install docker-ce
sudo systemctl status docker

The last step fails. I get this message:

$ sudo systemctl status docker
System has not been booted with systemd as init system (PID 1). Can't operate.
Failed to connect to bus: Host is down

That led me here and the 4th and almost lowest-scored answer seems to work, except it needs to be run in background mode:

sudo dockerd &
sudo usermod -aG docker your-user

Then I go back to the guide post-Docker install step and resume with

docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark

and this fails with

ERRO[2020-06-23T07:28:28.582848400-04:00] 5cd9b9d7011ba20f72971dd27900b23b2c0f6be656b0bd53b9e178944fe4eba6 cleanup: failed to delete container from containerd: no such container
ERRO[2020-06-23T07:28:28.582946600-04:00] Handler for POST /v1.40/containers/5cd9b9d7011ba20f72971dd27900b23b2c0f6be656b0bd53b9e178944fe4eba6/start returned error: could not select device driver "" with capabilities: [[gpu]]
docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].
ERRO[0018] error waiting for container: context canceled

Finally I went back to the NVidia announcement and did these steps:

sudo apt-get update
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
curl -s -L https://nvidia.github.io/libnvidia-container/experimental/$distribution/libnvidia-container-experimental.list | sudo tee /etc/apt/sources.list.d/libnvidia-container-experimental.list
sudo apt-get update
sudo apt-get install -y nvidia-docker2
sudo dockerd &
docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark

SUCCESS: and I got a happy result:

> Windowed mode
> Simulation data stored in video memory
> Single precision floating point simulation
> 1 Devices used for simulation
GPU Device 0: "Quadro M500M" with compute capability 5.0

> Compute 5.0 CUDA device: [Quadro M500M]
3072 bodies, total time for 10 iterations: 3.817 ms
= 24.724 billion interactions per second
= 494.487 single-precision GFLOP/s at 20 flops per interaction

HOWEVER, per answer below, there is no NVIDIA-SMI, per known NVIDIA limitations.

FURTHER NOTE: The docker container test above works on Ubuntu shell. It does not work on Windows Powershell Preview with the Ubuntu tab.

If nbody works then you have everything well configured. The problem is NVIDIA drivers limitations. https://docs.nvidia.com/cuda/wsl-user-guide/index.html#known-limitations

NVIDIA Management Library (NVML) APIs are not supported.

nvidia-smi is based on top of the NVIDIA Management Library (NVML).

An update to @onoma's answer. From https://docs.nvidia.com/cuda/wsl-user-guide/index.html#known-limitations :

6. nvidia-smi is not yet packaged for CUDA on WSL 2.

Hopefully this will be solved in future by nvidia.

Please help configuring NVIDIA-SMI Ubuntu 20.04 on WSL 2

Related

Recent Posts