Please help configuring NVIDIA-SMI Ubuntu 20.04 on WSL 2
Following this announcement and somewhat trying to follow this confusing thread, I
- installed Windows Version 10.0.20150 Build 20150
- installed NVidia Driver version 455.51
- installed Ubuntu 20.04 LTS from the Windows Store
I started Ubuntu and tried to run NVIDIA-SMI. It told me it wasn't there but that I could install it with one of these options:
Command 'nvidia-smi' not found, but can be installed with:
sudo apt install nvidia-340 # version 340.108-0ubuntu2, or
sudo apt install nvidia-utils-390 # version 390.132-0ubuntu2
sudo apt install nvidia-utils-435 # version 435.21-0ubuntu7
sudo apt install nvidia-utils-440 # version 440.82+really.440.64-0ubuntu6
Note that there is no nvidia-utils-450
option corresponding to my 455.51, which the NVidia thread above said somewhere is required to make things go. I then ran
sudo apt install nvidia-utils-440
nvidia-smi
and it said "No devices found".
Then I found this guide. I uninstalled Ubunto 20.04, and then followed the guide. The guide asked me to
- install a vanilla Ubuntu (no release number), which I did instead of 20.04. (This turns out to give me 20.04).
- install Windows Terminal (I chose the Preview version)
- check to receive updates for related Windows programs
- update the kernel to 4.9.121
- install NVIDIA CUDA drivers on Windows 10 (I already did 455, have to check the CUDA release)
- install Docker
- install NVidia Container Toolkit
- test
The "install docker" part of that guide seems to be buggy. I couldn't get docker service to start. So I uninstalled my Ubuntu and repeated the steps up to that point, without touching Docker. Then (my version), the steps from the Docker point are (for docker part I am following these instructions to get Docker):
sudo apt-get update
sudo apt-get upgrade
sudo apt update
sudo apt install apt-transport-https ca-certificates curl software-properties-common
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu focal stable"
sudo apt update
apt-cache policy docker-ce
sudo apt install docker-ce
sudo systemctl status docker
The last step fails. I get this message:
$ sudo systemctl status docker
System has not been booted with systemd as init system (PID 1). Can't operate.
Failed to connect to bus: Host is down
That led me here and the 4th and almost lowest-scored answer seems to work, except it needs to be run in background mode:
sudo dockerd &
sudo usermod -aG docker your-user
Then I go back to the guide post-Docker install step and resume with
docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark
and this fails with
ERRO[2020-06-23T07:28:28.582848400-04:00] 5cd9b9d7011ba20f72971dd27900b23b2c0f6be656b0bd53b9e178944fe4eba6 cleanup: failed to delete container from containerd: no such container
ERRO[2020-06-23T07:28:28.582946600-04:00] Handler for POST /v1.40/containers/5cd9b9d7011ba20f72971dd27900b23b2c0f6be656b0bd53b9e178944fe4eba6/start returned error: could not select device driver "" with capabilities: [[gpu]]
docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].
ERRO[0018] error waiting for container: context canceled
Finally I went back to the NVidia announcement and did these steps:
sudo apt-get update
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
curl -s -L https://nvidia.github.io/libnvidia-container/experimental/$distribution/libnvidia-container-experimental.list | sudo tee /etc/apt/sources.list.d/libnvidia-container-experimental.list
sudo apt-get update
sudo apt-get install -y nvidia-docker2
sudo dockerd &
docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark
SUCCESS: and I got a happy result:
> Windowed mode
> Simulation data stored in video memory
> Single precision floating point simulation
> 1 Devices used for simulation
GPU Device 0: "Quadro M500M" with compute capability 5.0
> Compute 5.0 CUDA device: [Quadro M500M]
3072 bodies, total time for 10 iterations: 3.817 ms
= 24.724 billion interactions per second
= 494.487 single-precision GFLOP/s at 20 flops per interaction
HOWEVER, per answer below, there is no NVIDIA-SMI, per known NVIDIA limitations.
FURTHER NOTE: The docker container test above works on Ubuntu shell. It does not work on Windows Powershell Preview with the Ubuntu tab.
If nbody works then you have everything well configured. The problem is NVIDIA drivers limitations. https://docs.nvidia.com/cuda/wsl-user-guide/index.html#known-limitations
NVIDIA Management Library (NVML) APIs are not supported.
nvidia-smi is based on top of the NVIDIA Management Library (NVML).
An update to @onoma's answer. From https://docs.nvidia.com/cuda/wsl-user-guide/index.html#known-limitations :
6. nvidia-smi is not yet packaged for CUDA on WSL 2.
Hopefully this will be solved in future by nvidia.