NVIDIA NVML Driver/library version mismatch [closed]

When I run nvidia-smi I get the following message:

Failed to initialize NVML: Driver/library version mismatch

An hour ago I received the same message and uninstalled my cuda library and I was able to run nvidia-smi, getting the following result:

nvidia-smi-result

After this I downloaded cuda-repo-ubuntu1604-8-0-local-ga2_8.0.61-1_amd64.deb from the official NVIDIA page and then simply:

sudo dpkg -i cuda-repo-ubuntu1604-8-0-local-ga2_8.0.61-1_amd64.deb
sudo apt-get update
sudo apt-get install cuda
export PATH=/usr/local/cuda-8.0/bin${PATH:+:${PATH}}

Now I have cuda installed, but I get the mentioned mismatch error.


Some potentially useful information:

Running cat /proc/driver/nvidia/version I get:

NVRM version: NVIDIA UNIX x86_64 Kernel Module  378.13  Tue Feb  7 20:10:06 PST 2017
GCC version:  gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4)

I'm running Ubuntu 16.04.2 LTS.

Kernel release is: 4.4.0-66-generic.

Thanks!


Solution 1:

Surprise surprise, rebooting solved the issue (I thought I had already tried that).

The solution Robert Crovella mentioned in the comments may also be useful to someone else, since it's pretty similar to what I did to solve the issue the first time I had it.

Solution 2:

As @etal said, rebooting can solve this problem, but I think a procedure without rebooting will help.

For Chinese, check my blog -> 中文版

The error message

NVML: Driver/library version mismatch

tell us the Nvidia driver kernel module (kmod) have a wrong version, so we should unload this driver, and then load the correct version of kmod

How to do that ?

First, we should know which drivers are loaded.

lsmod | grep nvidia

you may get

nvidia_uvm            634880  8
nvidia_drm             53248  0
nvidia_modeset        790528  1 nvidia_drm
nvidia              12312576  86 nvidia_modeset,nvidia_uvm

our final goal is to unload nvidia mod, so we should unload the module depend on nvidia

sudo rmmod nvidia_drm
sudo rmmod nvidia_modeset
sudo rmmod nvidia_uvm

then, unload nvidia

sudo rmmod nvidia

Troubleshooting

if you get an error like rmmod: ERROR: Module nvidia is in use, which indicates that the kernel module is in use, you should kill the process that using the kmod:

sudo lsof /dev/nvidia*

and then kill those process, then continue to unload the kmods

Test

confirm you successfully unload those kmods

lsmod | grep nvidia

you should get nothing, then confirm you can load the correct driver

nvidia-smi

you should get the correct output

Solution 3:

So I was having this problem, none of the other remedies worked. The error message was opaque, but checking dmesg was key:

[   10.118255] NVRM: API mismatch: the client has the version 410.79, but
           NVRM: this kernel module has the version 384.130.  Please
           NVRM: make sure that this kernel module and all NVIDIA driver
           NVRM: components have the same version.

However I had completely removed the 384 version, and removed any remaining kernel drivers nvidia-384*. But even after reboot, I was still getting this. Seeing this meant that the kernel was still compiled to reference 384, but was only finding 410. So I recompiled my kernel:

# uname -a # find the kernel it's using
Linux blah 4.13.0-43-generic #48~16.04.1-Ubuntu SMP Thu May 17 12:56:46 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
# update-initramfs -c -k 4.13.0-43-generic #recompile it
# reboot

And then it worked.

After removing 384, I still had 384 files in: /var/lib/dkms/nvidia-XXX/XXX.YY/4.13.0-43-generic/x86_64/module /lib/modules/4.13.0-43-generic/kernel/drivers

I recommend using the locate command (not installed by default) rather than searching the filesystem every time.

Solution 4:

The top-2 answers can't solve my problem. I found a solution at the Nvidia official forum solved my problem. The below error info may cause by installing two different versions of the driver by different approaches. For example, install Nvidia driver by the apt and the official installer.

Failed to initialize NVML: Driver/library version mismatch

To solve this problem, only need to execute one of the following two commands.

sudo apt-get --purge remove "*nvidia*"
sudo /usr/bin/nvidia-uninstall