lspci returns "Cannot open /sys/bus/pci/devices/xxxxx/resource: No such file or directory"

My Ubuntu 16.10 server VM in MS Azure (NV6 series) suddenly had a hickup for unknown reasons (none of my doing), I had to restart it and when it came back online I was no longer able to use the GPU on the machine.

The nvidia-smi application freezes.

The command lspci yields

lspci: Cannot open /sys/bus/pci/devices/7ec1:00:00.0/resource: No such file or directory

And of course, that path (no longer?) exists. What does exist is,

$: ls /sys/bus/pci/devices/
0000:00:00.0/    0000:00:07.0/    0000:00:07.1/    0000:00:07.3/    0000:00:08.0/    b717ec1:00:00.0/

Some googling yielded a few similar questions like mine, many of which has been asked in the last 24 hours, like this one.

This might be due to Ubuntu or Azure, I have no idea which is the source of this problem or how to solve it.

Anyone have any ideas?

I was having the same problem (using Azure NC24 instances) and after working at it for a few hours I found this post and decided to submit a support request to Microsoft. Here's what they told me:

Canonical appears to have recently released kernel 4.4.0-75 for Ubuntu 16.04 and this is having an adverse effect on Tesla GPUs on NC-series VMs. Installation of the 4.4.0-75 breaks the 8.0.61-1 version of the NVIDIA CUDA driver that’s currently recommended for use on these systems, resulting in nvidia-smi not showing the adapters and lspci returning an error similar to the following:

root@pd-nvtest2:~# lspci lspci: Cannot open /sys/bus/pci/devices/2baf:00:00.0/resource: No such file or directory

They suggest backing up the OS drive, running

apt-get remove linux-image-4.4.0-75-generic

and then

update-grub

Reboot and it should work! At the very least doing that fixed the lspci output for me, I still needed to fix some CUDA stuff but that's from earlier debugging attempts.

lspci returns "Cannot open /sys/bus/pci/devices/xxxxx/resource: No such file or directory"

Related

Recent Posts