lspci returns "Cannot open /sys/bus/pci/devices/xxxxx/resource: No such file or directory"
My Ubuntu 16.10 server VM in MS Azure (NV6 series) suddenly had a hickup for unknown reasons (none of my doing), I had to restart it and when it came back online I was no longer able to use the GPU on the machine.
The nvidia-smi application freezes.
The command lspci
yields
lspci: Cannot open /sys/bus/pci/devices/7ec1:00:00.0/resource: No such file or directory
And of course, that path (no longer?) exists. What does exist is,
$: ls /sys/bus/pci/devices/
0000:00:00.0/ 0000:00:07.0/ 0000:00:07.1/ 0000:00:07.3/ 0000:00:08.0/ b717ec1:00:00.0/
Some googling yielded a few similar questions like mine, many of which has been asked in the last 24 hours, like this one.
This might be due to Ubuntu or Azure, I have no idea which is the source of this problem or how to solve it.
Anyone have any ideas?
I was having the same problem (using Azure NC24 instances) and after working at it for a few hours I found this post and decided to submit a support request to Microsoft. Here's what they told me:
Canonical appears to have recently released kernel 4.4.0-75 for Ubuntu 16.04 and this is having an adverse effect on Tesla GPUs on NC-series VMs. Installation of the 4.4.0-75 breaks the 8.0.61-1 version of the NVIDIA CUDA driver that’s currently recommended for use on these systems, resulting in nvidia-smi not showing the adapters and lspci returning an error similar to the following:
root@pd-nvtest2:~# lspci lspci: Cannot open /sys/bus/pci/devices/2baf:00:00.0/resource: No such file or directory
They suggest backing up the OS drive, running
apt-get remove linux-image-4.4.0-75-generic
and then
update-grub
Reboot and it should work! At the very least doing that fixed the lspci output for me, I still needed to fix some CUDA stuff but that's from earlier debugging attempts.