Need new GLIBC on Centos 6 to use PyTorch on GPUs
I am using a supercomputer facility which is running Centos 6. The node I want to use has 3 Tesla V100. The problem is that the version of GLIBC installed on that node is not compatible with the latest versions of PyTorch I must use.
I do not have root. Hence, I need a way to use another GLIBC from a user level. I can talk to the sysadmins and do stuff from root (like use Docker, or something like that) but I cannot reinstall OS or GLIBC globally. I have tried to install GLIBC by myself without root, but could not do it right; it did not work. It takes too much time, and cannot find a tutorial to do it right.
I have some ideas in mind, like trying to run a Container that can access the node and use other OS; when my scheduled computing time ends I can leave the node the same way I got it; for the next user on the supercomputer facility.
I was also thinking about chroot; Download an ISO of Centos 8 and chroot into it, but I do not know if it would use the GLIBC used by the Host OS, or Centos 8 GLIBC.
What do you recommend me to do? Do you think Docker would suit my needs? Or other containerization solution? Keep trying with installing other GLIBC?
Solution 1:
I have tried to install GLIBC by myself without root, but could not do it right; it did not work. It takes too much time, and cannot find a tutorial to do it right.
Far too difficult to install a different libc in parallel with the existing one. Not a thing I would attempt, especially on a shared system.
Containers are chroots but better, with more isolation. Use containers instead of chroots so you can take advantage of tooling and pre-existing images.
However, there are kernel requirements for the container host (or virtual machine). You will need nvidia's Linux kernel driver. Recent distros are supported, like EL 7 and EL8 via DKMS.
Yes its possible to use a GPU from a container. See How to enable NVIDIA GPUs in containers on bare metal in RHEL 8 and shell scripts extracted from it. That shows you how to install kmod-nvidia-latest-dkms
and nvidia-container-toolkit
, plus some SELinux policy to keep containers isolated.
I don't think you can escape an OS upgrade of the host OS. CentOS 6 is end of life and no longer receives security updates. Kernel drivers you would want for this GPU are only supported on later OSes. Red Hat's favorite container tooling, podman, podman, doesn't exist on the older OS either.