Why are packages installed rather than just linked to a specific environment?
I've noticed that normally when packages are installed using various package managers (for python), they are installed in /home/user/anaconda3/envs/env_name/
on conda and in /home/user/anaconda3/envs/env_name/lib/python3.6/lib-packages/
using pip on conda.
But conda caches all the recently downloaded packages too.
So, my question is: Why doesn't conda install all the packages on a central location and then when installed in a specific environment create a link to the directory rather than installing it there?
I've noticed that environments grow quite big and that this method would probably be able to save a bit of space.
Conda already does this. However, because it leverages hardlinks, it is easy to overestimate the space really being used, especially if one only looks at the size of a single env at a time.
To illustrate the case, let's use du
to inspect the real disk usage. First, if I count each environment directory individually, I get the uncorrected per env usage
$ for d in envs/*; do du -sh $d; done
2.4G envs/pymc36
1.7G envs/pymc3_27
1.4G envs/r-keras
1.7G envs/stan
1.2G envs/velocyto
which is what it might look like from a GUI.
Instead, if I let du
count them together (i.e., correcting for the hardlinks), we get
$ du -sh envs/*
2.4G envs/pymc36
326M envs/pymc3_27
820M envs/r-keras
927M envs/stan
548M envs/velocyto
One can see that a significant amount of space is already being saved here.
Most of the hardlinks go back to the pkgs
directory, so if we include that as well:
$ du -sh pkgs envs/*
8.2G pkgs
400M envs/pymc36
116M envs/pymc3_27
92M envs/r-keras
62M envs/stan
162M envs/velocyto
one can see that outside of the shared packages, the envs are fairly light. If you're concerned about the size of my pkgs
, note that I have never run conda clean
on this system, so my pkgs
directory is full of tarballs and superseded packages, plus some infrastructure I keep in base (e.g., Jupyter, Git, etc).