Relative paths in config.yaml for Snakefile
How can I use relative paths in my configuration file so that users do not need to change USER
in the paths for output directories?
I have this:
config.yml
proj_name: H1N1_rhesus
contact:
email: user.edu
person: user
01-preprocess: /home/user/2022-h1n1/01-preprocess/
02-salmon: /home/user/2022-h1n1/02-salmon/
raw-data: /tmp/H1N1_rhesus/
reference: /tmp/
Snakefile
#----SET VARIABLES----#
PROJ = config["proj_name"]
INPUTDIR = config["raw-data"]
PREPROCESS = config["01-preprocess"]
SALMON = config["02-salmon"]
REFERENCE = config["reference"
But would like to do something like this:
proj_name: H1N1_rhesus
contact:
email: user.edu
person: user
01-preprocess: /home/$(USER)/2022-h1n1/01-preprocess/
02-salmon: /home/$(USER)/2022-h1n1/02-salmon/
raw-data: /tmp/H1N1_rhesus/
reference: /tmp/
Or this:
proj_name: H1N1_rhesus
contact:
email: user.edu
person: user
01-preprocess: /home/$(PWD)/01-preprocess/
02-salmon: /home/$(PWD)/02-salmon/
raw-data: /tmp/H1N1_rhesus/
reference: /tmp/
But none of the methods I tried worked.
Solution 1:
One option is to use f-string formatting (inside Snakefile). So the .yaml
could contain:
proj_name: H1N1_rhesus
paths:
01-preprocess: /home/{user}/2022-h1n1/01-preprocess/
02-salmon: /home/{user}/2022-h1n1/02-salmon/
raw-data: /tmp/H1N1_rhesus/
reference: /tmp/
And inside Snakefile
you would have:
config: 'config.yaml'
# to identify the user, see comments: https://stackoverflow.com/a/842096/10693596
import getpass
paths = {k: v.format(user=getpass.getuser()) for k,v in config['paths'].items()}
The paths
object is a dictionary with the formatted paths.
Solution 2:
Another option is to use intake
for defining catalogues of data. This allows references to environmental variables, for example:
sources:
01-preprocess:
args:
url: "/home/{{env(USER)}}/2022-h1n1/01-preprocess/"
Inside Snakefile
, you would have:
import intake
cat = intake.open_catalog('config.yml')
data = cat['01-preprocess'].urlpath