Relative paths in config.yaml for Snakefile

How can I use relative paths in my configuration file so that users do not need to change USER in the paths for output directories?

I have this:

config.yml

proj_name: H1N1_rhesus
contact:
  email: user.edu
  person: user
01-preprocess: /home/user/2022-h1n1/01-preprocess/
02-salmon: /home/user/2022-h1n1/02-salmon/
raw-data: /tmp/H1N1_rhesus/
reference: /tmp/

Snakefile

#----SET VARIABLES----#
PROJ = config["proj_name"]
INPUTDIR = config["raw-data"]
PREPROCESS = config["01-preprocess"]
SALMON = config["02-salmon"]
REFERENCE = config["reference"

But would like to do something like this:

proj_name: H1N1_rhesus
contact:
  email: user.edu
  person: user
01-preprocess: /home/$(USER)/2022-h1n1/01-preprocess/
02-salmon: /home/$(USER)/2022-h1n1/02-salmon/
raw-data: /tmp/H1N1_rhesus/
reference: /tmp/

Or this:

proj_name: H1N1_rhesus
contact:
  email: user.edu
  person: user
01-preprocess: /home/$(PWD)/01-preprocess/
02-salmon: /home/$(PWD)/02-salmon/
raw-data: /tmp/H1N1_rhesus/
reference: /tmp/

But none of the methods I tried worked.


Solution 1:

One option is to use f-string formatting (inside Snakefile). So the .yaml could contain:

proj_name: H1N1_rhesus
paths:
   01-preprocess: /home/{user}/2022-h1n1/01-preprocess/
   02-salmon: /home/{user}/2022-h1n1/02-salmon/
   raw-data: /tmp/H1N1_rhesus/
   reference: /tmp/

And inside Snakefile you would have:

config: 'config.yaml'

# to identify the user, see comments: https://stackoverflow.com/a/842096/10693596
import getpass

paths = {k: v.format(user=getpass.getuser()) for k,v in config['paths'].items()}

The paths object is a dictionary with the formatted paths.

Solution 2:

Another option is to use intake for defining catalogues of data. This allows references to environmental variables, for example:

sources:
  01-preprocess:
    args:
      url: "/home/{{env(USER)}}/2022-h1n1/01-preprocess/"

Inside Snakefile, you would have:

import intake
cat = intake.open_catalog('config.yml')
data = cat['01-preprocess'].urlpath