Nextflow: publishDir, output channels, and output subdirectories
Solution 1:
Nextflow processes are designed to run isolated from each other, but this can be circumvented somewhat when the command-line input and/or outputs are specified using params
. Using params like this can be problematic because if, for example, a params variable specifies an absolute path but your output declaration expects files in the Nextflow working directory (e.g. ./work/fc/0249e72585c03d08e31ce154b6d873
), you will get the 'Missing output file(s) expected by process' error you're seeing.
The solution is to ensure your inputs are localized in the working directory using an input declaration block and that the outputs are also written to the work dir. Note that only files specified in the output declaration block can be published using the publishDir directive.
Also, best to avoid calling Singularity manually in your script block. Instead just add singularity.enabled = true
to your nextflow.config
. This should also work nicely with the beforeScript process directive to initialize your environment:
params.publishDir = './results'
input_dir = file( params.input_dir )
guppy_config = file( params.guppy_config )
ref_genome = file( params.ref_genome )
process GuppyBasecaller {
publishDir(
path: "${params.publishDir}/GuppyBasecaller",
mode: 'copy',
saveAs: { fn -> fn.substring(fn.lastIndexOf('/')+1) },
)
beforeScript 'module load cuda/11.4.2; export SINGULARITY_NV=1'
container '/path/to/guppy_basecaller.img'
input:
path input_dir
path guppy_config
path ref_genome
output:
path "outdir/pass/*.bam" into bams_ch
"""
mkdir outdir
guppy_basecaller \\
--config "${guppy_config}" \\
--device "cuda:0" \\
--bam_out \\
--recursive \\
--compress \\
--align_ref "${ref_genome}" \\
-i "${input_dir}" \\
-s outdir \\
--gpu_runners_per_device "${params.guppy_gpu_runners}" \\
--num_callers "${params.guppy_callers}"
"""
}