Nextflow: coding your own flow ============================== .. role:: bash(code) :language: bash This page provides an overview of the organization of a Nextflow structure, without covering every aspect. `Nextflow's Read the doc `_ details all the elements. See also a very good tutorial from Evan Floden `here `_. The best way to get an example is to refer to the flows available from SCIL, in particular :ref:`ref_tractoflow` (or see their `web page `_), which covers most of the aspects you'll need. See also `Rbx_flow `_ for a simpler version of Nextflow. Overview of nextflow files -------------------------- In general, a Nextflow pipe consists of 3 different files: - USAGE | It provides information on how to use Nextflow regarding inputs and options. | It corresponds somewhat to the argparser arguments in scilpy scripts (help). - nextflow.config It provides all the configurations required to run main.nf. In most cases, all options used in main.nf must be present in nextflow.config. You may refer to tractoflow or rbxflow for an overview of its structure (quite similar from one pipe to another). - main.nf It contains all the processes in your pipeline. Organization of the main.nf file -------------------------------- The main.nf file has three main elements, each detailed more in sections below. 1. Setup parameters They provide parameters or links to nextflow.config parameters required by processes. Also provide a certain amount of information (i.e. script duration, start, end, pipe name, …). 2. Channels They are used to define inputs. They can take several forms. 3. Processes They are parts of the flow, they execute a script (command). 1. Setup parameters ******************** These are general flow directives. They can be used to import parameters defined in nextflow.config, to define global cpu or memory usage parameters, or to provide pipeline information (pipe name, start, end, duration, etc.). Exemple : .. code-block:: bash params.input = false params.help = false if(params.help) { usage = file("$baseDir/USAGE") cpu_count = Runtime.runtime.availableProcessors() bindings = ["param1":"$params.param1", #param is set in nextflow.config "param2":"$params.param2", "param3":"$params.param3", … "paramN":"$params.paramN"] engine = new groovy.text.SimpleTemplateEngine() template = engine.createTemplate(usage.text).make(bindings) print template.toString() return } log.info "Name of pipeline" log.info "Start time: $workflow.start" workflow.onComplete { log.info "Pipeline completed at: $workflow.complete" log.info "Execution status: ${ workflow.success ? 'OK' : 'failed' }" log.info "Execution duration: $workflow.duration" } 2. Channels ************ For more information, see `Nextflows' doc `_. There are several ways of giving inputs to the Channel, depending on requirements. The 2 most commonly used are (similar to glob function): .. code-block:: bash #a single file Channel.fromPath("path/to/one/file") #several files Channel.fromFilePairs("path/to/file1", "path/to/file2","path/to/*)) There are also `operators `_ to define how inputs are stored. Channel inputs can be stored in specific variables via operators. The operators map, set and into are most often used, . Example : .. code-block:: bash Channel.fromPath("path/to/fa.nii.gz", maxDepth:1) .map{[it.parent.name, it]} .set{fa_for_process1} OR .into{fa_for_process1;fa_for_process2} #The dots must be aligned It is advisable to name variables according to the processes in which they will be used. Note that .set: is used when input(s) are stored in a single variable. When you only need a variable once, you can also write (more often used for a directory) without using .set: .. code-block:: bash fa_for_process1 = Channel.fromPath("path/to/fa.nii.gz") WARNING! An input variable cannot be used several times, i.e. each process has its own input variable. If you want to use an input in several processes, it must be stored in a number of variables corresponding to the number of processes. This is the usage of .into: is used when the input(s) are stored into several variables. These are the same inputs, so it doesn't divide them, but duplicates them. For example, above with .into fa.nii.gz is stored in fa_for_process1 and in fa_for_process2, because nextflow will use fa in process 1 and process 2. 3. Processes ************ A `process `_ consists of 4 parts: directives, input, output and the script. Each process is independent of the others and can be run in parallel. Example of a process structure : .. code-block:: bash inputs_variables process < process name > { directives input: output: [,