Part 4: Make an nf-core module¶
In this fourth part of the Hello nf-core training course, we show you how to create an nf-core module by applying the key conventions that make modules portable and maintainable.
The nf-core project provides a command (nf-core modules create) that generates properly structured module templates automatically, similar to what we used for the workflow in Part 2.
However, for teaching purposes, we're going to start by doing it manually: transforming the local cowpy module in your core-hello pipeline into an nf-core-style module step-by-step.
After that, we'll show you how to use the template-based module creation to work more efficiently in the future.
Note
This section assumes you have completed Part 3: Use an nf-core module and have integrated the CAT_CAT module into your pipeline.
If you did not complete Part 3 or want to start fresh for this part, you can use the core-hello-part3 solution as your starting point.
Run these commands from inside the hello-nf-core/ directory:
This gives you a pipeline with the CAT_CAT module already integrated.
1. Transform cowpy into an nf-core module¶
In this section, we'll apply nf-core conventions to the local cowpy module in your core-hello pipeline, transforming it into a module that follows community standards.
We'll apply the following nf-core conventions incrementally:
- Update
cowpyto use metadata tuples to propagate sample metadata through the workflow. - Centralize tool argument configuration with
ext.argsto increase module versatility while keeping the interface minimal. - Standardize output naming with
ext.prefixto promote consistency. - Centralize the publishing configuration to promote consistency.
After each step, we'll run the pipeline to test that everything works as expected.
Working directory
Make sure you're in the core-hello directory (your pipeline root) for all the commands and file edits in this section.
1.1. Update cowpy to use metadata tuples¶
In the current version of the core-hello pipeline, we're extracting the file from CAT_CAT's output tuple to pass to cowpy.
It would be better to have cowpy accept metadata tuples directly, allowing metadata to flow on through the workflow.
To the end, we'll need to make the following changes:
- Update the input and output definitions
- Update the process call in the workflow
- Update the emit block in the workflow
Once we've done all that, we'll run the pipeline to test that everything still works as before.
1.1.1. Update the input and output definitions¶
Let's get started!
Open the cowpy.nf module file (under core-hello/modules/local/) and modify it to accept metadata tuples as shown below.
As you can see, we changed both the main input and the output to a tuple that follows the tuple val(meta), path(input_file) pattern introduced in Part 3 of this training.
For the output, we also took this opportunity to add emit: cowpy_output in order to give the output channel a descriptive name.
Now that we've changed what the process expects, we need to update what we provide to it in the process call.
1.1.2. Update the process call in the workflow¶
The good news is that this change will simplify the process call.
Now that the output of CAT_CAT and the input of cowpy are the same 'shape', i.e. they both consist of a tuple val(meta), path(input_file) structure, we can simply connect them directly instead of having to extract the file explicitly from the output of the CAT_CAT process.
Open the hello.nf workflow file (under core-hello/workflows/) and update the call to cowpy as shown below.
We now call cowpy on CAT_CAT.out.file_out directly.
As a result, we no longer need to construct the ch_for_cowpy channel, so that line (and its comment line) can be deleted entirely.
1.1.3. Update the emit block in the workflow¶
Since cowpy now emits a named output, cowpy_output, we can update the hello.nf workflow's emit: block to use that.
This is technically not required, but it's good practice to refer to named outputs whenever possible.
1.1.4. Run the pipeline to test it¶
Let's run the workflow to test that everything is working correctly after these changes.
The pipeline should run successfully, with metadata now flowing from CAT_CAT through cowpy:
executor > local (8)
[b2/4cf633] CORE_HELLO:HELLO:sayHello (2) [100%] 3 of 3 ✔
[ed/ef4d69] CORE_HELLO:HELLO:convertToUpper (3) [100%] 3 of 3 ✔
[2d/32c93e] CORE_HELLO:HELLO:CAT_CAT (test) [100%] 1 of 1 ✔
[da/6f3246] CORE_HELLO:HELLO:cowpy [100%] 1 of 1 ✔
-[core/hello] Pipeline completed successfully-
That completes what we needed to do to make cowpy handle metadata tuples.
Now, let's look at what else we can do to take advantage of nf-core module patterns.
1.2. Centralize tool argument configuration with ext.args¶
In its current state, the cowpy process expects to receive a value for the character parameter.
As a result, we have to provide a value every time we call the process, even if we'd be happy with the defaults set by the tool.
For cowpy this is admittedly not a big problem, but for tools with many optional parameters, it can get quite cumbersome.
The nf-core project recommends using a Nextflow feature called ext.args to manage tool arguments more conveniently via configuration files.
Instead of declaring process inputs for every tool option, you write the module to reference ext.args in the construction of its command line.
Then it's just a matter of setting up the ext.args variable to hold the arguments and values you want to use in the modules.config file, which consolidates configuration details for all modules.
Nextflow will add those arguments with their values into the tool command line at runtime.
Let's apply this approach to the cowpy module.
We're going to need to make the following changes:
- Update the
cowpymodule - Configure
ext.argsin themodules.configfile - Update the
hello.nfworkflow
Once we've done all that, we'll run the pipeline to test that everything still works as before.
1.2.1. Update the cowpy module¶
Let's do it.
Open the cowpy.nf module file (under core-hello/modules/local/) and modify it to reference ext.args as shown below.
You can see we made three changes.
-
In the
input:block, we removed theval characterinput. Going forward, we'll supply that argument via theext.argsconfiguration as described further below. -
In the
script:block, we added the linedef args = task.ext.args ?: ''. That line uses the?:operator to determine the value of theargsvariable: the content oftask.ext.argsif it is not empty, or an empty string if it is. Note that while we generally refer toext.args, this code must referencetask.ext.argsto pull out the module-levelext.argsconfiguration. -
In the command line, we replaced
-c "$character"with$args. This is where Nextflow will inject any tool arguments set inext.argsin themodules.configfile.
As a result, the module interface is now simpler: it only expects the essential metadata and file inputs.
Note
The ?: operator is often called the 'Elvis operator' because it looks like a sideways Elvis Presley face, with the ? character symbolizing the wave in his hair.
1.2.2. Configure ext.args in the modules.config file¶
Now that we've taken the character declaration out of the module, we've got to add it to ext.args in the modules.config configuration file.
Specifically, we're going to add this little chunk of code to the process {} block:
The withName: syntax assigns this configuration to the cowpy process only, and ext.args = { "-c ${params.character}" } simply composes a string that will include the value of the character parameter.
Note the use of curly braces, which tell Nextflow to evaluate the value of the parameter at runtime.
Makes sense? Let's add it in.
Open conf/modules.config and add the configuration code inside the process {} block as shown below.
Hopefully you can imagine having all the modules in a pipeline have their ext.args specified in this file, with the following benefits:
- The module interface stays simple - It only accepts the essential metadata and file inputs
- The pipeline still exposes
params.character- End-users can still configure it as before - The module is now portable - It can be reused in other pipelines without expecting a specific parameter name
- The configuration is centralized in
modules.config, keeping workflow logic clean
By using the modules.config file as the place where all pipelines centralize per-module configuration, we make our modules more reusable across different pipelines.
1.2.3. Update the hello.nf workflow¶
Since the cowpy module no longer requires the character parameter as an input, we need to update the workflow call accordingly.
Open the hello.nf workflow file (under core-hello/workflows/) and update the call to cowpy as shown below.
The workflow code is now cleaner: we don't need to pass params.character directly to the process.
The module interface is kept minimal, making it more portable, while the pipeline still provides the explicit option through configuration.
1.2.4. Run the pipeline to test it¶
Let's test that the workflow still works as expected, specifying a different character to verify that the ext.args configuration is working.
Run this command using kosh, one of the more... enigmatic options:
nextflow run . --outdir core-hello-results -profile test,docker --validate_params false --character kosh
The pipeline should run successfully. In the output, look for the cowpy process execution line, which will show something like this:
So it ran successfully, great!
Now let's verify that the ext.args configuration worked by checking the output.
Find the output in the file browser or use the task hash (the bd/0abaf8 part in the example above) to look at the output file:
Output
You should see the ASCII art displayed with the kosh character, confirming that the ext.args configuration worked!
Optional: Inspect the command file
If you want to see exactly how the configuration was applied, you can inspect the .command.sh file:
You'll see the cowpy command with the -c kosh argument:
This shows that the .command.sh file was generated correctly based on the ext.args configuration.
Take a moment to think about what we achieved here. This approach keeps the module interface focused on essential data (files, metadata, and any mandatory per-sample parameters), while options that control the behavior of the tool are handled separately through configuration.
This may seem unnecessary for a simple tool like cowpy, but it can make a big difference for data analysis tools that have a lot of optional arguments.
To summarize the benefits of this approach:
- Clean interface: The module focuses on essential data inputs (metadata and files)
- Flexibility: Users can specify tool arguments via configuration, including sample-specific values
- Consistency: All nf-core modules follow this pattern
- Portability: Modules can be reused without hardcoded tool options
- No workflow changes: Adding or changing tool options doesn't require updating workflow code
Note
The ext.args system has powerful additional capabilities not covered here, including switching argument values dynamically based on metadata. See the nf-core module specifications for more details.
1.3. Standardize output naming with ext.prefix¶
Now that we've given the cowpy process access to the metamap, we can start taking advantage of another useful nf-core pattern: naming output files based on metadata.
Here we're going to use a Nextflow feature called ext.prefix that will allow us to standardize output file naming across modules using meta.id (the identifier included in the metamap), while still being able to configure modules individually if desired.
This will be similar to what we did with ext.args, with a few differences that we'll detail as we go.
Let's apply this approach to the cowpy module.
We're going to need to make the following changes:
- Update the
cowpymodule - Configure
ext.prefixin themodules.configfile
(No changes need to the workflow.)
Once we've done that, we'll run the pipeline to test that everything still works as before.
1.3.1. Update the cowpy module¶
Let's do it.
Open the cowpy.nf module file (under core-hello/modules/local/) and modify it to reference ext.prefix as shown below.
You can see we made three changes.
-
In the
script:block, we added the lineprefix = task.ext.prefix ?: "${meta.id}". That line uses the?:operator to determine the value of theprefixvariable: the content oftask.ext.prefixif it is not empty, or the identifier from the metamap (meta.id) if it is. Note that while we generally refer toext.prefix, this code must referencetask.ext.prefixto pull out the module-levelext.prefixconfiguration. -
In the command line, we replaced
cowpy-${input_file}with${prefix}.txt. This is where Nextflow will inject the value ofprefixdetermined by the line above. -
In the
output:block, we replacedpath("cowpy-${input_file}")withpath("${prefix}.txt").** This simply reiterates what the file path will be according to what is written in the command line.
As a result, the output file name is now constructed using a sensible default (the identifier from the metamap) combined with the appropriate file format extension.
1.3.2. Configure ext.prefix in the modules.config file¶
In this case the sensible default is not sufficiently expressive for our taste; we want to use a custom naming pattern that includes the tool name, cowpy-<id>.txt, like we had before.
We'll do that by configuring ext.prefix in modules.config, just like we did for the character parameter with ext.args, except this time the withName: 'cowpy' {} block already exists, and we just need to add the following line:
This will compose the string we want.
Note that once again we use curly braces, this time to tell Nextflow to evaluate the value of meta.id at runtime.
Let's add it in.
Open conf/modules.config and add the configuration code inside the process {} block as shown below.
In case you're wondering, the ext.prefix closure has access to the correct piece of metadata because the configuration is evaluated in the context of the process execution, where metadata is available.
1.3.3. Run the pipeline to test it¶
Let's test that the workflow still works as expected.
Check the outputs:
You should see the cowpy output file with the same naming as before: cowpy-test.txt, based on the default batch name.
Feel free to change the ext.prefix configuration to satisfy yourself that you can change the naming pattern without having to make any changes to the module or workflow code.
Alternatively, you can also try running this again with a different --batch parameter specified on the command line to satisfy yourself that that part is still customizable on the fly.
This demonstrates how ext.prefix allows you to maintain your preferred naming convention while keeping the module interface flexible.
To summarize the benefits of this approach:
- Standardized naming: Output files are typically named using sample IDs from metadata
- Configurable: Users can override the default naming if needed
- Consistent: All nf-core modules follow this pattern
- Predictable: Easy to know what output files will be called
Pretty good, right? Well, there's one more important change we need to make to improve our module to fit the nf-core guidelines.
1.4. Centralize the publishing configuration¶
You may have noticed that we've been publishing outputs to two different directories:
results— The original output directory we've been using from the beginning for our local modules, set individually using per-modulepublishDirdirectives;core-hello-results— The output directory set with--outdiron the command line, which has been receiving the nf-core logs and the results published byCAT_CAT.
This is messy and suboptimal; it would be better to have one location for everything.
Of course, we could go into each of our local modules and update the publishDir directive manually to use the core-hello-results directory, but what about next time we decide to change the output directory?
Having individual modules make publishing decisions is clearly not the way to go, especially in a world where the same module might be used in a lot of different pipelines, by people who have different needs or preferences. We want to be able to control where outputs get published at the level of the workflow configuration.
"Hey," you might say, "CAT_CAT is sending its outputs to the --outdir. Maybe we should copy its publishDir directive?"
Yes, that's a great idea.
Except it doesn't have a publishDir directive. (Go ahead, look at the module code.)
That's because nf-core pipelines centralize control at the workflow level by configuring publishDir in conf/modules.config rather than in individual modules.
Specifically, the nf-core template declares a default publishDir directive (with a predefined directory structure) that applies to all modules unless an overriding directive is provide.
Doesn't that sound awesome? Could it be that to take advantage of this default directive, all we need to do is remove the current publishDir directive from our local modules?
Let's try that out on cowpy to see what happens, then we'll look at the code for the default configuration to understand how it works.
Finally, we'll demonstrate how to override the default behavior if desired.
1.4.1. Remove the publishDir directive from cowpy¶
Let's do this.
Open the cowpy.nf module file (under core-hello/modules/local/) and remove the publishDir directive as shown below.
| core-hello/modules/local/cowpy.nf (excerpt) | |
|---|---|
That's it!
1.4.2. Run the pipeline to see what happens¶
Let's have a look at what happens if we run the pipeline now.
Have a look at your current working directory.
Now the core-hello-results also contains the outputs of the cowpy module.
You can see that Nextflow created this hierarchy of directories based on the names of the workflow and of the module.
The code responsible lives in the conf/modules.config file.
This is the default publishDir configuration that is part of the nf-core template and applies to all processes.
process {
publishDir = [
path: { "${params.outdir}/${task.process.tokenize(':')[-1].tokenize('_')[0].toLowerCase()}" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}
This may look complicated, so let's look at each of the three components:
path:Determines the output directory based on the process name. The full name of a process contained intask.processincludes the hierarchy of workflow and module imports (such asCORE_HELLO:HELLO:CAT_CAT). Thetokenizeoperations strip away that hierarchy to get just the process name, then take the first part before any underscore (if applicable), and convert it to lowercase. This is what determines that the results ofCAT_CATget published to${params.outdir}/cat/.mode:Controls how files are published (copy, symlink, etc.). This is configurable via theparams.publish_dir_modeparameter.saveAs:Filters which files to publish. This example excludesversions.ymlfiles by returningnullfor them, preventing them from being published.
This provides a consistent logic for organizing outputs.
The output looks even better when all the modules in a pipeline adopt this convention, so feel free to go delete the publishDir directives from the other modules in your pipeline.
This default will be applied even to modules that we didn't explicitly modify to follow nf-core guidelines.
That being said, you may decide you want to organize your inputs differently, and the good news is that it's easy to do so.
1.4.3. Override the default¶
To override the default publishDir directive, you can simply add your own directives to the conf/modules.config file.
For example, you could override the default for a single process using the withName: selector, as in this example where we add a custom publishDir directive for the 'cowpy' process.
| core-hello/conf/modules.config | |
|---|---|
We're not actually going to make that change, but feel free to play with this and see what logic you can implement.
The point is that this system allows gives you the best of both worlds: consistency by default and the flexibility to customize the configuration on demand.
To summarize, you get:
- Single source of truth: All publishing configuration lives in
modules.config - Useful default: Processes work out-of-the-box without per-module configuration
- Easy customization: Override publishing behavior in config, not in module code
- Portable modules: Modules don't hardcode output locations
This completes the set of nf-core module features you should absolutely learn to use, but there are others which you can read about in the nf-core modules specifications.
Takeaway¶
You now know how to adapt local modules to follow nf-core conventions:
- Design your modules to accept and propagate metadata tuples;
- Use
ext.argsto keep module interfaces minimal and portable; - Use
ext.prefixfor configurable, standardized output file naming; - Adopt the default centralized
publishDirdirective for a consistent results directory structure.
What's next?¶
Learn how to use nf-core's built-in template-based tools to create modules the easy way.
2. Generate modules with nf-core tools¶
Now that you've learned the nf-core module patterns by applying them manually, let's look at how you'd create modules in practice.
The nf-core project provides the nf-core modules create command that generates properly structured module templates with all these patterns built in from the start.
2.1. Using nf-core modules create¶
The nf-core modules create command generates a module template that already follows all the conventions you've learned.
For example, to create the cowpy module with a minimal template:
The --empty-template flag creates a clean starter template without extra code, making it easier to see the essential structure.
The command runs interactively, guiding you through the setup. It automatically looks up tool information from package repositories like Bioconda and bio.tools to pre-populate metadata.
You'll be prompted for several configuration options:
- Author information: Your GitHub username for attribution
- Resource label: A predefined set of computational requirements.
The nf-core project provides standard labels like
process_singlefor lightweight tools andprocess_highfor demanding ones. These labels help manage resource allocation across different execution environments. - Metadata requirement: Whether the module needs sample-specific information via a
metamap (usually yes for data processing modules).
The tool handles the complexity of finding package information and setting up the structure, allowing you to focus on implementing the tool's specific logic.
2.2. What gets generated¶
The tool creates a complete module structure in modules/local/ (or modules/nf-core/ if you're in the nf-core/modules repository):
Directory contents
Each file serves a specific purpose:
main.nf: Process definition with all the nf-core patterns built inmeta.yml: Module documentation describing inputs, outputs, and the toolenvironment.yml: Conda environment specification for dependenciestests/main.nf.test: nf-test test cases to validate the module works
Learn more about testing
The generated test file uses nf-test, a testing framework for Nextflow pipelines and modules. To learn how to write and run these tests, see the nf-test side quest.
The generated main.nf includes all the patterns you just learned, plus some additional features:
process COWPY {
tag "$meta.id"
label 'process_single'
conda "${moduleDir}/environment.yml"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/YOUR-TOOL-HERE':
'biocontainers/YOUR-TOOL-HERE' }"
input:
tuple val(meta), path(input) // Pattern 1: Metadata tuples ✓
output:
tuple val(meta), path("*"), emit: output
path "versions.yml" , emit: versions
when:
task.ext.when == null || task.ext.when
script:
def args = task.ext.args ?: '' // Pattern 2: ext.args ✓
def prefix = task.ext.prefix ?: "${meta.id}" // Pattern 3: ext.prefix ✓
"""
// Add your tool command here
cat <<-END_VERSIONS > versions.yml
"${task.process}":
cowpy: \$(cowpy --version)
END_VERSIONS
"""
stub:
def args = task.ext.args ?: ''
def prefix = task.ext.prefix ?: "${meta.id}"
"""
echo $args
touch ${prefix}.txt
cat <<-END_VERSIONS > versions.yml
"${task.process}":
cowpy: \$(cowpy --version)
END_VERSIONS
"""
}
Notice how all the patterns you applied manually above are already there! The template also includes several additional nf-core conventions. Some of these work out of the box, while others are placeholders we'll need to fill in, as described below.
Features that work as-is:
tag "$meta.id": Adds sample ID to process names in logs for easier trackinglabel 'process_single': Resource label for configuring CPU/memory requirementswhen:block: Allows conditional execution viatask.ext.whenconfiguration
These features are already functional and make modules more maintainable.
Placeholders we'll customize below:
input:andoutput:blocks: Generic declarations we'll update to match our toolscript:block: Contains a comment where we'll add the cowpy commandstub:block: Template we'll update to produce the correct outputs- Container and environment: Placeholders we'll fill with package information
The next sections walk through completing these customizations.
2.3. Completing the environment and container setup¶
In the case of cowpy, the tool warned that it couldn't find the package in Bioconda (the primary channel for bioinformatics tools).
However, cowpy is available in conda-forge, so you would complete the environment.yml like this:
For the container, you can use Seqera Containers to automatically build a container from any Conda package, including conda-forge packages:
Bioconda vs conda-forge packages
- Bioconda packages: Automatically get BioContainers built, providing ready-to-use containers
- conda-forge packages: Can use Seqera Containers to build containers on-demand from the Conda recipe
Most bioinformatics tools are in Bioconda, but for conda-forge tools, Seqera Containers provides an easy solution for containerization.
2.4. Defining inputs and outputs¶
The generated template includes generic input and output declarations that you'll need to customize for your specific tool.
Looking back at our manual cowpy module from section 1, we can use that as a guide.
Update the input and output blocks:
This specifies:
- The input file parameter name (
input_fileinstead of genericinput) - The output filename using the configurable prefix pattern (
${prefix}.txtinstead of wildcard*) - A descriptive emit name (
cowpy_outputinstead of genericoutput)
2.5. Writing the script block¶
The template provides a comment placeholder where you add the actual tool command. We can reference our manual module from earlier for the command logic:
Key changes:
- Change
def prefixto justprefix(withoutdef) so it's accessible in the output block - Replace the comment with the actual cowpy command that uses both
$argsand${prefix}.txt
2.6. Implementing the stub block¶
The stub block provides a fast mock implementation for testing pipeline logic without running the actual tool. It must produce the same output files as the script block:
Key changes:
- Change
def prefixto justprefixto match the script block - Remove the
echo $argsline (which was just template placeholder code) - The stub creates an empty
${prefix}.txtfile matching what the script block produces
This allows you to test workflow logic and file handling without waiting for the actual tool to run.
Once you've completed the environment setup (section 2.1.2), inputs/outputs (section 2.1.3), script block (section 2.1.4), and stub block (section 2.1.5), the module is ready to test!
Takeaway¶
You now know how to use the built-in nf-core tooling to create modules efficiently using templates rather than writing everything from scratch.
What's next?¶
Learn what are the benefits of contributing modules to nf-core and what are the main steps and requirements involved.
3. Contributing modules back to nf-core¶
The nf-core/modules repository welcomes contributions of well-tested, standardized modules.
3.1. Why contribute?¶
Contributing your modules to nf-core:
- Makes your tools available to the entire nf-core community through the modules catalog at nf-co.re/modules
- Ensures ongoing community maintenance and improvements
- Provides quality assurance through code review and automated testing
- Gives your work visibility and recognition
3.2. Contributor's checklist¶
To contribute a module to nf-core, you will need to go through the following steps:
- Check if it already exists at nf-co.re/modules
- Fork the nf-core/modules repository
- Use
nf-core modules createto generate the template - Fill in the module logic and tests
- Test with
nf-core modules test tool/subtool - Lint with
nf-core modules lint tool/subtool - Submit a pull request
For detailed instructions, see the nf-core components tutorial.
3.3. Resources¶
- Components tutorial: Complete guide to creating and contributing modules
- Module specifications: Technical requirements and guidelines
- Community support: nf-core Slack - Join the
#moduleschannel
Takeaway¶
You now know how to create nf-core modules! You learned the four key patterns that make modules portable and maintainable:
- Metadata tuples propagate metadata through the workflow
ext.argssimplifies module interfaces by handling optional arguments via configurationext.prefixstandardizes output file naming- Centralized publishing via
publishDirconfigured inmodules.configrather than hardcoded in modules
By transforming cowpy step-by-step, you developed a deep understanding of these patterns, making you equipped to work with, debug, and create nf-core modules.
In practice, you'll use nf-core modules create to generate properly structured modules with these patterns built in from the start.
Finally, you learned how to contribute modules to the nf-core community, making tools available to researchers worldwide while benefiting from ongoing community maintenance.
What's next?¶
When you're ready, continue to Part 5: Input validation to learn how to add schema-based input validation to your pipeline.