Snakemake-node
We have a special compute node to start Snakemake pipelines in order to reduce the load on the login. Please do
ssh snakemake-node
from the login node to go there.
Snakemake profile
With Snakemake v9.16.3, we have released an optimized configuration file that ensures best compatibility of Snakemake with our cluster. The configuration profile:
- ensures, that Snakemake rules are executed as individual jobs on our cluster instead of inside the Snakemake main process, which allows you to
- use combined computing power of all cluster nodes instead of just a single one, and
- create smaller jobs with fine-grained resource requirements that will start sooner instead of one big job
- writes error and console output of individual rules to files inside your Snakemake directory
- handles dependencies on software modules on the cluster automatically for individual rules
- check the status of jobs across the cluster, which will recognize finished and failed rules earlier
- provides an automatic jobscript template so you don't have to provide one
- integrates Singularity, the containerization framework, into Snakemake, so you can achieve optimal performance from rules executing in containers
- integrates Conda (via miniforge) for managing your virtual environments within Snakemake
Execution
This makes running Snakemake workflows on the cluster much easier, as you don't need to compose Snakemake's lengthy command line arguments yourself. After logging in on our cluster using SSH (and VPN if outside the campus):
- Log in on our dedicated Snakemake node:
ssh snakemake-node - (One time only: Create a directory snakemake_logs in your scratch directory,
mkdir /gpfs/scratch/$USER/snakemake_logs. Snakemake does not create this directory itself and will send you a notificationPost job file processing errorif it does not exist. ) cdto the directory containing your Snakemake workflow- Load the Snakemake module:
module load Snakemake/9.16.3 - Execute Snakemake:
snakemake --profile /software/Snakemake/hhu-profile-snakemake9
That's it! No more composing lengthy Snakemake command lines. Of course, you can take a look at the profile and the defaults chosen in our universities GitLab or in /software/Snakemake/hhu-profile-snakemake9 on the cluster. Any defaults can still be overridden on the command line using the appropriate command line argument or even for individual rules in your Snakefile. See the Snakemake reference for the full specification, including guidance on how to write your own workflows. Some examples are listed down below.
Managing your environment in Snakemake
Below is an example showcasing two different ways to manage your environment in Snakemake: Conda and Environment Modules . The first rule uses conda to create a new Python 3.14 environment, while the second uses one of the preinstalled modules on our cluster to load GCC. Both then simply write the loaded version of the program to a file. To run them, make sure to replace YouProjectName with the name of a project which allows you to submit jobs.
rule condaenv:
conda: "envs/py314.yaml"
resources:
project="YouProjectName"
shell: "python -c 'import sys; print(sys.version)' > pythonver.txt"
rule moduleenv:
envmodules:
"gcc/12.3.0"
resources:
project="YouProjectName"
shell: "gcc --version > gccver.txt"
With the conda environment file py314.yaml in the subdirectory envs of your snakemake directory:
name: py314
channels:
- conda-forge
dependencies:
- python=3.14
Note that this requires you to set up conda properly beforehand as described in the Conda article. Alternatively, you can also use pip, again with the proper setup beforehand.
Resource configuration
Just like any other job on the cluster, Snakemake jobs need to be submitted with clearly defined resources in terms of number of nodes, number of CPU cores per node, number of GPUs per node, memory per node and walltime. To ensure optimal use of the cluster for everyone and to make your own jobs run as early as possible, it is important to set these values as tight as possible for each job. The default profile /software/Snakemake/hhu-profile-snakemake9 chooses a single node with one CPU, no GPUs, 1GB of memory and 1hour of walltime. To override these values, you can enter them in the resource section of each rule in your Snakefile. Example:
rule myrule:
[...]
resources:
project="YouProjectName",
cpus=2,
gpus=1,
mem_mb=4000,
walltime="00:10:00"
[...]
You can even assign resources dynamically, e.g. giving your job more or less memory depending on the size of your input dataset:
rule myrule:
[...]
resources:
mem_mb=lambda mem, input: min(2 * input.size_mb, 500)
[...]
This makes use of Python lambda functions to generate a value for mem_mb after your input has been read. In this example 2 times the size of the input file or at least 500mb. The value input.size_mb is automatically calculated by Snakemake as the total size of all input files for the rule.
Older versions
Since Snakemake has made some breaking changes since earlier versions, certain settings are incompatible between e.g. Snakemake 6.4.0 and 9.16.3. We strongly recommend using the latest available version, which should be the most stable, feature-rich and well-supported. Of course, older versions and their profile in /software/Snakemake/hhu-profile will remain available on the cluster if you absolutely need them.
Resource configuration in older versions (< 9.16.3)
To specify the resources allocated to each rule, you need to create a cluster.yaml file inside your Snakemake directory. As for regular jobs on the cluster, be sure to calculate your requirements as tight as possible. Otherwise resources will be wasted and unusable to other users, and jobs will take longer to start.
An example might look like this:
Troubleshooting
Error: "/bin/bash: unbound variable"
When using shell-Blocks, Snakemake adds set -u (amongst others) during execution. This instruction causes bash to report uninitialized (unbound) variables, which might be an indication for a bug in your shell script. However, using uninitialized variables is often desired in bash, for example when changing PATH variables. To allow this, add explicitly set +u at the beginning in your shell block.
