Versionen im Vergleich

Schlüssel

  • Diese Zeile wurde hinzugefügt.
  • Diese Zeile wurde entfernt.
  • Formatierung wurde geändert.

...

Snakemake greatly supports execution of a pipeline on a grid compute system with very little overhead. However, it will help reading Snakemake's own documentation to get familiar with with basic concepts and commands: https://snakemake.readthedocs.io/en/stable/executable.html#cluster-execution


There is also an official Snakemake module and a matching Miniconda module that skips initialisation to avoid clashes between the SM module's Python and the conda Python.

Compose the Snakemake master command

...


The flag `-p` will print the (shell) commands to be executed. Useful for debugging. The flag `-r` reports why a specific rule for a specific input need to be (re)executed. Using both aids debugging and understanding the flow of your pipeline.

Executing


  1.   You should start a new screen https://linuxize.com/post/how-to-use-linux-screen/ session: `screen -S spike`.

  2.  
    1. Since the login node has very limited resources, you should start an new interactive gird job, with medium memory, one node and one core, but relatively long runtime:

      Codeblock
      languagebash
      qsub -I -A ngsukdkohi -l mem=10GB -l walltime="40:59:50,nodes=1:ppn=1"


    2.  Alternatively you can run your master job on hilbert238 (but make sure that you are not actually causing workload there!)
  3.   navigate to your `spike` clone and make sure you configure `spike` correctly (docs to be come), specifically `snupy` credentials and selection of samples to be processed.
  4.   Trigger a dry run of the pipeline by using the `-n` flag of snakemake and check that everything looks good:

    Codeblock
    languagebash
    snakemake --cluster-config cluster.json --cluster "qsub -A {cluster.account} -q {cluster.queue} -l select={cluster.nodes}:ncpus{cluster.ppn}:mem={cluster.mem} -l walltime={cluster.time}" -j 100 --latency-wait 900 --use-conda --cluster-status scripts/barnacle_status.py --max-status-checks-per-second 1 --keep-going -p -r -n


  5.   (During development of `spike` it happened to me that `snakemake` wanted to re-execute long running programs because of changes in the Snakefiles, but it would produce identical results. To avoid the waste of compute, I am sometimes "touching" output files to update the time stamp such that snakemake will not reinvoke execution. This harbours the risk of computing with outdated or even incomplete intermediate results! Be careful: replace `-n` with `--touch`.
  6.   Trigger actual snakemake run, by removing `-n` (and `--touch`) from the above command.

...