➕Creating Pipelines with Builders
Builders are responsible for creating the pipeline configuration that is later used to run the pipeline. Configuration information include the library used, input/output filenames, and run parameters for related algorithm.
Here are individual file readers and builders:
1. File Readers
FastqReader
You can bundle paired FASTQ files in a list:
BamReader
When running COSAP without Dockerization, relative file paths passed to Readers are resolved relative to the directory from which you run the Python command.
The workdir option in the pipeline builder only affects where intermediate and final files will be created.
2. Builders
Trimmer
Trimmer builder for adapter trimming and quality control. Takes the list of paired FastqReaders as input. Uses fastp.
Mapper
Mapper builder for read mapping. Takes the Trimmer or FastqReader as input. Currently following libraries are supported:
Parabricks fq2bam (Integrated into the "bwa" library. Pipeline runner device must be "gpu")
MarkDuplicates Builder
Duplicate read tagger and remover builder. Takes Mapper as input.
BaseRecalibrator
GATK BaseRecalibrator builder. Takes Mapper or MDUP as input.
Elprep Preprocessing Tool
Elprep is a high performance tool for preprocessing. Its functionality is the same as duplicate remover and base recalibrator combined. This tool requires up to 200GB of memory therefore is only recommended to be used on capable workstations and servers.
VariantCaller
Variant caller builder for variant detection tools. Takes Mapper, MDUP, Recalibrator, or Elprep of both normal and tumor samples as input. Currently the following libraries are supported:
If sample name is provided in the Mapper as read group, it must be provided in the VariantCaller params as well.
For Strelka2, Manta, and VarNet, COSAP requires Docker to be installed on the system.
On some systems, Mutect2 may cause crashes when used with multithreading. To turn off multithreading in COSAP, set COSAP_THREADS_PER_JOB to 1.
VariantAnnotator
Variant annotator builder.
Currently following libraries are supported:
For Ensembl-vep, COSAP requires Docker to be installed on the system.
Building Pipeline Config
After creating individual pipeline steps, it is time to gather them under a pipeline. To do this you can simply create a Pipeline instance and add the previously created steps to it.
Stacking steps into pipeline is easy as .add()
:
You must add every step you want to run to the pipeline.
To create the configuration file:
This will create a YAML file in the workdir that you specified.
Last updated