Builders are responsible for creating the pipeline configuration that is later used to run the pipeline. Configuration information include the library used, input/output filenames, and run parameters for related algorithm.
Here are individual file readers and builders:
1. File Readers
FastqReader
from cosap import FastqReader
sample_fastq = FastqReader("/path/to/fastq.fastq", name="normal_sample")
from cosap import BamReader
sample_bam = BamReader("/path/to/sample.mdup.bam")
When running COSAP without Dockerization, relative file paths passed to Readers are resolved relative to the directory from which you run the Python command.
The workdir option in the pipeline builder only affects where intermediate and final files will be created.
2. Builders
Trimmer
from cosap import Trimmer
trimmer_germline = Trimmer(input_step=germline_fastqs)
Duplicate read tagger and remover builder. Takes Mapper as input.
from cosap import MDUP
mdup_germline = MDUP(input_step=mapper_germline_bwa)
# By default, this removes all duplicates.
# If you only want to mark them, use duplicate_handling_strategy argument
mdup_germline = MDUP(input_step=mapper_germline_bwa,duplicate_handling_method="mark")
BaseRecalibrator
GATK BaseRecalibrator builder. Takes Mapper or MDUP as input.
from cosap import Recalibrator
recalibrator_germline = Recalibrator(input_step=mdup_germline)
Elprep Preprocessing Tool
from cosap import Elprep
elprep_recalibrator_germline = Elprep(input_step=mapper_germline_bwa)
VariantCaller
Variant caller builder for variant detection tools. Takes Mapper, MDUP, Recalibrator, or Elprep of both normal and tumor samples as input. Currently the following libraries are supported:
If sample name is provided in the Mapper as read group, it must be provided in the VariantCaller params as well.
On some systems, Mutect2 may cause crashes when used with multithreading. To turn off multithreading in COSAP, set COSAP_THREADS_PER_JOB to 1.
VariantAnnotator
Variant annotator builder.
Currently following libraries are supported:
from cosap import Annotator
annotator = Annotator(library="vep", input_step=mutect_caller)
Building Pipeline Config
After creating individual pipeline steps, it is time to gather them under a pipeline. To do this you can simply create a Pipeline instance and add the previously created steps to it.
Trimmer builder for adapter trimming and quality control. Takes the list of paired as input. Uses .
Mapper builder for read mapping. Takes the or as input. Currently following libraries are supported:
(Integrated into the "bwa" library. Pipeline runner device must be "gpu")
is a high performance tool for preprocessing. Its functionality is the same as duplicate remover and base recalibrator combined. This tool requires up to 200GB of memory therefore is only recommended to be used on capable workstations and servers.
For Strelka2, Manta, and VarNet, COSAP requires to be installed on the system.
For Ensembl-vep, COSAP requires to be installed on the system.