COSAP Docs
Go to PortalGo to Github
  • COSAP - Comparative Sequencing Analysis Platform
  • Fundamentals
    • 🛠️Getting Set Up
      • 🐳COSAP via Docker
    • ❔Using COSAP
      • ➕Creating Pipelines with Builders
      • 🚀Running Pipelines
      • ⛓️Predefined Workflows
    • 🌐Using the Web App
    • 🏗️Deploying COSAP
Powered by GitBook
On this page
  • 1. File Readers
  • FastqReader
  • BamReader
  • 2. Builders
  • Trimmer
  • Mapper
  • MarkDuplicates Builder
  • BaseRecalibrator
  • Elprep Preprocessing Tool
  • VariantCaller
  • VariantAnnotator
  • Building Pipeline Config
  1. Fundamentals
  2. Using COSAP

Creating Pipelines with Builders

Builders are responsible for creating the pipeline configuration that is later used to run the pipeline. Configuration information include the library used, input/output filenames, and run parameters for related algorithm.

Here are individual file readers and builders:

1. File Readers

FastqReader

from cosap import FastqReader

sample_fastq = FastqReader("/path/to/fastq.fastq", name="normal_sample")

You can bundle paired FASTQ files in a list:

germline_fastqs = [
    FastqReader("/path/to/fastq_1.fastq", name="normal_sample", read=1),
    FastqReader("/path/to/fastq_2.fastq", name="normal_sample", read=2)
]

tumor_fastqs = [
    FastqReader("/path/to/fastq_1.fastq", name="tumor_sample", read=1),
    FastqReader("/path/to/fastq_2.fastq", name="tumor_sample", read=2)
]

BamReader

from cosap import BamReader

sample_bam = BamReader("/path/to/sample.mdup.bam")

When running COSAP without Dockerization, relative file paths passed to Readers are resolved relative to the directory from which you run the Python command.

The workdir option in the pipeline builder only affects where intermediate and final files will be created.

2. Builders

Trimmer

from cosap import Trimmer

trimmer_germline = Trimmer(input_step=germline_fastqs)

Mapper

from cosap import Mapper

mapper_germline_params = {
    "read_groups": {
        "ID": "H0164.2",
        "SM": "Pt28N",
        "PU": "0",
        "PL": "illumina",
        "LB": "Solexa-272222"
    }
}

mapper_germline_bwa = Mapper(
    library="bwa2",
    input_step=trimmer_germline,
    params=mapper_germline_params
)

mapper_germline_bowtie = Mapper(
    library="bowtie",
    input_step=trimmer_germline,
    params=mapper_germline_params
)

MarkDuplicates Builder

Duplicate read tagger and remover builder. Takes Mapper as input.

from cosap import MDUP

mdup_germline = MDUP(input_step=mapper_germline_bwa)

# By default, this removes all duplicates.
# If you only want to mark them, use duplicate_handling_strategy argument
mdup_germline = MDUP(input_step=mapper_germline_bwa,duplicate_handling_method="mark")

BaseRecalibrator

GATK BaseRecalibrator builder. Takes Mapper or MDUP as input.

from cosap import Recalibrator

recalibrator_germline = Recalibrator(input_step=mdup_germline)

Elprep Preprocessing Tool

from cosap import Elprep

elprep_recalibrator_germline = Elprep(input_step=mapper_germline_bwa)

VariantCaller

Variant caller builder for variant detection tools. Takes Mapper, MDUP, Recalibrator, or Elprep of both normal and tumor samples as input. Currently the following libraries are supported:

from cosap import VariantCaller

sample_params = {"germline_sample_name":"Pt28N"}

mutect_caller = VariantCaller(
    library="mutect", 
    germline=recalibrator_germline, 
    tumor=recalibrator_tumor, 
    params=sample_params
)
strelka_caller = VariantCaller(
    library="strelka", 
    germline=recalibrator_germline, 
    tumor=recalibrator_tumor, 
    params=sample_params
)

If sample name is provided in the Mapper as read group, it must be provided in the VariantCaller params as well.

On some systems, Mutect2 may cause crashes when used with multithreading. To turn off multithreading in COSAP, set COSAP_THREADS_PER_JOB to 1.

VariantAnnotator

Variant annotator builder.

Currently following libraries are supported:

from cosap import Annotator

annotator = Annotator(library="vep", input_step=mutect_caller)

Building Pipeline Config

After creating individual pipeline steps, it is time to gather them under a pipeline. To do this you can simply create a Pipeline instance and add the previously created steps to it.

from cosap import Pipeline

pipeline = Pipeline()

Stacking steps into pipeline is easy as .add():

pipeline.add(trimmer_germline)
pipeline.add(trimmer_tumor)
pipeline.add(mapper_germline_bwa)
pipeline.add(mapper_tumor_bwa)
pipeline.add(mdup_germline)
pipeline.add(mdup_tumor)
pipeline.add(recalibrator_germline)
pipeline.add(recalibrator_tumor)
pipeline.add(mutect_caller)
pipeline.add(annotator)

You must add every step you want to run to the pipeline.

To create the configuration file:

pipeline_config = pipeline.build(workdir="/path/to/pipeline/workdir")

This will create a YAML file in the workdir that you specified.

PreviousUsing COSAPNextRunning Pipelines

Last updated 9 months ago

Trimmer builder for adapter trimming and quality control. Takes the list of paired as input. Uses .

Mapper builder for read mapping. Takes the or as input. Currently following libraries are supported:

(Integrated into the "bwa" library. Pipeline runner device must be "gpu")

is a high performance tool for preprocessing. Its functionality is the same as duplicate remover and base recalibrator combined. This tool requires up to 200GB of memory therefore is only recommended to be used on capable workstations and servers.

For Strelka2, Manta, and VarNet, COSAP requires to be installed on the system.

For Ensembl-vep, COSAP requires to be installed on the system.

❔
➕
BWA
BWA-MEM2
Bowtie2
Parabricks fq2bam
Elprep
Mutect2
Varscan2
Strelka2
Octopus
MuSe
VarDict
SomaticSniper
VarNet
DeepVariant
HaplotypeCaller
Manta
Docker
Ensembl-vep
Annovar
SnpEff
InterVar
CancerVar
PharmGKB
Annotsv
Docker
fastp
FastqReaders
Trimmer
FastqReader