Variation Analysis
Package Overview
Goal
Identification of sequence variants in germline and somatic samples, in whole-genomes and exomes
Requirements
- Paired-end sequencing with 100bp read length or more
- Germline:
- 40x technical coverage (read length*2 / length of genome or exome)
- Unique molecular identifiers (UMIs) or amplification-free library preparation suggested
- Somatic:
- Paired germline controls highly recommended
- UMIs or amplification-free library preparation highly recommended
- Unique-dual indexing of samples
- 100x technical coverage for somatic samples for exome
- 40x technical coverage for germline controls for exome
- 60x technical coverage for somatic samples for whole-genome
- 40x technical coverage for germline controls for whole-genome
Analysis
- Running the nf-core sarek pipeline
- Trimming of adapters and low-quality bases, processing of UMIs
- Mapping to a reference
- Identification and genotyping of SNVs and small indels
- Functional annotation of variants based on a defined set of databases
Output
- Mapped reads for visual inspection
- Variant tables with genotype information (germline only) and annotation
- HTML report
Advanced Analysis
- Analysis of large sample sets (exomes: >= 100 samples, genomes: >= 20 samples)
- Identification of copy-number variants including functional annotation
- Identification of structural variants including functional annotation