Mapping and Counting
Output Description
Dependent on the type of experiment, a different pipeline was used to analyze the data.
Result files were produced with an in-house pipeline.
Multiqc summary report
- Location:
multiqc/*html
- MultiQC (Ewels et al. 2016, see download) HTML summary reports with a general summary of the experiment
Basic sequence quality control
- Location:
fastqc/*html
- FastQC (Andrews et al. 2010) HTML report containing quality control metrics for each FastQ file
- If sequence data were pre-processed, there will be reports for raw and pre-processed FastQ files
- Software and detailed documentation can be found here
Pre-processed FastQ files
- Location:
preprocess/*fastq.gz
- FastQ files after pre-processing with:
- cutadapt (Martin 2011) for trimming of adapter and low-quality bases (download)
- umi-tools (Smith, Heger, and Sudbery 2017) for extracting and filtering UMI (download)
- seqtk (Li 2016) for down-sampling (download)
- xengsort (Zentgraf and Rahmann 2021) for xenograft sorting (download)
- Only available if pre-processing was done
- Steps depend on the selected pre-processing proctocol
Mappings
- Location:
bwa/*bam
andbwa/*bai
for DNA-seq datastar/*bam
andstar/*bai
for RNA-seq data
- Read alignments in BAM format as well as their BAI indices
- Mappers are BWA (Li and Durbin 2009, see download) for DNA-seq data and STAR (Dobin and Gingeras 2015, see download) for RNA-seq data
- Alignments can be viewed in IGV (Thorvaldsdóttir, Robinson, and Mesirov 2013, see download)
- Duplicate reads (originating from the same PCR template) are tagged with Picard Tools (“Picard Toolkit” 2019, see download)
Gene counts
- Location:
- for standard RNA-seq:
genecounts/bfx*tsv.gz
andgenecounts/bfx*annotated.xlsx
- for RNA-seq with UMI:
genecounts/bfx*reads.tsv.gz
,genecounts/bfx*umi.tsv.gz
andgenecounts/bfx*annotated.xlsx
- for standard RNA-seq:
- Table of counts per sample and gene obtained with featureCounts (Liao, Smyth, and Shi 2013) for standard RNA-seq (download) or umi-tools for RNA-seq with UMI
- Only available for RNA-seq data
- Raw counts
- Used for DEG tests
TPM (transcript-per-million) counts
- Location:
kallisto/bfx*tpm_per_transcript.tsv.gz
,kallisto/bfx*tpm_per_transcript.xlsx
,kallisto/bfx*tpm_per_gene.tsv.gz
andkallisto/bfx*tpm_per_gene.xlsx
- TPM counts per transcript and gene obtained with kallisto (Bray et al. 2016, see download)
- Only available for RNA-seq data
- UMIs are not considered in the TPM calculation
- TPM normalization accounts for transcript length and sequencing depth
- Use for within-sample comparisons between genes
RNA-SeQC report
- Location:
rnaseqc/report/bfx*html
- RNA-SeQC (Graubert et al. 2021, see download) HTML report containing RNA-seq library quality metrics for each sample
- Only available for RNA-seq data
Result files were produced with the Cell Ranger pipeline (Zheng et al. 2017, see download).
Summary report
Counts
- Location:
filtered_feature_bc_matrix
,filtered_feature_bc_matrix.h5
orcount/sample_filtered_feature_bc_matrix
,count/sample_filtered_feature_bc_matrix.h5
- Matrix of counts per gene and cell in Matrix Exchange Format format or HDF5 format
- Primary input for single-cell analysis
Mappings
- Location:
possorted_genome_bam.bam
,possorted_genome_bam.bam.bai
orcount/sample_alignments.bam
,count/sample_alignments.bam.bai
- Read alignments in BAM format as well as their BAI indices
- Not available by default for fixed RNA profiling
- Alignments can be viewed in IGV (Thorvaldsdóttir, Robinson, and Mesirov 2013, see download)
Loupe Browser file
- Location:
cloupe.cloupe
orcount/sample_cloupe.cloupe
- File for visualization of the secondary analysis results produced by Cell Ranger in the Loupe Browser (Zheng et al. 2017, see download)
- Extensive tutorials can be found here
Other Cell Ranger files
- A full and detailed description can be found here
Other files produced by us
- Velocyto:
- Location:
velocyto/*.loom
- Output files for RNA velocity analysis done with Velocyto (La Manno et al. 2018, see download)
- Location:
- Doublet scores:
- Location:
doublets/scdblFinder.csv
- Doublet scores for each cell calculated by scDblFinder (Germain et al. 2021, see download)
- Location:
Result files were produced with the Cell Ranger ATAC pipeline (Satpathy et al. 2019, see download).
Summary report
- Location:
web_summary.html
- Interactive summary HTML file with quality metrics and automated secondary analysis results
- Detailed descriptions can be found here
Counts
- Location:
filtered_peak_bc_matrix
,filtered_peak_bc_matrix.h5
andfiltered_tf_bc_matrix
,filtered_tf_bc_matrix.h5
- Matrix of counts per peak and cell, as well as transcription factor and cell in Matrix Exchange Format format or HDF5 format
- Primary input for single-cell analysis
Fragments
- Location:
fragments.tsv.gz
,fragments.tsv.gz.tbi
- ATAC fragments positions in BED format
- Often used as well for further single-cell analysis
- Can be viewed in IGV (Thorvaldsdóttir, Robinson, and Mesirov 2013, see download)
Mappings
- Location:
possorted_bam.bam
,possorted_bam.bam.bai
- Read alignments in BAM format as well as their BAI indices
- Alignments can be viewed in IGV (Thorvaldsdóttir, Robinson, and Mesirov 2013, see download)
Loupe Browser file
- Location:
cloupe.cloupe
- File for visualization of the secondary analysis results produced by Cell Ranger ATAC in the Loupe Browser (Zheng et al. 2017,see download)
- Extensive tutorials can be found here
- Also have a look at these notes
Other Cell Ranger ATAC files
- A full and detailed description can be found here
Result files were produced with the Cell Ranger ARC pipeline (Zheng et al. 2017, see download).
Summary report
- Location:
web_summary.html
- Interactive summary HTML file with quality metrics and automated secondary analysis results
- Detailed descriptions can be found here
Counts
- Location:
filtered_feature_bc_matrix
,filtered_feature_bc_matrix.h5
and - Matrix of counts per gene/peak/transcription factor and cell in Matrix Exchange Format format or HDF5 format
- Primary input for single-cell analysis
Fragments
- Location:
fragments.tsv.gz
,fragments.tsv.gz.tbi
- ATAC fragments positions in BED format
- Often used as well for single-cell analysis
Mappings
- Location:
gex_possorted_bam.bam
,gex_possorted_bam.bam.bai
andatac_possorted_bam.bam
,atac_possorted_bam.bam.bai
- Gene expression and ATAC read alignments in BAM format as well as their BAI indices
- Alignments can be viewed in IGV (Thorvaldsdóttir, Robinson, and Mesirov 2013, see download)
Loupe Browser file
- Location:
cloupe.cloupe
- File for visualization of the secondary analysis results produced by Cell Ranger ARC in the Loupe Browser (Zheng et al. 2017, see download)
- Extensive tutorials can be found here
Other Cell Ranger ARC files
- A full and detailed description can be found here
Result files were produced with the Space Ranger pipeline (Zheng et al. 2017, see download).
Summary report
- Location:
web_summary.html
- Interactive summary HTML file with quality metrics and automated secondary analysis results
- Detailed descriptions can be found here
Counts
- Location:
filtered_feature_bc_matrix
,filtered_feature_bc_matrix.h5
(Visium) orbinned_outputs/square_*/filtered_feature_bc_matrix
,binned_outputs/square_*/filtered_feature_bc_matrix.h5
(Visium HD) - Matrix of counts per gene and cell in Matrix Exchange Format format or HDF5 format
- For Visium HD, the counts are aggregated at different resolutions (squares)
- Primary input for single-cell analysis
Images
- Location:
spatial/*tiff
(Visium) orbinned_outputs/square_*/spatial/*tiff
andspatial/*tiff
(Visium HD) - TIFF images of the tissue sections with spots and gene expression data
- For Visium HD, the images are shown at different resolutions (squares)
- Includes additionally the original microscopy images
- Can be viewed in Fiji (Schindelin et al. 2012, see download)
- Primary input for single-cell analysis
Mappings
- Location:
possorted_genome_bam.bam
,possorted_genome_bam.bam.bai
(Visium) - Read alignments in BAM format as well as their BAI indices
- Alignments can be viewed in IGV (Thorvaldsdóttir, Robinson, and Mesirov 2013, see download)
- Not available by default for Visium HD
Loupe Browser file
- Location:
cloupe.cloupe
(Visium) orbinned_outputs/square_*/cloupe.cloupe
(Visium HD) - File for visualization of the secondary analysis results produced by Space Ranger in the Loupe Browser (Zheng et al. 2017, see download)
- Extensive tutorials can be found here
Other Space Ranger files
- A full and detailed description can be found here-
Result files were produced with the Xenium Onboard Analysis/Xenium Ranger pipeline (Janesick et al. 2023, see download).
Summary report
- Location:
analysis_summary.html
- Interactive summary HTML file with quality metrics and automated secondary analysis results
- Detailed descriptions can be found here
Counts
- Location:
cell_feature_matrix
,cell_feature_matrix.h5
- Matrix of counts per gene and cell in Matrix Exchange Format format or HDF5 format
- Counts are based on segmentations of transcripts into cells
- Primary input for spatial analysis
Segmentations
Images
- Location:
morphology.ome.tif
andmorphology_focus/morphology_focus_000[0-3].ome.tif
- Full 3D Z-stack of the DAPI image at different resolutions (image pyramid) in TIFF format
- 2D autofocus projection images in TIFF format for the nuclei DAPI stain, the boundary ATP1A1/E-Cadherin/CD45 stain, the interior RNA (18S) stain and the interior protein alphaSMA/Vimentin
- Can be viewed in Fiji (Schindelin et al. 2012, see download)
- Sometimes used as input for spatial analysis
Transcripts
Xenium Explorer files
- Location:
experiment.xenium
- File for visualization of the secondary analysis results produced by Xenium Onboard Analysis/Xenium Ranger pipeline (Janesick et al. 2023, download)
- Requires the following additional files:
analysis_summary.html
,morphology.ome.tif
,morphology_focus/*tif
,cells.zarr.zip
,transcripts.zarr.zip
,cell_feature_matrix.zarr.zip
,analysis.zarr.zip
- Extensive tutorials can be found here
Other Xenium Onboard Analysis/Xenium Ranger files
- A full and detailed description can be found here
References
Andrews, Simon, Felix Krueger, Anne Segonds-Pichon, Laura Biggins, Christel Krueger, and Steven Wingett. 2010. “FastQC.” Babraham Institute. https://www.bioinformatics.babraham.ac.uk/projects/fastqc/.
Bray, Nicolas L, Harold Pimentel, Páll Melsted, and Lior Pachter. 2016. “Near-Optimal Probabilistic RNA-seq Quantification.” Nat. Biotechnol. 34 (5): 525–27. https://www.nature.com/articles/nbt.3519.
Dobin, Alexander, and Thomas R Gingeras. 2015. “Mapping RNA-seq Reads with STAR.” Curr. Protoc. Bioinformatics 51 (1): 11.14.1–19. https://currentprotocols.onlinelibrary.wiley.com/doi/10.1002/0471250953.bi1114s51.
Ewels, Philip, Måns Magnusson, Sverker Lundin, and Max Käller. 2016. “MultiQC: summarize analysis results for multiple tools and samples in a single report.” Bioinformatics 32 (19): 3047–48. https://doi.org/10.1093/bioinformatics/btw354.
Germain, Pierre-Luc, Aaron Lun, Carlos Garcia Meixide, Will Macnair, and Mark D Robinson. 2021. “Doublet Identification in Single-Cell Sequencing Data Using scDblFinder.” F1000Res. 10 (September): 979. https://f1000research.com/articles/10-979/v2.
Graubert, Aaron, François Aguet, Arvind Ravi, Kristin G Ardlie, and Gad Getz. 2021. “RNA-SeQC 2: efficient RNA-seq quality control and quantification for large cohorts.” Bioinformatics 37 (18): 3048–50. https://doi.org/10.1093/bioinformatics/btab135.
Janesick, Amanda, Robert Shelansky, Andrew D Gottscho, Florian Wagner, Stephen R Williams, Morgane Rouault, Ghezal Beliakoff, et al. 2023. “High Resolution Mapping of the Tumor Microenvironment Using Integrated Single-Cell, Spatial and in Situ Analysis.” Nat. Commun. 14 (1): 8353. https://www.nature.com/articles/s41467-023-43458-x.
La Manno, Gioele, Ruslan Soldatov, Amit Zeisel, Emelie Braun, Hannah Hochgerner, Viktor Petukhov, Katja Lidschreiber, et al. 2018. “RNA Velocity of Single Cells.” Nature 560 (7719): 494–98. https://www.nature.com/articles/s41586-018-0414-6.
Li, Heng. 2016. “Seqtk.” GitHub. https://github.com/lh3/seqtk.
Li, Heng, and Richard Durbin. 2009. “Fast and Accurate Short Read Alignment with Burrows-Wheeler Transform.” Bioinformatics 25 (14): 1754–60. https://academic.oup.com/bioinformatics/article/25/14/1754/225615.
Liao, Yang, Gordon K. Smyth, and Wei Shi. 2013. “featureCounts: an efficient general purpose program for assigning sequence reads to genomic features.” Bioinformatics 30 (7): 923–30. https://doi.org/10.1093/bioinformatics/btt656.
Martin, Marcel. 2011. “Cutadapt Removes Adapter Sequences from High-Throughput Sequencing Reads.” EMBnet.journal 17 (1): 10–12. https://doi.org/10.14806/ej.17.1.200.
“Picard Toolkit.” 2019. Broad Institute, GitHub Repository. https://broadinstitute.github.io/picard/; Broad Institute.
Satpathy, Ansuman T, Jeffrey M Granja, Kathryn E Yost, Yanyan Qi, Francesca Meschi, Geoffrey P McDermott, Brett N Olsen, et al. 2019. “Massively Parallel Single-Cell Chromatin Landscapes of Human Immune Cell Development and Intratumoral T Cell Exhaustion.” Nat. Biotechnol. 37 (8): 925–36. https://www.nature.com/articles/s41587-019-0206-z.
Schindelin, Johannes, Ignacio Arganda-Carreras, Erwin Frise, Verena Kaynig, Mark Longair, Tobias Pietzsch, Stephan Preibisch, et al. 2012. “Fiji: An Open-Source Platform for Biological-Image Analysis.” Nat. Methods 9 (7): 676–82. https://www.nature.com/articles/nmeth.2019.
Smith, Tom, Andreas Heger, and Ian Sudbery. 2017. “UMI-tools: Modeling Sequencing Errors in Unique Molecular Identifiers to Improve Quantification Accuracy.” Genome Res. 27 (3): 491–99. http://genome.cshlp.org/cgi/pmidlookup?view=long&pmid=28100584.
Thorvaldsdóttir, Helga, James T Robinson, and Jill P Mesirov. 2013. “Integrative Genomics Viewer (IGV): High-Performance Genomics Data Visualization and Exploration.” Brief. Bioinform. 14 (2): 178–92. https://academic.oup.com/bib/article/14/2/178/208453.
Zentgraf, Jens, and Sven Rahmann. 2021. “Fast Lightweight Accurate Xenograft Sorting.” Algorithms Mol. Biol. 16 (1): 2. https://almob.biomedcentral.com/articles/10.1186/s13015-021-00181-w.
Zheng, Grace X Y, Jessica M Terry, Phillip Belgrader, Paul Ryvkin, Zachary W Bent, Ryan Wilson, Solongo B Ziraldo, et al. 2017. “Massively Parallel Digital Transcriptional Profiling of Single Cells.” Nat. Commun. 8 (1): 14049. https://www.nature.com/articles/ncomms14049.