Mapping and Counting

Output Description

Dependent on the type of experiment, a different pipeline was used to analyze the data.

Result files were produced with an in-house pipeline.

Multiqc summary report

Location: multiqc/*html
MultiQC (Ewels et al. 2016, see download) HTML summary reports with a general summary of the experiment

Basic sequence quality control

Location: fastqc/*html
FastQC (Andrews et al. 2010) HTML report containing quality control metrics for each FastQ file
If sequence data were pre-processed, there will be reports for raw and pre-processed FastQ files
Software and detailed documentation can be found here

Pre-processed FastQ files

Location: preprocess/*fastq.gz
FastQ files after pre-processing with:
- cutadapt (Martin 2011) for trimming of adapter and low-quality bases (download)
- umi-tools (Smith, Heger, and Sudbery 2017) for extracting and filtering UMI (download)
- seqtk (Li 2016) for down-sampling (download)
- xengsort (Zentgraf and Rahmann 2021) for xenograft sorting (download)
Only available if pre-processing was done
Steps depend on the selected pre-processing proctocol

Mappings

Location:
- bwa/*bam and bwa/*bai for DNA-seq data
- star/*bam and star/*bai for RNA-seq data
Read alignments in BAM format as well as their BAI indices
Mappers are BWA (Li and Durbin 2009, see download) for DNA-seq data and STAR (Dobin and Gingeras 2015, see download) for RNA-seq data
Alignments can be viewed in IGV (Thorvaldsdóttir, Robinson, and Mesirov 2013, see download)
Duplicate reads (originating from the same PCR template) are tagged with Picard Tools (“Picard Toolkit” 2019, see download)

Gene counts

Location:
- for standard RNA-seq: genecounts/bfx*tsv.gz and genecounts/bfx*annotated.xlsx
- for RNA-seq with UMI: genecounts/bfx*reads.tsv.gz, genecounts/bfx*umi.tsv.gz and genecounts/bfx*annotated.xlsx
Table of counts per sample and gene obtained with featureCounts (Liao, Smyth, and Shi 2013) for standard RNA-seq (download) or umi-tools for RNA-seq with UMI
Only available for RNA-seq data
Raw counts
Used for DEG tests

TPM (transcript-per-million) counts

Location: kallisto/bfx*tpm_per_transcript.tsv.gz, kallisto/bfx*tpm_per_transcript.xlsx, kallisto/bfx*tpm_per_gene.tsv.gz and kallisto/bfx*tpm_per_gene.xlsx
TPM counts per transcript and gene obtained with kallisto (Bray et al. 2016, see download)
Only available for RNA-seq data
UMIs are not considered in the TPM calculation
TPM normalization accounts for transcript length and sequencing depth
Use for within-sample comparisons between genes

RNA-SeQC report

Location: rnaseqc/report/bfx*html
RNA-SeQC (Graubert et al. 2021, see download) HTML report containing RNA-seq library quality metrics for each sample
Only available for RNA-seq data

Result files were produced with the Cell Ranger pipeline (Zheng et al. 2017, see download).

Summary report

Location: web_summary.html
Interactive summary HTML file with quality metrics and automated secondary analysis results
Detailed descriptions can be found here
Help on interpreting the different metrics is in this tech note
If an issue was detected, shows a number of different alerts (more info)

Counts

Location: filtered_feature_bc_matrix, filtered_feature_bc_matrix.h5 or count/sample_filtered_feature_bc_matrix, count/sample_filtered_feature_bc_matrix.h5
Matrix of counts per gene and cell in Matrix Exchange Format format or HDF5 format
Primary input for single-cell analysis

Mappings

Location: possorted_genome_bam.bam, possorted_genome_bam.bam.bai or count/sample_alignments.bam, count/sample_alignments.bam.bai
Read alignments in BAM format as well as their BAI indices
Not available by default for fixed RNA profiling
Alignments can be viewed in IGV (Thorvaldsdóttir, Robinson, and Mesirov 2013, see download)

Loupe Browser file

Location: cloupe.cloupe or count/sample_cloupe.cloupe
File for visualization of the secondary analysis results produced by Cell Ranger in the Loupe Browser (Zheng et al. 2017, see download)
Extensive tutorials can be found here

Other Cell Ranger files

A full and detailed description can be found here

Other files produced by us

Velocyto:
- Location: velocyto/*.loom
- Output files for RNA velocity analysis done with Velocyto (La Manno et al. 2018, see download)
Doublet scores:
- Location: doublets/scdblFinder.csv
- Doublet scores for each cell calculated by scDblFinder (Germain et al. 2021, see download)

Result files were produced with the Cell Ranger ATAC pipeline (Satpathy et al. 2019, see download).

Summary report

Location: web_summary.html
Interactive summary HTML file with quality metrics and automated secondary analysis results
Detailed descriptions can be found here

Counts

Location: filtered_peak_bc_matrix, filtered_peak_bc_matrix.h5 and filtered_tf_bc_matrix, filtered_tf_bc_matrix.h5
Matrix of counts per peak and cell, as well as transcription factor and cell in Matrix Exchange Format format or HDF5 format
Primary input for single-cell analysis

Fragments

Location: fragments.tsv.gz, fragments.tsv.gz.tbi
ATAC fragments positions in BED format
Often used as well for further single-cell analysis
Can be viewed in IGV (Thorvaldsdóttir, Robinson, and Mesirov 2013, see download)

Mappings

Location: possorted_bam.bam, possorted_bam.bam.bai
Read alignments in BAM format as well as their BAI indices
Alignments can be viewed in IGV (Thorvaldsdóttir, Robinson, and Mesirov 2013, see download)

Loupe Browser file

Location: cloupe.cloupe
File for visualization of the secondary analysis results produced by Cell Ranger ATAC in the Loupe Browser (Zheng et al. 2017,see download)
Extensive tutorials can be found here
Also have a look at these notes

Other Cell Ranger ATAC files

A full and detailed description can be found here

Result files were produced with the Cell Ranger ARC pipeline (Zheng et al. 2017, see download).

Summary report

Location: web_summary.html
Interactive summary HTML file with quality metrics and automated secondary analysis results
Detailed descriptions can be found here

Counts

Location: filtered_feature_bc_matrix, filtered_feature_bc_matrix.h5 and
Matrix of counts per gene/peak/transcription factor and cell in Matrix Exchange Format format or HDF5 format
Primary input for single-cell analysis

Fragments

Location: fragments.tsv.gz, fragments.tsv.gz.tbi
ATAC fragments positions in BED format
Often used as well for single-cell analysis

Mappings

Location: gex_possorted_bam.bam, gex_possorted_bam.bam.bai and atac_possorted_bam.bam, atac_possorted_bam.bam.bai
Gene expression and ATAC read alignments in BAM format as well as their BAI indices
Alignments can be viewed in IGV (Thorvaldsdóttir, Robinson, and Mesirov 2013, see download)

Loupe Browser file

Location: cloupe.cloupe
File for visualization of the secondary analysis results produced by Cell Ranger ARC in the Loupe Browser (Zheng et al. 2017, see download)
Extensive tutorials can be found here

Other Cell Ranger ARC files

A full and detailed description can be found here

Result files were produced with the Space Ranger pipeline (Zheng et al. 2017, see download).

Summary report

Location: web_summary.html
Interactive summary HTML file with quality metrics and automated secondary analysis results
Detailed descriptions can be found here

Counts

Location: filtered_feature_bc_matrix, filtered_feature_bc_matrix.h5 (Visium) or binned_outputs/square_*/filtered_feature_bc_matrix, binned_outputs/square_*/filtered_feature_bc_matrix.h5 (Visium HD)
Matrix of counts per gene and cell in Matrix Exchange Format format or HDF5 format
For Visium HD, the counts are aggregated at different resolutions (squares)
Primary input for single-cell analysis

Images

Location: spatial/*tiff (Visium) or binned_outputs/square_*/spatial/*tiff and spatial/*tiff (Visium HD)
TIFF images of the tissue sections with spots and gene expression data
For Visium HD, the images are shown at different resolutions (squares)
Includes additionally the original microscopy images
Can be viewed in Fiji (Schindelin et al. 2012, see download)
Primary input for single-cell analysis

Mappings

Location: possorted_genome_bam.bam, possorted_genome_bam.bam.bai (Visium)
Read alignments in BAM format as well as their BAI indices
Alignments can be viewed in IGV (Thorvaldsdóttir, Robinson, and Mesirov 2013, see download)
Not available by default for Visium HD

Loupe Browser file

Location: cloupe.cloupe (Visium) or binned_outputs/square_*/cloupe.cloupe (Visium HD)
File for visualization of the secondary analysis results produced by Space Ranger in the Loupe Browser (Zheng et al. 2017, see download)
Extensive tutorials can be found here

Other Space Ranger files

A full and detailed description can be found here-

Result files were produced with the Xenium Onboard Analysis/Xenium Ranger pipeline (Janesick et al. 2023, see download).

Summary report

Location: analysis_summary.html
Interactive summary HTML file with quality metrics and automated secondary analysis results
Detailed descriptions can be found here

Counts

Location: cell_feature_matrix, cell_feature_matrix.h5
Matrix of counts per gene and cell in Matrix Exchange Format format or HDF5 format
Counts are based on segmentations of transcripts into cells
Primary input for spatial analysis

Segmentations

Location: nucleus_boundaries.csv.gz, nucleus_boundaries.parquet and cell_boundaries.csv.gz, cell_boundaries.parquet
Boundaries of nuclei and cells in CSV format or Parquet format
Primary input for spatial analysis

Images

Location: morphology.ome.tif and morphology_focus/morphology_focus_000[0-3].ome.tif
Full 3D Z-stack of the DAPI image at different resolutions (image pyramid) in TIFF format
2D autofocus projection images in TIFF format for the nuclei DAPI stain, the boundary ATP1A1/E-Cadherin/CD45 stain, the interior RNA (18S) stain and the interior protein alphaSMA/Vimentin
Can be viewed in Fiji (Schindelin et al. 2012, see download)
Sometimes used as input for spatial analysis

Transcripts

Location: transcripts.csv.gz, transcripts.parquet
Transcript coordinates in CSV format or Parquet format
Primary input for segmentations, sometimes used for spatial analysis

Xenium Explorer files

Location: experiment.xenium
File for visualization of the secondary analysis results produced by Xenium Onboard Analysis/Xenium Ranger pipeline (Janesick et al. 2023, download)
Requires the following additional files: analysis_summary.html, morphology.ome.tif, morphology_focus/*tif, cells.zarr.zip, transcripts.zarr.zip, cell_feature_matrix.zarr.zip, analysis.zarr.zip
Extensive tutorials can be found here

Other Xenium Onboard Analysis/Xenium Ranger files

A full and detailed description can be found here

References

Andrews, Simon, Felix Krueger, Anne Segonds-Pichon, Laura Biggins, Christel Krueger, and Steven Wingett. 2010. “FastQC.” Babraham Institute.

Bray, Nicolas L, Harold Pimentel, Páll Melsted, and Lior Pachter. 2016. “Near-Optimal Probabilistic RNA-seq Quantification.” Nature Biotechnology 34 (5): 525–27.

Dobin, Alexander, and Thomas R Gingeras. 2015. “Mapping RNA-seq Reads with STAR” 51 (1): 11.14.1–19.

Ewels, Philip, Måns Magnusson, Sverker Lundin, and Max Käller. 2016. “MultiQC: Summarize Analysis Results for Multiple Tools and Samples in a Single Report.” Bioinformatics (Oxford, England) 32 (19): 3047–48. https://doi.org/10.1093/bioinformatics/btw354.

Germain, Pierre-Luc, Aaron Lun, Carlos Garcia Meixide, Will Macnair, and Mark D Robinson. 2021. “Doublet Identification in Single-Cell Sequencing Data Using scDblFinder.” F1000Res. 10 (September): 979.

Graubert, Aaron, François Aguet, Arvind Ravi, Kristin G Ardlie, and Gad Getz. 2021. “RNA-SeQC 2: Efficient RNA-seq Quality Control and Quantification for Large Cohorts.” Bioinformatics (Oxford, England) 37 (18): 3048–50. https://doi.org/10.1093/bioinformatics/btab135.

Janesick, Amanda, Robert Shelansky, Andrew D Gottscho, Florian Wagner, Stephen R Williams, Morgane Rouault, Ghezal Beliakoff, et al. 2023. “High Resolution Mapping of the Tumor Microenvironment Using Integrated Single-Cell, Spatial and in Situ Analysis.” Nature Communications 14 (1): 8353.

La Manno, Gioele, Ruslan Soldatov, Amit Zeisel, Emelie Braun, Hannah Hochgerner, Viktor Petukhov, Katja Lidschreiber, et al. 2018. “RNA Velocity of Single Cells.” Nature 560 (7719): 494–98.

Li, Heng. 2016. “Seqtk.” GitHub.

Li, Heng, and Richard Durbin. 2009. “Fast and Accurate Short Read Alignment with Burrows-Wheeler Transform.” Bioinformatics (Oxford, England) 25 (14): 1754–60.

Liao, Yang, Gordon K. Smyth, and Wei Shi. 2013. “featureCounts: An Efficient General Purpose Program for Assigning Sequence Reads to Genomic Features.” Bioinformatics (Oxford, England) 30 (7): 923–30. https://doi.org/10.1093/bioinformatics/btt656.

Martin, Marcel. 2011. “Cutadapt Removes Adapter Sequences from High-Throughput Sequencing Reads.” EMBnet.journal 17 (1): 10–12. https://doi.org/10.14806/ej.17.1.200.

“Picard Toolkit.” 2019. Broad Institute.

Satpathy, Ansuman T, Jeffrey M Granja, Kathryn E Yost, Yanyan Qi, Francesca Meschi, Geoffrey P McDermott, Brett N Olsen, et al. 2019. “Massively Parallel Single-Cell Chromatin Landscapes of Human Immune Cell Development and Intratumoral T Cell Exhaustion.” Nature Biotechnology 37 (8): 925–36.

Schindelin, Johannes, Ignacio Arganda-Carreras, Erwin Frise, Verena Kaynig, Mark Longair, Tobias Pietzsch, Stephan Preibisch, et al. 2012. “Fiji: An Open-Source Platform for Biological-Image Analysis.” Nature Methods 9 (7): 676–82.

Smith, Tom, Andreas Heger, and Ian Sudbery. 2017. “UMI-tools: Modeling Sequencing Errors in Unique Molecular Identifiers to Improve Quantification Accuracy.” Genome Research 27 (3): 491–99.

Thorvaldsdóttir, Helga, James T Robinson, and Jill P Mesirov. 2013. “Integrative Genomics Viewer (IGV): High-Performance Genomics Data Visualization and Exploration.” Briefings in Bioinformatics 14 (2): 178–92.

Zentgraf, Jens, and Sven Rahmann. 2021. “Fast Lightweight Accurate Xenograft Sorting.” Algorithms Mol. Biol. 16 (1): 2.

Zheng, Grace X Y, Jessica M Terry, Phillip Belgrader, Paul Ryvkin, Zachary W Bent, Ryan Wilson, Solongo B Ziraldo, et al. 2017. “Massively Parallel Digital Transcriptional Profiling of Single Cells.” Nature Communications 8 (1): 14049.