Mapping and Counting

Output Description

Dependent on the type of experiment, a different pipeline was used to analyze the data.

Result files were produced with an in-house pipeline.

Multiqc summary report

  • Location: multiqc/*html
  • MultiQC (Ewels et al. 2016, see download) HTML summary reports with a general summary of the experiment

Basic sequence quality control

  • Location: fastqc/*html
  • FastQC (Andrews et al. 2010) HTML report containing quality control metrics for each FastQ file
  • If sequence data were pre-processed, there will be reports for raw and pre-processed FastQ files
  • Software and detailed documentation can be found here

Pre-processed FastQ files

Mappings

Gene counts

  • Location:
    • for standard RNA-seq: genecounts/bfx*tsv.gz and genecounts/bfx*annotated.xlsx
    • for RNA-seq with UMI: genecounts/bfx*reads.tsv.gz, genecounts/bfx*umi.tsv.gz and genecounts/bfx*annotated.xlsx
  • Table of counts per sample and gene obtained with featureCounts (Liao, Smyth, and Shi 2013) for standard RNA-seq (download) or umi-tools for RNA-seq with UMI
  • Only available for RNA-seq data
  • Raw counts
  • Used for DEG tests

TPM (transcript-per-million) counts

  • Location: kallisto/bfx*tpm_per_transcript.tsv.gz, kallisto/bfx*tpm_per_transcript.xlsx, kallisto/bfx*tpm_per_gene.tsv.gz and kallisto/bfx*tpm_per_gene.xlsx
  • TPM counts per transcript and gene obtained with kallisto (Bray et al. 2016, see download)
  • Only available for RNA-seq data
  • UMIs are not considered in the TPM calculation
  • TPM normalization accounts for transcript length and sequencing depth
  • Use for within-sample comparisons between genes

RNA-SeQC report

  • Location: rnaseqc/report/bfx*html
  • RNA-SeQC (Graubert et al. 2021, see download) HTML report containing RNA-seq library quality metrics for each sample
  • Only available for RNA-seq data

Result files were produced with the Cell Ranger pipeline (Zheng et al. 2017, see download).

Summary report

  • Location: web_summary.html
  • Interactive summary HTML file with quality metrics and automated secondary analysis results
  • Detailed descriptions can be found here
  • Help on interpreting the different metrics is in this tech note
  • If an issue was detected, shows a number of different alerts (more info)

Counts

  • Location: filtered_feature_bc_matrix, filtered_feature_bc_matrix.h5 or count/sample_filtered_feature_bc_matrix, count/sample_filtered_feature_bc_matrix.h5
  • Matrix of counts per gene and cell in Matrix Exchange Format format or HDF5 format
  • Primary input for single-cell analysis

Mappings

  • Location: possorted_genome_bam.bam, possorted_genome_bam.bam.bai or count/sample_alignments.bam, count/sample_alignments.bam.bai
  • Read alignments in BAM format as well as their BAI indices
  • Not available by default for fixed RNA profiling
  • Alignments can be viewed in IGV (Thorvaldsdóttir, Robinson, and Mesirov 2013, see download)

Loupe Browser file

  • Location: cloupe.cloupe or count/sample_cloupe.cloupe
  • File for visualization of the secondary analysis results produced by Cell Ranger in the Loupe Browser (Zheng et al. 2017, see download)
  • Extensive tutorials can be found here

Other Cell Ranger files

  • A full and detailed description can be found here

Other files produced by us

Result files were produced with the Cell Ranger ATAC pipeline (Satpathy et al. 2019, see download).

Summary report

  • Location: web_summary.html
  • Interactive summary HTML file with quality metrics and automated secondary analysis results
  • Detailed descriptions can be found here

Counts

  • Location: filtered_peak_bc_matrix, filtered_peak_bc_matrix.h5 and filtered_tf_bc_matrix, filtered_tf_bc_matrix.h5
  • Matrix of counts per peak and cell, as well as transcription factor and cell in Matrix Exchange Format format or HDF5 format
  • Primary input for single-cell analysis

Fragments

Mappings

Loupe Browser file

  • Location: cloupe.cloupe
  • File for visualization of the secondary analysis results produced by Cell Ranger ATAC in the Loupe Browser (Zheng et al. 2017,see download)
  • Extensive tutorials can be found here
  • Also have a look at these notes

Other Cell Ranger ATAC files

  • A full and detailed description can be found here

Result files were produced with the Cell Ranger ARC pipeline (Zheng et al. 2017, see download).

Summary report

  • Location: web_summary.html
  • Interactive summary HTML file with quality metrics and automated secondary analysis results
  • Detailed descriptions can be found here

Counts

  • Location: filtered_feature_bc_matrix, filtered_feature_bc_matrix.h5 and
  • Matrix of counts per gene/peak/transcription factor and cell in Matrix Exchange Format format or HDF5 format
  • Primary input for single-cell analysis

Fragments

  • Location: fragments.tsv.gz, fragments.tsv.gz.tbi
  • ATAC fragments positions in BED format
  • Often used as well for single-cell analysis

Mappings

  • Location: gex_possorted_bam.bam, gex_possorted_bam.bam.bai and atac_possorted_bam.bam, atac_possorted_bam.bam.bai
  • Gene expression and ATAC read alignments in BAM format as well as their BAI indices
  • Alignments can be viewed in IGV (Thorvaldsdóttir, Robinson, and Mesirov 2013, see download)

Loupe Browser file

  • Location: cloupe.cloupe
  • File for visualization of the secondary analysis results produced by Cell Ranger ARC in the Loupe Browser (Zheng et al. 2017, see download)
  • Extensive tutorials can be found here

Other Cell Ranger ARC files

  • A full and detailed description can be found here

Result files were produced with the Space Ranger pipeline (Zheng et al. 2017, see download).

Summary report

  • Location: web_summary.html
  • Interactive summary HTML file with quality metrics and automated secondary analysis results
  • Detailed descriptions can be found here

Counts

  • Location: filtered_feature_bc_matrix, filtered_feature_bc_matrix.h5 (Visium) or binned_outputs/square_*/filtered_feature_bc_matrix, binned_outputs/square_*/filtered_feature_bc_matrix.h5 (Visium HD)
  • Matrix of counts per gene and cell in Matrix Exchange Format format or HDF5 format
  • For Visium HD, the counts are aggregated at different resolutions (squares)
  • Primary input for single-cell analysis

Images

  • Location: spatial/*tiff (Visium) or binned_outputs/square_*/spatial/*tiff and spatial/*tiff (Visium HD)
  • TIFF images of the tissue sections with spots and gene expression data
  • For Visium HD, the images are shown at different resolutions (squares)
  • Includes additionally the original microscopy images
  • Can be viewed in Fiji (Schindelin et al. 2012, see download)
  • Primary input for single-cell analysis

Mappings

Loupe Browser file

  • Location: cloupe.cloupe (Visium) or binned_outputs/square_*/cloupe.cloupe (Visium HD)
  • File for visualization of the secondary analysis results produced by Space Ranger in the Loupe Browser (Zheng et al. 2017, see download)
  • Extensive tutorials can be found here

Other Space Ranger files

  • A full and detailed description can be found here-

Result files were produced with the Xenium Onboard Analysis/Xenium Ranger pipeline (Janesick et al. 2023, see download).

Summary report

  • Location: analysis_summary.html
  • Interactive summary HTML file with quality metrics and automated secondary analysis results
  • Detailed descriptions can be found here

Counts

  • Location: cell_feature_matrix, cell_feature_matrix.h5
  • Matrix of counts per gene and cell in Matrix Exchange Format format or HDF5 format
  • Counts are based on segmentations of transcripts into cells
  • Primary input for spatial analysis

Segmentations

  • Location: nucleus_boundaries.csv.gz, nucleus_boundaries.parquet and cell_boundaries.csv.gz, cell_boundaries.parquet
  • Boundaries of nuclei and cells in CSV format or Parquet format
  • Primary input for spatial analysis

Images

  • Location: morphology.ome.tif and morphology_focus/morphology_focus_000[0-3].ome.tif
  • Full 3D Z-stack of the DAPI image at different resolutions (image pyramid) in TIFF format
  • 2D autofocus projection images in TIFF format for the nuclei DAPI stain, the boundary ATP1A1/E-Cadherin/CD45 stain, the interior RNA (18S) stain and the interior protein alphaSMA/Vimentin
  • Can be viewed in Fiji (Schindelin et al. 2012, see download)
  • Sometimes used as input for spatial analysis

Transcripts

  • Location: transcripts.csv.gz, transcripts.parquet
  • Transcript coordinates in CSV format or Parquet format
  • Primary input for segmentations, sometimes used for spatial analysis

Xenium Explorer files

  • Location: experiment.xenium
  • File for visualization of the secondary analysis results produced by Xenium Onboard Analysis/Xenium Ranger pipeline (Janesick et al. 2023, download)
  • Requires the following additional files: analysis_summary.html, morphology.ome.tif, morphology_focus/*tif, cells.zarr.zip, transcripts.zarr.zip, cell_feature_matrix.zarr.zip, analysis.zarr.zip
  • Extensive tutorials can be found here

Other Xenium Onboard Analysis/Xenium Ranger files

  • A full and detailed description can be found here

References

Andrews, Simon, Felix Krueger, Anne Segonds-Pichon, Laura Biggins, Christel Krueger, and Steven Wingett. 2010. FastQC.” Babraham Institute. https://www.bioinformatics.babraham.ac.uk/projects/fastqc/.
Bray, Nicolas L, Harold Pimentel, Páll Melsted, and Lior Pachter. 2016. “Near-Optimal Probabilistic RNA-seq Quantification.” Nat. Biotechnol. 34 (5): 525–27. https://www.nature.com/articles/nbt.3519.
Dobin, Alexander, and Thomas R Gingeras. 2015. “Mapping RNA-seq Reads with STAR.” Curr. Protoc. Bioinformatics 51 (1): 11.14.1–19. https://currentprotocols.onlinelibrary.wiley.com/doi/10.1002/0471250953.bi1114s51.
Ewels, Philip, Måns Magnusson, Sverker Lundin, and Max Käller. 2016. MultiQC: summarize analysis results for multiple tools and samples in a single report.” Bioinformatics 32 (19): 3047–48. https://doi.org/10.1093/bioinformatics/btw354.
Germain, Pierre-Luc, Aaron Lun, Carlos Garcia Meixide, Will Macnair, and Mark D Robinson. 2021. “Doublet Identification in Single-Cell Sequencing Data Using scDblFinder.” F1000Res. 10 (September): 979. https://f1000research.com/articles/10-979/v2.
Graubert, Aaron, François Aguet, Arvind Ravi, Kristin G Ardlie, and Gad Getz. 2021. RNA-SeQC 2: efficient RNA-seq quality control and quantification for large cohorts.” Bioinformatics 37 (18): 3048–50. https://doi.org/10.1093/bioinformatics/btab135.
Janesick, Amanda, Robert Shelansky, Andrew D Gottscho, Florian Wagner, Stephen R Williams, Morgane Rouault, Ghezal Beliakoff, et al. 2023. “High Resolution Mapping of the Tumor Microenvironment Using Integrated Single-Cell, Spatial and in Situ Analysis.” Nat. Commun. 14 (1): 8353. https://www.nature.com/articles/s41467-023-43458-x.
La Manno, Gioele, Ruslan Soldatov, Amit Zeisel, Emelie Braun, Hannah Hochgerner, Viktor Petukhov, Katja Lidschreiber, et al. 2018. RNA Velocity of Single Cells.” Nature 560 (7719): 494–98. https://www.nature.com/articles/s41586-018-0414-6.
Li, Heng. 2016. Seqtk.” GitHub. https://github.com/lh3/seqtk.
Li, Heng, and Richard Durbin. 2009. “Fast and Accurate Short Read Alignment with Burrows-Wheeler Transform.” Bioinformatics 25 (14): 1754–60. https://academic.oup.com/bioinformatics/article/25/14/1754/225615.
Liao, Yang, Gordon K. Smyth, and Wei Shi. 2013. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features.” Bioinformatics 30 (7): 923–30. https://doi.org/10.1093/bioinformatics/btt656.
Martin, Marcel. 2011. “Cutadapt Removes Adapter Sequences from High-Throughput Sequencing Reads.” EMBnet.journal 17 (1): 10–12. https://doi.org/10.14806/ej.17.1.200.
“Picard Toolkit.” 2019. Broad Institute, GitHub Repository. https://broadinstitute.github.io/picard/; Broad Institute.
Satpathy, Ansuman T, Jeffrey M Granja, Kathryn E Yost, Yanyan Qi, Francesca Meschi, Geoffrey P McDermott, Brett N Olsen, et al. 2019. “Massively Parallel Single-Cell Chromatin Landscapes of Human Immune Cell Development and Intratumoral T Cell Exhaustion.” Nat. Biotechnol. 37 (8): 925–36. https://www.nature.com/articles/s41587-019-0206-z.
Schindelin, Johannes, Ignacio Arganda-Carreras, Erwin Frise, Verena Kaynig, Mark Longair, Tobias Pietzsch, Stephan Preibisch, et al. 2012. “Fiji: An Open-Source Platform for Biological-Image Analysis.” Nat. Methods 9 (7): 676–82. https://www.nature.com/articles/nmeth.2019.
Smith, Tom, Andreas Heger, and Ian Sudbery. 2017. UMI-tools: Modeling Sequencing Errors in Unique Molecular Identifiers to Improve Quantification Accuracy.” Genome Res. 27 (3): 491–99. http://genome.cshlp.org/cgi/pmidlookup?view=long&pmid=28100584.
Thorvaldsdóttir, Helga, James T Robinson, and Jill P Mesirov. 2013. “Integrative Genomics Viewer (IGV): High-Performance Genomics Data Visualization and Exploration.” Brief. Bioinform. 14 (2): 178–92. https://academic.oup.com/bib/article/14/2/178/208453.
Zentgraf, Jens, and Sven Rahmann. 2021. “Fast Lightweight Accurate Xenograft Sorting.” Algorithms Mol. Biol. 16 (1): 2. https://almob.biomedcentral.com/articles/10.1186/s13015-021-00181-w.
Zheng, Grace X Y, Jessica M Terry, Phillip Belgrader, Paul Ryvkin, Zachary W Bent, Ryan Wilson, Solongo B Ziraldo, et al. 2017. “Massively Parallel Digital Transcriptional Profiling of Single Cells.” Nat. Commun. 8 (1): 14049. https://www.nature.com/articles/ncomms14049.