2  Gene expression

Single-cell RNA-Seq experiments and analyses
Published

27-Aug-2024

2.1 Overview

Current best practices in scRNA-Seq

  • Perform QC by finding outlier peaks in the number of genes, the count depth and the fraction of mitochondrial reads. Consider these covariates jointly instead of separately.
  • Be as permissive of QC thresholding as possible, and revisit QC if downstream clustering cannot be interpreted.
  • If the distribution of QC covariates differ between samples, QC thresholds should be determined separately for each sample to account for sample quality differences
  • We recommend scran for normalization of non-full-length datasets. An alternative is to evaluate normalization approaches via scone especially for plate-based datasets. Full-length scRNA-seq protocols can be corrected for gene length using bulk methods.
  • There is no consensus on scaling genes to 0 mean and unit variance. We prefer not to scale gene expression.
  • Normalized data should be log(x+1)-transformed for use with downstream analysis methods that assume data are normally distributed.
  • Regress out biological covariates only for trajectory inference and if other biological processes of interest are not masked by the regressed out biological covariate.
  • Regress out technical and biological covariates jointly rather than serially.
  • Plate-based dataset pre-processing may require regressing out counts, normalization via non-linear normalization methods or downsampling.
  • We recommend performing batch correction via ComBat when cell type and state compositions between batches are consistent
  • Data integration and batch correction should be performed by different methods. Data integration tools may over-correct simple batch effects.
  • Users should be cautious of signals found only after expression recovery. Exploratory analysis may be best performed without this step.
  • We recommend selecting between 1,000 and 5,000 highly variable genes depending on dataset complexity.
  • Feature selection methods that use gene expression means and variances cannot be used when gene expression values have been normalized to zero mean and unit variance, or when residuals from model fitting are used as normalized expression values. Thus, one must consider what pre-processing to perform before selecting HVGs.
  • Dimensionality reduction methods should be considered separately for summarization and visualization.
  • We recommend UMAP for exploratory visualization; PCA for general purpose summarization; and diffusion maps as an alternative to PCA for trajectory inference summarization.
  • PAGA with UMAP is a suitable alternative to visualize particularly complex datasets.
  • Use measured data for statistical testing, corrected data for visual comparison of data and reduced data for other downstream analysis based on finding the underlying biological data manifold.
  • We recommend clustering by Louvain community detection on a single-cell KNN graph.
  • Clustering does not have to be performed at a single resolution. Subclustering particular cell clusters is a valid approach to focus on more detailed substructures in a dataset.
  • Do not use marker gene P-values to validate a cell-identity cluster, especially when the detected marker genes do not help to annotate the community. P-values may be inflated.
  • Note that marker genes for the same cell-identity cluster may differ between datasets purely due to dataset cell type and state compositions.
  • If relevant reference atlases exist, we recommend using automated cluster annotation combined with data-derived marker-gene-based manual annotation to annotate clusters.
  • Consider that statistical tests over changes in the proportion of a cell-identity cluster between samples are dependent on one another.
  • Inferred trajectories do not have to represent a biological process. Further sources of evidence should be collected to interpret a trajectory.
  • DE testing should not be performed on corrected data (denoised, batch corrected, etc.), but instead on measured data with technical covariates included in the model.
  • Users should not rely on DE testing tools to correct models with confounded covariates. Model specification should be performed carefully ensuring a full-rank design matrix.
  • We recommend using MAST or limma for DE testing.
  • Users should be wary of uncertainty in the inferred regulatory relationships. Modules of genes that are enriched for regulatory relationships will be more reliable than individual edges.

Luecken & Theis (2019)

Best practices for single-cell analysis across modalities

Heumos et al. (2023)

What information should be included in an scRNA-Seq publication?

Füllgrabe et al. (2020)

2.2 Experimental design

Experimental Considerations for Single-Cell RNA Sequencing Approaches

Overview of step-wise approach to designing single-cell analysis workflows. RNA integrity number (RIN); Reads per cell (RPC).

Overview of step-wise approach to designing single-cell analysis workflows. RNA integrity number (RIN); Reads per cell (RPC).

Nguyen et al. (2018)

How many reads are needed per cell? Sequencing depth?

Given a fixed budget, sequencing as many cells as possible at approximately one read per cell per gene is optimal, both theoretically and experimentally.

Zhang et al. (2020)

2.2.1 Batch design, number of cells

Avoid technical biases.

Experimental design examples. In the confounded design, cells are isolated from each sample onto separate plates, processed at potentially different times and the two groups (indicated by different colors) are sequenced on separate lanes of the sequencer. In the balanced design on the right, all samples are evenly distributed across all stages of the experiment, thus reducing the sources of technical variation in the experiment.

Experimental design examples. In the confounded design, cells are isolated from each sample onto separate plates, processed at potentially different times and the two groups (indicated by different colors) are sequenced on separate lanes of the sequencer. In the balanced design on the right, all samples are evenly distributed across all stages of the experiment, thus reducing the sources of technical variation in the experiment.

Deciding appropriate cell numbers

Estimate of cells required for experiments with various parameters. (A) The plot shows the log10(#Cells) required to capture at least 50 cell types based on the parameters on the X- and Y-axes. (B) The plot shows the log10(#Cells) required to capture the number of cells on the Y-axis if the population consists of 20 cell types.

Estimate of cells required for experiments with various parameters. (A) The plot shows the log10(#Cells) required to capture at least 50 cell types based on the parameters on the X- and Y-axes. (B) The plot shows the log10(#Cells) required to capture the number of cells on the Y-axis if the population consists of 20 cell types.

Baran-Gale et al. (2018)

2.2.2 Sequencing depth

While 250 000 reads per cell are sufficient for accuracy, 1 million reads per cell were a good target for saturated gene detection.

Svensson et al. (2017)

2.3 Mapping and Quantification

2.3.1 CellRanger

  • Process chromium data

  • BCL to FASTQ

  • FASTQ to cellxgene counts

  • Feature barcoding

  • CellRanger

2.3.2 Kallisto Bustools

  • 10x, inDrop and Dropseq

  • Generate cellxgene, cellxtranscript matrix

  • RNA velocity data

  • Feature barcoding

  • QC reports

  • BUSTools

Melsted et al. (2019)

2.3.3 Salmon Alevin

  • Drop-seq, 10x-Chromium v1/2/3, inDropV2, CELSeq 1/2, Quartz-Seq2, sci-RNA-seq3
  • Generate cellxgene matrix
  • Alevin

2.3.4 Nextflow nf-core rnaseq

  • Bulk RNA-Seq, SMART-Seq
  • QC, trimming, UMI demultiplexing, mapping, quantification
  • cellxgene matrix
  • nf-core scrnaseq

2.4 Doublet detection

Summary of doublet detection tools.

Summary of doublet detection tools.

The methods include doubletCells, Scrublet, cxds, bcds, hybrid, Solo, DoubletDetection, DoubletFinder, and DoubletDecon. Evaluation was conducted using 16 real scRNA-seq datasets with experimentally annotated doublets and 112 synthetic datasets.

  • Evaluation Metrics
    • Detection Accuracy: Assessed using the area under the precision-recall curve (AUPRC) and the area under the receiver operating characteristic curve (AUROC).
    • Impact on Downstream Analyses: Effects on differential expression (DE) gene analysis, highly variable gene identification, cell clustering, and cell trajectory inference.
    • Computational Efficiency: Considered aspects such as speed, scalability, stability, and usability.
  • Key Findings
    • Detection Accuracy: DoubletFinder achieved the highest detection accuracy among the methods.
    • Downstream Analyses: Removal of doublets generally improved the accuracy of downstream analyses, with varying degrees of improvement depending on the method.
    • Computational Efficiency: cxds was found to be the most computationally efficient method, particularly excelling in speed and scalability.
  • Performance Factors
    • Doublet Rate: Higher doublet rates improved the accuracy of all methods.
    • Sequencing Depth: Greater sequencing depth led to better performance.
    • Number of Cell Types: More cell types generally resulted in better detection accuracy, except for cxds, bcds, and hybrid methods.
    • Cell-Type Heterogeneity: Higher heterogeneity between cell types improved the detection accuracy for most methods.

Overall Conclusion: DoubletFinder is recommended for its high detection accuracy and significant improvement in downstream analyses, while cxds is highlighted for its computational efficiency.

Xi & Li (2021)

2.5 Cell type prediction

Performance comparison of supervised classifiers for cell identification using different scRNA-seq datasets. Heatmap of the a median F1-scores and b percentage of unlabeled cells across all cell populations per classifier (rows) per dataset (columns). Gray boxes indicate that the corresponding method could not be tested on the corresponding dataset.

Performance comparison of supervised classifiers for cell identification using different scRNA-seq datasets. Heatmap of the a median F1-scores and b percentage of unlabeled cells across all cell populations per classifier (rows) per dataset (columns). Gray boxes indicate that the corresponding method could not be tested on the corresponding dataset.

Abdelaal et al. (2019)

Identification of cell types can be completely automated (by comparing to reference data/databases) or semi-automated (reference data + marker genes).

Summary of performance of the automatic cell-type identification methods. Bar graphs of the automatic cell-type identification methods with six evaluation criteria indicated. For each evaluation criteria, the length of the bars shows the performance of the automatic method: poor, median or good. The automatic methods are sorted based on the mean performance of the evaluation criteria. No bar: not evaluated.

Summary of performance of the automatic cell-type identification methods. Bar graphs of the automatic cell-type identification methods with six evaluation criteria indicated. For each evaluation criteria, the length of the bars shows the performance of the automatic method: poor, median or good. The automatic methods are sorted based on the mean performance of the evaluation criteria. No bar: not evaluated.

Xie et al. (2021), Sun et al. (2022)

2.6 Differential expression

  • Comparison of DGE tools for single-cell data

All of the tools perform well when there is no multimodality or low levels of multimodality. They all also perform better when the sparsity (zero counts) is less. For data with a high level of multimodality, methods that consider the behavior of each individual gene, such as DESeq2, EMDomics, Monocle2, DEsingle, and SigEMD, show better TPRs. If the level of multimodality is low, however, SCDE, MAST, and edgeR can provide higher precision.

In general, the methods that can capture multimodality (non-parametric methods), perform better than do the model-based methods designed for handling zero counts. However, a model-based method that can model the drop-out events well, can perform better in terms of true positive and false positive. We observed that methods developed specifically for scRNAseq data do not show significantly better performance compared to the methods designed for bulk RNAseq data; and methods that consider behavior of each individual gene (not all genes) in calling DE genes outperform the other tools.

Effect of sample size (number of cells) on detecting DE genes. The sample size is in horizontal axis, from 10 to 400 cells in each condition. Effect of sample size on a TPR, b FPR, c accuracy (=(TP + TN)/(TP + FP + TN + FN)), and precision (=TP/(TP + FP)). A threshold of 0.05 is used for FDR or adjusted p-value.

Effect of sample size (number of cells) on detecting DE genes. The sample size is in horizontal axis, from 10 to 400 cells in each condition. Effect of sample size on a TPR, b FPR, c accuracy (=(TP + TN)/(TP + FP + TN + FN)), and precision (=TP/(TP + FP)). A threshold of 0.05 is used for FDR or adjusted p-value.

Wang et al. (2019)

2.7 Data Integration

  • Single-cell data integration challenges

Overview of common data integration methods classified according to their anchor choice.

Overview of common data integration methods classified according to their anchor choice.

a–c, Depending on the anchor choice, three types of data integration strategies can be considered: horizontal integration with features as the anchors (a), vertical integration with cells as the anchors (b) and diagonal integration with no anchors in high-dimensional space (c). The left column shows the data modalities extracted, while the right column illustrates the resulting data matrices to be integrated, depending on the anchor choice.

a–c, Depending on the anchor choice, three types of data integration strategies can be considered: horizontal integration with features as the anchors (a), vertical integration with cells as the anchors (b) and diagonal integration with no anchors in high-dimensional space (c). The left column shows the data modalities extracted, while the right column illustrates the resulting data matrices to be integrated, depending on the anchor choice.

Mosaic integration. a, Overview of an experimental design where different data modalities (each block in the rows) are profiled in different subsets of cells (each block in the columns). Transparent matrices denote missing information. b, Resulting data matrices after applying a mosaic integration approach aimed at imputing missing data modalities.

Mosaic integration. a, Overview of an experimental design where different data modalities (each block in the rows) are profiled in different subsets of cells (each block in the columns). Transparent matrices denote missing information. b, Resulting data matrices after applying a mosaic integration approach aimed at imputing missing data modalities.

Argelaguet et al. (2021)

  • Comparison of data integration methods

a, Overview of top and bottom ranked methods by overall score for the human immune cell task. Metrics are divided into batch correction (blue) and bio-conservation (pink) categories. Overall scores are computed using a 40/60 weighted mean of these category scores (see Methods for further visualization details and Supplementary Fig. 2 for the full plot). b,c, Visualization of the four best performers on the human immune cell integration task colored by cell identity (b) and batch annotation (c). The plots show uniform manifold approximation and projection layouts for the unintegrated data (left) and the top four performers (right).

a, Overview of top and bottom ranked methods by overall score for the human immune cell task. Metrics are divided into batch correction (blue) and bio-conservation (pink) categories. Overall scores are computed using a 40/60 weighted mean of these category scores (see Methods for further visualization details and Supplementary Fig. 2 for the full plot). b,c, Visualization of the four best performers on the human immune cell integration task colored by cell identity (b) and batch annotation (c). The plots show uniform manifold approximation and projection layouts for the unintegrated data (left) and the top four performers (right).

a, Scatter plot of the mean overall batch correction score against mean overall bio-conservation score for the selected methods on RNA tasks. Error bars indicate the standard error across tasks on which the methods ran. b, The overall scores for the best performing method, preprocessing and output combinations on each task as well as their usability and scalability. Methods that failed to run for a particular task were assigned the unintegrated ranking for that task. An asterisk after the method name (scANVI and scGen) indicates that, in addition, cell identity information was passed to this method. For ComBat and MNN, usability and scalability scores corresponding to the Python implementation of the methods are reported (Scanpy and mnnpy, respectively).

a, Scatter plot of the mean overall batch correction score against mean overall bio-conservation score for the selected methods on RNA tasks. Error bars indicate the standard error across tasks on which the methods ran. b, The overall scores for the best performing method, preprocessing and output combinations on each task as well as their usability and scalability. Methods that failed to run for a particular task were assigned the unintegrated ranking for that task. An asterisk after the method name (scANVI and scGen) indicates that, in addition, cell identity information was passed to this method. For ComBat and MNN, usability and scalability scores corresponding to the Python implementation of the methods are reported (Scanpy and mnnpy, respectively).

Luecken et al. (2022)

Qualitative evaluation of 14 batch-effect correction methods using UMAP visualization for dataset 9 of human cell atlas. The 14 methods are organized into two panels, with the top panel showing UMAP plots of raw data, Seurat 2, Seurat 3, Harmony, fastMNN, MNN Correct, ComBat, and limma outputs, while the bottom panel shows the UMAP plots of scGen, Scanorama, MMD-ResNet, ZINB-WaVE, scMerge, LIGER, and BBKNN outputs. Cells are colored by batch.

Qualitative evaluation of 14 batch-effect correction methods using UMAP visualization for dataset 9 of human cell atlas. The 14 methods are organized into two panels, with the top panel showing UMAP plots of raw data, Seurat 2, Seurat 3, Harmony, fastMNN, MNN Correct, ComBat, and limma outputs, while the bottom panel shows the UMAP plots of scGen, Scanorama, MMD-ResNet, ZINB-WaVE, scMerge, LIGER, and BBKNN outputs. Cells are colored by batch.

We tested 14 state-of-the-art batch correction algorithms designed to handle single-cell transcriptomic data. We found that each batch-effect removal method has its advantages and limitations, with no clearly superior method. Based on our results, we found LIGER, Harmony, and Seurat 3 to be the top batch mixing methods. Harmony performed well on datasets with common cell types, and also different technologies. The comparatively low runtime of Harmony also makes it suitable for initial data exploration of large datasets. Likewise, LIGER performed well, especially on datasets with non-identical cell types. The main drawback of LIGER is its longer runtime than Harmony, though it is acceptable for its performance. Seurat 3 is also able to handle large datasets, however with 20–50% longer runtime than LIGER. Due to its good batch mixing results with multiple batches, it is also recommended for such scenarios. To improve recovery of DEGs in batch-corrected data, we recommend scMerge for batch correction.

Tran et al. (2020)

2.8 Trajectory

Fig 2.1: Comparison of Trajectory inference methods.

Saelens et al. (2019)

  • Tempora Trajectory inference for time-series data

2.9 RNA velocity

2.10 Databases

2.10.1 Data

Single-cell data repositiories.

2.10.2 Markers

Curated list of marker genes by organism, tissue and cell type.

2.11 Interactive visualisation frameworks

Ouyang et al. (2021)

Cakir et al. (2020)

2.12 Learning

References

Abdelaal, T., Michielsen, L., Cats, D., Hoogduin, D., Mei, H., Reinders, M. J., & Mahfouz, A. (2019). A comparison of automatic cell identification methods for single-cell RNA sequencing data. Genome Biology, 20, 1–19. https://link.springer.com/article/10.1186/s13059-019-1795-z
Andreatta, M., Hérault, L., Gueguen, P., Gfeller, D., Berenstein, A. J., & Carmona, S. J. (2024). Semi-supervised integration of single-cell transcriptomics data. Nature Communications, 15(1), 872. https://www.nature.com/articles/s41467-024-45240-z
Argelaguet, R., Cuomo, A. S., Stegle, O., & Marioni, J. C. (2021). Computational principles and challenges in single-cell data integration. Nature Biotechnology, 39(10), 1202–1215.
Baran-Gale, J., Chandra, T., & Kirschner, K. (2018). Experimental design for single-cell RNA sequencing. Briefings in Functional Genomics, 17(4), 233–239.
Cakir, B., Prete, M., Huang, N., Van Dongen, S., Pir, P., & Kiselev, V. Y. (2020). Comparison of visualization tools for single-cell RNAseq data. NAR Genomics and Bioinformatics, 2(3), lqaa052.
Füllgrabe, A., George, N., Green, M., Nejad, P., Aronow, B., Fexova, S. K., Fischer, C., Freeberg, M. A., Huerta, L., Morrison, N., et al. (2020). Guidelines for reporting single-cell RNA-seq experiments. Nature Biotechnology, 38(12), 1384–1386.
Heumos, L., Schaar, A. C., Lance, C., Litinetskaya, A., Drost, F., Zappia, L., Lücken, M. D., Strobl, D. C., Henao, J., Curion, F., et al. (2023). Best practices for single-cell analysis across modalities. Nature Reviews Genetics, 1–23.
Luecken, M. D., Büttner, M., Chaichoompu, K., Danese, A., Interlandi, M., Müller, M. F., Strobl, D. C., Zappia, L., Dugas, M., Colomé-Tatché, M., et al. (2022). Benchmarking atlas-level data integration in single-cell genomics. Nature Methods, 19(1), 41–50.
Luecken, M. D., & Theis, F. J. (2019). Current best practices in single-cell RNA-seq analysis: A tutorial. Molecular Systems Biology, 15(6), e8746.
Melsted, P., Booeshaghi, A. S., Gao, F., Beltrame, E., Lu, L., Hjorleifsson, K. E., Gehring, J., & Pachter, L. (2019). Modular and efficient pre-processing of single-cell RNA-seq. BioRxiv, 673285.
Nguyen, Q. H., Pervolarakis, N., Nee, K., & Kessenbrock, K. (2018). Experimental considerations for single-cell RNA sequencing approaches. Frontiers in Cell and Developmental Biology, 6, 108.
Ouyang, J. F., Kamaraj, U. S., Cao, E. Y., & Rackham, O. J. (2021). ShinyCell: Simple and sharable visualization of single-cell gene expression data. Bioinformatics, 37(19), 3374–3376.
Saelens, W., Cannoodt, R., Todorov, H., & Saeys, Y. (2019). A comparison of single-cell trajectory inference methods. Nature Biotechnology, 37(5), 547–554. https://www.nature.com/articles/s42003-022-03175-5
Sun, X., Lin, X., Li, Z., & Wu, H. (2022). A comprehensive comparison of supervised and unsupervised methods for cell type identification in single-cell RNA-seq. Briefings in Bioinformatics, 23(2), bbab567.
Svensson, V., Natarajan, K. N., Ly, L.-H., Miragaia, R. J., Labalette, C., Macaulay, I. C., Cvejic, A., & Teichmann, S. A. (2017). Power analysis of single-cell RNA-sequencing experiments. Nature Methods, 14(4), 381–387.
Tran, H. T. N., Ang, K. S., Chevrier, M., Zhang, X., Lee, N. Y. S., Goh, M., & Chen, J. (2020). A benchmark of batch-effect correction methods for single-cell RNA sequencing data. Genome Biology, 21, 1–32. https://link.springer.com/article/10.1186/s13059-019-1850-9
Wang, T., Li, B., Nelson, C. E., & Nabavi, S. (2019). Comparative analysis of differential gene expression analysis tools for single-cell RNA sequencing data. BMC Bioinformatics, 20(1), 1–16.
Xi, N. M., & Li, J. J. (2021). Benchmarking computational doublet-detection methods for single-cell RNA sequencing data. Cell Systems, 12(2), 176–194.
Xie, B., Jiang, Q., Mora, A., & Li, X. (2021). Automatic cell type identification methods for single-cell RNA sequencing. Computational and Structural Biotechnology Journal, 19, 5874–5887.
Zhang, M. J., Ntranos, V., & Tse, D. (2020). Determining sequencing depth in a single-cell RNA-seq experiment. Nature Communications, 11(1), 774.