5  ATAC-Seq

Quantifying chromatin accessibility
Published

27-Aug-2024

5.1 Overview

ATAC-seq provides a simple and scalable way to assay the regions of the genome that are bound by TFs, and to compare how these landscapes change between particular contexts or perturbations. This is accomplished using in vitro transposition of sequencing adapters into native chromatin. Each unique transposition event, termed an ‘insertion’, marks a location in the genome where a Tn5 transposase dimer is able to access DNA and perform a cut-and-paste reaction. The transposase simultaneously fragments the DNA and inserts sequence handles that are then used for amplification during library preparation.

Grandi et al. (2022), Yan et al. (2020)

Our benchmarking results highlight SnapATAC, cisTopic, and Cusanovich2018 as the top performing scATAC-seq data analysis methods to perform clustering across all datasets and different metrics. Methods that preserve information at the peak level (cisTopic, Cusanovich2018, Scasat) or bin level (SnapATAC) generally outperform those that summarize accessible chromatin regions at the motif/k-mer level (chromVAR, BROCKMAN, SCRAT) or over the gene body (Cicero, Gene Scoring). In addition, methods that implement a dimensionality reduction step (BROCKMAN, cisTopic, Cusanovich2018, Scasat, SnapATAC) generally show advantages over the other methods without this important step. SnapATAC is the most scalable method; it was the only method capable of processing more than 80,000 cells. Cusanovich2018 is the method that best balances analysis performance and running time.

Chen et al. (2019)

5.2 Feature selection

The performance of various methods for analyzing datasets with different cell structures and sizes is discussed. For simple datasets with distinct cell types, all methods are effective. For datasets with small cell classes or with hierarchical clustering and similar subtypes, SnapATAC and SnapATAC2 are preferred. SnapATAC is not memory-efficient for large datasets (over 20,000 cells); in such cases, SnapATAC2 is better. Signac outperforms ArchR, but ArchR is more memory-efficient. Adding aggregation steps to Signac does not significantly increase time or memory usage. Feature engineering choices like peak versus bins calling do not majorly affect performance, so users can choose based on preference. Recommended latent space dimensions vary by method: 10-30 for SnapATAC/SnapATAC2, 10-50 for Signac/ArchR, and even larger for aggregation methods.

De Rop et al. (2024)

5.3 Celltyping

Performance of label transfer methods on single-cell data from selected mouse and human tissues. (A) Overall metrics considering performance on all scATAC-seq cells.

Performance of label transfer methods on single-cell data from selected mouse and human tissues. (A) Overall metrics considering performance on all scATAC-seq cells.

Here, we evaluated the performance of five scATAC-seq annotation methods on both their classification accuracy and scalability using publicly available single-cell datasets from mouse and human tissues including brain, lung, kidney, PBMC, and BMMC. Using the BMMC data as basis, we further investigated the performance of these methods across different data sizes, mislabeling rates, sequencing depths and the number of cell types unique to scATAC-seq. Bridge integration, which is the only method that requires additional multimodal data and does not need gene activity calculation, was overall the best method and robust to changes in data size, mislabeling rate and sequencing depth. Conos was the most time and memory efficient method but performed the worst in terms of prediction accuracy. scJoint tended to assign cells to similar cell types and performed relatively poorly for complex datasets with deep annotations but performed better for datasets only with major label annotations. The performance of scGCN and Seurat v3 was moderate, but scGCN was the most time-consuming method and had the most similar performance to random classifiers for cell types unique to scATAC-seq.

Wang et al. (2022)

5.4 Tools

Comparison of toolkits Granja et al. (2021)

Stuart et al. (2021)

5.5 Tutorials

References

Chen, H., Lareau, C., Andreani, T., Vinyard, M. E., Garcia, S. P., Clement, K., Andrade-Navarro, M. A., Buenrostro, J. D., & Pinello, L. (2019). Assessment of computational methods for the analysis of single-cell ATAC-seq data. Genome Biology, 20(1), 1–25. https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1854-5
De Rop, F. V., Hulselmans, G., Flerin, C., Soler-Vila, P., Rafels, A., Christiaens, V., González-Blas, C. B., Marchese, D., Caratu, G., Poovathingal, S., et al. (2024). Systematic benchmarking of single-cell ATAC-sequencing protocols. Nature Biotechnology, 42(6), 916–926. https://genomebiology.biomedcentral.com/articles/10.1186/s13059-024-03356-x
Grandi, F. C., Modi, H., Kampman, L., & Corces, M. R. (2022). Chromatin accessibility profiling by ATAC-seq. Nature Protocols, 17(6), 1518–1552. https://www.nature.com/articles/s41596-022-00692-9
Granja, J. M., Corces, M. R., Pierce, S. E., Bagdatli, S. T., Choudhry, H., Chang, H. Y., & Greenleaf, W. J. (2021). ArchR is a scalable software package for integrative single-cell chromatin accessibility analysis. Nature Genetics, 53(3), 403–411. https://www.nature.com/articles/s41588-021-00790-6
Stuart, T., Srivastava, A., Madad, S., Lareau, C. A., & Satija, R. (2021). Single-cell chromatin state analysis with signac. Nature Methods, 18(11), 1333–1341. https://www.nature.com/articles/s41592-021-01282-5
Wang, Y., Sun, X., & Zhao, H. (2022). Benchmarking automated cell type annotation tools for single-cell ATAC-seq data. Frontiers in Genetics, 13, 1063233. https://www.frontiersin.org/journals/genetics/articles/10.3389/fgene.2022.1063233/full
Yan, F., Powell, D. R., Curtis, D. J., & Wong, N. C. (2020). From reads to insight: A hitchhiker’s guide to ATAC-seq data analysis. Genome Biology, 21, 1–16. https://link.springer.com/article/10.1186/s13059-020-1929-3