5 ATAC-Seq – SeqHub

5.1 Overview

5.1.1 Bulk ATAC-Seq

ATAC-seq provides a simple and scalable way to assay the regions of the genome that are bound by TFs, and to compare how these landscapes change between particular contexts or perturbations. This is accomplished using in vitro transposition of sequencing adapters into native chromatin. Each unique transposition event, termed an ‘insertion’, marks a location in the genome where a Tn5 transposase dimer is able to access DNA and perform a cut-and-paste reaction. The transposase simultaneously fragments the DNA and inserts sequence handles that are then used for amplification during library preparation.

Grandi et al. (2022), Yan et al. (2020)

5.1.2 Single-cell ATAC-Seq

ATAC-Seq quantifies DNA and can be applied to frozen/fixed tisues where nuclei can be isolated

Sample-level quality
- Cell viability must exceed 80%
- Accurate assessment of cell number
Library-level quality
- DNA fragment distribution (multiples of 200bp)

Counts QC
- Unique nuclear fragments (>1000)
- Fraction of transposition events in the peaks (>0.3)
- Transcription start sites (TSS) enrichment score (>5)
- Ratio of mononucleosomal to nucleosome-free fragments
- Apply filters separately for each sample
Generate features (peaks/target regions)
- Various methods exist
Dimensionality reduction
Cells with similar accessibility profiles are organized into clusters
There are two approaches to cell identity annotation: the cell type-specific peaks-based method and the scRNA-seq-based method
- Enhancers can be used to accurately annotate cell types as distal cis-regulatory elements specific to particular cell types and states.
- Cell type-specific gene expression is predicted based on their accessibility and used to annotate cells

Epigenomic profiling from scATAC-seq data.

Cell type-specific chromatin architecture
- Profiling the regulatory elements for each cluster/cell type
- Identifying differentially accessible regions between different clusters/cell types
- Uncovering key factors that contribute to the altered chromatin accessibility
- Linking promoter–enhancer interactions
Three main strategies are used to identify TFs of interest
- Searching for overrepresented motifs in cell type-specific accessible regions
- Comparing motif activity between cell types
- Detecting foot-printing for TF occupancy

Shi et al. (2022)

Single-cell ATAC sequencing analysis: From data preprocessing to hypothesis generation

Baek & Lee (2020)

5.2 Feature selection

The performance of various methods for analyzing datasets with different cell structures and sizes is discussed. For simple datasets with distinct cell types, all methods are effective. For datasets with small cell classes or with hierarchical clustering and similar subtypes, SnapATAC and SnapATAC2 are preferred. SnapATAC is not memory-efficient for large datasets (over 20,000 cells); in such cases, SnapATAC2 is better. Signac outperforms ArchR, but ArchR is more memory-efficient. Adding aggregation steps to Signac does not significantly increase time or memory usage. Feature engineering choices like peak versus bins calling do not majorly affect performance, so users can choose based on preference. Recommended latent space dimensions vary by method: 10-30 for SnapATAC/SnapATAC2, 10-50 for Signac/ArchR, and even larger for aggregation methods.

De Rop et al. (2024)

5.3 Celltyping

Performance of label transfer methods on single-cell data from selected mouse and human tissues. (A) Overall metrics considering performance on all scATAC-seq cells.

Here, we evaluated the performance of five scATAC-seq annotation methods on both their classification accuracy and scalability using publicly available single-cell datasets from mouse and human tissues including brain, lung, kidney, PBMC, and BMMC. Using the BMMC data as basis, we further investigated the performance of these methods across different data sizes, mislabeling rates, sequencing depths and the number of cell types unique to scATAC-seq. Bridge integration, which is the only method that requires additional multimodal data and does not need gene activity calculation, was overall the best method and robust to changes in data size, mislabeling rate and sequencing depth. Conos was the most time and memory efficient method but performed the worst in terms of prediction accuracy. scJoint tended to assign cells to similar cell types and performed relatively poorly for complex datasets with deep annotations but performed better for datasets only with major label annotations. The performance of scGCN and Seurat v3 was moderate, but scGCN was the most time-consuming method and had the most similar performance to random classifiers for cell types unique to scATAC-seq.

Wang et al. (2022)

5.4 Benchmarking tools

Our benchmarking results highlight SnapATAC, cisTopic, and Cusanovich2018 as the top performing scATAC-seq data analysis methods to perform clustering across all datasets and different metrics. Methods that preserve information at the peak level (cisTopic, Cusanovich2018, Scasat) or bin level (SnapATAC) generally outperform those that summarize accessible chromatin regions at the motif/k-mer level (chromVAR, BROCKMAN, SCRAT) or over the gene body (Cicero, Gene Scoring). In addition, methods that implement a dimensionality reduction step (BROCKMAN, cisTopic, Cusanovich2018, Scasat, SnapATAC) generally show advantages over the other methods without this important step. SnapATAC is the most scalable method; it was the only method capable of processing more than 80,000 cells. Cusanovich2018 is the method that best balances analysis performance and running time.

Chen et al. (2019)

Overall, feature aggregation, SnapATAC, and SnapATAC2 outperform latent semantic indexing-based methods. For datasets with complex cell-type structures, SnapATAC and SnapATAC2 are preferred. With large datasets, SnapATAC2 and ArchR are most scalable.

Luo et al. (2024)

5.5 Tools

Granja et al. (2021)

Signac (R) Stuart et al. (2021)
ArchR (R) Granja et al. (2021)
SnapATAC (R) Fang et al. (2021)
pycisTopic (Python)
Scasat (Bash,Python)

5.6 Tutorials

References

Baek, S., & Lee, I. (2020). Single-cell ATAC sequencing analysis: From data preprocessing to hypothesis generation. Computational and Structural Biotechnology Journal, 18, 1429–1439. https://www.sciencedirect.com/science/article/pii/S2001037020303019

Chen, H., Lareau, C., Andreani, T., Vinyard, M. E., Garcia, S. P., Clement, K., Andrade-Navarro, M. A., Buenrostro, J. D., & Pinello, L. (2019). Assessment of computational methods for the analysis of single-cell ATAC-seq data. Genome Biology, 20(1), 1–25. https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1854-5

De Rop, F. V., Hulselmans, G., Flerin, C., Soler-Vila, P., Rafels, A., Christiaens, V., González-Blas, C. B., Marchese, D., Caratu, G., Poovathingal, S., et al. (2024). Systematic benchmarking of single-cell ATAC-sequencing protocols. Nature Biotechnology, 42(6), 916–926. https://genomebiology.biomedcentral.com/articles/10.1186/s13059-024-03356-x

Fang, R., Preissl, S., Li, Y., Hou, X., Lucero, J., Wang, X., Motamedi, A., Shiau, A. K., Zhou, X., Xie, F., et al. (2021). Comprehensive analysis of single cell ATAC-seq data with SnapATAC. Nature Communications, 12(1), 1337. https://www.nature.com/articles/s41467-021-21583-9

Grandi, F. C., Modi, H., Kampman, L., & Corces, M. R. (2022). Chromatin accessibility profiling by ATAC-seq. Nature Protocols, 17(6), 1518–1552. https://www.nature.com/articles/s41596-022-00692-9

Granja, J. M., Corces, M. R., Pierce, S. E., Bagdatli, S. T., Choudhry, H., Chang, H. Y., & Greenleaf, W. J. (2021). ArchR is a scalable software package for integrative single-cell chromatin accessibility analysis. Nature Genetics, 53(3), 403–411. https://www.nature.com/articles/s41588-021-00790-6

Luo, S., Germain, P.-L., Robinson, M. D., & Meyenn, F. von. (2024). Benchmarking computational methods for single-cell chromatin data analysis. Genome Biology, 25(1), 225. https://link.springer.com/article/10.1186/s13059-024-03356-x

Shi, P., Nie, Y., Yang, J., Zhang, W., Tang, Z., & Xu, J. (2022). Fundamental and practical approaches for single-cell ATAC-seq analysis. Abiotech, 3(3), 212–223. https://link.springer.com/article/10.1007/s42994-022-00082-5

Stuart, T., Srivastava, A., Madad, S., Lareau, C. A., & Satija, R. (2021). Single-cell chromatin state analysis with signac. Nature Methods, 18(11), 1333–1341. https://www.nature.com/articles/s41592-021-01282-5

Wang, Y., Sun, X., & Zhao, H. (2022). Benchmarking automated cell type annotation tools for single-cell ATAC-seq data. Frontiers in Genetics, 13, 1063233. https://www.frontiersin.org/journals/genetics/articles/10.3389/fgene.2022.1063233/full

Yan, F., Powell, D. R., Curtis, D. J., & Wong, N. C. (2020). From reads to insight: A hitchhiker’s guide to ATAC-seq data analysis. Genome Biology, 21, 1–16. https://link.springer.com/article/10.1186/s13059-020-1929-3