maxATAC: Genome-scale Transcription-factor Binding Prediction from ATAC-seq with Deep Neural Networks


Abstract:


Transcription factors read the genome, fundamentally connecting DNA sequence to gene expression across diverse cell types. Determining how, where, and when TFs bind chromatin will advance our understanding of gene regulatory networks and cellular behavior. The 2017 ENCODE-DREAM in vivo Transcription-Factor Binding Site (TFBS) Prediction Challenge highlighted the value of chromatin accessibility data to TFBS prediction, establishing state-of-the-art methods. Yet, while Assay-for-Transposase-Accessible-Chromatin (ATAC)-seq datasets grow exponentially, suboptimal motif scanning is commonly used for TFBS prediction from ATAC-seq. Here, we present “maxATAC”, a suite of user-friendly, deep neural network models for genome-wide TFBS prediction from ATAC-seq in any cell type. With models available for 127 human TFs, maxATAC is the largest collection of state-of-the-art TFBS models to date. maxATAC performance extends to primary cells and single-cell ATAC-seq, enabling state-of-the-art TFBS prediction in vivo. We demonstrate maxATAC’s capabilities by identifying TFBS associated with allele-dependent chromatin accessibility at atopic dermatitis genetic risk loci.

Reference:

T. A Cazares, F. W. Rizvi, B. Iyer, X. Chen, M. Kotliar, J. A. Wayman, A. Bejjani, O. Donmez, B. Wronowski, S. Parameswaran, L. C. Kottyan, A. Barski, M. T. Weirauch, V. B. S. Prasath, E. R. Miraldi. maxATAC: genome-scale transcription-factor binding prediction from ATAC-seq with deep neural networks. PLOS Computational Biology, 19(1), e1010863, January 2023. doi:10.1371/journal.pcbi.1010863 

Biorxiv version doi:10.1101/2022.01.28.478235. Code, PyPI