Abstract:
Gene regulation is fundamental to shaping morphological diversity, driven by cis-regulatory regions containing transcription factors motifs to orchestrate precise gene expression. While protein-coding genes are very well conserved in sequence, cis-regulatory regions are subject to strong sequence divergence. How enhancer sequences change while maintaining their function is not clear. Recent research has shown that chromatin accessibility depends on motif cooperativity that follows a flexible motif syntax and includes low-affinity motifs, providing a possible avenue by which motifs arise de novo and diverge over time. To test if evolutionary selection occurs at the level of chromatin accessibility, we used Drosophila trichome development as a model system. We comprehensively mapped the chromatin accessibility landscape across several Drosophila species by performing ATAC-seq on D. melanogaster , D. erecta, D. ananassae, and D. mojavensis embryos at the appropriate stage. This revealed that, despite considerable sequence divergence, the amount of chromatin accessibility in regulatory regions is highly conserved across species, consistent with evolutionary selection at this level. To precisely identify in an unbiased way which motifs and motif cooperativity rules drive the levels of chromatin accessibility in each species, we trained BPReveal deep learning models to predict bias-free accessibility profiles from DNA sequence. Interpreting these models revealed that strong sequence divergence between species is associated with a high turnover of individual motif instances across orthologous regions, as well as changes in motif affinity. Nevertheless, the type of motifs and their syntax rules are largely conserved across species, suggesting that the trans-environment of transcription factors is conserved. Consistent with this, models trained on one species perform well in predicting the ATAC-seq data from another species, with only small losses in performance with larger evolutionary distances. This suggests that cis-regulatory regions are not only subject to strong sequence divergence, but also change in the way they encode chromatin accessibility over evolutionary time. Since the chromatin accessibility levels are under strong evolutionary selection, these results suggest that cis-regulatory regions diverge rapidly because sequence changes have a relatively high probability of producing similar amounts of chromatin accessibility through an alternative sequence encoding. Taken together, our data support the hypotheses that the highly flexible sequence rules of chromatin accessibility are a facilitator of cis-regulatory sequence evolution.