Abstract:
Specific interactions between DNA and proteins mediate many crucial processes in biological systems. The basis for this specificity is only known for certain protein families such as zinc fingers and transcription activation-like effectors but remains undiscovered for a vast majority of them.
We believe that common molecular interactions mediate the specificity in all DNA-protein complexes. We found that hydrogen bonds are the predominant interactions responsible for imparting specificity to the DNA-protein interface. Therefore, we hypothesized that the similar arrangement of hydrogen bond donor and acceptor atoms is key to the specific recognition between DNA and protein molecules regardless of their overall structure. We compared the hydrogen bonds in diverse structures of DNA-protein complexes to find the common patterns for recognition. We used these data to generate probabilistic rules that dictate the specific association of DNA with proteins. By employing these rules, we developed a method to predict the amino acids that would specifically bind a given DNA sequence. We used multiple strategies to validate our method. We also attempted to verify the predictions from our method using in-vitro assays.
Apart from this, we have also established a method to predict DNA-binding residues from the structure of a protein. The method can also identify the putative DNA sequence that the protein binds. We validated our method using different benchmark datasets and outperformed the existing methods for predicting DNA-binding residues and the DNA sequence that the proteins are likely to bind.
Our findings provide previously unknown insights into the specificity determining factors in DNA -protein complexes.