Abstract:
Starting with the problem of structure prediction, we leveraged machine learning to predict DNA conformation from its sequence accurately. We developed an end-to-end data-driven approach using machine learning and free energy calculations to offer a fresh perspective on this long-standing problem. Besides accurately predicting the DNA conformation, our model also explains why certain sequences adopt a particular conformation.
Transitioning from the DNA to the world of proteins, we employed unsupervised learning (called hierarchical clustering) and our algebraic fitting algorithm to study the surface curvature of protein surfaces. We later used surface curvature to assess the shape complementarity among the interacting biomolecules, intending to devise a scoring algorithm for the fast selection of binders with complimentary curvature for a particular active site.
To find out the binding mechanism at the molecular level, one needs to identify the appropriate reaction coordinate. Therefore, our next endeavour was to devised a novel approach based on regularized sparse autoencoders – an energy-based model, to predict a useful and physically intuitive set of reaction coordinates.
Although finding strong binders is the first step towards finding a drug, it is not the most crucial step since all the binders to a receptor can not be characterized as drugs, which have to satisfy certain conditions called ADME condition. Therefore, finally, we tried to address this significant problem – “what makes a molecule a putative drug ?”. We used representation learning in conjunction with modern graph neural network architectures to learn and predict crucial attributes behind the prospective drug-like activity. Overall, the goal of the studies carried out in the thesis is to find a fast selection of putative drugs.