Please use this identifier to cite or link to this item: http://dr.iiserpune.ac.in:8080/xmlui/handle/123456789/7036
Title: Accurate prediction of B-form/A-form DNA conformation propensity from primary sequence: A machine learning and free energy handshake
Authors: GUPTA, ABHIJIT
Kulkarni, Mandar
MUKHERJEE, ARNAB
Dept. of Chemistry
Keywords: Machine learning
Light GBM
DNA sequence
DNA conformation
Nested cross-validation
Genome
2021
Issue Date: Sep-2021
Publisher: Elsevier B.V.
Citation: Patterns, 2(9), 100329.
Abstract: DNA carries the genetic code of life, with different conformations associated with different biological functions. Predicting the conformation of DNA from its primary sequence, although desirable, is a challenging problem owing to the polymorphic nature of DNA. We have deployed a host of machine learning algorithms, including the popular state-of-the-art LightGBM (a gradient boosting model), for building prediction models. We used the nested cross-validation strategy to address the issues of “overfitting” and selection bias. This simultaneously provides an unbiased estimate of the generalization performance of a machine learning algorithm and allows us to tune the hyperparameters optimally. Furthermore, we built a secondary model based on SHAP (SHapley Additive exPlanations) that offers crucial insight into model interpretability. Our detailed model-building strategy and robust statistical validation protocols tackle the formidable challenge of working on small datasets, which is often the case in biological and medical data.
URI: https://doi.org/10.1016/j.patter.2021.100329
http://dr.iiserpune.ac.in:8080/xmlui/handle/123456789/7036
ISSN: 2666-3899
Appears in Collections:JOURNAL ARTICLES

Files in This Item:
There are no files associated with this item.


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.