Please use this identifier to cite or link to this item:
http://dr.iiserpune.ac.in:8080/xmlui/handle/123456789/7036
Title: | Accurate prediction of B-form/A-form DNA conformation propensity from primary sequence: A machine learning and free energy handshake |
Authors: | GUPTA, ABHIJIT Kulkarni, Mandar MUKHERJEE, ARNAB Dept. of Chemistry |
Keywords: | Machine learning Light GBM DNA sequence DNA conformation Nested cross-validation Genome 2021 |
Issue Date: | Sep-2021 |
Publisher: | Elsevier B.V. |
Citation: | Patterns, 2(9), 100329. |
Abstract: | DNA carries the genetic code of life, with different conformations associated with different biological functions. Predicting the conformation of DNA from its primary sequence, although desirable, is a challenging problem owing to the polymorphic nature of DNA. We have deployed a host of machine learning algorithms, including the popular state-of-the-art LightGBM (a gradient boosting model), for building prediction models. We used the nested cross-validation strategy to address the issues of “overfitting” and selection bias. This simultaneously provides an unbiased estimate of the generalization performance of a machine learning algorithm and allows us to tune the hyperparameters optimally. Furthermore, we built a secondary model based on SHAP (SHapley Additive exPlanations) that offers crucial insight into model interpretability. Our detailed model-building strategy and robust statistical validation protocols tackle the formidable challenge of working on small datasets, which is often the case in biological and medical data. |
URI: | https://doi.org/10.1016/j.patter.2021.100329 http://dr.iiserpune.ac.in:8080/xmlui/handle/123456789/7036 |
ISSN: | 2666-3899 |
Appears in Collections: | JOURNAL ARTICLES |
Files in This Item:
There are no files associated with this item.
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.