Digital Repository

Accurate prediction of B-form/A-form DNA conformation propensity from primary sequence: A machine learning and free energy handshake

Show simple item record

dc.contributor.author GUPTA, ABHIJIT en_US
dc.contributor.author Kulkarni, Mandar en_US
dc.contributor.author MUKHERJEE, ARNAB en_US
dc.date.accessioned 2022-06-13T04:29:20Z
dc.date.available 2022-06-13T04:29:20Z
dc.date.issued 2021-09 en_US
dc.identifier.citation Patterns, 2(9), 100329. en_US
dc.identifier.issn 2666-3899 en_US
dc.identifier.uri https://doi.org/10.1016/j.patter.2021.100329 en_US
dc.identifier.uri http://dr.iiserpune.ac.in:8080/xmlui/handle/123456789/7036
dc.description.abstract DNA carries the genetic code of life, with different conformations associated with different biological functions. Predicting the conformation of DNA from its primary sequence, although desirable, is a challenging problem owing to the polymorphic nature of DNA. We have deployed a host of machine learning algorithms, including the popular state-of-the-art LightGBM (a gradient boosting model), for building prediction models. We used the nested cross-validation strategy to address the issues of “overfitting” and selection bias. This simultaneously provides an unbiased estimate of the generalization performance of a machine learning algorithm and allows us to tune the hyperparameters optimally. Furthermore, we built a secondary model based on SHAP (SHapley Additive exPlanations) that offers crucial insight into model interpretability. Our detailed model-building strategy and robust statistical validation protocols tackle the formidable challenge of working on small datasets, which is often the case in biological and medical data. en_US
dc.language.iso en en_US
dc.publisher Elsevier B.V. en_US
dc.subject Machine learning en_US
dc.subject Light GBM en_US
dc.subject DNA sequence en_US
dc.subject DNA conformation en_US
dc.subject Nested cross-validation en_US
dc.subject Genome en_US
dc.subject 2021
dc.title Accurate prediction of B-form/A-form DNA conformation propensity from primary sequence: A machine learning and free energy handshake en_US
dc.type Article en_US
dc.contributor.department Dept. of Chemistry en_US
dc.identifier.sourcetitle Patterns en_US
dc.publication.originofpublisher Foreign en_US


Files in this item

Files Size Format View

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record

Search Repository


Advanced Search

Browse

My Account