Please use this identifier to cite or link to this item:
http://dr.iiserpune.ac.in:8080/xmlui/handle/123456789/7036
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.author | GUPTA, ABHIJIT | en_US |
dc.contributor.author | Kulkarni, Mandar | en_US |
dc.contributor.author | MUKHERJEE, ARNAB | en_US |
dc.date.accessioned | 2022-06-13T04:29:20Z | |
dc.date.available | 2022-06-13T04:29:20Z | |
dc.date.issued | 2021-09 | en_US |
dc.identifier.citation | Patterns, 2(9), 100329. | en_US |
dc.identifier.issn | 2666-3899 | en_US |
dc.identifier.uri | https://doi.org/10.1016/j.patter.2021.100329 | en_US |
dc.identifier.uri | http://dr.iiserpune.ac.in:8080/xmlui/handle/123456789/7036 | |
dc.description.abstract | DNA carries the genetic code of life, with different conformations associated with different biological functions. Predicting the conformation of DNA from its primary sequence, although desirable, is a challenging problem owing to the polymorphic nature of DNA. We have deployed a host of machine learning algorithms, including the popular state-of-the-art LightGBM (a gradient boosting model), for building prediction models. We used the nested cross-validation strategy to address the issues of “overfitting” and selection bias. This simultaneously provides an unbiased estimate of the generalization performance of a machine learning algorithm and allows us to tune the hyperparameters optimally. Furthermore, we built a secondary model based on SHAP (SHapley Additive exPlanations) that offers crucial insight into model interpretability. Our detailed model-building strategy and robust statistical validation protocols tackle the formidable challenge of working on small datasets, which is often the case in biological and medical data. | en_US |
dc.language.iso | en | en_US |
dc.publisher | Elsevier B.V. | en_US |
dc.subject | Machine learning | en_US |
dc.subject | Light GBM | en_US |
dc.subject | DNA sequence | en_US |
dc.subject | DNA conformation | en_US |
dc.subject | Nested cross-validation | en_US |
dc.subject | Genome | en_US |
dc.subject | 2021 | |
dc.title | Accurate prediction of B-form/A-form DNA conformation propensity from primary sequence: A machine learning and free energy handshake | en_US |
dc.type | Article | en_US |
dc.contributor.department | Dept. of Chemistry | en_US |
dc.identifier.sourcetitle | Patterns | en_US |
dc.publication.originofpublisher | Foreign | en_US |
Appears in Collections: | JOURNAL ARTICLES |
Files in This Item:
There are no files associated with this item.
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.