Developing a Biophysically Grounded Deep Learning Model for Gene Expression Prediction

SHIVHARE, PRARABDH

Please use this identifier to cite or link to this item: http://dr.iiserpune.ac.in:8080/xmlui/handle/123456789/10101

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Martinez-Corral, Rosa	-
dc.contributor.author	SHIVHARE, PRARABDH	-
dc.date.accessioned	2025-05-27T10:38:28Z	-
dc.date.available	2025-05-27T10:38:28Z	-
dc.date.issued	2025-05	-
dc.identifier.citation	58	en_US
dc.identifier.uri	http://dr.iiserpune.ac.in:8080/xmlui/handle/123456789/10101	-
dc.description.abstract	In the past 2 decades of literature dealing with modeling complex systems, there has been a balance, or rather, a tension between the predictive power and the interpretability of machine learning models using vast amounts of data. Biological complex systems are no different. The past decade has seen an astonishing increase in the amount of publicly available functional genomics data. While the adoption of deep learning techniques to determine the sequence patterns, syntax and grammar in DNA sequence elements that govern gene regulatory activity has been a natural consequence, most of these investigations have adopted a ‘black box’ approach, with model predictions that are hard to interpret mechanistically. Multiple attribution strategies, which seek to extract meaningful post-hoc interpretations from neural networks have been proposed for addressing this problem. However, there remains a substantial gap in the literature between the outputs of such post-hoc methods and fully mechanistic models, specifically in the context of gene regulation. This problem can be at least partially overcome by including some level of mechanistic detail in the internal structure of deep learning algorithms. This can enable us to better understand the predictions of the model to obtain mechanistic insight. Here, we use a cell-state specific Massively Paraellel Reporter Assay dataset from hemotopoeitic stem cells to model gene regulation using deep learning to predict transcription factor (TF) binding on DNA sequence employing cell-state specific Chip-Seq data and graph-based representations of markov processes to model effects of bound TFs on different rate-limiting steps in the transcriptional cycle. Our model assumptions are grounded in recent biophysical findings in literature.	en_US
dc.description.sponsorship	Dr. Rosa Martinez-Corral, Dr. Lars Velten	en_US
dc.language.iso	en	en_US
dc.subject	Deep Learning	en_US
dc.subject	Biophysics	en_US
dc.title	Developing a Biophysically Grounded Deep Learning Model for Gene Expression Prediction	en_US
dc.type	Thesis	en_US
dc.description.embargo	One Year	en_US
dc.type.degree	BS-MS	en_US
dc.contributor.department	Dept. of Data Science	en_US
dc.contributor.registration	20201147	en_US
Appears in Collections:	MS THESES

Files in This Item:

File	Description	Size	Format
20201147_Prarabdh_Shivhare_MS_Thesis.pdf	MS Thesis	12.73 MB	Adobe PDF	View/Open Request a copy

Show simple item record