Digital Repository

Quantifying Secondary-Structure Binding Preferences of RNA-Binding Proteins via Interrogation of Deep Neural Networks

Show simple item record

dc.contributor.advisor Marsico, Annalisa en_US
dc.contributor.author LONDHE, SHUBHANKAR en_US
dc.date.accessioned 2022-05-12T05:46:27Z
dc.date.available 2022-05-12T05:46:27Z
dc.date.issued 2022-05
dc.identifier.citation 66 en_US
dc.identifier.uri http://dr.iiserpune.ac.in:8080/xmlui/handle/123456789/6863
dc.description.abstract The pursuit of understanding the complex interplay between the biological sequences - DNA, RNA and proteins - has driven biological research for many years. The interaction between RNA and RNA-binding proteins (RBPs) is integral to RNA function and cellular regulation. At least 1,500 of the over 20,000 annotated human proteins are predicted to bind RNA. RBPs usually recognize RNA targets through a common local preference or sequence - referred to as a ’motifs’ - which facilitate RNA-protein interactions. Recent experimental high-throughput approaches such as iCLIP, PAR-CLIP or eCLIP enable one to profile binding sites of a given RBP transcriptome-wide, thus providing insight into the RBPs binding preferences. While an abundance of studies has explored the sequence binding motifs of RBPs via deep learning methods, less is known about the RNA secondary structure preferences of RBPs. This project aims to explore the secondary structure binding preferences for a large set of RBPs using CLIP-seq datasets from publicly available ENCODE database. We develop a deep learning model which incorporates sequence and structure information to outperform the sequence-only baseline. Finally, we use model interpretation and feature attribution techniques to quantify the relative importance of sequence and secondary structure information for each RBP, thus identifying the primary binding modes of different RBPs. Through our explorations, we realised that in vivo RNA structure data has low coverage on the transcriptome, limiting the amount of information we can extract about structural binding motifs. Finally, we investigate how the choice of negative samples can impact downstream model performance. We find that computationally determined RNA secondary structure provides no new information to the model and that experimentally derived RNA secondary structure improves performance of models under certain negative sampling conditions. en_US
dc.language.iso en en_US
dc.subject Bioinformatics en_US
dc.subject RNA-protein binding en_US
dc.subject Computational RNA biology en_US
dc.subject Machine Learning en_US
dc.subject Explainable AI en_US
dc.title Quantifying Secondary-Structure Binding Preferences of RNA-Binding Proteins via Interrogation of Deep Neural Networks en_US
dc.type Thesis en_US
dc.type.degree BS-MS en_US
dc.contributor.department Dept. of Biology en_US
dc.contributor.registration 20171154 en_US


Files in this item

This item appears in the following Collection(s)

  • MS THESES [1705]
    Thesis submitted to IISER Pune in partial fulfilment of the requirements for the BS-MS Dual Degree Programme/MSc. Programme/MS-Exit Programme

Show simple item record

Search Repository


Advanced Search

Browse

My Account