Analysis of local neighbourhoods in proteins

S, MUKUNDAN

DR Home
→
THESES & PROJECT REPORTS
→
PhD THESES
→
View Item

dc.contributor.advisor	MADHUSUDHAN, M. S.
dc.contributor.author	S, MUKUNDAN
dc.date.accessioned	2025-07-23T06:17:18Z
dc.date.available	2025-07-23T06:17:18Z
dc.date.issued	2025-07
dc.identifier.citation	163	en_US
dc.identifier.uri	http://dr.iiserpune.ac.in:8080/xmlui/handle/123456789/10317
dc.description.abstract	Proteins, through their activity and interactions, are the predominant class of molecules enabling the functioning of the living cell. They are linear polymers of amino acids that fold based on covalent and non-covalent interactions (Anfinsen 1973). The amino acids in proteins form local structural and chemical neighbourhoods responsible for stabilising the proteins and the myriad of functions they execute. Analysing these local neighbourhoods should help us understand various aspects of proteins such as their stability, functionality, and structural similarity. To this end, we have formulated an approach to arrange the three-dimensional data that describes these neighbourhoods as a tree database. This database clusters and superimposes the local neighbourhoods from a database of protein structures based only on their spatial similarity. Moreover, the data is indexed in the process, i.e., arranged to make for efficient search, even for large databases. We make use of this tree to create a topology-independent (sequence and sequence- order independent) structural database search program that is methodologically analogous to BLAST (Altschul et al. 1990). Our method mines structural data in a superior manner than state-of-the-art methods such as FoldSeek (van Kempen et al. 2024) or TM-Align (Zhang and Skolnick 2005). We demonstrate this by an example query consisting of pockets of discontiguous small peptide fragments involved in binding phosphoinositide. We retrieve multiple high-quality structural matches using such queries, while FoldSeek and TM-align return zero matches. We can achieve this because our method only evaluates the similarity in the spatial arrangement of atoms. One of the applications of the tree database described above is to engineer proteins where we attempt to engineer new functions into existing proteins. We aim to achieve this by creating a program that transfers the local chemical and structural neighbourhoods responsible for the function via predicting mutations. Our program first checks if the structural motifs required for the function exist in the query protein. It then tries to reproduce the chemical environment (interaction and catalytic residues) required for the function. Using the green fluorescence function from Green fluorescent protein (Shimomura, Johnson, and Saiga 1962) as our function of interest, we searched the 200199 structures from the Protein Data Bank (Berman et al. 2002) and found 12573 hits. Each of these outputs consists of a mutational library, where a subset of the mutation would make the protein fluorescent. We filtered and selected three out of the 12573 hits for experiments based on size, number of mutations and MD simulations. The experiments on two of the tree hits show no fluorescence; the experiments on the third protein are underway. During the process of protein engineering, we have also developed a dynamic programming approach to predict the optimal set of overhangs for efficient and accurate DNA assembly by the Golden Gate approach (Engler, Kandzia, and Marillonnet 2008; Pryor et al. 2020). The Golden Gate approach assembles DNA fragments in an order specified by four base pair overhangs. However, all overhangs are not equally efficient nor accurate (fidelity) (Potapov et al. 2018). Thus, predicting high efficiency and high fidelity overhangs (fragmentation sites) in a DNA sequence is crucial for creating large mutational libraries. Our method simultaneously optimises number of fragments and the fragmentation sites for with that for both efficiency and fidelity of assembly. We then benchmarked our method against NEB SplitSet (Pryor et al. 2022), a popular method to predict overhangs which only optimises fidelity. Our method outperforms SplitSet when only optimising fidelity. We match SplitSet on fidelity and outperforms SplitSet on efficiency when optimising both fidelity and efficiency. In another study, we analysed the physical chemistry of biomolecular association reactions in situations where reactants in the system share such local environments involved in the interactions. We specifically explored cases where one such reactant has comparatively weaker binding strength. The strength of molecular interactions is characterised by their dissociation constants (KD). Only high-affinity interactions (KD ≤ 1e−8 M) are extensively investigated and support binary on/off switches. However, such analyses have discounted the presence of low-affinity binders (KD > 1e−5 M) in the cellular environment that might share similar local environments with a high-affinity interactor. We assess the potential influence of such low-affinity binders on high-affinity interactions. By employing Gillespie stochastic simulations and continuous methods, we demonstrate that the presence of low-affinity binders can alter the kinetics and the steady state of high-affinity interactions. We refer to this effect as ‘herd regulation’ and have evaluated its possible impact in two different contexts, including sex determination in Drosophila melanogaster and modelling signalling systems that employ molecular thresholds. We have also suggested experiments to validate herd regulation in vitro. We speculate that low-affinity binders are prevalent in biological contexts where the outcomes depend on molecular thresholds impacting homoeostatic regulation.	en_US
dc.language.iso	en	en_US
dc.subject	Biophysics	en_US
dc.subject	Computational structural biology	en_US
dc.subject	systems biology	en_US
dc.subject	protein engeneering	en_US
dc.subject	Golden Gate assembly	en_US
dc.title	Analysis of local neighbourhoods in proteins	en_US
dc.type	Thesis	en_US
dc.description.embargo	1 Year	en_US
dc.type.degree	Ph.D	en_US
dc.contributor.department	Dept. of Biology	en_US
dc.contributor.registration	20193650	en_US