Abstract:
While the majority of proteins adhere to the traditional one-sequence/one-structure model, an increasing number of proteins have been found to possess dual structures and functions. These "fold-switching" or "metamorphic" proteins can undergo significant structural changes under native conditions or in response to specific cellular stimuli. This ability to "switch folds" allows proteins to perform multiple functions, enabling cells to tightly regulate various biochemical processes. Despite their importance in health and disease, not much work has been performed that could assign a fold-switching protein from sequence information. Here, we present a fragment-based approach to build a classifier that can predict metamorphic behaviour in protein sequences with an average accuracy of 84.7% and a Matthew's correlation coefficient (MCC) of 0.698. To develop our classification algorithm, we use the structural data from the experimentally solved structures in the protein data bank (PDB) and AlphaFold Protein Structure Database, which is a repository of computationally solved structures. Our classifier works by analysing the diversity of structures the fragments within a query protein occupy. The algorithm takes in a query protein sequence as input, fragments the sequence using a sliding window, performs a fragment search across the database and gives a binary output prediction using a support vector machine (SVM) to perform the classification. We employed our algorithm on 57 different proteomes consisting of a total of 601,218 proteins (one sequence per gene). We identified about 10% of these proteins have the ability to fold switch, significantly expanding the known metamorphic reservoir of proteins. Potential candidates with high confidence scores are shortlisted for further experimental evaluations. Additionally, we have built a web server for Morpheus in the event that the user's protein of interest is absent from the data published (http://mbu.iisc.ac.in/~anand/morpheus). Our work is the first implementation of a proteome-level predictor of metamorphic proteins, and it helps in the selection of potential candidate protein sequences to evaluate their metamorphic behaviour through further experimentation.