Abstract:
Antibody-Antigen interactions are fundamental in immunological research, influencing vaccines, diagnostics and therapeutic developments. However, existing computation models for predicting the antibody-interacting residues (epitopes) suffer from significant limitations, especially the lack of species specificity. These models are trained on data from diverse species and hence fail to account for the interspecies variation in antibody-antigen interactions. To address these limitations, this study presents a novel framework for species-specific B-Cell epitope prediction. We first establish the existence of interspecies differences in the interactions by comparing the amino acid composition of human-specific and mouse-specific epitopes, highlighting the need for species-specific models. Next, we develop HAIRpred (Human Antibody Interacting Residue predictor), a second-generation human-specific ensemble model based on random forests. HAIRpred integrates information from position-specific scoring matrices (PSSM) and relative solvent accessibility (RSA), which were identified as key features determining antibody-antigen interaction. The model achieves the area under the receiver operating characteristic curve (AUROC) score of 0.72 on the independent test dataset which surpasses the performance of the current state-of-the-art models. Additionally, we use the SHAP explainable AI (XAI) approach to analyse the contribution of residue position in the pattern in the eventual prediction. Furthermore, we recognize the need for mouse-specific prediction tools and we extend our methodology to develop a mouse-specific predictor. However, data scarcity is a significant challenge with only 290 available antibody-antigen complexes forming 94 clusters at 70% sequence identity threshold. To address this, we implement a cluster-based data splitting strategy, preventing overestimation of model performance. Our findings highlight the need for species-specific models and set a new standard for epitope prediction tools. By addressing the limitations of previous studies, we anticipate that these models will be a valuable resource in immunological research.