Abstract:
In the past, several methods have been developed for predicting conformational B-cell epitopes in antigens that are not specific to any host. Our primary analysis of antibody–antigen complexes indicated a need to develop host-specific B-cell epitopes. In this study, we present a novel approach to predict conformational B-cell epitopes specific to human hosts by focusing on human antibody interacting residues in antigens. We trained, tested, and evaluated our models on 277 complexes of human antibody–antigen complexes. Initially, we employed machine learning models based on the one hot encoding sequence profile of antigens, achieving a maximum area under the receiver operating characteristic curve (AUROC) of 0.61. The performance of the model improved significantly with the AUROC increasing from 0.61 to 0.67 when evolutionary profiles were used instead of one hot encoding profile. Models developed using embeddings from fine-tuned protein language models reached an AUROC of 0.61. Additionally, models utilizing predicted surface relative solvent accessibility achieved an AUROC of 0.67. Our ensemble model, which combined relative surface accessibility with evolutionary profiles, achieved the highest precision with an AUROC of 0.72. All models in this study were trained using fivefold cross-validation on a training dataset and evaluated on an independent dataset not used for training or validation. Our method outperforms existing approaches on the independent dataset. Furthermore, we used the SHAP eXplainable AI (XAI) method to interpret the importance of elements in features contributing to the predictions made by our models. To support the scientific community, we have developed a standalone software and web server, HAIRpred, for predicting human antibody interacting residues in proteins