Abstract:
The 20 naturally occurring amino acids have different environmental preferences of where they are likely to occur in protein structures. Environments in a protein can be classified by their proximity to solvent by the residue depth measure. Since the frequencies of amino acids are different at various depth levels, the substitution frequencies should vary according to depth. To quantify these substitution frequencies, we built depth dependent substitution matrices. The dataset used for creation of the matrices consisted of 3696 high quality, non redundant pairwise protein structural alignments. One of the applications of these matrices is to predict the tolerance of mutations in different protein environments. Using these substitution scores the prediction of deleterious mutations was done on 3500 mutations in T4 lysozyme and CcdB. The accuracy of the technique in terms of the Matthews Correlation Coefficient (MCC) is 0.48 on the CcdB testing set, while the best of the other tested methods has an MCC of 0.40. Further developments in these substitution matrices could help in improving structure-sequence alignment for protein 3D structure modeling.