Proteasomal cleavage site prediction of protein antigen using BP neural network based on a new set of amino acid descriptor

Wang, Yuanqiang; Lin, Yong; Shu, Mao; Wang, Rui; Hu, Yong; Lin, Zhihua

doi:10.1007/s00894-013-1827-7

Proteasomal cleavage site prediction of protein antigen using BP neural network based on a new set of amino acid descriptor

Original Paper
Published: 13 April 2013

Volume 19, pages 3045–3052, (2013)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Journal of Molecular Modeling Aims and scope Submit manuscript

Proteasomal cleavage site prediction of protein antigen using BP neural network based on a new set of amino acid descriptor

Download PDF

Yuanqiang Wang^1,2,
Yong Lin¹,
Mao Shu¹,
Rui Wang¹,
Yong Hu¹ &
…
Zhihua Lin^1,3

1677 Accesses
4 Citations
Explore all metrics

Abstract

The accurate identification of cytotoxic T lymphocyte epitopes is becoming increasingly important in peptide vaccine design. The ubiquitin–proteasome system plays a key role in processing and presenting major histocompatibility complex class I restricted epitopes by degrading the antigenic protein. To enhance the specificity and efficiency of epitope prediction and identification, the recognition mode between the ubiquitin–proteasome complex and the protein antigen must be considered. Hence, a model that accurately predicts proteasomal cleavage must be established. This study proposes a new set of parameters to characterize the cleavage window and uses a backpropagation neural network algorithm to build a model that accurately predicts proteasomal cleavage. The accuracy of the prediction model, which depends on the window sizes of the cleavage, reaches 95.454 % for the N-terminus and 95.011 % for the C-terminus. The results show that the identification of proteasomal cleavage sites depends on the sequence next to it and that the prediction performance of the C-terminus is better than that of the N-terminus on average. Thus, models based on the properties of amino acids can be highly reliable and reflect the structural features of interactions between proteasomes and peptide sequences.

Identification of CD8⁺ T cell epitopes through proteasome cleavage site predictions

Article Open access 14 December 2020

NetCleave: an open-source algorithm for predicting C-terminal antigen processing for MHC-I and MHC-II

Article Open access 23 June 2021

DRREP: deep ridge regressed epitope predictor

Article Open access 03 October 2017

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

With the development of tumor immunology, several studies have demonstrated that cytotoxic T lymphocytes (CTLs) play an important role in antitumor immunity. Given the tumor-specific killing ability of CTLs, they are used to treat malignant tumors [1]. As an integrity antigen, the peptide vaccine for CTL epitope poses no risk for infection and can be massively synthesized and oriented for antitumor and antivirus activities. Thus, the accurate identification of CTL epitopes has become increasingly important, and many studies are focused on developing advanced methods for CTL epitope screening [2–4].

The prediction of proteasomal cleavage sites and peptides transported by transporter associated protein (TAP) has not yet been considered. Thus, the current methods for CTL epitope prediction are not perfect, and most of them are constructed based only on the affinity between peptides of CTL epitopes and major histocompatibility complex (MHC) class I (MHC-I) molecules. Accordingly, an effective CTL epitope must undergo the following translocation mechanism in living cells. First, the CTL epitope is produced by proteasomes from the protein antigen and transported by TAP to the endoplasmic reticulum to assemble with the MHC-I complexes. Second, the product is excreted by the Golgi apparatus to the surface of the cell membrane and made available for interaction with T cell receptors. In the CTL response, these intermolecular recognition processes are connected like a chain, and any errors in them invalidate the whole response. Several studies have also found that some peptides with high binding affinities for MHC-I molecules do not elicit a strong CTL response [5]. Therefore, predicting the binding affinity between peptides and MHC-I molecules is insufficient; a prediction method that combines the prediction of proteasomal cleavage sites, TAP transport peptides, and binding affinity of epitope peptides must be established to improve the accuracy of epitope prediction and identification.

Proteasomes are intracellular multi-subunit proteases that catalyze selective proteolytic protein processing within various cell signal-transduction pathways, such as cell cycle control, transcriptional regulation, and antigen presentation [6–8]. Proteasomes exist in several forms. For example, the proteolytically active core complexes or 20S proteasomes, association with ATP-dependent 19S cap complexes yields the larger 26S proteasomes, which can recognize proteins and are marked by ubiquitin for proteasomal degradation [9, 10]. Eukaryotic eukaryotic 20S proteasomes consist of four stacked rings (overall stoichiometry α₇β₇β₇α₇) each consisting of seven different subunits [11]. Each of the two inner β-rings carries three catalytically active sites on its inner surface, and proteolytic specificities of active site have been described as chymotrypsin like [cleaving after large, hydrophobic amino acids (AAs)], trypsin like (cleaving after basic AAs), and peptidyl-glutamyl-peptide hydrolyzing (cleaving after acidic AAs) [12]. Proteasomes cleave ubiquitin-protein conjugates into smaller peptides and generate the correct C-terminus of MHC-I-binding peptides [13, 14].

Several computational approaches for elucidating the cleavage specificity of proteasomes have been presented. These approaches are all based on experimentally verified cleavage sites within protein substrates and the analysis of the flanking region of such sites. EpiJen is an integrated approach for predicting proteasomal cleavage sites, transporting TAP, and binding MHC-I based on quantitative matrices [15]. Pierre et al. [16] used a new matrix-based method to establish a proteasomal cleavage site prediction method based on experimentally verified proteasomal cleavage sites. Holzhutter et al. [17] used a statistical method based on the MAPPP prediction server to analyze cleavage sites found in a set of peptide substrates with lengths ranging from 22 AAs to 30 AAs. PAProC [18, 19] is a method based on the proteasomal degradation of enolase protein. Up to 10 flanking AAs around the verified cleavage sites are used by an evolutionary algorithm to create the network-based model used for prediction. Another method for proteasomal cleavage prediction is NetChop, which uses a neural network for prediction [20]. Two different sets of training data are used by NetChop: verified cleavage sites within proteins and naturally processed MHC ligands. Altuvia and Margalit [21] used MHC-I ligands to study proteasomal cleavage.

This study proposes a new set of parameters to characterize the structure of AAs and uses a backpropagation (BP) neural network (BP-NN) to build a model for accurately predicting proteasomal cleavage. The accuracy of the prediction model reaches 95.454 % for the N-terminus and 95.011 % for the C-terminus, which depends on the cleavage window sizes. The results show that the proteasomal cleavage site identification depends on the sequence next to it and that the predictive performance of the C-terminus is better than that of the N-terminus on average.

Materials and methods

Experimental data

A total of 175 epitopes associated with HLA-A*0201 molecules with 8 AAs to 11 AAs were collected from the AntiJen database and used as the positive samples of proteasomal cleavage. The negative samples were constructed according to literature [20]. Briefly, pseudo-epitopes with a N- or C-termini at the middle of the epitopes are randomly produced, which assume the presence of non-cleavage sites within the epitopes. The cleavage sites that produce the epitopes and pseudo-epitopes are defined as positive (+1) and negative (−1) samples, respectively. These samples are included in the appendix.

Structural descriptors and characterization

Three properties of AAs, including residue volume [22], electron-ion interaction potential values [23], and hydrophobicity [24], are selected to characterize the peptide. The values of the parameters are listed in Table 1, and their correlation matrix is shown in Table 2. Three descriptors are needed for each AA to characterize the peptide structures, and 3 × N parameters are needed to characterize the peptide with N AAs. If a cleavage window contains N AAs, the window can be characterized by a set of 3 × N descriptors. For example, a window constituting 4 AAs should be represented by 12 descriptors.

Table 1 Three descriptor scales for natural amino acids

Full size table

Table 2 Correlation matrix of parameters

Full size table

Neural network algorithm

Artificial neural networks have been developed as generalizations of mathematical models of biological nervous systems [25, 26]. The present study uses a BP-NN model with three layers. The nodes of the input layer are equal to the descriptors of AAs, and the node of the output layer is the ideal value, i.e., a negative or positive sample represented by −1 or +1, respectively. The nodes of the hidden layer are calculated by the following formula:

$$ {n_l}=\sqrt{{{n_i}+{n_o}}}+10, $$

where n _l, n _i, and n _o are the nodes of the hidden, input, and output layers, respectively. The weights connected to the input, hidden, and output layers are randomly generated and adjusted using standard BP (a gradient descent type) algorithm with the circular samples presented. Finally, the process is terminated when the minimal error or the maximal iteration is reached. For example, a cleavage window sequence containing 4 AAs should be represented by 12 descriptors, which should all be placed in the input layer. Thus, the BP-NN for the 4 AA cleavage window must have 12 neurons in the input layer.

Optimization of cleavage windows

The cleavage window is formed with several AAs at the cleavage site and defined as “P_m…P₂P₁| P₁′P₂′…P_n′”, where “|” is the cleavage site and m and n are 1 to 6, respectively. For example, the cleavage window has 5 AAs when m = 2 and n = 3, which means that the previous 2 and the next 3 AAs belong to the cleavage window. For each cleavage window, the network is trained for 100 times with all samples, and the statistical results are the average values. The shortest cleavage window, also called the optimized window, is formed after training for every window.

Evaluation of network performance

The network performance is evaluated using the correlation coefficient (CC) [27] as follows:

$$ CC=\frac{{TP\times TN-FN\times FP}}{{\sqrt{{\left( {TN+FN} \right)\left( {FN+TP} \right)\left( {TP+FP} \right)\left( {FP+TN} \right)}}}}, $$

where TP is the number of true positives (experimentally verified cleavage sites also the predicted cleavage sites), TN is the number of true negatives (experimentally verified non-cleavage sites also the predicted non-cleavage sites), FP is the number of false positives (experimentally verified non-cleavage sites predicted as cleavage sites), and FN is the number of false negatives (experimentally verified cleavage sites predicted as non-cleavage sites). The additional performance measurements used are as follows:

$$ \begin{array}{*{20}c} {SE=\frac{TP }{TP+FN}\times 100} & {SP=\frac{TN }{TN+FP}\times 100} & {AC=\frac{TP+TN }{TP+FP+TN+FN}\times 100} \\ \end{array}. $$

Results and discussion

The proteasomal cleavage window is important for the recognition of cleavage sites. Several studies have selected different windows [18, 20], and the maximum length of window is composed of 12 AAs in the present study. To determine the optimized window size, BP-NN models are built with different cleavage windows. The procedure is similar to window optimization and the final results are the average values of 50 times of training and testing.

To identify the motif recognized by proteasomes, 48 windows are constructed according to the combination of AAs before and/or after at N- and C-termini cleavage sites. The sequence for every window is characterized by the parameters and modeled by BP-NN. The BP-NN predictive model is then evaluated and validated with the following four statistical parameters: sensitivity (SN), specificity (SP), accuracy (AC), and CC. The AC and CC are comprehensive parameters, including SN and SP information.

A total of 96 sets of evaluation results are obtained through calculation of the BP-NN models. These evaluation results are divided into two groups, namely the N- and C-termini. As shown in Table 3, the windows for the N-terminus containing 0 or 1 AA at both sides of the cleavage site have unsatisfactory reliability, whereas the other windows have high predictive ability (AC of about 90 % and CC above 0.80). As shown in Table 4, the windows containing no AA before the cleavage site have low predictive ability, whereas the windows containing one or more AAs before the cleavage site have high reliability (AC of about 95 % and CC of about 0.90). In general, the predictive ability of the models based on the C-terminus is superior to that of the models based on the N-terminus.

Table 3 Statistical results of N-terminus

Full size table

Table 4 Statistical results of C-terminus

Full size table

To compare the performance of the predictive models clearly, the AC and CC statistical parameters are shown with histograms for the N-terminus (Fig. 1) and C-terminus (Fig. 2). As shown in Fig. 1, the performance of the model based on the N-terminus is not high, but the AAs after the cleavage site are less than 3, which indicate that the proteasome can at least recognize the cleavage site through the 3 AAs. Figure 1 shows the optimized window consisting of 3 AAs expressed as “|P₁′P₂′P₃′”. The windows show the 3 AAs motif at the N-terminus of epitope. Similarly, Fig. 2 shows the optimized window size consisting of 2 AAs expressed as “P₁|P₁′”. The BP-NN models show high performance based on one or more AAs before the cleavage site. The windows expressed as “P₁|” can be regarded as the most suitable motif for proteasome recognition. The optimized windows show that the epitope can be derived from antigen by proteasomes, which are recognized through AAs at the N-terminus and the C-terminus with sizes of 3 and 1, respectively. The motif located in the terminus is similar to the anchor residues of epitope and recognized by proteasomes instead of MHC.

Knowledge of the cleavage patterns is essential for understanding how the proteasome contributes to the generation of peptides presented by MHC class I molecules on the surface of vertebrate cells [27]. Compared with eukaryotic proteasomes, immunosubunits replace the constitutive proteolytically active subunits and significantly improve proteasome-dependent antigen processing. Groll et al. [28] reported that the conformation of MHC-I bound peptides is similar to that of the propeptides of proteasome active subunits, which suggests the co-evolution of MHC molecules and proteasomal ligand-binding sites. Conserved AAs should be present near the cleavage for recognition of proteasomes, which are analogous to the anchor residues of epitopes bound to MHC. In this study, the conserved analysis is performed by weblogo [29], and the sequence logo for the N- and C-termini is shown in Fig. 3. The motif for the N-terminus contains the conservative site at position 2 and the corresponding residues are Leu, Ile, and Met. Similarly, the motif for the C-terminus contains one residue and the conservative AAs are Val, Leu, and Ile. More than 75 % of peptides are nonamers and the conservative sites are consistent with nonamer epitopes, which are located at positions 2 and 9.

In eukaryotic cells, the ubiquitin-proteasome system plays a critical role in the cleavage of antigen protein in epitope processing and presentation in the MHC-I pathway. Ubiquitination occurs by the covalent attachment of multiple ubiquitin molecules onto the substrate, which is then transported and degraded in the 26S proteasome. Consequently, the weak interactions between the AAs closing the cleavage site and the proteasome depend on the AA properties, such as stericity, electrostatic properties, and hydrophobicity. The proteasome ligand-binding site is similar to MHC; thus, the interaction between the optimized motif and proteasome depends on that between MHC and epitope. The properties, residue volume, electron-ion interaction potential values, and hydrophobicity can reflect the interaction between motif and proteasome, and the BP-NN models show high reliability and predictive ability.

Numerous predictive models have been constructed with the machine learning algorithm [20, 30, 31], including the support vector machine and ANN. Table 5 lists the statistical parameters of the predictive model built for the C-terminus from the present and previous studies. The present predictive model is superior to those in literature. Although the datasets, characterization of sequence, and modeling method differ among the listed models in Table 5, the BP-NN model has high predictive ability because of the properties representing the profile of the interaction between proteasome and motif. Therefore, the modeling based on the properties of AAs can provide a good scheme predicting proteasomal cleavage sites.

Table 5 Comparison with literature (C-terminal)

Full size table

Conclusions

Sequences at both sides of a cleavage site are selected as windows for proteasome recognition. The optimized windows for the N- and C-termini are determined by BP-NN, wherein the optimization standard is the window with higher reliability and the shortest sequence. The prediction model from the BP-NN has high reliability that is also superior to those of previous models. The predicted CCs of BP-NN are 0.862 for N-terminus and 0.922 for C-terminus for the test dataset. The properties used in this work can build a good predictive model but cannot reflect the comprehensive structural features of cleavage sites. Therefore, more suitable parameters for characterizing cleavage sites are favorable for proteasome recognition at corresponding positions. The predicted BP-NN model can well calculate the cleavage site of the determinate and predict the novel cleavage site of the protein sequence.

References

Frahm N, Korber BT, Adams CM et al. (2004) Consistent cytotoxic-T-lymphocyte targeting of immuno-dominant regions in human immunodeficiency virus across multiple ethnicities. Virol 78(5):2187–2200
Article CAS Google Scholar
Takedatsu H, Shichijo S, Katagiri K et al. (2004) Identification of Peptide Vaccine Candidates Sharing among HLA-A3(+), −A11(+), −A31(+), and −A33(+). J Cancer Patients Clin Cancer Res 10(3):1112–1120
Article CAS Google Scholar
Wu YZ, Shi TD (1998) EBVD: a new approach to molecular design of vaccine. Immunologist 8(s):1487–1489
Google Scholar
Shi TD, Wu Y (1998) Molecular design and immunological investigations on the therapeutic peptide vaccines against viral hepatitis B. Immunologist 8(s):330–334
Google Scholar
Rosenberg SA, Yang JC, Schwartzentruber DJ et al. (1998) Immunologic and therapeutic evaluation of a synthetic peptide vaccine for the treatment of patients with metastatic melanoma. Nat Med 4(3):321–327
Article CAS Google Scholar
Ciechanover A (1994) The ubiquitin-proteasome proteolytic pathway. Cell 79(1):13–21
Article CAS Google Scholar
Kisselev AF, Tatos N et al. (1999) The sizes of peptides generated from protein by mammalian 26 and 20 S proteasomes. J Biol Chem 274(6):3363–3371
Article CAS Google Scholar
Coux O, Tanaka K, Goldberg AL (1996) Structure and functions of the 20S and 26S proteasomes. Annu Rev Biochem 65:801–847
Article CAS Google Scholar
Jentsch S, Schlenker S (1995) Selective protein degradation: a journey’s end within the proteasome. Cell 82(6):881–884
Article CAS Google Scholar
Hershko A, Ciechanover A (1998) The ubiquitin system. Annu Rev Biochem 67:425–479
Article CAS Google Scholar
Groll M, Ditzel L, Lowe J et al. (1997) Structure of 20S proteasome from yeast at 2.4A resolution. Nature 386:463–471
Article CAS Google Scholar
Uebel S, Tampe R (1999) Specificity of the proteasome and the TAP transporter. Current Opinion in Immunology. J Curr Opin Immunol 11(2):203–208
Article CAS Google Scholar
Craiu A, Akopian T, Goldberg A, Rock KL (1997) Two distinct proteolytic processes in the generation of a major histocompatibility complex class I-presented peptide. Proc Natl Acad Sci 94:10850–10855
Article CAS Google Scholar
Niedermann G, King G, Butz S et al. (1996) The proteolytic fragments generated by vertebrate proteasomes: structural relationships to major histocompatibility complex class I binding peptides. Proc Natl Acad Sci 93:8572–8577
Article CAS Google Scholar
Doytchinova IA, Guan PP, Flower DR (2006) EpiJen: a server for multistep T cell epitope prediction. BMC Bioinforma 7:131–141
Article Google Scholar
Pierre D, Oliver K (2005) Integrated modeling of the major events in the MHC class I antigen processing pathway. Protein Sci 14:2132–2140
Article Google Scholar
Holzhutter HG, Frommel C, Kloetzel PM (1999) A theoretical approach towards the identification of cleavage-determining amino acid motifs of the 20S proteasome. J Mol Biol 286:1251–1265
Article CAS Google Scholar
Kuttler C, Nussbaum AK, Dick TP et al. (2000) An algorithm for the prediction of proteasomal cleavages. J Mol Biol 298:417–429
Article CAS Google Scholar
Nussbaum AK, Kuttler C, Hadeler KP et al. (2001) PAProC: a prediction algorithm for proteasomal cleavages available on the WWW. Immunogenetics 53:87–94
Article CAS Google Scholar
Kesmir C, Nussbaum AK, Schild H et al. (2002) Prediction of proteasome cleavage motifs by neural networks. Protein Eng 15:287–296
Article CAS Google Scholar
Altuvia Y, Margalit H (2000) Sequence signals for generation of antigenic peptides by the proteasome: implications for proteasomal cleavage mechanism. J Mol Biol 295:879–890
Article CAS Google Scholar
Goldsack DE, Chalifoux RC (1973) Contribution of the free energy of mixing of hydrophobic side chains to the stability of the tertiary structure. J Theor Biol 39:645–651
Article CAS Google Scholar
Cosic I (1994) Macromolecular bioactivity: is it resonant interaction between macromolecules?–Theory and applications. J IEEE Trans Biomed Eng 41:1101–1114
Article CAS Google Scholar
Prabhakaran M (1990) The distribution of physical, chemical and conformational properties in signal and nascent peptides. J Biochem 269:691–696
CAS Google Scholar
Shihab K (2006) A backpropagation neural network for computer network security. J Comput Sci 2(9):710–715
Article Google Scholar
Abraham A (2005) Artificial neural networks. In: Sydenham PH, Thorn R (eds) Handbook of measuring system design. Wiley, New York, pp 901-908
Marques AJ, Palanimurugan R, Matias AC et al. (2009) Catalytic mechanism and assembly of the proteasome. Chem Rev 109(4):1509–1536
Article CAS Google Scholar
Groll M, Larionov OV, Huber R et al. (2006) Inhibitor-binding mode of homobelactosin C to proteasomes: new insights into class I MHC ligand generation. Proc Natl Acad Sci USA 103(12):4576–4579
Article CAS Google Scholar
Crooks GE, Hon G, Chandonia JM et al. (2004) WebLogo: a sequence logo generator. Genome Res 14(6):1188–1190
Article CAS Google Scholar
Saxova P, Buus S, Brunak S, Kesmir C (2003) Predicting proteasomal cleavage sites: a comparison of available methods. Int Immunol 15(7):781–787
Article CAS Google Scholar
Bhasin M, Raghava GP (2005) Pcleavage: an SVM based method for prediction of constitutive proteasome and immunoproteasome cleavage sites in antigenic sequences. Nucleic Acids Res 33(Web Server issue):W202–W207
Article CAS Google Scholar

Download references

Acknowledgments

This study was supported by grants from the National Basic Research Program of China (No. 2012CB11460), National Natural Science Foundation of China (No. 81171508, 31170747) and Natural Science Foundation Project of CQ CSTC (No. cstc2013jjb10004).

Author information

Authors and Affiliations

College of Pharmacy and Bioengineering, Chongqing University of Technology, Chongqing, 400050, People’s Republic of China
Yuanqiang Wang, Yong Lin, Mao Shu, Rui Wang, Yong Hu & Zhihua Lin
State Key Laboratory of Silkworm Genome Biology, Southwest University, Chongqing, 400715, People’s Republic of China
Yuanqiang Wang
School of Chemistry and Chemical Engineering, Chongqing University, Chongqing, 400044, People’s Republic of China
Zhihua Lin

Authors

Yuanqiang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yong Lin
View author publications
You can also search for this author in PubMed Google Scholar
Mao Shu
View author publications
You can also search for this author in PubMed Google Scholar
Rui Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yong Hu
View author publications
You can also search for this author in PubMed Google Scholar
Zhihua Lin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhihua Lin.

Electronic supplementary material

The supplementary lists the epitope, peso-epitope and their protein.

ESM 1

(XLS 207 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, Y., Lin, Y., Shu, M. et al. Proteasomal cleavage site prediction of protein antigen using BP neural network based on a new set of amino acid descriptor. J Mol Model 19, 3045–3052 (2013). https://doi.org/10.1007/s00894-013-1827-7

Download citation

Received: 20 November 2012
Accepted: 18 March 2013
Published: 13 April 2013
Issue Date: August 2013
DOI: https://doi.org/10.1007/s00894-013-1827-7

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Proteasomal cleavage site prediction of protein antigen using BP neural network based on a new set of amino acid descriptor

Abstract

Similar content being viewed by others

Identification of CD8⁺ T cell epitopes through proteasome cleavage site predictions

NetCleave: an open-source algorithm for predicting C-terminal antigen processing for MHC-I and MHC-II

DRREP: deep ridge regressed epitope predictor

Introduction