Key words

1 Introduction

T cells play a key role in fighting infectious agents such as pathogenic viruses, bacteria and parasites, as well as in cancer immune surveillance, eliminating tumoral cells. T cells respond to foreign peptide antigens (T-cell epitopes) bound the cell surface expressed major histocompatibility complex (MHC) molecules [14]. There are two main classes of MHC molecules, MHC class I (MHCI) and MHC class II (MHCII) that are in turn recognized by CD8+ T and CD4+ T cells, respectively [4]. In humans, MHC molecules are known as Human Leukocyte Antigens (HLAs) and are extremely polymorphic [5]. HLA polymorphisms are the basis for distinct peptide binding specificity of HLA allelic variants [5].

The relevance of T-cell epitopes for understanding disease pathology [6] and for epitope-based vaccines [710] has led to the identification of thousands of epitopes and MHC-peptide ligands from all kind of antigens. The availability of this vast amount of data has had two major intertwined consequences. On the one hand, it has given rise to comprehensive databases and resources to store the ever-increasing data. On the other hand, it has fueled the development of computational approaches for the prediction of T-cell epitopes.

Relevant examples of T-cell epitope and MHC-peptide ligand databases include SYFPEITHI [11], JenPep [12], and MHCBN [13], TEPIDAS [14], ImmuneEpitope Database [15], and EPIMHC [16]. These resources are instrumental for the analysis of peptide–MHC binding and T-cell epitope immunogenicity, primarily serving as source of data but also providing analytic and predictive tools. All these databases are based on relational databases and share a considerable amount of capabilities. Yet they also have unique features. Here we will work with EPIMHC [16], a highly curated database of T-cell epitopes and MHC-binding peptides that unlike any related resource enables tailored predictions of T-cell epitopes using custom-made peptide–MHC binding motif-profiles [1719].

T-cell epitopes are determined by several molecular events [2022], of which peptide–MHC binding is the most selective. Therefore, prediction of peptide–MHC binding is the main basis to anticipate T-cell epitopes [23]. Peptide–MHC binding predictions can be achieved through a great variety of methods [23], including peptide–MHC binding motif-profiles [1719] which rank among the most successful and popular of them [24]. These motif-profiles consist of weighted position specific scoring matrices (PSSM) [25] created from sets of aligned peptide sequences known to bind to the relevant MHC molecules.

Prediction of T-cell epitopes using a large set of MHC-specific motif-profiles is readily available for free public use in at our RANKPEP site ( http://imed.med.ucm.es/Tools/rankpep.html ). We generated the peptide–MHC binding profiles available in RANKPEP from the largest non-redundant set of peptides that we could identify. In computational cross-validations, RANKPEP profiles exhibited a great performance [17]. However, they are not necessarily the best for all predictive matters. In fact there is no general consensus on what peptides should be included for peptide–MHC binding model building. Therefore, in this chapter, we illustrate how to use EPIMHC to derive custom-made peptide–MHC binding motif-profiles and produce tailored T-cell epitope predictions.

2 Materials

2.1 EPIMHC Database and Query Form

EPIMHC is a database with comprehensive information on MHC-restricted ligands and T-cell epitopes that are observed in real proteins from a great variety of sources including tumor antigens. Peptide data were collected from related databases [11, 26, 27] and the literature and was incorporated into the database upon computational curation (altered peptide ligands are not included). EPIMHC data is structured as a relational database with a set of related tables (Fig. 1). Entries in EPIMHC are unique with regard to the combination of two features: the sequence of the peptide and the MHC restriction element. Main annotations in EPIMHC include information on the ligand (sequence, length, MHC binding, T-cell activity, processing, protein source, protein name, and organisms), the MHC restriction element (CLASS, MHC molecule and MHC source), and publication reference. The “processing” field indicates whether the peptides are processed and presented from their protein sources in vivo (annotated as natural). “MHC binding” and “immunogenicity level” fields follow a qualitative annotation of four values (high, moderate, little, unknown). The immunogenicity level only applies to peptides with reported “T-cell activity.” Immunogenicity and MHC-binding binding levels were obtained from the literature and translated onto the indicated qualitative values as previously reported [26]. If no information on peptide–MHC binding and/or Immunogenicity level was found, then such fields were annotated as unknown.

Fig. 1
figure 1

EPIMHC database structure

EPIMHC database is accessible online at http://imed.med.ucm.es/epimhc/ (Fig. 2) through an intuitive and user-friendly Web interface. This server allows for complex database queries, combining any annotation field, thanks to the underlying SQL language. For example, users can search for peptide ligands and/or epitopes that are restricted by one, various or all MHC molecules (left side of the screen), and restrict the search according to various criteria like length and source of the peptide (right of the screen). Also, any field of interest can be included in the search output. The EPIMHC search output will be described in detail in the Method section in the context of the generation of custom-made profiles.

Fig. 2
figure 2

EPIMHC database Web interface. EPIMHC resource is available for free public use at http://imed.med.ucm.es/epimhc/

2.2 Prediction of a Peptide–MHC Binding and T-Cell Epitopes Using Profiles

As mentioned, motif-profiles consist of weighted PSSM [25] created from a set of aligned peptide sequences known to bind to a given MHC molecule. In order to predict peptide–MHC binding and T-cell epitope using motif-profiles, we used a search algorithm known as RANKPEP [17, 18]. The algorithm uses the profile coefficients to score all possible peptide fragments in a protein with the width of the PSSM and ranks them by score. The width of a PSSM is given by the number of residue sites in a multiple sequence alignment. Although rank per se is insufficient to assess whether a peptide is a potential binder, we have shown that T-cell epitopes score among the top 2 % ranking peptides [17, 18]. Motif-profiles assume that peptide residues contribute independently to MHC binding. This assumption is well supported by experimental data, although there are reported instances in which the contribution of peptide residues to MHC-binding is influenced by neighboring residues [28].

RANKPEP is accessible online for public use at the site http://imed.med.ucm.es/Tools/rankpep.html (Fig. 3). Currently, 88 and 50 different MHCI and MHCII molecules, respectively, can be targeted for peptide binding predictions in RANKPEP using the relevant motif-profiles. The profiles available in RANKPEP have been derived from large sets of non-redundant peptide–MHC binders, without taking in consideration their T-cell activity and source. These sets can include self-peptides eluted from MHC molecules. The RANKPEP Web server is flexible, intuitive and combines several interesting features. Notably, users can upload their own motif-profiles, such as those generated using EPIMHC (see Subheading 3). A simplified version of the RANKPEP input form can also be launched from EPIMHC to facilitate tailored prediction of T-cell epitopes using custom-made profiles (see Subheading 3).

Fig. 3
figure 3

RANKPEP Web server. The figure depicts a screenshot of the RANKPEP interface with the option of uploading custom-made profiles highlighted. RANKPEP is available for free public use at http://imed.med.ucm.es/Tools/rankpep.html

3 Methods

In this section, we show a step-by-step guide to derive a specific peptide–MHC binding motif-profile in EPIMHC and produce tailored T-cell epitope predictions. In particular, we will target the prediction of A*0201-restricted CD8 T-cell epitopes from SARS coronavirus nucleoprotein (GI: 30173007). This protein contains 7 experimentally identified A*0201-restricted CD8 T-cell epitopes (Table 1) and we will use that knowledge to assess the predictive accuracy of various peptide–MHC binding motif-profiles.

Table 1 Known A*0201-restricted CD8 T-cell epitopes in SARS nucleoprotein

3.1 Peptide Selection and Motif-Profile Building

We will build a motif-profile from all 9-mer peptides that are annotated in EPIMHC to bind with high affinity to the human MHC I molecule HLA-A*0201 (A*0201) (see Notes 1 6 ). To this end, we first do a search in EPIMHC with the following selection criteria, leaving the remaining fields with default settings:

  1. 1.

    Select HLA-A*0201 in MHC SELECTION.

  2. 2.

    Select 9 in LENGTH (see Note 2 ).

  3. 3.

    Select high in PEPTIDE BINDING LEVEL.

In Fig. 4a we show a screen capture of the selection described. Upon submitting the search, we get 178 peptides (Fig. 4b). EPIMHC search results consist of a tabulated list, rows, of records fitting the search criteria. The table columns provide the information fields selected by the users in the query form. The default fields are those shown in Fig. 4b and include the MHC restriction element (MHC), the MHC class (I or II) (CLASS), the sequence of the peptide (SEQUENCE), the name of the source protein (PROTEIN SOURCE NAME), whether the peptide is an epitope (Yes or NO) (EPITOPE), T-cell activity level (EPITOPE LEVEL), the source organisms of the peptide (PEPTIDE SOURCE ORGANISMS), and length of the peptide (LENGTH). Clicking on the peptide sequence will show its location onto the relevant protein source. Also, one can retrieve the protein source record in NCBI by clicking on the relevant protein source names. Users can select any record by clicking on the record-checkbox or select all records by clicking on the option check all at the bottom of the result page and download the data in a variety of text formats from the relevant action buttons (FASTA Sequence, Table format, or Full Record).

Fig. 4
figure 4

EPIMHC search example and output. The figure illustrates a search example in the EPIMHC resource for peptides binding to HLA-A*0201 with high affinity (a) and the result of that specific search (b)

Motif-profiles are built from peptide sequences selected in the output. Currently, EPIMHC can generate two types of profiles that are specified by clicking on either the p.mtx or the pwp checkboxes. The first one uses a branch proportional weighting method [29], while the second uses position-based weights [30]. To make a motif-profile incorporating position-based weights from all peptide records, we follow the next sequential steps (highlighted in Fig. 5a):

Fig. 5
figure 5

Profile-building using peptides selected in the EPIMHC output. (a) Figure highlights the steps that are needed (labeled 1, 2, and 3) to make a motif-profile with position-based weights from all peptides in the EPIMHC search output (b) RANKPEP form launched by EPIMHC with ready-to-use custom-made profile (highlighted)

  1. 1.

    Click on check all (all peptides will be selected) (see Note 6 ).

  2. 2.

    Click the p.mtx check box.

  3. 3.

    Click the create matrix action button.

Upon hitting the create matrix button, EPIMHC opens a simplified version of the RANKPEP Web interface that incorporates the custom-made motif-profile from the selected peptides (Fig. 5b). The profile appears under the File field of the form and can be downloaded with a mouse right click (PC) or by a mouse click-and-hold (MAC). The profile thus generated, shown in Fig. 6, has the format required by the MAST-motif search algorithm [31] and can be uploaded to the original RANKPEP Web server to produce custom predictions of peptide–MHC binding and T-cell epitopes. However, the RANKPEP interface launched by EPIMHC allows a more direct and simple way to achieve such a task (Fig. 5b). Under SET DISPLAY OPTIONS, users can select between two options to set the number of peptides to be returned by the algorithm: one is as a fixed number of top scoring peptides and the other as a percentage of top scoring peptides. Users can also restrict the peptides sorted by RANKPEP by molecular weight (MW) so that only peptides within a MW window will be returned. By default, MW filtering is not applied. The RANKPEP interface also provides three models for proteasomal cleavage predictions [17, 22]. By default, model one is selected. These models will be applied regardless of the class of the MHC targeted for predictions but the predictions are only meaningful for MHC I-restricted peptides (see Note 4 ).

Fig. 6
figure 6

EPIMHC profile with position-based weights generated from 178 9-mer peptides binding to HLA-A*0201 with high affinity

3.2 Prediction of Peptide–MHC Binding and T-Cell Epitopes With EPIMHC Custom-Made Profiles

To target SARS nucleoprotein for T-cell epitope predictions using the RANKPEP form launched by EPIMHC with the custom-made profile we carry on as follows:

  1. 1.

    Set peptides to display to 2 % of top scoring peptides (see Note 5 ).

  2. 2.

    Paste the SARS nucleoprotein, FASTA format, in the text box INPUT section.

  3. 3.

    Click on the matrix check box.

  4. 4.

    Click on the action button Run Rankpep.

The indicated steps are highlighted in Fig. 7a and we describe next the RANKPEP output (Fig. 7b)

Fig. 7
figure 7

Tailored prediction of peptide–MHC binding using the RANKPEP form launched by EPIMHC (a) The figure illustrate the steps to predict HLA-A*0201-restricted peptides from SARS nucleoprotein. (b) RANKEP output showing the prediction results

The top part of the RANKPEP output shows the matrix (profile) used for the predictions, a consensus sequence that would reach the largest score, optimal (largest) score and a binding threshold (BT). The later is an important feature. Large scores lead to top ranking peptides and are indicative of peptide–MHC binding. However, rank per se is insufficient to know whether a given peptide will bind to a particular MHC molecule, e.g., scoring a single peptide. Therefore, EPIMHC provides a profile-specific BT that serves to identify the most confident peptide–MHC binders and T-cell epitopes as those with a score ≥ BT. The profile-specific BT provided by EPIMC is obtained by scoring all the peptides used to make the relevant profile matching the 90 percentile value of all peptide scores [18]. The next part in the RANKPEP output consists of a list of peptides from the input protein ranked by the scores obtained with the relevant profile. In our case, RANKPEP shows only 9 peptides from SARS nucleoprotein because we selected to display only the 2 % of top scoring peptides. For every peptide, RANKPEP shows its rank (RANK), location in the protein sequence (POS), sequence (SEQUENCE), three N-terminal (N) and C-terminal (C) flaking residues, score (SCORE), and relative score, in percentage, with regard to the optimum score (%OPT). Peptides whose scores are equal or greater than the BT score are highlighted in red, and those containing a C-terminal end predicted to be the result of proteasomal cleavage are shown in violet.

As we made a profile from peptides binding with high affinity to A*0201, we are predicting peptides from SARS nucleoprotein that bind to A*0201 and hence potential A*0201-restricted CD8 T-cell epitopes. In fact, in the results shown in Fig. 7b, it is possible to identify 5 out of the 7 known A*0201-restricted CD8 T-cell epitopes from SARS nucleoprotein (Table 2).

Table 2 Description of custom-made profiles used in this study

3.3 Comparison of CD8 T-Cell Epitope Predictions Using Various Custom-Made Profiles

The goodness of peptide–MHC binding and T-cell epitope predictions provided by any predictive model, including motif-profiles, is tied to the data used for model building [32, 33]. To demonstrate such influence, here we will compare the predictions of A*0201-restricted CD8 T-cell epitopes from SARS nucleoprotein that are obtained with 4 distinct motif-profiles, including that described in the previous section (hereafter profile #1). The specific peptide selections that give rise to the different profiles used in this section are detailed in Table 2. Briefly, all profile-motifs are generated from peptides binding with high affinity to A*0201. In addition, profile #3 and #4 only include peptides from viruses and profile #2 and #4 only include peptides known to display T-cell activity (they are epitopes). To evaluate the predictive performance of these profiles, we scored and ranked all peptides from SARS nucleoprotein and compared the ranking achieved by the known SARS nucleoprotein A*0201-restricted CD8 T-cell epitopes shown in Table 1. These results are summarized in Table 3. All four motif-profiles produce related results, ranking the known CD8 T-cell epitopes among the top scoring peptides of SARS nucleoprotein. This is expected as A*0201-restricted CD8 T-cell epitopes and the peptides used for model building have in common the ability to bind to A*0201. However, there are also differences in the results. Thus, only the profiles derived from viral peptides are capable of predicting the known A*0201-restricted CD8 T-cell epitopes from SARS nucleoprotein among the top 11 scoring peptides (Table 3). Judging from the dispersion of the ranks (Table 3), the best overall epitope predictions are obtained with a motif-profile derived from viral peptides with known T-cell activity (T-cell epitopes). It remains to be explored whether discarding peptide–MHC binders with no reported T-cell activity always improves the resulting T-cell epitope prediction models. It is important to note that none of the known epitopes used in these analyses have been used to derive any of the profiles. In fact, the current version of EPIMHC does not contain any SARS peptides at all.

Table 3 Ranking and statistics of known SARS nucleoprotein A*0201-restricted CD8 T-cell epitopes using four custom-made motif-profiles

In conclusion, profiles are very powerful at capturing nontrivial motifs and the results shown here support that epitope predictions can be improved using customized peptide–MHC binding profiles. EPIMHC is the only available resource readily suitable for that task.

4 Notes

  1. 1.

    In EPIMHC, users can make profiles from any peptide selection but the peptides must be related to some extend (e.g., binding to the same MHC) to produce profiles yielding meaningful predictions (see Subheading 3.1).

  2. 2.

    EPIMHC is better suited for making MHC I-specific profiles. Moreover, profiles can only be generated from peptides with the same length; otherwise EPIMHC returns an error (see Subheading 3.1). There are practical and structure-based reasons for this limitation as discussed by Reche et al. [18].

  3. 3.

    EPIMHC can also produce profiles from peptides that have been selected to bind to MHC II molecules, provided that they have the same length (see Subheading 3.1). However, as data availability is limited, we recommend using a motif discovery program such as MEME [31] for making peptide–MHC II binding profiles from peptides of any length as described in previous reports [17, 18].

  4. 4.

    The proteasomal cleavage predictions should not be taken in consideration when predicting peptide binding to MHC II molecules: the proteasome is not involved in class II antigen processing. We are working in correcting this inconsistency.

  5. 5.

    For RANKPEP to return all peptides in a given protein sorted by score, users need to make the following selections in the RANKPEP input form: first set peptides to display by number and then select 990 from the pull-down menu.

  6. 6.

    To generate profiles that are capable of capturing the relevant peptide–MHC binding feature, we suggest using a minimum of five peptides.