Keywords

1 Introduction

The term “immunity” comes into consideration through individuals who had recovered from certain infectious disease and got protected from the same disease when it would be encountered in future. Thus there exists an immune system and associated biological processes within these individuals, which are responsible for developing “immunity.” The role of an immune system is to protect against diseases by identifying and killing pathogens. An immune system includes innate and adaptive components. According to traditional dogma of immunology, vertebrates have both innate and adaptive immune systems whereas invertebrates possess only innate immune system (Kimbrell and Beutler 2001; Tomar and De 2010).

An immune system may be considered as a network of thousands of molecules, which leads to many intertwined responses. It is found to be structurally and functionally diverse. This diversity is both temporal and varies over the individuals. Thus huge amount of data related to immune systems is being generated. Immunologists have been using high-throughput experimental techniques for quite a long time, which have generated a vast amount of functional, clinical, and epidemiological data. So the development of new computational approaches to store and analyze these data is needed. Recently immunology focused resources and softwares are coming up, which help in understanding the properties of whole immune system (Gardy et al. 2009). This gives rise to a new field, called immunoinformatics. Immunogenomics, immunoproteomics, epitope prediction, and in silico vaccination are different areas of computational immunological research. Recently, Systems Biology approaches are being applied to investigate the properties of dynamic behavior of an immune system network (De and Tomar 2014; Tomar and De 2010, 2014).

The information provides immunoinformatics domain, an immunologist can explore the potential binding sites, which, in turn, leads to development of new vaccines. This methodology is termed as “reverse vaccinology” which analyzes the pathogen genome in order to identify potential antigenic proteins (Davies and Flower 2007). These tools are also helpful to identify virulence genes and surface-associated proteins. Immunomics itself is a new discipline, with high-throughput techniques to get the immune system mechanism (De Groot 2006; Grainger 2004) as mentioned in Fig. 9.1.

Fig. 9.1
figure 1

Shows the workflow in immunomics

2 Data Sources

This section provides information on immune system-related datatypes and databases. The detailed version of information on data sources is available from our previously published articles and book on immunoinformatics (De and Tomar 2014; Tomar and De 2010, 2014).

2.1 Data from Lab Experiments

Immunology has a vast amount of experimental data due to the high-throughput molecular biology techniques. These techniques help in finding the structure and function of immune genes and their products (Yates et al. 2001). Experiments involve many immunological techniques to understand the underlying mechanism of an immune system and its responses to various infections, diseases, and drugs, viz., affinity chromatography (Kaplan et al. 1974), flow cytometry (Davey 2003), radioimmunoassay (Mari et al. 2006) (Durkin et al. 1997), enzyme-linked immunosorbent assay (ELISA) (Durkin et al. 1997; Ma et al. 2006), competitive inhibition assay (Levine et al. 1980), and Coombs test (Nishimaki et al. 1987).

Purification techniques like affinity chromatography are used to purify MHC peptide from membrane MHC molecules, which can be analyzed by capillary high-pressure liquid chromatography electrospray ionization-tandem mass spectrometry (Admon et al. 2003).

2.2 Immunomic Microarray Technology

The similar technology, like microarray one, is used in functional immunomics and is referred to as “immunomic microarray” that includes dissociable antibody microarray (Wang 2004), serum microarray (Magdalena et al. 2005), and serological analysis of cDNA expression library (SEREX) (Sahin et al. 1997). Antibody microarray is used to measure concentration of antigen for a specific antibody probes and thereby consists of antibody probes and antigen targets. Peptide microarray uses antigen peptides as fixed probes and serum antibodies as targets. Peptide-MHC microarray or artificial antigen-presenting chip technique has recombinant peptide-MHC complexes and co-stimulatory molecules, which are immobilized on a surface. The T cell spots act as artificial antigen-presenting cells (Oelke et al. 2003) containing a defined MHC-restricted peptides. One can measure two or more signals simultaneously determined by a single feature, i.e., epitope in immunomic microarray (Braga-Neto and Marques 2006; Nahtman et al. 2007).

2.3 Immunomic Databases

Epitope information-related databases, bioinformatics tools, and prediction algorithms are very crucial for basic immunological studies, diagnosis, and vaccine research (Peters et al. 2005). InnateDB (Lynn et al. 2008) (www.innatedb.ca) has been created to understand complete network of pathways and interactions of innate immune system responses. The newer version is Cerebral (Barsky et al. 2007) and has a Java plug-in for the Cytoscape biomolecular interaction viewer version 2.8.2 (Shannon et al. 2003) for automatically generating layouts of biological pathways. Table 9.1 has some of the databases for B cell epitopes, T cell epitopes, allergy prediction, and evolution of immune system genes and proteins (Tomar and De 2010).

Table 9.1 Databases on B cell epitopes, T cell epitopes, allergen, and molecular evolution of immune system components (Tomar and De 2010, 2014)

2.4 B Cell Epitope Databases

Epitome (Schlessinger et al. 2006) (http://www.rostlab.org/services/epitome/) contains all known antigen-antibody complex structures. More details are available in our previously published article (Tomar and De 2010, 2014).

2.5 T Cell Epitope Databases

Some recent investigations include finding and mapping of potential epitopes. Epitope mapping leads to design effective vaccines. SYFPEITHI database (Rammensee et al. 1999) (www.syfpeithi.de) has information on MHC class I and II anchor motifs and their bindings. IEDB (Sathiamurthy et al. 2005) (http://www.immuneepitope.org/) and ontology-related information (http://ontology.iedb.org/) are specifically designed to get intrinsic, chemical, and biochemical information on immune epitopes and their interactions with host molecules.

FRED (Feldhahn et al. 2009) (http://abi.inf.uni-tuebingen.de/Software/FRED) deals with the methods for data processing and to compare the performance of the prediction methods considering experimental values. IMGT® (Lefranc et al. 2009) (the international ImMunoGeneTics information system®) (http://imgt.org) has 5 databases and 15 interactive online tools for sequence, genome, and 3D structure analysis. The IMGT/HLA Database (Robinson et al. 2011) (http://www.ebi.ac.uk/imgt/hla/) provides a specialist database as a part of the international ImMunoGeneTics project (IMGT). This information is also available in our previously published article (Tomar and De 2010).

3 Immunomic Tools and Algorithms

The main objective of B cell epitope prediction is to design a molecule that can replace an antigen in the process of either antibody production or antibody detection. Such a molecule can be synthesized, or, in case of a protein, its gene can be cloned into an expression vector. Designed molecules are preferable to use because they are inexpensive and noninfectious in contrast to viruses or bacteria, which may be harmful to a researcher or experimental animal. Epitopes are important for disease understanding, host-pathogen interaction analyses, antimicrobial target discovery, and vaccine design. The experimental techniques are found to be difficult and time consuming. Due to this reason, several in silico methodologies are being developed and used to identify epitopes. Table 9.2 lists some of the tools that deal with B and T cell epitope prediction, allergy prediction, and in silico vaccination. Here, we describe different methodologies for epitope and allergy prediction and the process of in silico vaccination briefly.

Table 9.2 Web servers and tools for prediction of B cell epitopes, T cell epitopes, allergy, and for in silico vaccination (De and Tomar 2014; Tomar and De 2010, 2014)

3.1 B Cell Epitope Prediction

B cell epitopes are classified as continuous/linear and discontinuous/conformational. A synthetic peptide may correspond to a short continuous stretch from a protein sequence and bind an antibody raised against a protein; such a peptide is called a continuous epitope of the protein. Both sequence- and structure-based prediction tools are available; however, prediction tools are less available for discontinuous B cell epitopes (Tong and Ren 2009; Saha et al. 2005).

3.2 Prediction of Continuous B Cell Epitopes

3.2.1 Sequence Based Methods

The majority of the sequence-based methods assume that epitopes must be accessible for antibody binding, and, thus, these methods used epitope properties related to surface exposure. These methods are limited to the prediction of continuous epitopes. Sequence-based methods have been tested on prediction of two protective epitopes known in influenza A virus hemagglutinin HA1 (Bui et al. 2007). The first continuous epitope is the 91–108 epitope (SKAFSNCYPYDVPDYASL), which is a protective epitope in a rabbit, able to elicit antibodies neutralizing infectivity of influenza viruses (Muller et al. 1982). The second continuous epitope is the 127–133 epitope (WTGVTQN) protective against the influenza strain A/Achi/2/68 (H3N2) in mouse (Naruse et al. 1994).

3.2.2 Prediction Using Amino Acid Propensity Scale

Amino acid scale-based methods apply amino acid scales to compute the scores of a residue i in a given protein sequence. The i-(n-1)/2 neighboring residues on each side of residue i are used to compute the score for residue i in a window of size n. The final score for residue i is the average of the scale values for n amino acids in the window. Amino acid propensity scales such as hydrophilicity and characteristic flexibility can be used to identify epitopes.

3.2.3 Prediction Using Machine Learning Methodologies

Several researchers used machine learning algorithms and tools to retrieve characteristics of an epitope through learning a dataset. For example, Saha and Raghava used ANN in ABCpred (Saha and Raghava 2006) (www.imtech.res.in/raghava/abcpred). Sweredoski and Baldi (2009) presented COBEpro using SVM. Saha et al. (2005) used feed forward and recurrent neural networks to predict continuous B cell epitopes. COBEpro (Sweredoski and Baldi 2009) that is a two-step system for prediction of continuous B cell epitopes. For BepiPred (Larsen et al. 2006), (http://www.cbs.dtu.dk/services/BepiPred), three datasets of linear B cell epitopes were constructed.

3.2.4 Mimotope-Based Methodology

Phage display library has a large number (more than 109) of random peptides (Mayrose et al. 2007a, b). These pools of peptides are called as mimotopes (Moreau et al. 2006). MIMOP tool (Moreau et al. 2006) has been developed. MIMOP predicts linear and conformational epitopes based on two algorithms, viz., MimAlign uses degenerated alignment analyses, and MimCons is based on consensus identification. MIMOX (Huang and Honda 2006) (http://web.kuicr.kyoto-u.ac.jp/~hjian/mimox) comes in the same category, which maps a single mimotope or a consensus sequence of a set of mimotopes, on to the corresponding antigen structure.

3.2.5 T Cell Epitope Prediction

There exist several methodologies for prediction of MHC binding peptides, which are based on the idea of quantitative matrices, hidden Markov model (HMM), artificial neural networks (ANN), support vector machine (SVM), and structure of the (De and Tomar 2014; Tomar and De 2010, 2014).

3.2.6 Prediction Through Matrix-Driven Methods

Huang and Dai (2006) first investigated a new encoding scheme of peptides. This scheme has BLOSUM matrix with the amino acid indicator vectors for direct prediction of T cell epitopes. It replaced each nonzero entry in the amino acid indicator vector by the corresponding value appeared in the diagonal entries in BLOSUM matrix. MMBPred (Bhasin and Raghava 2003) (www.imtech.res.in/raghava/mmbpred/) server is one such example, which predicts the mutated promiscuous and high affinity MHC binding peptide.

3.2.7 Prediction Through Hidden Markov Model

Zhang et al. developed PREDTAP (Zhang et al. 2006) for the prediction of peptide binding to hTAP, also mentioned in Tomar and De (2010).

3.2.8 Prediction Through Artificial Neural Networks

MHC class I molecule motifs are well defined; however, MHC class II binding peptide prediction is found to be difficult. The reasons are variable length of reported binding peptides, undetermined core region for each peptide, and number of amino acids as primary anchor. Brusic et al. developed PERUN (Brusic et al. 1998), a hybrid method for the prediction of MHC class II binding peptide. The use of PlaNet package version 5.6 (Miyata 1991) to design and train a three-layered fully connected feed forward artificial neural network has provided the needful impact. The whole process of MHC class I ligands’ degradation and presentation has been modeled in EpiJen (Doytchinova et al. 2006) (http://www.ddg-pharmfac.net/epijen/EpiJen/EpiJen.htm), which uses a multistep algorithm based on quantitative matrices.

4 Applications of Immunoinformatics

This section focuses on applications of immunoinformatics that includes cancer diagnosis and therapy, along with the idea of integrating Systems Biology with immunoinformatics.

4.1 Immunoinformatics for Cancer Diagnosis and Therapy

Antigen presentation plays a central role in the immune response and as a result also in immunotherapeutic methods like antitumor vaccination. There is a need to rapidly screen the antigens and to design specific types of expression constructs for immunotherapy of cancer. Competent immune responses to cancer are likely to be restricted to the immunome of a specific cancer, including the set of antigens that drive successful immune responses. However, it is still difficult to find the set of antigens that varies between different tumors. Antitumor vaccination takes advantage of in vivo processes, and it harnesses the full power of the immune system, unlike, the more artificial ex vivo expansion of T cells.

Changes in the cancer diagnosis and prevention are being supported by informatics (Hu et al. 2004). For example, the Cancer Biomedical Informatics Grid (caBIG) connects a network of 500 individuals and 50 institutions who share data and analyze tools to speed up the development of innovative approaches for the prevention and treatment of cancer (Sanchez et al. 2004). The 2005 database issue of Nucleic Acids Research lists 14 cancer-related molecular databases, which mainly focus on cancer-related genes and gene expression (Galperin 2005). Listings of tumor antigens are also available (Novellino et al. 2005). This list includes antigens that have defined T cell epitopes. Tumor-associated antigens (TAA) have played a vital role in both diagnosis and treatment of human carcinomas, such as prostate-specific antigen (PSA) in the diagnosis of prostate cancer. Despite of this, the process of TAA identification has often been hampered by the complicated lab procedures. To fasten the process of tumor antigen discovery, and improve diagnosis and treatment of human carcinoma, a publicly available database Human Potential Tumor Associated Antigen database (HPtaa) (http://www.hptaa.org) has been established (Wang et al. 2006). Systems Biology approaches target identification of a small number of antigens expressed by cancer cells that are suitable targets of immune responses against cancer. A proteomic mapping of in vivo targets for antibodies in the lungs, and solid tumors in experimental animals, defines aminopeptidase-P and annexin A1 as targets of anticancer immune responses (Oh et al. 2004). Informatic methods have also been used for classification of tumors into subtypes, which supports decisionmaking for the selection of therapeutic approaches; however, such applications in cancer immunology are yet to come (Camp et al. 2004).

4.2 Vaccine Against Tumors

Reliable predictions of immunogenic T cell epitope peptides are crucial for rational vaccine design and represent a key problem in immunoinformatics. Computational approaches have been developed to facilitate the process of epitope detection and show potential applications to the immunotherapeutic treatment of cancer. Epitope-driven vaccine design employs these bioinformatics algorithms to identify potential targets of vaccines against cancer (Rosa et al. 2010). The development of epitope-based DNA vaccines and their antitumor effects in preclinical research against B-cell lymphoma has been described (Iurescia et al. 2012).

Most immunotherapeutic approaches work on the induction of antitumor CD8+ T cells, which exhibit cytolytic activity toward tumor cells expressing tumor-specific or tumor-associated Ags. But the immunization strategies that focus solely on CD8+ T cell immunity might prove to be insufficient because they will be unable to provide long-term protective immunity (Khanolkar et al. 2007). It has been shown that the peptides predicted to bind MHC can elicit a tumor-killing cytotoxic T lymphocytes (CTL) response (Lu and Celis 2000). Although CTLs have been found to be the key player in the generation of antitumor therapeutic effects, sometimes it also remains as suboptimal. CD4+ T cells are critical for the generation and maintenance of CTLs response through providing cytokines or by major pathway, i.e., dendritic cell licensing (Smith et al. 2004; Wan and Flavell 2009). Class II MHC-bound epitopes activate CD4+ T cells and maintain effective CTL response that plays an important role in the antitumor response (Hung et al. 1998; Kalams and Walker 1998).

CD4+ T cells determine the functional status of both innate and adaptive immune responses; thus, the inclusion of appropriate CD4+ T cell epitopes may be essential for vaccine efficacy. Idiotypic immunoglobulin M (IgM) expressed by B-cell lymphoma is a clonal marker and a tumor-specific antigen. Thus, it can be used as an immune target. Specific immunogenic epitopes identified from these tumor antigens can be used as vaccines to activate an immune response against tumors cells (Houot and Levy 2009). Concerning to lymphoproliferative malignancies, TTFrC (tetanus toxin fragment C) fusion vaccine design was able to activate anti-Id antibody responses and to suppress tumor growth in murine models (King et al. 1998; Thirdborough et al. 2002) as well as was effective in inducing CD8+ CTL in several tumor model et al. 2001).

4.3 Immunoinformatics and Systems Biology for Personalized Medicine

The idea to integrate immunoinformatics with Systems Biology approaches is for the better understanding of immune-related diseases at various systems levels. This integration can open the path of several translational studies for better clinical practices. The association between a disease and genetic variations is one of the most important aspects in pharmacogenomics and the development of personalized medicine. Figure 9.2 shows the integration that leads to the development of personalized medicine. The information about allele frequencies of immune molecules in a human population is especially important as different patient subgroups can be identified with different vaccine or drug responses (Yan 2010). For example, a SNP (S427T) in the innate immune gene interferon regulatory factor 3 (IRF3) has been associated with increased risk of human papillomavirus (HPV) persistence and cervical cancer (Wang et al. 2009). Genomic variation databases such as HapMap (http://snp.cshl.org/) and dbSNP (http://www.ncbi.nlm.nih.gov/SNP/) provide information on individual genotype data. The Allele Frequencies Database can be used to search for polymorphic regions of various populations on histocompatibility and immunogenetics (http://www.allelefrequencies.net/). This includes polymorphism information on HLA, cytokines, and killer-cell immunoglobulin-like receptors (KIR). Thus, there is a scope of the development of optimized vaccines and drugs tailored to personalized prevention and treatment through the integration of Systems Biology and immunoinformatics.

Fig. 9.2
figure 2

Shows an integration of immunoinformatics and Systems Biology and how it leads to the development of a personalized medicine (Idea inspired from Yan 2010)

5 Conclusions

High-throughput experimental techniques are combined with immunoinformatics which result in explosive growth of immunology. This is as similar as the event that has transformed genetics into genomics as domain immunoinformatics can help in reducing time and cost for traditional immunology lab practices. This review article contains online immunological databases, tools and web servers, and the application of immunoinformatics.

Immunoinformatics models simulate the real behavior of immune system reactions and kinetics; therefore, these are engineered in a way that it can be interpreted and/or modified. These mathematical models take over the systems’ uncertainty as compared to lab experiments. However, these cannot be directly compared to real biological data and it is a limitation. Immune system modeling capabilities take us toward designing a drug with no side effects. Therefore, integrating Systems Biology with immunoinformatics can lead to better clinical trials.