Introduction

The human epidermal growth factor receptor 2 (HER-2/neu/ErbB2) is a 185 kDa transmembrane glycoprotein (Tu et al. 2007). HER-2 is a member of the epidermal growth factor receptor (EGFR) family of tyrosine kinase receptors (Renard and Leach 2007) that also includes the HER-1, HER-3 and HER-4 receptors (Yarden and Sliwkowski 2001). Members of the epidermal growth factor family of transmembrane receptors (ErbB family) are potent mediators of normal cell growth and development (Hynes and Lane 2005; Mendelsohn and Baselga 2003). ErbB2 amplification and overexpression have been reported in a number of human tumors, including 18–25% of human breast cancers, depending on the diagnostic technique used and other factors (Slamon et al. 1987; Owens et al. 2004; Yaziji et al. 2004), as well as in subsets of patients with ovarian cancers (Vermeij et al. 2008), gastric carcinoma (Jaehne et al. 1992) and salivary gland tumours (Cornolti et al. 2007). Studies show that approximately 25% of breast cancer patients have tumors that are HER-2 positive (Morrow et al. 2009). HER-2 positive tumors tend to grow and spread more quickly than tumors that are not HER-2 positive (Akhoon et al. 2010). The monitoring of HER-2 is important because of its association with very aggressive types of breast cancer tumors and the high chance of relapse by the patient that is often correlated with these tumors (Lippman 2008; Riccio et al. 2009; Cho et al. 2003).

ErbB2 expression in tumor cells is usually retained after development of trastuzumab resistance. Active vaccination aiming at the initiation or enhancement of endogenous ErbB2-specific immune responses may offer a valuable alternative treatment. DNA vaccines are of great value since any DNA sequence can be added to the vaccine vector that may improve T cell activation with minimal toxicity. Unlike passive immunotherapy with antibodies, antigen-specific vaccination has the potential to induce a broad spectrum of immune effector mechanisms, which includes CD4+ and CD8+ T cell responses (Prehn and Main 1957; Klein et al. 1960). Vaccination in patients with breast cancer could induce an expansion of CD8+ cytotoxic T lymphocytes (CTLs), capable of rejecting tumor cells via recognition of tumor associated antigenic (TAA) epitopes presented on the surface of cancer cells in association with human leukocyte antigen (HLA) class I molecules. The antigens used in breast cancer vaccination strategies can be represented by whole tumor cells/dendritic cells (either allogeneic or autologous) or of specific TAAs, which are delivered as DNA (naked or comprised in recombinant viruses), RNA, protein or HLA class I/II restricted peptide epitopes (Nencioni et al. 2004). The immunogenicity of the unique tumor antigens was recognized in several seminal studies including animal transplant models (Prehn and Main 1957) and chemically or UV light–induced tumors (Klein et al. 1960; Kripke 1974). They are of particular interest because they result from somatic mutations in individual tumors and are absent from normal tissues (Wortzel et al. 1983; Gilboa 1999), providing antitumor specificity without anticipated deleterious autoimmunity. Endogenous ErbB2-specific CD4+ T cells and antibodies have been detected in patients with different ErbB2-expressing cancers (Wortzel et al. 1983; Gilboa 1999; Bei et al. 1999), and in clinical trials, ErbB2-specific CD4+ and CD8+ T cell responses could be induced by peptide vaccination (Disis et al. 2002; Knutson et al. 2002).

Antigen presenting cells (APCs) play a central role in immune responses against tumors. The transfer of tumor antigens from APCs to T lymphocytes is a key event for the initiation of lymphocyte-mediated immunity against tumors. APCs including dendritic cell (DCs), macrophages, B lymphocytes and DCs are recognized as the most potent APCs in the process of uptake, processing and presentation of tumor antigens. The presence of DCs with positive expression of activated markers has been reported in various human tumors (Scarpino et al. 2000; Lespagnard et al. 1999; Enk et al. 1997; Zeid and Muller 1993; Tsujitani et al. 1990), and most studies have revealed that tumor infiltrating DCs were a positive prognostic indicator in cancer patients (Pinzon-Charry et al. 2005; Vicari et al. 2002; Yang and Carbone 2004).

Immunoinformatics has recently emerged as a critical field for accelerating immunology research. Although still an evolving process, computational models now play instrumental roles, not only in directing the selection of key experiments, but also in the formulation of new testable hypotheses through detailed analysis of complex immunologic data that could not be achieved using traditional approaches alone. For a vaccine to be effective it must invoke a strong response from both T cells and B cells; therefore, epitope mapping is a central issue in designing vaccine (De Groot and Berzofsky 2004; De Groot 2006).

HER-2 protein is expected to be an excellent target for development of vaccines specific for HER-2 overexpressing human cancers (Lee et al. 2003) as HER-2 specific antibodies and T cells are detected in a number of breast cancer patients (Disis et al. 1994, 1997). In the present study, the HER-2 protein was targeted to design a potential DNA vaccine against breast cancer. We used in silico techniques to identify the promiscuous structurally discontinuous and sequentially continuous immunogenic B cell and T cell epitopes for making an allowance for accelerating the strong antigenic and immune responses. Reverse translation of the antigenic sequence, optimization and its introduction into a suitable expression vector for cloning was the final phase of the work. An unguided experimental search for antigenic and immunogenic regions is inherently laborious and resource intensive. The computational approaches can speed up the process and having the potential to simplify the epitope identification to a great extent. Hence, the novel vaccine candidates can be rapidly identified in silico before being subjected to in vitro confirmatory studies (Gupta et al. 2010a).

Materials and methodology

Sequence retrieval and antigenicity check

The 624 amino acid residue sequence of HER-2 from Homo sapiens with gene id: 48425583 was retrieved from Entrez protein database available at NCBI (http://www.ncbi.nlm.nih.gov/) and submitted to Immune Epitope Database and Analysis Resource (IEDB) for the prediction of its antigenic character. Window size of 7 and threshold value 1 was selected as input parameter.

In silico prediction of B cell epitopes

The epitope of a B cell is defined by the discrete surface region of an antigenic protein bound by the variable domain of an antibody. The production of specific antibodies for a tumor can boost host immunity. B cell epitopes can be divided into continuous (linear) and discontinuous (conformational). Linear, or continuous epitopes are defined by the primary amino acid sequence of a particular area of a protein and the surfaces which interact with the antibody are situated next to each other sequentially on the protein whereas the conformational epitopes are the regions of the antigen separated within the sequence but brought together in the folded protein to form a three-dimensional interface.

Discontinuous epitope prediction

DiscoTope (http://www.cbs.dtu.dk/services/DiscoTope/), a method for predicting discontinuous epitopes from 3D structures of proteins in PDB format and based on amino acid statistics, spatial information and surface accessibility in a compiled data set of discontinuous epitopes determined by X-ray crystallography of antibody/antigen protein complexes, was used for the prediction of discontinuous B cell epitopes. Owing to need of 3D protein structure as an input for discontinuous epitope prediction, the PDB file (protein structure data) of the HER-2 sequence (1S78A) was downloaded from Protein data bank (http://www.pdb.org/). After manual inspection of the protein coordinate data file traces of some amino acid residues present in 1S78|A sequence were found missing in chain A of PDB file (1S78), hence the present exercise of developing the 3D model of HER-2 was undertaken. Comparative modeling was performed by using two automated homology modeling programs (SWISS-MODEL (Schwede et al. 2003) and CPH models 3.0 server (Nielsen et al. 2010) and an offline MODELLER 9v7 (Sali and Blundell 1993) software. Further the models were verified with Ramachandran plot obtained from PROCHECK (Laskowaski et al. 1993) and subsequent energy minimizations were performed to optimize the protein model (see supplementary material for details). Subsequently, the best protein model of HER-2 was used as an input for discontinuous epitope prediction at default threshold value of −7.7 which corresponds to a specificity of 75%. Epitopia server (http://epitopia.tau.ac.il) which implements a machine-learning based algorithm to predict immunogenic regions as candidate B-cell epitopes using either the 3D structure or the sequence of a given protein, was also used for the prediction of B cell epitope to enhance the prediction accuracy. The conserved high scored epitopes shown by both the servers were finally selected as discontinuous or conformational epitopes.

Continuous epitope prediction

Continuous B cell epitopes were predicted by IEDB (http://www.immuneepitope.org/) and BcePred (http://www.imtech.res.in/raghava/bcepred/) tools on the basis of four algorithms; (a) Chou & Fasman Beta-Turn Prediction (b) Emini Surface Accessibility Prediction (c) Karplus & Schulz Flexibility Prediction and (d) Parker Hydrophilicity Prediction. As BcePred is able to predict epitopes with 58.7% accuracy using Flexibility, Hydrophilicity, Polarity, and Surface properties combined at a threshold of 2.38 so to enhance the prediction accuracy we also predict epitopes by IEDB on the above four selected properties. Filtering of epitopes was implemented and only those epitopes were mapped as continuous B cell epitopes that were conserved and whose prediction score was high.

In silico prediction of T cell epitopes

T lymphocytes play a central role in the generation of a protective immune response in many microbial infections (Esser et al. 2003). The binding strength of T cell epitopes to major histocompatibility complex (MHC or HLA) molecules is a key determinant in T cell epitope immunogenicity. This allows the epitopes with higher binding affinities to be more likely to be displayed on the surface of the cell where they are recognized by their corresponding T cell receptor (TCR) (De Groot and Martin 2009). For the integration of MHC class-I and class-II T cell epitopes in our antigenic sequence we used four immunoinformatic tools for the prediction of T cell epitopes (IEDB, HLApred, ProPred and EpiToolKit). In IEDB we selected ANN as an algorithm, all length epitopes and cut off ≤50 to discriminate high affinity binders (IC50 < 50 nM) from those with lower affinity. HLApred method allows the identification and prediction for 87 alleles, out of which 51 belong to class-I and 36 belong to class-II. We selected all the 87 alleles for the prediction of both class-I and class-II epitopes by using default parameters of the server. ProPred, a graphical web tool for predicting class II binding regions in antigenic protein sequences was also accessed by selecting all the 51 alleles present in the tool. The sequence in single letter amino acid code was given as input by using default parameters of the server for the prediction of class-II epitopes and only epitopes with highest prediction score were selected for epitope mapping. T cell epitopes were also predicted by EpiToolKit that offers a variety of different prediction methods for class-I and class-II ligands as well as minor histocompatibility antigens. The SYFPEITHI method was selected for the prediction of epitopes. Only those epitopes from all the above servers were used that were not overlapping with B cell epitopes.

T cell conformational epitopes

T cell epitopes may have potential to act as confirmatory epitopes (Blythe and Flower 2005; Greenbaum et al. 2007) that binds with specific monoclonal antibody (mAb) to generate immune responses of memory B cells. Molecular docking of T cell epitopes with antibody is an in silico method to identify the confirmatory prospective of T cell epitopes (Rajkannan and Malar 2007; Von Goethe 2008; Gupta et al. 2009). Human B cells antigens specific mAb (PDB id 2HFG) was retrieved from PDB database and subsequently antibody modelling was performed using Model Antibody Loops protocol of Accelrys Discovery Studio 2.5. Loop having conserved signature residue (Chothia and Lesk 1987; Morea et al. 2000) were identified and subsequently optimized by Superlooper (Hildebrand et al. 2009).

Three-dimensional structures of the identified T cell epitopes were predicted by homology modelling using MODELLER. Epitope models were energy minimized with Gromos96 forcefield, to relieve the models with bad contacts. Epitopes in most stable conformation were docked on optimized antigen binding domain of mAb by using LigandFit protocol implemented in cerius2 suite of programs (version 4.8, Accelrys Software Inc.).

Consensus epitope integration

Rankpep server was used for the prediction of 9 mer consensus peptides for both MHC-I and MHC-II molecules. For enhancing the immunogenicity of the construct all the consensus epitopes were added in the sequence without altering the conformational B cell epitopes in the HER-2 sequence by selecting 10 angstrom area from either side of the discontinuous conformational B cell epitope present in the query sequence. These steps were done by using DeepView 3.7 software.

Back translation and codon adaptation analysis

Since our objective was to design a DNA vaccine, backtranslation of protein sequences into nucleotide sequences was performed using reverse translate tool of Sequence Manipulation Suite (SMS) available at http://www.bioinformatics.org/sms2/rev_trans.html. The tool accepts a protein sequence as input and implements codon usage table to generate a DNA sequence representing the most likely non-degenerate coding sequence. Codon usage table of Homo sapiens was selected for the backtranslation of the sequence and optimal expression of the construct. Rare codon analysis tool of GenScript (http://www.genscript.com/cgi-bin/tools/rare_codon_analysis) was used to check the expression quality of constructed antigenic insert.

In silico cloning

The mammalian expression vector pSecTag2B was used as backbone for designing of DNA vaccine. Antigenic determinants are expressed under the control of the CMV promoter as an in-frame fusion with a vector-encoded Ign leader sequence for secretion followed by COOH-terminal Myc and His tags. Specific targeting of the expressed gene product to APC can enhance the efficiency of ErbB2 DNA vaccine, so to facilitate targeted delivery of tumor antigens to APC; we integrated designed antigenic insert (ErbB2_624) with extracellular domain of Human CTLA-4. For construction of plasmid pSecTag2-CTLA-4-ErbB2_624, a cDNA fragment encoding a CTLA-4-ErbB2 fusion protein that consists of the extracellular domain of human CTLA-4 (CTLA-4_124) and a NH2-terminal 624-amino acid fragment of human ErbB2 (ErbB2_624) was generated by using CLC Main Workbench. The vector was digested with BamHI and XhoI restriction enzymes and antigenic construct (CTLA-4_124-ErbB2_624) was inserted into BamHI/XhoI digested pSecTag2B.

Results and discussion

The HER-2 protein is a well-defined cancer-related immunogenic protein (or tumor antigen) that is expressed by multiple tumor types. HER-2 sequence with gene id: 48425583 was retrieved from NCBI and after checking its antigenicity by Kolaskar and Tongaonkar method (Kolaskar and Tongaonkar 1990) at 1.00 threshold it was confirmed that this protein can elicit a high immune response (Fig. 1). Hence, the sequence was subjugated to B and T cell epitope mapping.

Fig. 1
figure 1

Graph showing antigenicity scale of HER-2 protein sequence

Epitopes present in vaccines appear to be capable of inducing more potent responses than whole protein vaccines (De Groot et al. 2005). The detection of highly immunogenic regions within a given protein, specifically those that elicit a humoral immune response is crucial to many immunodetection and immunotherapeutic applications (Irving et al. 2001; Westwood and Hay 2001). DiscoTope server which takes PDB coordinate file as an input was used for the prediction of discontinuous B-cell epitopes. PDB file of HER-2 was downloaded from Protein Data Bank with its PDB id: 1S78|A. Unfortunately, after manual inspection it was noticed that PDB file lack some continuous stretches of amino acids that were present in sequence. This compelled us to go for homology modeling of HER-2 sequence. The model built by the two servers was manually checked for the missing residues and it was observed that 3D models still lack amino acids present in sequence. Hence, the highly reputed offline software for homology modeling, MODELLER was used for model building of HER-2 sequence using seven best templates (Supplementary Fig. 1) by means of multi-template modeling protocol. With the help of MODELLER ten models were generated and the quality of the modelled structures was assessed using Ramachandran plot in PROCHECK validation package. Among the 10 models, model 1 was selected as it contained maximum number of residues in core region and the percentage of residues was 87.2% core, 10.8% allowed, 1.7% generic and 0.4% disallowed. Since, several residues were found in the disallowed regions of Ramachandran plot; therefore energy refinement was applied for all amino acids whose Phi/Psi was out of core region, using the steepest descent and conjugate gradient techniques. After refinement, 87.9% of the residues were in the core region, 11.3% in the allowed region, 0.8% in the generously allowed region and no residue in disallowed region (Supplementary Fig. 2A) which indicated that the backbone dihedral angles, phi and psi, in the model were reasonably accurate. We also compared Ramachandran plot of the crystal structure of chain A of 1S78 and our modeled protein. After evaluation, 81.6% of the residues were noticed in the core region, 17.3% in the allowed region, 1.1% in the generously allowed region and no residue in disallowed region in 1S78|A (Supplementary Fig. 2B). Therefore, on the basis of distribution of amino acids theoretically we can say our 3D model is best as compared to 1S78|A. In addition, the compatibility score above zero in the VERIFY-3D graph confirmed the acceptable side chain environments (Akhoon et al. 2010) of our protein model. The energy profile obtained from the VERIFY-3D program showed 84.00% of the residues with an average 3D-1D score >0.2. The optimized protein model of HER-2 build by using multi-template modeling was successfully submitted to Protein Model DataBase (PMDB) of Universitàe Ricerca, Via dei Tizii, Rome, Italy under PMDB ID: PM0076332. Structural differences between the optimized model and the original model of HER-2 available at PDB (PDB id: 1S78|A) were visualized using UCSF Chimera v.1.4.1 (Pettersen et al. 2004) with structural alignment (superposition) protocol. Molecular modeling experiments with overlapping of crystal structure of 1S78|A and the modelled HER-2 from Homo sapiens demonstrated that these proteins were substantially similar in structure (Supplementary Fig. 3A and 3B) with Cα RMSD deviation of 1.9 Å. The structures RMSD within default threshold of chimera (2.0) intend the structures within same cluster. Such information was supporting, since significant structural alterations could affect or modify the activity of modelled protein (Srivastava et al. 2010).

The central part of the polypeptide chain is fundamental for the constitution of a conformational epitope for the interaction with mAb since the solubility of core region of epitope appreciably enhances epitope binding with mAb. Therefore, the central amino acid fragment (LNNTTPVTGA) which was found missing in PDB of 1S78A was modeled successfully and in addition, the terminal stretch was also modeled properly. The optimized HER-2 structure was finally given as input to DiscoTope and Epitopia server for the prediction of discontinuous B cell epitopes. The performance of Epitopia server in comparison to DiscoTope (Haste Andersen et al. 2006), CEP (Kulkarni-Kale et al. 2005), and ElliPro (Ponomarenko et al. 2008) was checked by Rubinstein and collaborators and the server succeeded in 59 out of the 66 predictions, yielding a success rate of 89.4% (Rubinstein et al. 2009), was also accessed for prediction of epitopes. Epitopes predicted by both servers with a high prediction score were only selected as conformational B cell epitopes (Fig. 2).

Fig. 2
figure 2

The positional arrangement of discontinous B cell epitopes in 3D structure (epitopes are shown in green color) (color figure online)

For continuous B cell epitope prediction BcePred (accuracy 58.70% at threshold 2.38) with default parameters was employed. Epitopes were predicted on the basis of four properties i.e., Hydrophilicity, Flexibility/Mobility, Accessibility and Turns. To enhance the prediction accuracy IEDB with window size 7 was used for the prediction of high affinity epitopes. Final selection of epitopes was made in such a way that only high scored and conserved epitopes not overlapping with conformational epitopes were mapped as vaccine candidates (Fig. 3).

Fig. 3
figure 3

Continuous B cell epitopes on the basis of various antigenic properties and their prediction score. Epitopes are distinguished by color codes. Purple (Accessibility), Green (B-Turn), Red (Hydrophilicity), Yellow (Flexibility) (color figure online)

T cell epitope content is the major contributor of antigenicity so T cell epitope prediction of HER-2 protein was examined with four popular immunoinformatic tools (HLApred, IEDB, ProPred and EpiToolKit). Both MHC-I and MHC-II epitopes were predicted by HLApred for 87 alleles, out of which 51 belong to class-I and 36 belongs to class-II. Two class-I epitopes and single class-II epitopes were finally selected because most of epitopes were rejected as they showed overlapping with the already fitted B cell epitopes. IEDB with ANN algorithm was used for the prediction of all length class-I epitopes and based on the cut off threshold value 199 high affinity epitopes were predicted by IEDB. Among 199 epitopes, only 17 epitopes corresponding to 62 HLA alleles were selected as they were not showing any overlap with the B-cell epitopes. Propred with default parameters was used for the prediction of class-II epitopes and 51 peptides with highest binding affinity related to 51 alleles were predicted. We have selected only 4 best peptides belonging to 12 alleles based on epitope fitting criteria. EpiToolKit with SYFPEITHI method was also accessed for the prediction of peptides binding to class-I and only six epitopes related to 11 alleles were selected. The remaining non binding epitopes of HER-2 protein were substituted with high binder consensus peptides in such a way that the substitution of the non-binder peptides must not overlap with the B cell conformational epitope. This task was attained by searching for amino acid residues that were within 10 Å of the conformational epitopes as the DiscoTope makes prediction by using amino acid contact within 10 Å area and the observed neighboring residues (Table 1) of the conformational epitopes were kept conserved. We substituted only those non-binder epitopes that were outside the 10 Å area, on either side of the conformational epitope so that the substituted amino acids may not cause any distortion in the conformational B cell epitopes due to neighboring residue alteration. Considering the fact of enhancing immune responses, we predicted consensus epitopes by using RANKPEP server and 5 consensus epitopes were finally added as a substitution to non immunogenic regions. To enhance the potency of our vaccine, we also added a 9 mer peptide (AVVGILLVV) that has been considered as the most powerful therapeutic peptide eliciting robust antitumor responses (Gritzapisa et al. 2010). The final selection of T cell epitopes is shown in Fig. 4 and all epitopes except conformational B cell epitopes that were used finally for model fitting are listed in Supplementary Table 1. Because of highly polymorphic nature of MHC, different patients typically bind different repertoires of peptides hence it is crucial to identify the optimal set of peptides for a vaccine, given constraints such as MHC allele probabilities in the target population and maximum number of selected peptides. It has been investigated that the most common HLA in the general population is HLA-A*0201, which accounts for 30–40% of the major ethnicities (Rosenberg 2001). In our construct we incorporated 4 epitopes (QLFEDNYAL, KIFGSLAFL, ILHNGAYSL, TLQGLGISWL) related to HLA-A*0201 and other epitopes for HLAs like A*0301 (ELHCPALVTY, RVLQGLPREY), B*0702 (GPEADQCVA), DRB1*1101 (FQNLQVIRG) that are widely distributed in population were also observed, hence contributing enormously to major ethnicities.

Table 1 List of various neighboring amino acids of conformational B cell epitopes within 10 Å area
Fig. 4
figure 4

T cell epitopes and their allele frequency. Epitopes from various immunoinformatic tools are discriminated by color codes. Blue (IEDB-MHC I), dark red (HLApred-MHC I), dark green (EpiToolKit-MHC I), red (HLAPred/PROPRED-MHC II), olive green (IEDB-MHC I/PROPRED-MHC II), orange (PROPRED-MHC II), grey (IEDB-MHC II), purple (epitope with proved potent antitumor properties), green (Rankpep) (color figure online)

In the multi-domain structure of mAb, the VL and VH are the variable domains along with light and heavy chain. Each of these domains, VL and VH consists 3 hypervariable loops—L1, L2, L3, h1, h2, and h3 in most antibody structures which forms the antigen binding site or complementarity determining regions (CDRs). Five out of the six hypervariable loops, L1, L2, L3, h1, and h2 can adopt a limited set of conformations, i.e., canonical conformations, while the conformation of loop h3 is often highly variable. These canonical conformations are defined by the loop length and the conserved signature residue positions (Chothia and Lesk 1987; Morea et al. 2000). Key signature residue Pro95 (Table 2) was identified in L3 loop (Fig. 5) hence; it was selected for docking purpose. The energy of L3 loop was optimized by Superlooper (Hildebrand et al. 2009) and the best loop conformation was selected on the basis of best-fit loop parameters (Fig. 6) of Superlooper. The energy of mAb protein docking is listed in (Supplementary Table 2). It has been reported by various workers that all the conformational epitopes are not discontinuous (Rajkannan and Malar 2007; Gupta et al. 2009). Epitope-mAb docking studies reveal the confirmatory potential of T cell epitopes to generate B cell immune responses. Out of 34 T cell epitopes, YCFGGGHKR epitope was discerned as best T cell confirmatory epitope by mAb screening. The docking score comprised of all of the possible interactions present between ligand and receptor were calculated and it was found that YCFGGGHKR was having highest score of 119.56 followed by FLQDIQEVQ with a score of 113.44. Also TQVCTGTDM and TYLPTNASL were having good score of 112.34 and 111.49, respectively. Such findings are pretty interesting since the confirmatory epitopes may act as T cell as well as B cell epitopes. Immunological researches have been also shown the traces of generating humoral responses by T cell epitopes (del Guercio et al. 1997; Gupta et al. 2009).

Table 2 Loops identified by antibody modeling
Fig. 5
figure 5

Three-dimensional structure of monoclonal antibody (2HFG) showing the binding region of epitopes in Kappa L3 loop

Fig. 6
figure 6

Best conformation of optimized Kappa L3 loop, selected on the basis of best-fit loop parameters (RMSD stem: 0.053 Å, MinDist: 3.50 Å, MaxDist: 4.81 Å and Score: 0.157) of Superlooper

The epitope fitted HER-2 protein sequence was reverse translated into nucleotide sequence and codon frequency table for Homo sapiens was used to replace each amino acid of input sequence with the corresponding most frequently occurring codon. The Codon Adaptation Index (CAI) of 1.0 is considered to be ideal for the expression of construct and lower the number, the higher the chance that the gene will be expressed poorly. Here we got a CAI of 1, thus probably the expression of the proposed construct would be optimal (Fig. 7a). The ideal percentage range of GC content is between 30 and 70% in Homo sapiens and any peaks outside of this range will adversely affect transcriptional and translational efficiency. Average GC content of 66.58 was noticed in our antigenic insert which clearly gives an indication that this insert can optimally transcribe and translate in host organism (Fig. 7b). The percentage distribution of codons was also examined. The value of 100 is set for the codon with the highest usage frequency for a given amino acid in the desired expression organism. Codons with values lower than 30 are likely to hamper the expression efficiency. The results (100% distribution of codons) from this analysis also support quality expression of insert as shown in Fig. 8. Therefore, there are almost negligible chances for hampering of expression efficiency of our finally constructed antigenic insert. The mammalian expression vector pSecTag2 has been used as vehicle for delivery and expression of vaccine antigens. The DNA vector contains the genetic material to produce an antigen, allowing that antigen to be transcribed in the host cell. Prolonged expression of vector in host cell is attributed to cellular promoters and the CMV promoter present in the vector induces high-level constitutive expression in a variety of mammalian cell lines. DNA vaccines are commonly delivered by intramuscular injection, with the majority of DNA being taken up by muscle cells. Cross-presentation of antigens from the transfected cells by APC is believed to be the major mechanism that then allows activation of T cells (Rice et al. 2008). Hence, targeted transfer of antigen to APC from other cell types expressing the vaccine construct could enhance cross-presentation and the efficiency of DNA vaccination.

Fig. 7
figure 7

a Codon adaptation index (CAI) of antigenic insert in human and b average GC content adjustment of antigenic construct

Fig. 8
figure 8

The percentage distribution of codons in antigenic construct. The value of 100 is set for the codon with the highest usage for a given amino acid in the desired expression organism

Protective and therapeutic effects of DNA vaccines that encode secreted proteins, which consist of the extracellular domain of CTLA-4 fused to fragments of the ErbB2/Neu and NY-ESO-1 tumor antigens have been investigated by Sloots et al. 2008. Besides, specific targeting of the expressed gene product to APC can enhance the efficiency of ErbB2 DNA vaccines has also been noticed. In a specific case study, although untargeted ErbB2222 protected 40% of BALB/c mice against challenge with transplanted tumor cells, vaccination with CTLA-4-ErbB2222 and CTLA-4-Neu224 elicited protective immunity in 80 and 100% of the treated animals against challenge with ErbB2- or Neu-expressing tumor cells, respectively (Sloots et al. 2008). Hence we have chosen human CTLA-4 domain for targeting of ErbB2 protein antigens to B7-expressing APC. We generated recombinant DNA vaccine that contains the extracellular domain of human CTLA-4 (Supplementary material) for specific targeting of an ErbB2 protein (HER-2_624) to B7-expressing APC (Supplementary Fig. 4). The respective gene products may carry the extracellular domain of CTLA-4 (CTLA-4_124) for binding to costimulatory B7 molecules on the surface of APC, fused to a complete epitope sequence of HER-2. In comparison with corresponding vaccines without a cell targeting domain, fusion to the CTLA-4 fragment has potential to augment vaccine activity and resulting in markedly enhanced protection against challenge with tumor cells expressing the respective antigens. Recombinant protein vaccines that contain the extracellular domain of CTLA-4 for specific targeting of an ErbB2 protein fragment to B7-expressing APC have been previously generated and such CTLA-4-ErbB2 fusion proteins purified from Escherichia coli lysates were found to bind to and enter B7-expressing cells in vitro, and in contrast to a corresponding untargeted ErbB2 protein fragment induced potent protective immunity in mice against challenge with ErbB2-expressing renal carcinoma cells (Rohrbach et al. 2005). So there are major chances that the targeting of the predicted ErbB2 protein antigens to B7-expressing APC will likely enhance the efficiency of the proposed HER-2 DNA vaccine in Humans.

With the accelerated pace of immune-informatics in the field of cancer vaccines, a new era of effective cancer immune therapies is emerging (Sette and Fikes 2003). The search for immunogenic peptides to be used as vaccine candidates by traditional molecular immunology techniques is prolonged and expensive (Gupta et al. 2010b). Presently bioinformatics plays most important role in clinical informatics and drug discovery through computational methods and also approaches that may improve vaccine efficacy in general. Many researchers (Chaitra et al. 2005; Parida et al. 2007; Wiwanitkit 2007; Doytchinova and Flower 2007) have successfully implemented computational approaches for identification of probable vaccine candidates, which supports the potential of in silico studies in vaccinology. Potential multi-antigen vaccines predicted by in silico approach may prevent the escape of genetically unstable tumor cells that have lost antigen expression.

Conclusion

It is now well established that breast cancer is an immunogenic tumor. Although passive immunity with trastuzumab is approved for treatment of breast cancer, however, a number of concerns exist with passive immunotherapy. Treatment is expensive, and has a limited duration of action, necessitating repeated administrations of the monoclonal antibody. Active specific immunotherapy offers the possibility of generating sustained anti-HER-2 immune responses and is potentially more effective than passive approaches, particularly when the application is primary or secondary cancer prevention. The majority of currently published clinical trials of HER-2 vaccines have focused on immunizing patients with fragments of the HER-2 protein or peptides designed to stimulate a specific immune response. In this work, we made all possible efforts to make the entire HER-2 sequence fully immunogenic by mapping almost all types of high affinity antigenic determinants against breast cancer using various computational approaches. The non immunogenic regions were excluded from the HER-2 sequence and the highly immunogenic consensus epitopes were integrated at desired locations without any alteration to the conformational B cell epitopes and their neighboring residues. We believe that development of such a proposed vaccine is a suitable method to induce strong B- and T cell multiepitopic responses in breast cancer patients. Consequently, the data reported in this study would be valuable for further breast cancer vaccine developments since promiscuous peptide binders allow the total number of predicted epitopes to be minimized without compromising the population coverage, required in the designing of vaccines. Plasmid based strategies may allow the delivery of multiple antigens simultaneously but plasmid-based DNA vaccines, however, have been only minimally effective when translated to human clinical use, because of the low local transduction of DNA into APC as most vaccines are delivered to muscles. Therefore, to make our plasmid-based tumor antigen vaccine therapeutically effective we added costimulatory molecule functioning to transform local APC in the skin to stimulate immunity. In silico studies also confirmed efficient transcriptional and translational, as well as the quality expression of the proposed construct in Homo sapiens. Although, DNA vaccine developed in this work is an acceptable approach for a candidate vaccine however, clinical trials are requisite to validate the efficacy of proposed vaccine.