Introduction

Klebsiella pneumoniae, a member of Enterobacteriaceae family, is an important cause of nosocomial infections and is responsible for significant morbidity and mortality in patients with deficiencies in the immune system (Tsai et al. 2010). Klebsiella is an extracellular parasite and immunity against it is largely mediated by antibodies (Schulz 1996). Besides, there are several lines of evidence to support an essential role of CD4+ T cell in protection against this pathogen (Hagen et al. 1998). Several studies have shown the requirement of major histocompatibility complex (MHC) class II for protection in mice (Hagen et al. 1998; Zisman et al. 1998). Additionally, CD4+ T cell has an important role in supporting both B cell and CD8+ T cell function (Li et al. 2010). These findings suggest that an effective immunotherapeutic strategy against K. pneumoniae should include B and CD4+ T helper cell (Th) epitopes. Nowadays, most of the available vaccines against K. pneumoniae are based on native components such as capsular polysaccharides (CPSs) and lipopolysaccharides (LPSs) (Clements et al. 2008; Yadav et al. 2005). Toxic reactions, which arise from LPS-based vaccines and a high number of antigens applied in CPS-containing vaccines indicate critical disadvantages of such vaccines (Lundberg et al. 2013). An efficient modality to fight K. pneumoniae needs exploration of new antigens and making novel recombinants (Li et al. 2010).

Outer membrane proteins (Omps) are a series of channel proteins that span the outer membrane of gram-negative bacteria (Schulz 1996). Thus, they allow the permeation of a broad range of components, which are necessary for growth and function of the cell (Hong et al. 2006). In addition to transport function, Omps act as virulence factors during bacterial infection (Galdiero et al. 2003). On other hand, sequence comparisons between members of each enterobacterial Omp family have implied that their sequences are highly conserved (Braun and Cole 1984). Therefore, Omps are ideal candidates for vaccine development against Enterobacteriaceae since they have the ability to induce strong immunity response and cover a wider spectrum of pathogens. Previous research has recognized a number of Omps such as OmpA, FepA, OmpC, OmpX and OmpW from K. pneumoniae which are able to induce robust immune responses against this pathogen (Galdiero et al. 2003; Kurupati et al. 2006).

Active immunotherapy, such as an epitope-based vaccine, has recently drawn much attention in treating infectious diseases (Yang and Yu 2009). Immunization based on epitope-based vaccines is powerful in stimulation of the cellular and/or humoral arms of the immune system (Bijker et al. 2007). These types of vaccines consist of highly immunogenic T and/or B cell epitopes, which provoke cytotoxic T cells (CTL), Th or B to specific epitopes (Baloria et al. 2012; Akhoon et al. 2011). B and Th cells play an important role in induction of a protective immune response in many bacterial infections; thus, determination of peptides that induce T and B cell responses is a crucial requirement for the design of effective epitope-based vaccines (Gupta et al. 2010, 2012). The epitope-based vaccines have some potential advantages, such as ability to choose the type of immunity, cost effective production and increased safety. In contrast, they have a few limitations such as low immunogenicity of single epitope (Sbai et al. 2001). Generally, different strategies such as increasing the number of antigenic epitopes and enabling the insertion of antigenic epitopes into immunogenic adjuvants or a carrier protein are being used to prevail epitope vaccine deficiencies (Yi et al. 2004; Coban et al. 2011).

Immune-informatics or computational biology have added an unavoidable contribution to design epitope-based vaccines. In this context, identification of potential epitopes from an antigen protein by in silico methods can be considered in such vaccines reducing the lengthy process for discovery of appropriate epitopes (Srivastava et al. 2011). Nowadays, many online servers are available for predicting B and T cell epitopes. In this respect, Immune Epitope Database (IEDB) server website (Vita et al. 2010) provides tools to predict both B and T cell epitopes. Other online servers such as MetaMHCII (Hu et al. 2010), ProPred (Singh and Raghava 2001), MHCpred (Guan et al. 2003) and SVMHC (Dönnes and Elofsson 2002) have different tools for finding T cell epitopes. Besides, several web-servers such as Discotope (Kringelum et al. 2012) and CBTOPE (Ansari and Raghava 2010) provide access to prediction of conformational B cell epitopes on an antigen sequence. However, determining the 3D structure of a discontinuous B cell polytopic construct is critical since such construct representing discontinuous epitopes should mimic the structure of antigenic protein epitopes (Ponomarenko and Regenmortel 2009). In order to the technical difficulties and labor intensiveness of experimental techniques for the structural characterization of proteins, the numbers of designed epitope-based or chimeric protein vaccines were frequently modeled by computational approaches by different researchers (Nezafat et al. 2014; Nazarian et al. 2012). There exist several methodologies for the structure prediction of proteins including comparative modeling, threading and ab initio modeling (Yang et al. 2015; Roy et al. 2010). It is demonstrated that the composite approaches, which combine various techniques of protein structure prediction, have significant advantages in protein structure prediction (Srivastava et al. 2011). I-TASSER is one of such servers, which combines threading and ab initio structure prediction methods to obtain the full-length model (Yang et al. 2015; Roy et al. 2010).

In the present study, two multi-epitope vaccinal constructs, which are based on the Omps of the K. pneumoniae have been designed. Besides, in order to minimize possible problems that occur in designing epitope-based vaccines, and the resulting low efficiency, the design process was modified to create a broad spectrum vaccine that covers virulence Enterobacteriaceae.

Materials and Methods

Retrieving Reference Sequences of OmpA, FepA, OmpW, OmpX and OmpC of K. pneumoniae

Complete putative OmpA (Accession number [AN]: NC_012731.1), FepA (AN: NZ_JQSE01000018.1), OmpW (AN: NZ_AJVY01000178.1), OmpX (AN: NZ_JQSE01000029.1) and OmpC (AN: NZ_JQSE01000032.1) of K. pneumoniae mentioned as reference sequences in the National Center for Biotechnol-ogy Information (NCBI) Databases (http://ncbi.nlm.nih.gov/) were retrieved, separately. The sequences were saved in FASTA format and performed for subsequent analysis.

Entropy Plot and Alignment for Finding the Mutational/Conserved Regions

Thirteen sequences of OmpA of K. pneumoniae and other Enterobacteriaceae were retrieved from NCBI by direct searching. Eleven sequences of FepA, 14 sequences of OmpW, 22 sequences of OmpX and 12 sequences of OmpC were also obtained by the mentioned strategy. Selected sequences and their accession numbers are given in Online Resource 1. Retrieved sequences of each Omp were aligned, analyzed and trimmed using Bioedit software version 7.7.9, separately. Partial sequences and areas with ambigu-ous alignment were omitted and Shannon entropy values (Shannon 1948) were measured for retrieved sequences of the five mentioned proteins, separately. Shannon entropy analysis measures the variable and conserved regions in the set of aligned sequences. The Shannon entropy score (Hx) ranges from 0 to 4.322 for every position in an alignment. Typically, positions with Hx ≤ 1.0 are considered highly conserved positions (Litwin and Jores 1992). The epitopes from highly conserved regions are likely to evoke more immune responses (Sánchez-Burgos et al. 2010; Gupta et al. 2011).

Prediction of Topology of the Omps

While designating B cell epitopes of transmembrane proteins, it is important to determine the amino acid positions, with respect to the lipid bilayer. In order to determine the topology of sequences of OmpA (AN: NC_012731.1), FepA (AN: NZ_JQSE01000018.1), OmpW (AN: NZ_AJVY01000178.1), OmpX (AN: NZ_JQSE01000029.1) and OmpC (AN: NZ_JQSE01000032.1) of K. pneumoniae, PRED-TMBB server (http://bioinformatics.biol.uoa.gr/PRED-TMBB) was employed. PRED-TMBB predicts the transmembrane strands and the topology of β-barrel Omps of gram-negative bacteria based on a Hidden Markov Model.

3D Structure Prediction and Validation of the Omps

Since 3D structures of FepA, OmpX and OmpW of K. pneumoniae were not available in RCSB Protein Data Bank; therefore, modeling of these proteins were done by using I-Tasser server (http://zhanglab.ccmb.med.umich.edu/I-TASSER). I-TASSER is an integrated platform based on multiple threading alignment for automated protein structure prediction (Yang et al. 2015; Roy et al. 2010). The tool PyMOL was used to visualize the modeled 3D structures.

To recognize the errors in the generated FepA, OmpX and OmpW models, coordinates were supplied by uploading 3D structures in PDB format into ProSAweb, separately. ProSAweb, which is frequently exploited in protein structure validation, analyzes the energy distribution in protein structure to determine a structure as native- like or fault (Wiederstein and Sippl 2007). ProSAweb z-score indicates overall model quality, and its value is displayed in a plot that contains the z-scores of all experimentally determined protein chains in current PDB.

The modeled structures of the FepA, OmpX and OmpW were evaluated to see the quality of the resulting stereochemistry of structure by using Ramachandran plot in PROCHECK (http://swissmodel.expasy.org/workspace) (Laskowski et al. 1993), separately. Ramachandran plot, a way to visualize backbone dihedral angles psi against phi of amino acid residues in protein structure, is a regular tool utilized in determining protein structure (Lovell et al. 2003). All residues in an adequate packed protein have fractional volumes near to 1.0 ± 0.1.

Gromos96 forcefield (Gunsteren et al. 1996), implemented in Swiss-PdbViewer v.4.2 (Kaplan and Littlejohn 2001) was performed for energy minimization of the modeled proteins. Energy minimization procedure helps to correct the stereochemistry of the model via eliminating bad contacts between protein atoms and structural water molecules (Laskowski et al. 1993). Minimum energy arrays of atoms correspond to stable states of the system (Srivastava et al. 2011).

Representation of Discontinuous B Cell Epitopes of Omps

Discontinuous B cell epitopes in 3D structure of OmpA (PDB: 2K0L), OmpC (PDB: 1OSM), and modeled FepA, OmpW and OmpX were predicted by using Discotope Server (http://www.cbs.dtu.dk/services/DiscoTope/) (Kringelum et al. 2012). DiscoTope predicts discontinuous epitopes from 3D structure of proteins by applying a linear combination of the normalized values of the hydrophilicity, amino acid statistics, number of contacts, and area of relative solvent accessibility for each residue. In the current study, in order to insert in the B cell construct, discontinuous epitopes from each Omp, which were located in the extracellular surface not in the highly variable regions of the original Omp, were selected.

Prediction of CD4+ T Cells Epitopes

MetaMHCII online tool at http://www.biokdd.fudan.edu.cn/Service/MetaMHCII/server.html (Hu et al. 2010) and Propred at http://www.imtech.res.in/raghava/propred/ (Singh and Raghava 2001) were performed to predict 9mer linear CD4+ T cell epitopes of OmpA of K. pneumoniae. This strategy was also used for identification of 9mer CD4+ T epitopes from FepA, OmpW, OmpC and OmpX of the mentioned pathogen, separately. MetaMHCII implements consensus, probabilistic meta-predictor (PM), AvgTanh and MetaSVMp approaches for combining the results of different servers and, therefore, it has a better performance than discrete predictors (Hu et al. 2010). ProPred implements matrix-based prediction algorithm employing amino-acid/position coefficient table deduced from the literature (Singh and Raghava 2001). The maximum accuracy of ProPred is 75 at a 4 % threshold (default threshold). HLA-DRB1*0101 is one of the most frequent alleles in Caucasians (Pedron et al. 2005). Hence, predictions of epitopes were checked for this allele.

Construct Design, Fusion of Epitopes and Improving Immunogenicity

In order to acquire maximum yield in immunization, it is valuable to consider placing patterns of epitopes in the right positions near each other. Tandem fusion of epitopes to each other and/or adjuvant without proper linkers can result in generation of a new protein with novel properties (Livingston et al. 2002; Yano et al. 2005). To overcome these challenges, a linker sequence of NH2-EAAAK-COOH was inserted within the B cell construct. In the case of CD4+ T epitopes, to provide target-specific cleavage in lysosomal degradation machineries, linker sequences of GPGPG or NH2-AAY-COOH or KK were incorporated between epitopes within the T cell construct sequence. Epitopes were arranged randomly in each construct. In the case of B cell construct, two repetitions of each epitope were inserted in the construct. The frequency of each linker and their connection order with epitopes in T cell construct were optimized by predicting physiochemical characteristics.

The addition of an immunogenic protein sequence can also enhance epitope-based vaccine potency and efficacy (Capone et al. 2006). Hence, for improving immunogenicity of the B cell construct, the complete sequence (548 AA) of the GroEL protein (HSP60) of Salmonella typhi (AN: NP_458769.1) was retrieved from NCBI in FASTA format and added to N-terminal of the polytope via the helical linker (NH2-EAAAK-COOH). The role of GroEL in priming of the humoral and cellular immune responses is being exploited in vaccine development in infectious diseases (Panchanathan et al. 1998; Chitradevi et al. 2013). For T cell construct, the complete sequence of the Heparin-binding hemagglutinin (HBHA) of Mycobacterium tuberculosis (AN: ZP_07011362.1) was obtained from NCBI, and its functional amino acid residues were added to the N-terminal of the construct as an adjuvant. HBHA is an immune adjuvant that has the functional role in binding to toll like receptor 4 (TLR4) (Jung et al. 2011, Adams 2009). TLR ligands agonists, such as TLR4, have strong immunostimulatory effects and can be employed as adjuvant in immunotherapy (van der Burg et al. 2006). Since HBHA is a functional protein, the helical EAAAK linker was placed at both the -NH2 and -COOH termini of the partial selected sequence of this protein to reduce the interaction with other construct regions and cause more efficient separation (Arai et al. 2001).

Evaluation of the Physicochemical Parameters

Protein sequence statistics for B and T cell constructs including amino acid composition, theoretical pI, instability index, in vitro and in vivo half-life, aliphatic index, grand average of hydropathicity (GRAVY) and molecular weight were computed applying ProtParam tool (http://web.expasy.org/protparam/). ProtParam results present the physicochemical parameters of uncharacterized proteins. SOLpro server at http://scratch.proteomics.ics.uci.edu/ was used to predict the propensity of protein solubility upon over-expression in Escherichia coli. SOLpro performs a two-stage SVM architecture method based on multiple representations of the primary sequence (Magnan et al. 2009). The overall accuracy of this server is evaluated over 74 % using multiple runs of tenfold cross-validation.

Posttranslational Modification Analysis

For posttranslational modification analysis of B and T cell constructs, the NetNGlyc1.0 and NetOGlyc4.0 online tools available at http://www.cbs.dtu.dk/services/ were applied. To predict N-glycosylation sites in human proteins, the NetNGlyc server performs artificial neural networks (ANNs) that examine the sequence context of asparagine- any amino acid- serine/threonine (Asn-Xaa-Ser/Thr) sequence (Cai et al. 2003). The NetOGlyc predicts mucin-type GalNAc O-glycosylation sites in mammalian proteins based on neural network (Steentoft et al. 2013).

Calculation of Hydrophobic Regions

In order to evaluate hydrophobic behavior of amino acid sequences of each construct, different methods were exploited in BioEdit software. To evaluate the hydrophobic and hydrophilic regions of the constructs, the algorithm of Kyte and Doolittle (1982) was employed. Peak and trough regions in the profile diagram exhibit hydrophobicity and hydrophilicity, respectively.

Reverse Translation and Codon Optimization

Reverse translation of the B and T cell constructs into DNA sequence and adaptation of the DNA sequences to E. coli codon usage (codon optimization) were performed by JCAT (http://www.jcat.de) and OPTIMIZER (Puigbo et al. 2007), separately.

Allergenicity Evaluation

To analyze the allergenicity of the B and T cell constructs, AlgPred web server at http://www.imtech.res.in/raghava/algpred/ was employed. AlgPred projects the allergenicity based on similarity of known epitope with any region of the protein. Hybrid prediction approach (SVMc+IgEepitope+ARPsBLAST+MAST), exploited at AlgPred, predicts protein allergenicity with a high accuracy (85 % at a threshold _0.4) (Saha and Raghava 2006).

B Cell Construct Modeling and Evaluation

For tertiary structure prediction of the B cell construct, I-TASSER server was used. The tool PyMOL software was used to visualize the modeled 3D structure. ProSAweb was used to recognize the errors in the generated models. In order to see the quality of the resulting stereochemistry of the best model, Ramachandran plot in PROCHECK software was performed (Laskowski et al. 1993). The energy minimization of the structure was done by GROMOS96 implemented in Swiss-PdbViewer v.4.2 (Kaplan and Littlejohn 2001).

Prediction of Immunogenic Epitopes of the B Cell Construct

A construct including discontinuous B cell epitopes should produce B cell mediated immunity to be a good vaccine candidate. For prediction of discontinuous epitopes of the B cell construct, modeled construct was subjected to Discotope server.

Results

Entropy Plot for Finding the Conserved Sites

Based on the entropy plot, a high conservation (Hx ≤ 1) was observed along the OmpA sequence (Fig. 1a). Results of the entropy plot also reflected that the conservation (Hx ≤ 1) is very high along the FepA, OmpW, OmpX and OmpC sequences, separately (Fig. 1b–e). More specifically, according to the entropy plot (Fig. 1a), four highly variable regions (regions above threshold 1) were observed along OmpA that is located in residues 47–49, 143, 182 and 303. Besides, three highly variable regions were detected along FepA that is located in residues 146, 514–516 and 569–570 (Fig. 1b). No highly variable regions were observed along the amino acid sequence of OmpW (Fig. 1c). In the case of OmpX, six highly variable regions were found along this protein that is located in residues 41, 46, 76–77, 115, 120, 116–117 (Fig. 1d). Finally, 11 highly variable regions were found along OmpC that is located in residues 57, 59, 88–92, 102, 164, 178–186, 191–200, 232–244, 277–285, 320–328 and 363–364 (Fig. 1e).

Fig. 1
figure 1figure 1

Variation plot of residues along; OmpA (a), FepA (b), OmpW(c), OmpX (d), OmpC (e) sequences. Regions above threshold 1 are parts with high variations playing an essential role in vaccine designing

Prediction of the Topology of the Omps

The topology of each Omp of K. pneumoniae was predicted by using PRED-TMBB. Graphical representation of the position of the transmembrane β strands with respect to the lipid bilayer and the location of the loops (periplasmic/extracellular) for OmpA, OmpC, FepA, OmpX and OmpW are represented in Online Resource 2a–e, respectively.

3D Structure Prediction of the Omps

Comparative modeling of the FepA, OmpX and OmpW sequences was exploited by using I-TASSER, separately, to generate 3D models. The quality and potential errors in 3D models of each protein were checked by ProSAweb. The z-scores of starting input structures of FepA, OmpW and OmpX were −3.21, −3.68 and −3.36, respectively, which appear within the range of scores determined in native proteins of similar size. However, in order to obtain the best model of each protein with minimal bad contacts, energy minimization was done. After energy minimization process, the z-scores of the modeled FepA, OmpW and OmpX were improved to −3.49, −3.94 and −3.54, respectively. The predicted 3D structures of the FepA, OmpW and OmpX after energy minimization, visualized by PyMOL software, are shown in Online Resource 3a–c, respectively. Besides, according to energy minimization, the energy minimized FepA, OmpW and OmpX models have admissible stability (−40141.125, −8960.531 and −7712.953 kcal/mol, respectively) compared to that of the initial models (−49541.617, −10828.097, −9683.152 kcal/mol, respectively). These data show that for each protein, the minimized energy structure has more stability in proportion to the initial model.

Evaluation of Models Stability

Before and after energy minimization, the predicted structures of the FepA, OmpW and OmpX were separately validated for their reliability and structural quality based on the Ramachandran plot quality assessment analysis. Obtained results after energy minimization for FepA, OmpW and OmpX are given in Online Resource 4a–c, respectively. Data show that most residues of each afore-mentioned Omps after energy minimization are within >90 % (allowed) regions, separately.

Prediction of B Cell Epitopes in the 3D Structures of the Omps

The predicted discontinuous B cell epitopes in 3D structures of OmpA, OmpC, FepA and OmpX by Discotope server are demonstrated in Online Resource 5. No discontinuous epitope was determined in 3D structure of OmpW. The selected discontinuous epitopes for performing in the B cell construct are shown in Table 1. All of the picked epitopes are located in the extracellular surface (outside) of their original Omps (Online Resource 2a–e). Besides, selected epitopes are not located in the highly variable regions of the parental Omps (Fig. 1).

Table 1 Predicted B cell epitopes of four Omps by DiscoTope server used in the B cell construct

Defining CD4+ T Cell Epitopes

9mer CD4+ T epitopes of OmpA sequence of K. pneumoniae were determined using MetaMHC and ProPred servers. This strategy was also used for identification of 9mer CD4+ T epitopes from FepA, OmpW, OmpC and OmpX sequences of K. pneumoniae, separately. In order to achieve the high ranked epitopes of OmpA of K. pneumoniae, the peptides with low MetaSVMp values in MetaMHC server and high scores in ProPred server were selected. Then, the high ranked epitopes which were designated from the MetaMHC server were utilized in the final selection with the ProPred server. This strategy was also used for final prediction of CD4+ T epitopes from FepA, OmpW, OmpC and OmpX, separately. The final selected epitopes from the five above mentioned proteins are shown in Table 2. The position of the epitopes on the original protein sequences are also shown in this table. Moreover, all of the predicted epitopes are located in conserved regions (Fig. 1).

Table 2 Predicted T CD4+ cell epitopes of five Omps by two different servers used in the CD4+ T cell construct

Primary Analysis of HBHA Sequence

It is reported that the important functional region of HBHA is placed between amino acid residues 1–158, and the low complexity region (LCR) of protein is located between residues 159 and 199 (Nezafat et al. 2014). Accordingly, we omitted the LCR from HBHA. Moreover, in order to obtain a shorter sequence suitable to insert in the construct, amino acid residues of signal peptide from HBHA was predicted using SignalP 4.0 server (http://www.cbs.dtu.dk/services/SignalP/) and excluded from HBHA sequence. The results showed that these residues are located between 1 and 27.

Construct Design

A schematic diagram of the designed B and T cell constructs are shown in Fig. 2a, b, respectively. For B cell construct, Ep1–Ep4 (two repeat for each epitope) are the epitopes fused to each other by a linker. NH2 terminus of B cell construct was fused to GroEL of S. typhi using EAAAK linker (Fig. 2a). In the case of T cell construct, E1–E25 (one repeat for each epitope) are the epitopes that were linked to each other by a linker. HBHA (residues 28–158) works as the adjuvant sequence to improve immunogenicity (Fig. 2b).

Fig. 2
figure 2

Schematic diagram depicting the designed B cell construct (a) and T cell construct (b). Amino acid sequence of each construct, is also shown below each diagram where the green letters show the amino acid sequences of the construct and black letters represent linkers (Color figure online)

Evaluation of the Physicochemical Parameters

According to PratParam server results, molecular weights were calculated as 72.99 and 55.06 kDa for B and C cell construct, respectively. The theoretical isoelectric point value (pI) is defined as the pH at which the surface of the protein is covered with charge but the net charge of the protein is zero. pI is valuable for assessment of mobility in an electric field. The calculated pIs of the B and T cell constructs were computed to be 5.59 and 6.03, indicating that both B and T cell constructs are acidic in nature. Instability index (Ii) provides the evaluation of the stability of a protein in vitro. On the basis of Ii, Expasy’s ProtParam classified the B cell construct (Ii = 27.30) and T cell construct (Ii = 38.42) as stable. The GRAVY values of the constructs were defined −0.285 and −0.369 for B and C cell construct, respectively. The negative GRAVY value indicates hydrophilisity of the construct that results in better interaction with the surrounding water molecules. The aliphatic indexes were defined as 90.09 and 76.18 for B and C cell constructs, respectively. The high aliphatic indexes show that the constructs are stable for a wide range of temperatures. The chance of constructs solubility upon overexpression in E. coli was computed to be 0.65 (B construct) and 0.97 % (T construct) by SolPro.

Hydrophobicity is a crucial challenge in cloning and expression of constructs. Besides, it helps to estimate the efficiency of vaccines (Kyte and Doolittle 1982). BioEdit software version 7.7.9 was employed to explore the hydrophobic behavior of the B and T cell constructs (Fig. 3a, b).

Fig. 3
figure 3

The hydrophobic profile diagram obtained by using the algorithm of Kyte and Doolittle for the B cell construct (a) and T cell construct (b). Size of window is 21. Well regions represent hydrophilicity and are antigenic regions. Regions above the threshold (0) are projected to be hydrophobic regions

Posttranslational Modification Analysis

Posttranslational modification analysis was done to check the existence of probable significant modifications of both constructs after their administration in mammal cells. For B cell construct sequence, three N-glycosylation and four O-glycosylation were predicted to exist. Two N-glycosylation and four O-glycosylation were predicted to exist within the T cell construct sequence. These results show that both constructs are free of much posttranslational modification that could be effective on immunogenicity.

Codon Optimization

From different ways to enhance the efficiency of gene expression, one of the most effective ones is codon optimization. It helps to achieve optimum expression of a cloned gene in the recombinant host cells (Sandhu et al. 2008). Reverse translation and codon optimization of the nucleotide sequences of B and T cell constructs were done by JCat and Optimizer. Codon adaptation index (CAI) >0.8 is favorable for high-level expression in different expression hosts. In this study, in the case of B cell construct, CAI of the optimized gene sequence is 0.83 (Fig. 4a). The ideal range of GC content, a measure of transcriptional and translational efficiency, is 30–70 %. The overall GC content of the B cell construct is 49.04 % (Fig. 4b). Moreover, codon with a frequency distribution of 91–100 in the gene construct is 55 % (Fig. 4c); codons with values lower than 30 may decrease the expression efficiency. For T cell construct, CAI of the optimized gene sequence is 0.82 (Fig. 4d), the overall GC content of the construct is 51.59 % (Fig. 4e) and codon with a frequency distribution of 91–100 in the gene construct is 54 % (Fig. 4f). Two negative cis-acting elements, which were found after optimization in the nucleotide sequence of both construct, were removed, separately.

Fig. 4
figure 4

Results of codon optimization of the B and T cell constructs. In the case of B construct, the CAI value of gene sequence is 0.83 (a), average GC content of gene sequence is 49.04 (b) and codon with a frequency distribution of 91–100, 81–90, 71–80, 61–70 and 51–60 are respectively 55, 9, 7, 6 and 21 % in gene sequence (c). For T cell construct, the CAI value of gene sequence is 0.82 (d), average GC content of gene sequence is 51.59 (e) and codon with a frequency distribution of 91–100, 81–90, 71–80, 61–70 and 51–60 are respectively 54, 5, 6, 13 and 20 % in gene sequence (f)

Six Histidine (His) codons were also located in the 3′ end of each construct for purification purpose. Aiming to clone the genes in prokaryotic expression vectors, the NcoI and XhoI restriction sites were added to 3′ and 5′ ends of each construct, respectively.

Allergenicity Evaluation

The allergenicity analysis was performed using the AlgPred server. Based on the hybrid approach in AlgPred, the B and T cell constructs were not detected as potential allergens.

B Cell Construct Modeling and Evaluation

Modeling of the B cell construct sequence was done by using I-TASSER to produce 3D models of the construct. The ProSAweb z-score of starting input structure of the best predicted model of the B cell construct was −11.24 and after energy minimization procedure, z-score improved to −11.45. ProSAweb result indicated that the construct is within the range of scores determined in native proteins of similar size. According to energy minimization, the energy minimized model has more acceptable stability (−31556.324 kcal/mol) compared to that of the initial model (−27960.523 kcal/mol). These data show that the minimized energy structure has more stability in proportion to the initial model. The predicted model of the B cell construct after energy minimization was visualized by PyMOL (Fig. 5). Ramachandran plot quality assessment analysis of the modeled B construct, after energy minimization, showed that most residues of the model are within >90 % (allowed) regions (Fig. 6).

Fig. 5
figure 5

Predicted 3D model of the B cell construct visualized by PyMOL Viewer tool. The GroEl domain, epitoes and linkers are displayed in red, green and dark pink colors, respectively (Color figure online)

Fig. 6
figure 6

Ramachandran plot of the modeled B cell construct predicted by PROCHECK. This plot indicated that 85.6 % of residues are located in most favored regions, 12.2 % in additional allowed regions, 1.0 % in generously allowed regions and 1.2 % in disallowed regions of the plot

Prediction of Antigenic B Cell Epitopes of the B Cell Construct

Full length B cell construct was subjected to conformational B cell epitope prediction using Discotope server. Out of 696 total residues, 72 conformational B cell epitopes were identified (Table 3).

Table 3 Conformational B-cell epitopes determined from 3D structure of B cell construct using DiscoTope server

Discussion

Immunotherapy is a prominent and effective strategy for reducing morbidity and mortality caused by infectious diseases. In recent years, using a rational step-by-step approach to multi-epitope vaccine design has attracted more global attention (Chiarella et al. 2009). By evolution of bioinformatics approaches and its associated branch, immunoinformatics, rapid advance occurred in the field of vaccinology enabling rational design methods of polytopic vaccines (Yang and Yu 2009; Tomar and De 2010). Several approaches have been performed for development of a vaccine against K. pneumoniae (Yadav et al. 2005; Edelman et al. 1994; Ahmad et al. 2012). Candidate proteins for vaccines against this pathogen are immunogenic surface antigens (Kurupati et al. 2006; Ahmad et al. 2012; Florea et al. 2003). Omps family members have been recognized to own these properties and are consequently ideal candidates for vaccine preparation (Kurupati et al. 2006).

In K. pneumoniae infection, B cells are able to induce robust humoral immunity and CD4+ T cells have a fundamental role in priming and maintenance of pathogen specific humoral and cellular immune responses (Hagen et al. 1998). Hence, identification of the discontinuous B cell epitopes and linear CD4+ T cell epitopes of bacterial antigenic Omps could contribute to better understanding of the protective immunity to K. pneumoniae and, also facilitate the preparation of effective anti Klebsiella vaccines. A discontinuous epitope consists of atoms from distant residues joined on the antigenic protein surface in the 3D space and determines antigenicity (Baloria et al. 2012). Discontinuous epitope may be bound to either a B cell receptor or an immunoglobulin and triggers humoral immune response (Ponomarenko and Regenmortel 2009). In this research, it has been attempted to engineer complex efficient polytopic vaccine based on discontinuous B cell epitopes and linear CD4+ T cell epitopes of five Omps from K. pneumoniae, using various bioinformatics approaches.

Prediction of discontinuous epitopes needs knowledge of protein structures. In case the 3D structure of the protein or its homologue is known, a discontinuous epitope can be mapped on the protein structure. Thus, by using the in silico methods, it is possible to model the structure of the pathogenic antigens and map conformational epitopes on the models (Ponomarenko and Regenmortel 2009). Accordingly, in this work, 3D structures of OmpA and OmpC of K. pneumoniae were obtained from RCSB Protein Data Bank. Besides, since 3D structures of FepA, OmpX and OmpW of K. pneumoniae were not available in Protein Data Bank; therefore, their 3D structures were predicted and evaluated using in silico approach. Discontinuous epitopes were designated from five afore-mentioned Omps using online server. The selected epitopes of each Omp, which were not located in the highly variable regions of the original protein (Fig. 1) were selected for further analysis.

B cell epitope region of the bacterial membrane proteins, including Omps, is surface-exposed which reacts with infected host antibodies (Stathopoulos 1996). For this reason, while predicting B cell epitopes from Omps, it is often very important to predict the location of membrane spanning segments along the sequence. In the current study, topology map of five Omps was predicted to find extracellular regions. It should be noted that since the input sequences were trimmed using BioEdit to find conserved regions of each Omp, we would say that this topology might not be a true representation for all the Omp homologs. In the next step, the conserved B cell epitopes of each Omp, were checked whether they were present in outer transmembrane regions of the original protein, using PRED-TMBB results. Epitopes exposed on the surface of the membrane of the Omps were performed in the B cell construct. In a part of the current research, the highly immunogenic CD4+ T cell epitopes were selected based on physicochemical properties and different prediction algorithms. A combination of MetaMHCII and ProPred servers with different algorithms was used to obtain more accurate binder peptides, which are bound to DRB1*0101 allele.

A critical issue in vaccination is the improvement of immune responses by enhancing immunogenicity. There are several methods to increase the immunogenicity of polytopic vaccines (Ingolotti et al. 2010). In this research, some of these approaches were used to enhance the immunogenicity of the constructs at different levels of designing. At first, the size of the constructs was increased by multiplication of the epitopes and incorporation of immunogenic sequence tags (adjuvants) because of the fact that construct which has small size may be rapidly cleared from the body (García-Briones et al. 2004; Gerner et al. 2006). Besides, appropriate linkers were performed to both constructs. Linkers are the non-immunogenic motifs which have an essential role in structural and functional features of a polytopic construct (Livingston et al. 2002). EAAAK linker improves the structural flexibility of a protein via prevention of non-native interactions between different domains that may interfere with the correct folding (George and Heringa 2002). KK spacer, the target sequence of lysosomal cathepsin B, is one of the linkers performed in the CD4+ multi-epitope vaccines (Livingston et al. 2002; Yano et al. 2005). Besides, GPGPG linkers induce Th responses and keep conformational dependent immunogenicity of helpers as well as antibody epitopes (Livingston et al. 2002). The AAY spacer eliminates junctional epitopes successfully and enhances epitope presentation. AAY extensions greatly decrease the binding affinity, and thus AAY-containing epitopes do not bind efficiently (Zhang et al. 2014).

To attain a high-level expression of each recombinant construct in the E. coli host, codon optimization was performed to improve the transcription efficiency and transcript stability. This was accomplished by developing CAI, the total GC content of DNA sequence and codon frequency distribution and removing negative elements that may form unfavorable secondary structures on mRNA. Solubility of overexpressed recombinant proteins in the E. coli host is one of the important requirements of many functional and biochemical assessments. The solubility chance of the B and T cell constructs (65 and 97 %) manifests that they show an acceptable percentage of solubility in an overexpressed mode.

However, in the field of peptide-based vaccine designing, it is important that discontinuous polytopic construct representing discontinuous epitopes mimics the structure of protein epitopes (Ponomarenko and Regenmortel 2009). For this reason, 3D structure of the B cell construct was determined, refined and validated. Then, discontinuous B cell epitopes on the construct were mapped. The result showed that the predicted discontinuous epitopes of the B cell construct (72 amino acid residue) (Table 3) are admissibly in common with total inserted discontinuous epitopes (108 amino acid residue) in the B construct (Fig. 2).

Existence of the high conservation between members of each Omp family indicates that these proteins may be suitable candidates for vaccine development against a wide range of gram-negative bacteria [Kurupati et al. 2006; Koebnik et al. 2000]. Consensus immunogen is a term utilized for immunogens that overcome the limitation of serotypes or amino acid variation of a pathogen that causes immune responses only to be efficient against some or only one serotype/s (Laddy et al. 2007). In this study, consensus highly immunogenic epitopes of Omps from K. pneumoniae and other pathogenic Enterobacteriaceae were predicted and inserted into the polytopic constructs. Therefore, the constructed complex vaccine based on Omps of K. pneumoniae in this work, could cover the genus Escherichia, genus Citrobacter, genus Enterobacter and other pathogenic Enterobacteriaceae. These conclusions may be useful in evaluation of the range of efficiency of the designed constructs against different bacteria.

Conclusion

The current study demonstrates an in silico approach to design efficient complex multi-epitope vaccine against K. pneumoniae. All constructs features are in line with this purpose. Epitopes of the designed constructs could potentially induce effective immune responses. Adjuvant sequences also could play a pivotal role in enhancing the immunogenicity of the constructs. This study could be useful in gaining insight towards the potential of epitope-based construct as an important protective and therapeutic approach for bacterial immunization. Ongoing studies will evaluate whether the polytopic vaccinal constructs could induce immune responses and protection against K. pneumoniae as well as other Enterobacteriaceae.