Introduction

γ-Aminobutyric acid (GABA) is an active biogenic substance present in the central nervous system (Cohen et al. 2002). It is involved in the regulation of the sleep–wake cycle, reducing blood pleasure (Inoue et al. 2003), prevention of diabetic condition, inducing insulin secretion from the pancreas (Adeghate and Ponery 2002; Hagiwara et al. 2004). Abnormalities in glutamate decarboxylase (GAD) function and reduced GABA levels are reported in people with many neurological disorders (Möhler 2012). GAD is a pyridoxal 5′-phosphate dependent enzyme that catalyzes l-glutamate decarboxylation to γ-aminobutyric acid (Komatsuzaki et al. 2005). Many bacterial GADs exhibit optimal activity at a pH range of 4.0–5.0, whereas at neutral pH, their activity decreases sharply. But Among the microorganisms GADs, GAD from Enterococcus faecium DO is active even in the neutral pH and has high performance (Hagiwara et al. 2004). The optimum temperature and pH for GAD activity were 30 C and 6–7.5, respectively (Lim et al. 2016). Km and Vmax values of GAD from Enterococcus strains were 3.26–5.26 mM and 1.20–3.45 μM/min, respectively (Chang et al. 2017; Lee et al. 2017). GAD from E. faecium DO has 466 amino acids with a molecular mass of 53.7 kD (NCBI_017960.1, UniProtKB- Q3Y080).

Escherichia coli is the most commonly used expression system in recombinant protein production (Rosano and Ceccarelli 2014), due to (i) fast growth (Sezonov et al. 2007); (ii) high cell density is easily attained (Shiloach and Fass 2005); (iii) inexpensive complex media can be used for growth (Sivashanmugam et al. 2009); (iv) well-characterized genetics, physiology and metabolism (Andersen et al. 2013); (v) simple fermentation, and favorable economics (Daegelen et al. 2009). E. coli strain BL21 (DE3) can direct high-level expression of cloned genes under the control of the T7 promoter (Kim et al. 2017).

The recombinant GAD enzyme has already been produced in the cytoplasm of E. coli (Fan et al. 2012; Yu et al. 2012), although our purpose is to express this enzyme in the outer membrane of the cell. One of the important challenges that cells face is the protein transfer from their site of synthesis in the cytosol to their sites of function. E. coli without suitable signal peptide, cannot be used for secretory proteins. Choosing a suitable signal peptide is a critical step in the secretory expression of different proteins (Choi and Lee 2004). Therefore, evaluation of different SP for expression recombinant glutamate decarboxylase in the outer membrane of E. coli is extremely crucial for increase GABA production. The secretion of recombinant GAD to the outer membrane of E. coli has several advantages over intracellular production. These benefits include minimizing protein degradation, simplifying downstream purification, reduces production costs, enhanced biological activity, higher product stability and solubility, and further N-terminal authenticity of the expressed peptide (Mergulhao et al. 2004). High-level expression of the recombinant GAD in cytoplasmic, periplasmic and outer membrane leads to aggregation of misfolded protein (Chang et al. 2017; Ueno 2000). Nevertheless, Santos et al. (2012) and Chang et al. (2017) mentioned that with a simple refolding process, it was converted to a folded protein with an acceptable efficiency.

In general, there are three main pathways in bacteria for translocation of proteins across the cytoplasmic into the periplasm, outer membrane or extracellular that have been classified to the general secretion pathway (Sec-pathway); the twin-arginine translocation (TAT-pathway) and the signal recognition particle pathway (SPR pathway) (De Marco 2009). It seems Sec and SRP pathways are more essential than the TAT pathway because folding and purification of secretory proteins in outer membrane space are more natural than in the cytoplasm (Choi and Lee 2004). Since the degradation of secretory proteins is less than cytoplasm, it can be concluded that the SPs using these pathways can be more appropriate than SPs which use TAT pathways (Natale et al. 2008).

The Sec machinery recognizes an N-terminal hydrophobic signal sequence. A cysteine residue follows immediately after the signal peptide cleavage site; this signal peptide is recognized and cleaved by lipoprotein signal peptidase (SPaseII or Lsp) after the N-terminal cysteine is modified with a lipid moiety, which anchors the protein to the membrane. Finally, an additional fatty acid is attached to the new N-terminus (Juncker et al. 2003). These proteins are then either retained at the cytoplasmic membrane or translocated into the outer membrane by the Lol lipoprotein-sorting pathway (Lewenza et al. 2008). Signal peptides for the sec pathway are typically 20 amino acids in length and generally consist of the following three domains: (i) a positively charged n-region that often contains Lys or Arg residues, (ii) a hydrophobic h-region and (iii) an uncharged but polar C-region (Papanikou et al. 2007). The cleavage site for the signal peptidase is located in the c-region (Green and Mecsas 2016).

Several articles have been published about “In silico analysis of different signal peptides for the secretory production of recombinant protein” (Mohammadi et al. 2019; Vahedi et al. 2019; Zamani et al. 2015). However, various signal peptides for the secretory production of recombinant protein, including the inner membrane (IM), periplasm, outer membrane (OM), and extracellular have been compared in one topic, and no distinction was made between them. Therefore, in the present study, in addition to seeking to find the best signal peptide, we carefully examine the protein localization and compare only the signal peptides expressing the Gad enzyme in E. coli’s outer membrane. This study was aimed only to predict best signal peptides to express recombinant glutamate decarboxylase in the outer membrane of E. coli. Also, there is no study to evaluate different signal peptides in connection with GAD and their probable effect on appropriate protein secretion. Furthermore, in this research several bioinformatics tools compared to the prediction of the subcellular localization, solubility and the secretion properties of proteins such as PSORTb, CELLO, Gneg-PLoc, ProtComp, SOLpro, PROSO II, CcSOL omics, Wilkinson and Harrison model, protein-sol, SODA, PRED-TAT, and SignalP 5.0 webservers.

Materials and Methods

Signal Sequence Collection and Study Design

In this study, an amino acid sequence encoding Glutamate decarboxylase of E. faecium DO was obtained from the UniProtKB server at http://www.uniprot.org/. GAD of E. faecium DO (UniProtKB- Q3Y080) has 466 amino acids with a molecular mass of 53.7 kD. Also, the amino acid sequences of 127 signal peptides were retrieved from the Signal Peptide Database (http://www.signalpeptide.de/). Signal sequences are listed in supplementary Table 1.

The Amino Acid Sequence of the GAD Enterococcus faecium DO

Translation = “MLYGKDNQEEKNYLEPIFGSASEDVDLPKYKLNKESIEPRIAYQLVQDEMLDEGNARLNLATFCQTYMEPEAVKLMTQTLEKNAIDKSEYPRTTEIENRCVNMIADLWHAPNNEKFMGTSTIGSSEACMLGGMAMKFAWRKRAEKLGLDIQAKKPNLVISSGYQVCWEKFCVYWDVELREVPMDEKHMSINLDTVMDYVDEYTIGIVGIMGITYTGRYDDIKGLNDLVEAHNKQTDYKVYIHVDAASGGFYAPFTEPDLVWDFQLKNVISINSSGHKYGLVYPGVGWVLWRDQQYLPEELVFKVSYLGGEMPTMAINFSHSAAQLIGQYYNFVRYGFDGYRDIHQRTHDVAVYLAKEIEKTGIFEIINDGSELPVVCYKLKEDPNREWTLYDLSDRLLMKGWQVPAYPLPKDLDQLIIQRLVVRADFGMNMAGDYVQDMNQAIEELNKAHIVYHKKQDVKTYGFTH”.

Computational Tools and Determine the Characteristics of Signal Peptides

Identification of Sub-Cellular Localization Site of Glutamate Decarboxylase

Gram‐negative bacteria have five major subcellular localization sites: the cytoplasm, the periplasm, the inner membrane, the outer membrane, and the extracellular space. The OM is the outermost structure in Gram-negative bacteria and hence is the interface between the cell and the environment (Mogensen and Otzen 2005). Since subcellular location plays a crucial role in protein function, the availability of systems that can predict location from the sequence will be essential to the full characterization of expressed proteins. Experimental determination of subcellular location is mainly accomplished by three approaches: electron microscopy, fluorescence microscopy, and cell fractionation. These methods are very variable and time-consuming (Paladin et al. 2017). To predict signal peptides by in silico methods, different bioinformatics tools have been developed that are based on neural networks, weight matrices, or sequence alignments (Gardy et al. 2004).

Computational prediction of the Final position of proteins is a major tool for automated protein annotation and genome analysis. Due to a protein’s subcellular localization can provide clues regarding its function in an organism and is critical to a wide range of studies (Yu et al. 2014). Several algorithms have been developed to the prediction of the subcellular localization of proteins such as PSORTb, CELLO, Gneg-PLoc, and ProtComp servers. The predictive websites are listed as follows (Table 1):

Table 1 The predictive website addresses and their features

The performance of CELLO, PSORT-B, Gneg-mPLoc, and ProtCompB servers compared in Table 2. According to the results, ProtCompB achieved better prediction accuracy and sensitivity for all outer membrane signal peptides of E. coli than the other approaches. The overall prediction precision of ProtCompB reached 94.12%, which was 6.62% and 28.56% higher than CELLO (87.5%) and PSORT-B (65.56%). Noticeably, ProtCompB prediction MCC for outer membrane location (p = 96%) is higher than other predictors. In general, ProtCompB gave significantly better predictive performances for outer membrane signal peptides of E. coli. For this reason we used the ProtCompB server to predict the final subcellular localization of the GAD enzyme connected with different signal peptides. Precision is a measure of the ability of the system to predict only the relevant data. Accuracy of the system is defined by the closeness of its prediction toward the true values. The MCC calculates the correlation between the prediction and the observation (Gardy et al. 2004; Shen and Chou 2010; Yu et al. 2014; http://www.softberry.com 2016).

Table 2 The comparison of performances in outer membrane signal peptides of Escherichia coli

ProtComp B server was used for in silico study and prediction of the final destination of Glutamate decarboxylase linked to different signal peptides (http://www.softberry.com). ProtCompB Version 9 combines several methods of protein localization prediction—neural networks-based prediction; direct comparison with bases of homologous proteins of known localization; comparisons of pentamer distributions calculated for query and DB sequences; prediction of specific functional peptide sequences, such as signal peptides and transmembrane segments. It means that the program treats correctly only complete sequences, containing signal sequences, anchors, and other functional peptides if any. The most important point is that, in this server, if both NNets and other predictions point to the same compartment, this is a very reliable prediction. The aggregate produced by ProtCompB has been reported as one of the most precise ensemble methods in subcellular localization predictions in general (http://www.softberry.com 2016).

Prediction of n, h, and c Regions, Cleavage Site and Signal Peptide Probability

The “n, h and c” regions were predicted by the SignalP 3.0 server at http://www.cbs.dtu.dk/services/SignalP3.0/because SignalP 4.1 and SignalP 5.0 servers are not able to evaluate n, h, and c Regions. The output of SignalP 4.1 was reported as five scores. The discrimination score (D-score) and S-score recognized cleavage sites and signal peptide positions, respectively. The Y-score was the geometric average of the C-score and the slope of the S-score, which results in the more precise prediction of the cleavage sites than the raw C-score. The average of the S-score was S-mean. D-score was the average of the S-mean and Y-max, which indicated the primary distinction between secretory and non-secretory proteins (Nielsen 2017). SignalP server as the most accurate and reliable tool for identification of cleavage sites works based on a combination of several neural networks, namely artificial neural network (ANN) and hidden Markov model (HMM) and average accuracy is 87% (Petersen et al. 2011). The presence of cleavage sites, their locations in signal peptide and signal peptide probability were assigned by SignalP 4.1 and SignalP 5.0 servers.

Investigation of Physicochemical Parameters of Signal Peptides

Physicochemical properties of signal peptides, including the length of SP sequence, molecular weight, theoretical PI, aliphatic index, instability index, grand average of hydropathicity (GRAVY), extinction coefficients, positively and negatively charged residues and estimated half-life were determined by ProtParam using the ExPASy server at http://web.expasy.org/protparam/. ProtParam computes various physicochemical properties that can be deduced from a protein sequence. No additional information is required about the protein under consideration. ProtParam, as a part of ExPASy and maintained by SIB and the European Bioinformatics Institute (EBI), is considered very trustable for computation of physicochemical properties of proteins (Gasteiger et al. 2005).

Protein Solubility Prediction

Prediction of protein solubility upon expression in E. coli was made by SOLpro, PROSO II, CcSOL omics, Wilkinson and Harrison model, protein-sol and SODA webservers.

SOLpro predicts protein solubility in E. coli using a two-stage SVM architecture based on multiple representations of the primary sequence (Cheng et al. 2005). Each classifier of the first layer takes as input a distinct set of features describing the sequence. A final SVM classifier summarizes the resulting predictions and predicts if the protein is soluble or not as well as the corresponding probability (Magnan et al. 2009). This webserver can be accessed from URL: http://scratch.proteomics.ics.uci.edu/.

PROSO II (Protein Solubility evaluator II) classifies proteins in soluble and insoluble categories at http://mbiljj45.bio.med.uni-muenchen.de:8888/prosoII/prosoII.seam. It is built on sequence composition and similarity-based model. This server can detect the subset of sequence features that possess the strongest impact on protein solubility (Smialowski et al. 2012). PROSO II employs a model based on a logistic function and an adapted Parzen window algorithm trained on experimental data extracted from the pepcDB (Berman et al. 2008) and PDB (Berman et al. 2000) databases.

CcSOL algorithm predicts protein solubility using physicochemical properties. The server also computes point mutations throughout the whole protein sequence to identify susceptible areas. CcSOL omics can be freely accessed on the web at http://service.tartaglialab.com/page/ccsol_group. In CcSOL, hydrophobicity, hydrophilicity, β-sheet, and α-helical propensities are combined into a solubility propensity score that is useful to investigate protein expression (Agostini et al. 2014).

SODA uses the propensity of the protein sequence to aggregate as well as intrinsic disorder, plus hydrophobicity and secondary structure preferences to estimate changes in the solubility. Also, SODA can evaluate difficult types of variation including point mutations, deletions, and insertions (Paladin et al. 2017). The webserver can be accessed from URL: http://protein.bio.unipd.it/soda.

The Wilkinson-Harrison model is based on two parameters: average charge, determined by the relative numbers of Asp, Glu, Lys and Arg residues, and the content of turn-forming residues (Asn, Gly, Pro, and Ser). Protein solubility was calculated according to Wilkinson-Harrison using their webserver (http://www.biotech.ou.edu/) (Idicula-thomas et al. 2005; Smialowski et al. 2006b).

Protein-Sol is a webserver for predicting protein solubility in a graphical format. This webserver is available at http://protein-sol.manchester.ac.uk. The tool can highlight lysine and arginine content regarding modifying protein solubility (Hebditch et al. 2017).

The performance of different methods for predicting protein solubility is presented in Table 3. The protein-sol was the single best performing method in this comparison with accuracy, Matthew’s correlation coefficient (MCC) and area under the receiver operating characteristic curve (AUROC) equal to 82.8%, 0.382 and 0.922, respectively (Agostini et al. 2014; Magnan et al. 2009; Paladin et al. 2017; Smialowski et al. 2012). It was followed by the ccSOL omics method. The Lowest performance was related to the Wilkinson and Harrison model. Protein-sol was proposed recently and shown to outperform previous methods in a comparative study led by the authors (Hebditch et al. 2017).

Table 3 Evaluation of performances of SOLpro and PROSO II servers in prediction of protein solubility

The receiver operating characteristic curve (ROC) portrays the relationship between the true positive rate and the false positive rate of the classifier (Smialowski et al. 2006b). AUROC measures the discriminating ability of the model and it takes values between 0.5 for a random drawing and 1.0 for the perfect classifier (Smialowski et al. 2012). It is often interpreted as a probability that if you randomly draw one positive and one negative instance, the one scored higher by the model will be actual positive (Frank et al. 2004).

Evaluation of the Secretion Properties of Signal Peptides

To sort SPs based on the secretion properties, PRED-TAT and SignalP 5.0 webservers were used. PRED-TAT operates based on Hidden Markov Models (HMMs) (Bagos et al. 2010). It can be accessed from http://www.compgen.org/tools/PRED-TAT/submit. PRED-TAT had MCC, CS recall and CS precision of 0.82–0.97, 0.72–0.78, 0.17–0.76 for predicting Sec pathway and Tat pathway SPs for Gram-negative bacteria, respectively (Bagos et al. 2010).

SignalP 5.0 is a deep neural network-based method combined with conditional random field classification and optimized transfer learning for improved SP prediction. SignalP 5.0 can differentiate between “standard” signal peptides translocated by the Sec translocon (Sec/SPI) and “Tat” (Twin-Arginine Translocation) signal peptides translocated by the Tat translocon (Tat/SPI) in Bacteria. In general, SignalP 5.0 distinguishes three types of signal peptides in prokaryotes: Sec substrates cleaved by SPase I (Sec/SPI), Sec substrates cleaved by SPase II (Sec/SPII), and Tat substrates cleaved by SPase I (Tat/SPI). SignalP 5.0 is available at http://www.cbs.dtu.dk/services/SignalP/index.php (Armenteros et al. 2019). To apply all webservers, each signal peptide was linked to the N-terminal of GAD amino acid sequence so that methionine residues were put in between SP and GAD amino acid sequence. SignalP 5.0 had MCCs of 0.907, 0.960 and 0.981 for predicting Sec/SPI SPs, Sec/SPII,Tat/SPI SPs for Gram-negative bacteria, respectively. Also, Regarding CS precision, the performance of SignalP 5.0 varies between 0.630 and 0.970, whereas its CS recall varies between 0.579 and 0.970. SignalP 5.0 performs as well as PRED-TAT for predicting Tat/SPI SPs in Gram-negative bacteria. SignalP 5.0 displayed the highest CS precision and CS recall scores in Gram-negative bacteria. Finally, SignalP 5.0 has the best SP discrimination in the Sec and Tat pathways (Armenteros et al. 2019).

Results and Discussion

Predicting Subcellular Localization of GAD Connected to Different Signal Peptides

ProtCompB webserver was used for predicting the subcellular location of GAD connected to different signal peptides. The predicted localization site of our protein with all signal peptides is shown in supplement’s Table 3. According to the Sub-cellular localization analysis results, it can be seen that among 127 SPs, the final localization site for 13 signal peptides (RZOR, FAED, Bla, ccmH, cexE, dsbG, pspE, torT, eglS, yehD, ASPG_ERWCH, yiiX, and bcsB) were in the outer Membrane space (Table 4).

Table 4 Identifying the sub-cellular location of GAD connected to different signal peptides by ProtComp server

Prediction of n, h and c-Regions and Signal Peptide Probability

The results showed that SPs’ D-scores were between 0.642 (RZOR) and 0.893 (pspE) (Table 5). The most important parameter for the diagnosis of a SP is the discriminating score (D-score) which is usually described with a cut-off value of 0.5. Only when an SP sequence has a D-score above 0.50, it is considered. In silico analysis results of the SignalP server have also indicated that the highest D-score belonged to pspE, ccmH, ASPG_ERWCH and yiiX, respectively (Table 5).

Table 5 Signal peptide probability and n, h and c regions of signal sequences

The sequences with a D-score higher than 0.57 were classified as putative signal peptides, whereby sequences possessing a D-score above 0.7 had a high probability that they did so. The used setting was E. coli, default D-cutoff value of 0.57 and standard graphics output. To use the server, for the evaluations on the whole secretory candidate protein, each SP sequence was connected to the N-terminal of glutamate decarboxylase amino acid sequence and methionine residues were inserted between each SP and GAD amino acid sequence.

For in silico investigation of n, h and c regions, SignalP version 3.0 was applied. The results showed that the collected SPs’ n-region length was between 3 and 17, h-region length was between 7 and 12, and c-region length was between 2 and 10 amino acids. It seemed all SP sequences in our study not only had a D-score above 0.50, but also contained distinct n, h and c regions (Table 5).

The N and h-regions play a critical role in transferring recombinant proteins into outer membrane space, while c-region plays a vital role as a cleavable site which can be distinguished by signal peptidase enzyme. Therefore a reliable SP sequence should have clear n, h and c regions (Owji et al. 2018). The hydrophobicity factor extremely relies on the length of h-region. The increase in the length of h-region would improve the level of hydrophobicity (Papanikou et al. 2007). Accordingly, there has been a significant diversity in the length of SPs h-region (7–12). Considering h-regions in Table 5, which indicate the hydrophobicity levels of the signal peptides torT, RZOR, FAED, eglS, yehD, and bcsB have the highest hydrophobicity levels among all 13 signal peptides.

Cleavage Site Prediction

According to the results (Table 5), all 13 signal peptides implying that signal peptidase enzyme correctly identified their cleavage sites. The c-region is the site of signal peptide cleavage by the signal peptidase. An “A-x-A” box sequence is believed to govern the cleavage motif in E. coli, which is characterized by the presence of alanine amino acid at the positions − 3 and − 1 relative to the signal peptidase cleavage site (Von Heijne and Abrahmsèn 1989). According to consensus motif A-X-A, the “x” is a large bulky residue like Phe, Tyr, Leu, and His at position -2 (Pratap and Dikshit 1998). Six of our SPs have AxA motif in their cleavage sites, including ccmH, cexE, dsbG, ASPG_ERWCH, eglS, and yiiX (Table 5).

Investigation of Physicochemical Parameters

The different physico-chemical properties of signal peptides, including the length of SP sequence, molecular weight, theoretical PI, aliphatic index, instability index, GRAVY and positively and negatively charged residues were evaluated by the ProtParam server, as shown in Table 6 and supplementary Table 4. The in silico results showed that the SP lengths were between 17 (dsbG) to 35 (FAED) amino acid for 13 sequences, with an average of 22 amino acids (Supplementary Table 4). Also, the lowest and the highest Mw belonged to dsbG (Mw sp = 1839.44, Mw sp connected to GAD = 55667.79) and FAED (Mw sp = 3698.48, Mw sp connected to GAD = 57526.83), respectively (Table 6 and supplementary Table 4).

Table 6 Physico-chemical properties of the GAD connected to signal peptides determined by ProtParam

All the selected SPs had net positive charges (Arg-Lys) of 1–4 and negative charges (Asp-Glu) of 0–1 based on ProtParam results, whereas the range of PI signal peptide and PI of the signal peptides connected to GAD were between 8.02 (Bla)—11 (yehD, yiiX) and 5.05 (ccmH, torT)—5.2 (FAED), respectively (Table 5 and supplementary Table 4). A net charge of at least one is assumed essential for the efficient export of the recombinant protein and different signal peptides may require different magnitudes of positive charge for maximum efficiency (Low et al. 2013). A net positive charge in the N region (arginines and/or lysines) enhances the processing and translocation rates protein to the outer membrane (Guo et al. 2018).

As it is observed, the lowest and the highest GRAVY belonged to bcsB and eglS, respectively (Table 6). The grand average of hydropathy score (GRAVY) for a protein is calculated as the sum of hydropathy values of all the amino acids, divided by the number of residues in the sequence (Kyte and Doolittle 1982). A positive GRAVY is a positive indicator of hydrophobicity and a negative indicator of hydrophilicity. Therefore, in addition to presenting the hydrophobicity of the protein, it can show an association with its solubility. A more hydrophobicity implies a higher ability of the protein in hydrogen bonding formation with water molecules and higher solubility (Gasteiger et al. 2005; Low et al. 2013).

The aliphatic index is another factor, which indicates the hydrophobicity value. The highest aliphatic index belonged to dsbG and the lowest belonged to bcsB (Table 6). It seems, according to our results, all SPs have appropriate GRAVY and aliphatic index to use. The aliphatic index is defined as the relative volume occupied by the aliphatic side chains (i.e., alanine, valine, isoleucine, and leucine) in an amino acid sequence. Consequently, the SPs which have a high GRAVY and aliphatic index are much better to apply (Gasteiger et al. 2005).

Instability index of five signal peptides (Separately) including Bla, eglS, yehD, yiiX, and bcsB were more than 40, so they were predicted as unstable (supplementary Table 4). However, according to our results in Table 6, the instability index of signal peptides in connection with GAD was between 34.39 (ccmH) and 37.46 (eglS). Instability index all the signal peptides in connection with GAD were less than 40 and predicted as stable. A protein whose instability index is smaller than 40 is predicted as stable, a value above 40 predicts that the protein may be unstable (Gamage et al. 2019).

Protein Solubility Prediction by Several Computational Methods

We evaluated our signal peptides by directly applying the SOLpro, PROSO II, ccSOL omics, Wilkinson and Harrison model, protein-sol and SODA webservers. The solubility of glutamate decarboxylase in connection with the 13 studied signal peptides analysis results showed that GAD was insoluble and Insolubility probability in E. coli was between 0.566 (ccmH) and 0.593 (pspE) out of 1 (Table 7).

Table 7 Solubility of the signal peptides predicted by SOLpro, PROSO II, ccSOL omics, Wilkinson and Harrison model, protein-sol and SODA servers

High-level expression of the recombinant GAD in cytoplasmic, periplasmic and outer membrane leads to aggregation of misfolded protein (Chang et al. 2017; Ueno 2000). As in our experiments, the Gad enzyme was expressed as an inclusion body. As Santos et al. (2012) and Chang et al. (2017) mentioned, with a simple refolding process which has acceptable efficiency, is converted to a folded protein.

As Chang et al. (2013) mentioned, the solubility of passenger proteins seems essential for efficient outer membrane expression, considering that the insoluble proteins may misfold or form inclusion bodies in this cellular compartment.

These insoluble proteins need to be solubilized and refolded to obtain functional proteins (Paladin et al. 2017). The researchers observed that insoluble proteins more frequently contained hydrophobic stretches of 20 or more residues, had lower glutamine content (Gln composition < 4%), fewer negatively charged residues (Asp +Glu composition < 17%) and a higher percentage of aromatic amino acids (aromatic composition > 7.5%) than soluble proteins (Smialowski et al. 2006a).

Changing the growth conditions, such as growth temperature, pH of the culture medium, concentration of inducer and induction time can be effective in decreasing the formation of inclusion bodies and improve the solubility of glutamate decarboxylase (Fan et al. 2012). At the isoelectric point (pI), proteins have a net zero charge, attractive forces predominate, and molecules tend to associate, resulting in insolubility (Gromiha 2010). Also, most proteins could be expressed as a soluble protein in the presence of sorbitol, arginine, and trehalose or chemical additives in the expression medium (Godbey 2014). These materials can suppress the formation of inclusion bodies through decrease the non-covalent interactions between protein molecules. Thus, increase the solubility of target protein in E. coli overexpression systems (Gromiha 2010).

Secretion Sorting of Signal Peptides

The classification was confirmed by detection of signal peptides based on the secretion properties using the PRED-TAT and SignalP 5.0 servers. The results demonstrated that all 13 SPs belonged to the Sec pathway (Table 8).

Table 8 Secretion sorting of SPs by PRED-TAT and SignalP 5.0 servers

Overall Considerations and Selection of the Best Potential SPs

Based on the results, sub-cellular localization sites of 13 signal peptides were in the outer membrane of E. coli, where the signal peptidase enzyme properly identified their cleavage sites. Also, according to the computational analysis, the most suitable candidates seemed to be torT with a reasonably high D-score, aliphatic index and GRAVY, followed by ccmH and then pspE (Figs. 1, 2 and 3).

Fig. 1
figure 1

Localization prediction for GAD connected to torT signal peptide

Fig. 2
figure 2

Prediction the presence and location of signal peptide cleavage sites in GAD amino acid sequence linked with torT signal peptide

Fig. 3
figure 3

In silico distribution of GAD solubility attached to torT signal peptide

There is a need for increased protein solubility to produce proteins on a large scale for industrial purposes. Over-expression of proteins in E. coli leads to the formation of insoluble protein or inclusion bodies, because bacteria lack the necessary system for protein folding in the natural form. Therefore, protein produced by in vitro conditions needs to be refolded.

There are different techniques for refolding of the inclusion body proteins including adding accelerant, chromatography, dialysis, dilution, and ultrafiltration, etc. (Godbey 2014). Commonly used chemical additives for protein refolding are denaturants [urea, guanidinium chloride (GdnHCl)], detergents (Triton X-100, CHAPS, SDS, N-lauroylsarcosine and CTAB Detergents with cycloamylose or cyclodextrin) and inhibitors (arginine hydrochloride, arginine amide, glycine amide, proline) (Gromiha 2010).

Conclusion

γ-Aminobutyric acid has broad potential for application as a bioactive additive in the food and pharmaceutical industries. GABA is biosynthesized from l-glutamate and this reaction is catalyzed by glutamate decarboxylase. The best approach for the transfer of GAD to outer membrane space is using a suitable signal peptide. The identification of suitable SPs is one of the most vital steps to produce secretory proteins as a recombinant protein in E. coli. The computational method provides the ability to rapidly predict possible secretory SPs and other features in the efficient secretion. A list of secretory SPS can provide an opportunity to select the best option based on efficient secretion.

The secretory SPs’ D-scores were between 0.642 (RZOR) and 0.893 (pspE). Considering h-regions in Table 4, which indicate the hydrophobicity levels of the signal peptides torT, RZOR, FAED, eglS, yehD, and bcsB have the highest hydrophobicity levels among all 13 signal peptides. All 13 signal peptides implying that signal peptidase enzyme correctly identify their cleavage sites. The secretory SPs having the highest GRAVY were eglS, torT, Bla, dsbG, and FAED. Instability index all the signal peptides in connection with GAD were less than 40 and predicted as stable. Six of our SPs have AxA motif in their cleavage sites, including ccmH, cexE, dsbG, ASPG_ERWCH, eglS, and yiiX. Finally, the most suitable candidates seemed to be torT with a fairly high D-score, aliphatic index, and GRAVY, followed by ccmH and then pspE, which are Sec-pathway SPs. torT accelerates GAD scale-up production and might be useful in future experimental research.