Introduction

The expression of genes in heterologous expression systems has made available large amounts of recombinant proteins that are limited in their natural sources (Ferrer-Miralles et al. 2009). In addition to being used in functional, structural, and biochemical studies, recombinant proteins are being formulated to be administered to patients in an increasing number of medical conditions and even produced at an industrial scale (Lingg et al. 2012; Reichert and Paquette 2003). Recombinant DNA technology is not only able to produce naturally occurring proteins, but it can also provide with an almost infinite plethora of variants of such natural proteins by mutations refining interesting features (like pharmacokinetics, targeting, etc.) while maintaining their biological function (Huang et al. 2008; Kamionka 2011). One case that exemplifies the great potential of this technology is human insulin. Insulin was the first recombinant product launched to the market in 1982. Since then, some mutant insulin have appeared with relevant implications in its time span of action improving the quality of life of insulin-dependent patients (Ferrer-Miralles et al. 2009; Kamionka 2011; Toyoda et al. 2012). Moreover, recombinant DNA technology allows the production of de novo designed proteins, not present in nature but resulting from the fusion of different active peptides or domains (each one from a different origin) in single-chain polypeptides with a modular architecture. This offers intriguing possibilities for the development of multifunctional and smart drug vehicles at the nanoscale range (Vazquez et al. 2009).

One of the most critical decision points when facing gene expression in a heterologous system is the cost of the procedure. There is a huge gap between the derived costs from using prokaryotic or low eukaryotic expression systems from those using a complex eukaryotic expression system. The difference is related not only to the cost of the culture medium used in the process but also to the process itself that in the case of complex eukaryotic expression systems might include sophisticated infrastructures to maintain cell growth. At the end, most production procedures are designed to be performed in a prokaryotic system and, among them, the most used is based on Escherichia coli strains. This is easily illustrated when consulting the origin of the recombinant proteins in databases such as the Protein Data Base (PDB): even though more than 50 % of the genes have a mammalian origin, only 3 % are expressed in mammalian expression systems, being E. coli used instead in more than 85 % of the cases. This data indicates the great potential of E. coli as a robust and reliable expression system. However, E. coli cells are not always able to render functional proteins (especially for complex eukaryotic proteins including post-translational modifications, or hydrophobic patches as in the case of membrane proteins with transmembrane domains) or even production is impeded (Freigassner et al. 2009). In those cases, production of the recombinant protein can be addressed using other prokaryotic expression systems such as psychrophilic bacteria (Unzueta et al. 2015). It is quite difficult to give exact data about the percentage of proteins that have failed to be expressed in prokaryotic hosts, since those results are not usually published and probably only the tip of the iceberg is available to the scientific community. This review is intended to give key clues to approach the expression of heterologous genes specifically for difficult-to-express (DTE) recombinant proteins in prokaryotic expression systems.

Where to start: analysis of properties of recombinant proteins with bioinformatic tools

In the design of the gene expression procedure, information about biological and physicochemical details from online bioinformatics resources is obtained just with the primary sequence of the protein (UNIPROT, EXPASY, see Fig. 1). Apart from those general parameters, a second line of important features are obtained when running programs to predict protein secondary structure and the presence of protein domains at Expasy (PROSITE, JPRED, CFSSP). Then, the tertiary structure can be either predicted (SWISS MODEL) or obtained if already deposited at the RSCB Protein Data Bank (PDB). At that point, the specific posttranslational modifications as glycosylations or disulfide bridges are revealed and the protein engineering design can be approached. In addition, computational modeling can be applied to increase protein stability by specific point mutations, as it has been demonstrated modifying solvent-exposed domains of complex proteins by using modeling programs. (Kim et al. 2016).

Fig. 1
figure 1

Alternative strategies to cope with difficult-to-express proteins. Recommended bioinformatics analysis as well as changes in the expression procedure and recombinant gene and alternative prokaryotic hosts are shown

Revisiting the recombinant gene

When a DTE protein is suspected or detected as an aggregation-prone protein, one of the most used strategies to overcome such problem is to modify the recombinant gene adding a solubilization tag, with the aim of obtaining the protein in its soluble form (Sun et al. 2011) (Fig. 1). However, the transfer of the insoluble protein to the soluble cell fraction does not guarantee the improvement in its conformational quality (Unzueta et al. 2015). In fact, it has been described that solubility is not a coincident parameter with conformational quality (Martinez-Alonso et al. 2008). Therefore, this strategy might not render the desired product at least when it comes to quality. An alternative to that strategy would be the refolding from IBs or its resolubilization under mild conditions (Singh and Panda 2005; Vallejo and Rinas 2004). Unfortunately both strategies use to give low efficiency in protein recovery.

One of the major problems observed in recombinant protein production is the difficulty of cell factories to manage the rare codons. As a consequence of the low availability of certain transfer RNA (tRNA) molecules, their associated codons are slowly translated. In recombinant protein production, the expression rates, yield, and final product quality can then be affected by the codon pattern of the gene of interest and by the codon bias of the expression system (Quax et al. 2015). If the synthesis is dramatically slowed, ribosomes could release prematurely the uncompleted polypeptide (Shah and Gilchrist 2010). In prokaryotes, the newly synthesized proteins start to fold in the ribosomal tunnel, protected by the ribosomal structure environment (O’Brien et al. 2014). This co-translational folding is influenced by the speed of polypeptide chain elongation and rare codons play a key role in the protein synthesis speed (Varenne et al. 1984). Codon usage then is another important parameter that should be taken into account for recombinant protein production.

Frequent and rare triplets are mixed within the same expression sequences and the analysis of elongation speed gives a codon landscape graph, which is typical for each gene. Because of this, the analysis and the optimization of the codon usage of the gene of interest and adjustment of the tRNA’s abundancy in cell factories are key steps to follow in order to overcome troubleshooting during the production process. In some cases, this strategy represents a significant increase in the production yield (Gustafsson et al. 2012) but in some instances this effect was not achieved (Gustafsson et al. 2012; Maertens et al. 2010).

As it is widely accepted, optimizing codon usage of the gene of interest in the chosen heterologous expression systems is a way to optimize its translation rate by exploiting the natural ratios in tRNAs content in the expression system (Fig. 1). Among the 64 combinations of triplets, the 20 standard amino acids are coded by 61 codons; meanwhile, three are translation stop signals. This redundancy means that more than one triplet can be used as “synonym” for the same amino acid. Depending on the organism, different biased frequency of synonym triplet is observed. Moreover, the tRNA species have variation in the expression ratio; consequently, synonymous codons are not equivalently represented in tRNA population (Ikemura 1985).

Codon optimization is then not referred as changing all synonymous triplets to the one that is prevalently represented in tRNA battery. If this happens, the codon landscape would be altered, compromising the rhythm of protein synthesis. This speed alteration could led to an incorrect folding of the recombinant protein, and the consequently degradation of the misfolded protein or accumulation in the insoluble fraction (Quax et al. 2015). In fact, it has been described that the elongation of nascent messenger RNA (mRNA) is a non-uniform process and speed seems to be linked to the need to modulate the time required to allow the correct folding of domains in multidomain proteins. On the one hand, optimizing the codon usage would mean to increase the speed of the elongation through all the sequence, but on the other hand promoting then the chances to obtain partially folded species that would be either degraded or sent to the inclusion bodies. Novel algorithms to harmonize the translation pattern of the gene with the one of the expression host are proposed (Bartholomaus et al. 2016; Hess et al. 2015).

Designing the recombinant gene expression procedure

In most of the cases, the recombinant genes are expressed in E. coli under standard conditions, described elsewhere (Graslund et al. 2008; Rosano and Ceccarelli 2014). In general, once the primary sequence is decided, several parameters are set at the initial steps for the expression experiments as for example the host strain, the promoter and media, among others (Gopal and Kumar 2013). However, in the case of a DTE protein, those basic procedures are not enough to render either the required amount or quality of the recombinant protein or the protein is not even produced. It is then when a series of trial and error proofs is performed. Many factors seem to be involved in this phenomenon. The saturation of the folding machinery (Martinez-Alonso et al. 2010), the RNA stability (Ahn et al. 2008), and the codon usage are among the most relevant (Rosano and Ceccarelli 2014). Therefore, when trying to approach the production of heterologous proteins in prokaryotic expression systems, a specific gene expression design has to be developed.

As many factors might be beneath the low production or the aggregation of the recombinant protein, a multifactorial approach has to be studied. The parameters to be taken into account include the change of the expression vector mainly to get a weaker promoter in case of toxic proteins (Lebendiker and Danieli 2014), the expression strain able to fulfill some specific post-translational modifications or able to supplement tRNA for rare codons (Tegel et al. 2010), the growth media and induction conditions (Marini et al. 2014), and the coexpression of folding modulators (Martinez-Alonso et al. 2010) (Fig. 1). In this sense, high-throughput expression screening and purification protocols can be established to include several variables at a time, increasing the chances to obtain the recombinant protein in the soluble cell fraction. (Nozach et al. 2013; Saez et al. 2014; Saez and Vincentelli 2014).

Emerging microbial systems as cell factories for “difficult-to-express” proteins

Taking into consideration the abovementioned obstacles when facing the production of DTE proteins, there is a clear consensus in the urgent need of new cell factories able to overcome these issues. Apart from sticking to current systems pursuing the development of improved strains by adding new properties to old systems (what in any case is an interesting and useful approach) (Makino et al. 2011), the wide diversity of the microbial forms of life and metabolism offers an almost unlimited source to explore new microorganisms. Their specific properties as unconventional cell factories offer intriguing opportunities to deal with the increasing demand of more complex DTE proteins. In this line, the prokaryotic universe is under continuous exploration in the search of new protein cell factories (Fig. 1). Among those systems, some of the most promising under current study are described in the following sections.

Bacillus

Bacillus species (a non-pathogenic, toxin-free Gram-positive bacterium) have a recognized history of safe use in food industry. The use of Bacillus subtilis has many advantages, such as the “generally recognized as safe” (GRAS) status and the easy and inexpensive culture methods, that can result in high cell densities. These characteristics and significant progress in genetic manipulation pushes Bacillus as a promising production platform (Schallmey et al. 2004; Wenzel et al. 2011), and therefore, a variety of expression systems have been developed for the efficient expression of heterologous genes in B. subtilis (Schumann 2007).

Among recent advances in B. subtilis tools development, new expression systems have been established. For example, the T7 expression system, a highly efficient system widely used in E. coli (Chen et al. 2010), or the IPTG-inducible strong promoter Pgrac100 (Phan et al. 2012) have been introduced in B. subtilis. Since IPTG-inducible promoters are not suitable for industrial-scale production (IPTG is expensive and toxic), maltose-inducible (Ming et al. 2010) and mannitol-inducible (Heravi et al. 2011) expression systems, based on cheaper and safer inducers, have been also developed for Bacillus. B. subtilis is also known for its capacity to secrete large amounts of endogenous enzymes, like proteases and lipases, many of them showing industrially relevant applications. However, when dealing with heterologous proteins, secretion usually shows a substantially lower efficiency because of the lack of specific secretion pathways, insufficient protein-conducting channels, or secretion stress response. To overcome these limitations, recent research has focused on engineering secretion pathways, like the Sec-SRP pathway (Diao et al. 2012; Kakeshita et al. 2010) or the twin-arginine translocation (Tat) pathway (Zhu et al. 2008), in order to avoid rate-limiting factors and enhance recombinant protein secretion. Many other tools for genetic modification of Bacillus species have been developed in recent years (Dong and Zhang 2014).

All these advances have helped to realize the potential of Bacillus species as production hosts, making these microorganisms competitive with traditional cell factories like E. coli.

Lactococcus

Lactococcus lactis, a Gram-positive bacterium from the lactic acid bacteria (LAB) group and also with GRAS status, is becoming during the last decades another attractive candidate as a microbial cell factory for heterologous protein production. The huge potential of these GRAS microorganisms and the continuous development of an increasing genomic and proteomic toolbox are expected to convert Lactococcus into a promising cell factory for a wide range of applications (de Vos 2011; Teusink et al. 2011).

Within this available genetic toolbox, the nisin-inducible (NICE) expression system (de Ruyter et al. 1996; Kuipers et al. 1995; Teusink et al. 2011), from the nis operon present in some L. lactis strains, has been extensively used to produce recombinant proteins in L. lactis (Kunji et al. 2003; Le et al. 2005; Mierau and Kleerebezem 2005; Teusink et al. 2011). The NICE system offers numerous advantages like easy use, tightly controlled, and efficiently induced expression leading to high protein yields (Mierau and Kleerebezem 2005; Teusink et al. 2011), and large-scale production process (Mierau et al. 2005). Other gene expression systems, like P170 (Jorgensen et al. 2014) or zinc systems (Llull and Poquet 2004), have been successfully used in L. lactis. Interestingly, L. lactis is also currently used as a life vector for drug, DNA, and other molecule delivery to mucosal surfaces (Braat et al. 2006; Pontes et al. 2011).

Studies on membrane proteins, a typical example of DTE proteins, have benefited from the introduction of L. lactis as a cell factory. Heterologous overproduction of membrane proteins, particularly eukaryotic ones, remains challenging, and even when a high-yield expression is achieved, membrane proteins are usually found in the form of inclusion bodies (Monne et al. 2005). In this line, several membrane proteins, including transporting proteins, receptors, and a quinone oxidoreductase, have been successfully overexpressed in lactococcal hosts (Kunji et al. 2003; Marreddy et al. 2011; Wieczorek and Martin 2010). Thus, this expression system can be used for the functional characterization of eukaryotic membrane proteins.

Moreover, a recent study showed that recombinant protein quality in L. lactis is deeply influenced by growth conditions and temperature (Cano-Garrido et al. 2014). By controlling these parameters, conformational quality of both soluble and insoluble protein fractions can be modulated, offering a high versatility of this system for the production of endotoxin-free soluble proteins of biomedical interest. These findings not only confirm L. lactis as an excellent producer of recombinant proteins but also reveal that there is still plenty of room for significant improvement by the exploitation of external protein quality modulators.

Corynebacterium

Corynebacterium glutamicum is another Gram-positive, non-pathogenic bacterium, traditionally used for the industrial production of l-amino acids, nucleic acids, antibiotics, and other biochemicals (Kimura 2003). C. glutamicum is also a promising cell factory for the production of recombinant proteins due to several features: (1) it has a single cellular membrane that allows direct secretion of target proteins into the culture medium; (2) it secretes small amounts of endogenous proteins into the culture medium, simplifying downstream processes; (3) extracellular hydrolytic enzyme activity is not detectable, which gives strong potential for expressing protease-sensitive proteins; and (4) it is an endotoxin-free, GRAS microorganism. Moreover, fermentation conditions for this host are well-known and established for large-scale production of various biomolecules, including recombinant proteins.

However, compared with E. coli, C. glutamicum showed some disadvantages, like lower transformation efficiency and only a few available expression vectors (Yim et al. 2014). To address these issues, a considerable effort has been put into genetic modification of C. glutamicum. Therefore, several techniques and vector components for genetic manipulation of C. glutamicum have been developed during the last years, including strong promoters for precise regulation of gene expression, various types of plasmid vectors, secretion systems and methods of genetically modifying the host strain genome (Nesvera and Patek 2011).

Thanks to all these advances, a variety of recombinant proteins have been successfully expressed in C. glutamicum, such as ovine gamma interferon (Billman-Jacobe et al. 1994), the human epidermal growth factor (Date et al. 2006), green fluorescent protein (Meissner et al. 2007), α-amylase (Tateno et al. 2007), or antibody fragments (Yim et al. 2014). In addition, a C. glutamicum-based protein expression system, with intellectual property rights, named CORYNEX has been developed by Ajinomoto Co., Inc. This system can secrete recombinant proteins into the medium, simplifying purification processes (http://www.ajinomoto.co.jp/corynex/en/characteristic/).

Pseudoalteromonas haloplanktis

Protein aggregation, a process mainly driven by sterospecific interactions between solvent-exposed hydrophobic patches (Carrio et al. 2005; Speed et al. 1996), has been traditionally seen as one of the main bottlenecks in recombinant protein production. Since interactions leading to aggregation are weakened when temperature decreases, production of recombinant proteins in psychrophilic bacteria (microorganisms cultured at 4 °C or below) represents a promising model to improve protein quality and/or solubility. In this context, a few cold-adapted bacterial species are under intense exploration as cell factories, with Pseudoalteromonas haloplanktis TAC125 as a representative example. P. haloplanktis TAC125 is a Gram-negative bacterium isolated from an Antarctic coastal seawater sample (Birolo et al. 2000), being able to grow in the range of 0–30 °C (Tutino et al. 2001) and even at lower temperatures, making it one of the faster growing psychrophiles so far characterized, and an attractive host as cell factory.

P. haloplanktis TAC125 versatility and uses as a cell factory have been expanded by the development of genetically engineered strains with improved features (Cusano et al. 2006; Parrilli et al. 2010). P. haloplanktis TAC125 is also the first Antarctic bacterium in which an efficient gene-expression technology was set up, by the proper assembly of psychrophilic molecular signals (Duilio et al. 2004; Tutino et al. 2001) into a modified E. coli cloning vector (Tutino et al. 2002). Several generations of cold-adapted expression vectors allow protein production either by constitutive (Duilio et al. 2004) or inducible regimes (Papa et al. 2007), and to address the product towards any cell compartment or to the extra-cellular medium (Parrilli et al. 2008).

Interestingly, insoluble aggregates of recombinant proteins have never been observed in P. haloplanktis TAC125, even at high expression levels, suggesting that its cellular physicochemical conditions and/or folding processes are quite different from those observed in mesophilic bacteria (Piette et al. 2010). Beneficial effects in using this cold-adapted platform versus conventional E. coli have been seen during the production of antibody fragments (Dragosits et al. 2011; Giuliani et al. 2014), or in the production of DTE proteins such as the human nerve growth factor, h-NGF (Vigentini et al. 2006) or the alpha-glucosidase from Saccharomyces cerevisiae. When produced in E. coli, these two proteins failed to correctly fold and accumulated into insoluble aggregates, but their production in P. haloplanktis TAC125 results in fully soluble proteins.

Recently, a synthetic, optimized medium for P. haloplanktis TAC125 chemostat cultivation was introduced (Giuliani et al. 2014), and a fed-batch fermentation strategy, feasible for industrial purposes, has been established (Wilmes et al. 2010). These achievements will surely launch the industrial application of P. haloplanktis TAC125 as a new, non-conventional system for DTE protein production.

The choice of the final expression system is not an easy or straightforward decision, and it usually will need customized answers. There are no “magical recipes”, and consequently, one cannot expect to find a universal production protocol, but rather should be aware of these different available options to improve the final results. In this context, a “function-driven” or “function-optimized” strategy is usually taken. For example, therapeutic proteins (requiring higher standards of quality, safety, etc.) are mainly produced in mammalian cells, but the appearance of GRAS prokaryotic systems can easily change this rule in the future. This has been nicely exemplified by the case of therapeutic antibodies, where the efforts in their production are moving from the obtention of generic “naked” antibodies to application-adapted immune-reagents that do not need further in vitro modification before their use (de Marco, 2015).

Apart from the abovementioned emerging prokaryotic expression systems, eukaryotic cells are becoming the preferred cell factories for proteins needing specific modifications, like glycosilations or disulfide bridges. These eukaryotic systems (beyond the scope of this review), together with the abovementioned emerging prokaryotic systems, are fueling a progressive diversification concerning available expression platforms.

Conclusions

E. coli is a robust microbial cell factory for the production of recombinant proteins. Even in the case of difficult-to-express proteins this expression system provides valuable tools as well as novel strategies to explore in a trial-and-error approach the improvement in protein production and protein solubility. In addition, development of other microbial cell factories is making progress towards their use as effective protein production platforms. Since the number of approved therapeutic proteins is steadily growing, prokaryotic expression systems would continue to play a significant role as cost-effective, tunable, and easy to operate source of recombinant protein.