Introduction

Plant molecular pharming is the term related to the ability of plant materials to produce therapeutic proteins (Habibi et al. 2017). The emergence of molecular pharming as a reliable and novel production technology in recent decades is the result of years of work, opening new paths to minimizing the technical problems involved in yeast, bacterial, and mammalian platforms. Moreover, the identification and characterization of a promising regulatory pathway for the large-scale production of biopharmaceuticals could strongly contribute to the benefits of this system (Fischer et al. 2012). Molecular pharming technology presents several advantages, including (a) the production of low-cost biomass; (b) end-products lacking human toxicity; (c) the accumulation of complex proteins with correct and proper folding; and (d) straightforward methods for protein purification (Moustafa et al. 2015). Furthermore, molecular pharming offers a flexible, scalable and diverse alternative method for producing new, patent-protected biopharmaceuticals and biosimilars, expanding product opportunities based on rapidly growing ‘biobetter’ molecule markets (Zimran et al. 2011).

Although the plant-based pharmaceutical industry is still in an early stage of development, many biopharmaceuticals are already in the preclinical and clinical development pipeline. For instance, the commercial development of taligluceraseα (Protalix®, Israel; http://protalix.com/), used for the treatment of Gaucher’s disease, is a significant breakthrough in molecular pharming. Monoclonal antibodies, such as palivizumab and rituximab, moss-GBA (glucerase), moss-aGal (agalsidase) and other biosimilars, are next-generation plant-made recombinant proteins that represent the beneficial attributes of plant cell culture as a promising resource of complex protein production (Grabowski et al. 2014; Niederkrüger et al. 2014). Moreover, the insulin produced in safflower (SemBioSys Genetics, Canada; http://www.sembiosys.com) and the HIV-neutralizing monoclonal antibody produced in tobacco (Pharma-Planta; Germany; http://www.pharma-planta.net) were developed based on transgenic plant manufacturing.

Medicago Inc. (Québec, Canada; http://www.medicago.com) is now working on a phase II clinical trial with influenza VLP (H5) accumulated in Nicotiana benthamiana through agroinfiltration technology. Table 1 summarizes some of the plant-based vaccines for human and animal diseases.

Table 1 Summary of vaccines and recombinant proteins produced by plant systems

Although plant platforms have many advantages in the production of biopharmaceuticals, future investigations will be required to invigorate the final product and to overcome the significant challenges and risks associated with large-scale production of biopharmaceuticals. For instance, regulatory unreliability and global concerns regarding the intrinsic yield of recombinant proteins are some of the barriers encountered by molecular pharming users. Hence, the identification and characterization of factors influencing protein accumulation levels are necessary. The yield may be divided into endogenous and exogenous factors that regulate protein accumulation in plants. Protein accumulation can be regulated by several steps, including (a) genetic elements (transcription, translation and post-translation) (Shinmyo and Kato 2010); (b) epigenetic factors inducing gene expression; and (c) environmental factors. Specific host platforms and the subcellular targeting of proteins to specific compartments are also considered critical parameters that contribute to the increase of recombinant protein yield (Fig. 1). Therefore, visualization and optimization of these features could be pre-eminent in mediating translational activity and boosting the amount of end-products within plant systems (Silverman et al. 2013). In this review, we first describe the upstream and downstream features affecting recombinant plant production and new perspectives on the optimization and improvement of these parameters are also discussed.

Fig. 1
figure 1

Schematic overview of sequence features impacting protein regulation. Inside factors include transcription, translation, mRNA processing and stability, and protein folding, and outside factors include host platform, culture condition, and subcellular localization

Expression cassette collection

An expression cassette is known as an important construct for high-level production of recombinant proteins in a host plant. Expression cassettes can be classified according to the number of cistrons (recombinant genes) within the corresponding mRNA. Monocistronic cassettes are suitable for expressing single proteins with synthesis driven by their endogenous regulatory sequences (Al-Rubeai 2011). They can either be provided by separated plasmids or be cloned into a single vector, leading to a higher prospect for the synergistic transfection of all genes of interest (Al-Rubeai 2011). In the case of single vectors, more than one protein can be expressed in the same cassette, although sequence length can be a restricting factor for complete integration into the plant genome, which means that an increasing number of linked transgenes leads to a lower probability of integration. The expression of all these factors can randomly result in genetic rearrangements and fragmentations that eliminate one or more transgenes from this process (Naqvi et al. 2010).

Bicistronic cassettes commonly use viral-derived internal ribosome entry sites (IRES) (Houdebine and Attal 1999; Lopez-Lastra et al. 2005) to bypass the 5′-cap-dependent translation process through a ribosomal skip mechanism. However, this approach is not widely used because the ORF expression level downstream of the IRES is usually decreased (Hennecke et al. 2001) Nevertheless, polycistronic cassettes can be used in an operon-like structure (Osbourn and Field 2009). Polycistronic cassettes are efficient and very useful for metabolic engineering purposes in plants, such as for secondary metabolite expression via the addition of whole heterologous metabolic pathways (Mozes-Koch et al. 2012). Cassettes carry out the body of required elements to increase the transcription rate (Clark and Pazdernik 2013) (Fig. 2).

Fig. 2
figure 2

Schematic of a gene expression cassette. An expression cassette is a DNA sequence that carries out the body of required elements to increase the transcription rate. Each of these elements could be optimized to boost the expression rate

The selection of a suitable promoter is an important factor for boosting gene expression via binding and interaction with trans-actin factors (Moustafa et al. 2015). Expression vectors can harbour either endogenous/homologous promoters or exogenous/heterologous promoters in relation to the host species. The plant promoter database (http://ppdb.agr.gifu-u.ac.jp/ppdb/cgi-bin/index.cgi) provides variety of core promoter structure and regulatory element groups for Physcomitrella patens, Oryza sativa, Arabidopsis thaliana, and Populus trichocarpa (poplar) (Hieno et al. 2014).

However, some heterologous promoters are commonly used in plants (Twyman et al. 2013). For example, the CaMV35S (Cauliflower Mosaic Virus 35S RNA subunit) vector is highly compatible with the transcription machinery of dicots and is commonly used for compact expression cassettes; however, CaMV35S activity may be decreased in some tissue and cells, such as newly developed tissues (Biłas et al. 2016). In this case, the region between nucleotides −90 and 208 acts as an enhancer and plays a key role in increasing the promoter activity. Act-1 and Ubi-1 are other examples of promoters that show more compatibility with the transcription machinery of monocots (Kang et al. 2008). The efficiency of polyubiquitin promoters PvUbi1 and PvUbi2 derived from switchgrass showed several fold higher constitutive expression when it was compared to the efficiency of the CaMV35S and OsAct1 promoters (Mann et al. 2011). Additionally, non-classical promoters were used for both monocots and dicots, such as the CmYLCV (Cestrum Yellow Leaf Curling Virus) promoter (Stavolone et al. 2003) and the pPLEX series promoters (Schünmann et al. 2003), respectively.

The efficiency of several commonly inducible (such as alcA, PR-1 and ACE1) and tissue-specific promoters (such as 2S albumin promoter, arc5-I promoter, CHS and RB7) has been reported in plants (Twyman et al. 2013). Recently, a β-oestradiol-inducible promoter has been used to create the TRANSPLANTA collection in A. thaliana (Coego et al. 2014) and to establish a stable gene expression system in P. patens (Kubo et al. 2013) for identification of the biological functions of transcription factors (TFs), while TFs can regulate gene expression in all organisms. Moreover, the efficacy of synthetic cis-acting motifs in transgenic tomato has been investigated, and it was compared with the efficiency of native CaMV35S and DECaMV35S promoters (Koul et al. 2012). Interestingly, the synthetic cis-acting modules led to increased transgene expression in tomato in comparison to the native promoters.

In contrast to the chemically inducible system developed to control transgene expression, a physical approach is considered safer and more applicable in terms of spatiotemporal resolution and the toxic effects of gene expression (Kang et al. 1999; Amirsadeghi et al. 2007; Muller et al. 2014). Thus, the use of chemically inducible transgene expression in plant cell cultures seems to be undesirable. In comparison to a chemical inducer, light was considered a compatible source for the bioproduction of recombinant proteins with high temporal resolution. In this context, a red light-switchable promoter has been developed to control transgene expression in the moss P. patens (Muller et al. 2014).

Additionally, multimeric protein subunits or different independent proteins can be expressed and assembled in planta using a single ORF (sORF) through linkage to the 2A peptide from foot-and-mouth virus (Luke et al. 2015) or inteins (interspacing polypeptide blocks of functional protein parts) (Evans et al. 2005). The 2A strategy is largely used for producing important proteins, such as a multi-epitope and low-cost candidate vaccine against cysticercosis using the so-called Helios2A polyprotein system (Monreal-Escalante et al. 2015). Although the 2A peptide is known to trigger the auto-cleavage and release of separate subunits, this mechanism occurs before the proteins enter the endoplasmic reticulum (ER). Otherwise, inteins ensure that the entire polypeptide is targeted to the ER and that autocleavage occurs in the lumen to guarantee a balanced protein expression level in an equimolar ratio (Kunes et al. 2009). Once inside the ER, exteins—protein subunit blocks flanking inteins—can be joined via protein splicing through a mechanism that is desirable for protein multimerization (Hauptmann et al. 2013), for instance, or they can be separately released (O’Brien et al. 2010). Interestingly, this last approach is very suitable for expressing recombinant antibodies, but has only been performed for mammalian cells (Gion et al. 2013).

Thus, considering all these possibilities, cassettes can also be shuffled according to their order in the vector backbone, or they can be rearranged in different orientations for expression (in tandem—the 5′ and 3′ ends are kept in the same orientation related to the other cassette—or sandwich—the 5′ and 3′ ends of each cassette are inverted to each other in a reversed orientation) (Al-Rubeai 2011). However, regardless of the vector setup, it is important to ensure that the coding DNA sequence (CDS) of interest has efficient control elements for translation initiation. In this way, it is advisable to provide an appropriate ribosome-binding site (RBS) for plant machinery via the addition of a eukaryote-derived RBS (Kozak sequence—consensus: GCCRCCATG) immediately upstream of the initiation codon (Kozak 1999; Mohammadzadeh et al. 2015) in the case of nuclear genes, or to adapt the CDS and its surrounding elements when performing chloroplast transformation. Most plastid mRNAs harbour Shine-Dalgarno (SD)-like sequences with a similar role, even though their distances from the initiation codon are not conserved. Furthermore, the initiation codons (e.g., AUG, GUG) can vary for plastid transgenes (Cardi et al. 2010).

Codon usage

Codon bias is a crucial step in regulating synthetic gene expression in a plant platform as it can influence numerous processes associated with protein production (e.g., RNA processing, protein translation and protein folding). Through evolutionary mechanisms, plant hosts developed similar substitutions of rare codons with favourable ones to reach a desired nucleotide distribution, as rare codons can inhibit protein translation (Gould et al. 2014). Gene expression based on codon usage can be predicted using diverse metrics, which have been reviewed by (Lindgreen 2012). Additionally, the codon usage database (http://www.kazusa.or.jp/codon/) released values on the number of times each codon is used per 1000 codons and the total number of times each codon is known to be used.

Codon usage optimization in plants was reviewed (Serres-Giardi et al. 2012), and it was demonstrated that rare codons and AU-rich destabilizing sequences may result in mRNA decline, reducing recombinant protein expression in plants (Laguía-Becher et al. 2010). The correlation between GC content, codon usage, and gene expression has also been reported in plants (Palidwor et al. 2010). These findings show the counterintuitive effect of GC content on determining the codon usage. Similar to these findings, the dominant effect of GC-biased genes on nucleotide distribution was reported in many seed plants (Serres-Giardi et al. 2012). Moreover, the influence of GC bias in codon context has been recognized for a set of codons but not for individual codons.

Codon usage optimization boosts gene expression and influences the amount of transgene expression more than 1000 fold (Gustafsson et al. 2004). In this context, (Franklin et al. 2002; Gisby et al. 2011) demonstrated significant increases in expression (75- to 80-fold) after codon optimization. In a recent study, (Kwon et al. 2016) the importance of codon usage optimization was shown with increasing levels of gene expression (4.9- to 7.1-fold or 22.5- to 28.1-fold) in lettuce and tobacco chloroplasts, respectively. The expression of recombinant protein was very low when heterologous genes were transferred into Chlamydomonas reinhardtii chloroplasts without codon usage optimization (Ishikura et al. 1999). In this way, there are various methods to evaluate the effect of codon usage on gene expression (Box 1) and to provide the best tools for codon optimization. The codon adaptation index (CAI) was used to estimate the expression level of heterologous genes. Genome-specific CAI must be used for optimal protein production as the nuclear, mitochondrial and chloroplast genomes may show different codon biases. For instance, a comparative study on the codon bias patterns of chloroplasts and their host nuclear genes demonstrated that the GC content of entire genes and the three-codon positions were higher in nuclear genes than in chloroplast genes, demonstrating different genomic organization and mutation pressures in nuclear and chloroplast genes (Liu and Xue 2005). Codon biases can boost expression efficiency by influencing translation rates and decreasing susceptibility to gene silencing (Heitzer et al. 2007).

Box 1 Proposed statistical methods for the analysis of codon bias

Previous reports stated that the translational elongation step is affected by codon bias optimization (Irwin et al. 1995). Moreover, many works provided evidence of codon bias optimization on translation efficiency in prokaryotes and eukaryotes (Duret 2002; Mueller et al. 2010; Coleman et al. 2011). These findings indicate that codon context effects are significantly correlated with the abundance of tRNA isoacceptor molecules on the ribosome surface. Moreover, post-translation modifications might be affected by codon bias as silent mutations within the transcript, resulting in unwanted protein instability and misfolding (Brest et al. 2011). Nevertheless, it is important to observe that these complex correlations are still not fully recognized and that our knowledge of the effects of codon sequence changes during translation and post-translational modification are somewhat limited. Moreover, we should keep in mind that the optimization of codon usage is not the only factor in obtaining high level expression of recombinant proteins, as various factors affect recombinant protein levels.

Characterization of untranslated regions (UTRs)

A recent study reported that the transcript sequence or the region close to the start codon AUG can be crucial for translation efficiency (Kim et al. 2014). Hence, the sequence located 21 bp upstream of the start codon was identified as a significant feature for determining translation efficiency in A. thaliana. However, how this region can affect translation efficiency was not fully determined, although it was previously shown that mRNA folding of the sequence near the initiation codon might strongly influence translation efficiency (Plotkin and Kudla 2011). Moreover, this conserved region in plants is required for translational initiation factors to recruit ribosome subunits for start codon recognition (Simon and Miller 2013).

5′ UTR introns are other sequences that can influence gene expression by boosting the steady-state ratio of mRNA and by correlating with polyadenylation factors in plants (Rose 2008; Morello et al. 2011; Rose et al. 2011). Although introns are not part of translated regions and should be removed by splicing processes, in certain cases, 5′ UTR introns act as transcriptional enhancers to influence gene expression. In this context, the 5′ UTR intron of the rice rubi3 gene was shown to boost gene expression up to 29-fold in transgenic rice cells (Lu et al. 2008). Additionally, the effect of 5′ introns on stimulating gene expression in seeds has been confirmed, as the maize Adh1 intron increased the production of the reporter gene (Callis et al. 1987).

Genes harbouring a 5′ UTR intron of actin (act) genes from P. patens can also increase the production of human vascular endothelial growth factor by influencing the activity of upstream promoter regions (Weise et al. 2006). In this way, the regulatory properties of riboswitches as non-coding and conserved elements located in the untranslated regions have been studied in plants (Cheah et al. 2007; Croft et al. 2007), and they indicate that riboswitches can regulate gene expression via splicing and alternative 3′ end processing of mRNAs (Wachter et al. 2007). How this conserved element affects splicing in plants was reviewed by (Bocobza and Aharoni 2014).

Another example of a 5′ UTR intron used in Nicotiana tabacum was demonstrated by (Herz et al. 2005). They developed novel types of plastid transformation vectors harbouring 5′ UTR to increase expression levels. The expression levels were strongly increased when the 5′-UTR of phage 7 gene 10 was used. Therefore, the improvement of expression cassette design based on the 5′ UTR intron region may increase the likelihood of producing recombinant proteins at economically feasible levels for commercial applications. Understanding the expression-promoting mechanism of the 5′ UTR intron region of Act genes can open new possibilities for vector design based on a moss-derived expression system in the near future.

RNA secondary structure

Among various features affecting gene translation, mRNA secondary structure plays a key role in the process (Kim et al. 2010; Sun et al. 2012). Its negative outcome on translation can reduce protein yields, making mRNA secondary structure a decisive factor for the regulation of gene expression (Gaspar et al. 2013). mRNAs are the most complex group of cellular RNAs, and they originate not only from transcription itself but also from numerous modification reactions, such as precursor mRNA (pre-mRNA) splicing, capping, polyadenylation, and 3′ end processing. Furthermore, the association of mRNAs with protein complexes and factors results in the regulation of mRNA translation and metabolism (Wachter 2014). Therefore, screening the functional capabilities of RNA folding might identify sequence features that contribute to gene regulation in plant molecular pharming. Based on high-throughput structure mapping analysis, coupled with transcriptome data from different RNAs, it is clear that the abundance of mRNA transcripts, transcript half-life in the cytoplasm and the probability of the secondary structure that is formed in the transcript can affect the regulation of gene expression since the formation of a stable structure in an RNA strand can increasingly affect the expression quality of a targeted protein (Farrell 2007).

A global view on protein expression could provide insights into translation regulation mechanisms occurring at the level of initiation and in early stages of elongation. Previous works have revealed the dramatic effect of variable mRNA structural stability on coding portions (Ullrich et al. 2015). The stable structure next to the start codon might inhibit initiation and elongation by providing a longer pause during ribosome movement, which consequently, can affect the regulation of co-translational protein folding (Xie 2015).

In this context, measurements of ribosome movement along mRNA set a correlation between ribosome translation, codon usage and mRNA secondary structure (Mao et al. 2014). Notably, when adjacent ribosomes are close together, the mRNA secondary structure between them will be weakened and will then disappear; as a result, different ribosomes might encounter the structure with different folding strengths at the same site.

It is noteworthy that the significant competition between transcripts to bind ribosomes is similar to the competition of mRNAs, which bind initiation factors with high affinity to synthesize as much protein as mRNAs with a lower binding affinity. In this context, the formation of secondary structure can be preclusive. For instance, the deterrent effect of mRNA secondary structures, which suppress mRNA scanning via ribosomes when they form in 5′ leader sequences, has been demonstrated for plants (Farrell 2007). Some important characteristics of mRNA structure, such as the thermodynamic stability of the hairpin and its position, can affect the degree of translation inhibition. In this case, a highly stable hairpin in front of an AUG codon can efficiently repress translation, while hairpin with low stability that is downstream of an initiation codon can increase translation. One of the systematic studies reported in plant systems unveiled the correlation between mRNA secondary structures and protein expression (Wang and Wessler 2001). The formation of an RNA hairpin into the 5′ leader sequence of the Zea mays Lc gene, which is involved in the anthocyanin biosynthetic pathway, has been reported to suppress translation since the creation of mutation and deletion within the RNA hairpin increased the amount of protein production by enhancing the ribosome’s ability to load onto the mRNA. There are convincing reports showing that mutations and deletions within hairpins decreased hairpin stability and increased protein expression (Wang and Wessler 2001).

Moreover, the combined effect of mRNA secondary structure and codon usage in highly translated mRNAs causes a short ribosomal distance in structural regions, eliminating the structures during translation, which leads to a high elongation rate. RNA structural motifs can alter the stability of RNA backbones by exhibiting more regions for ribosome–RNA interactions. For example, the identification of the glmS (glutamine-fructose-6-phosphate amidotransferase) ribozyme (Winkler et al. 2004) and an allosteric self-splicing intron (Lee et al. 2010) as domains within mRNAs affect gene regulation by ribozyme-containing mRNA domains. Hence, riboswitches have regulatory properties. Interestingly, these domains facilitate mRNAs to adjust gene expression without associating with regulatory factors (Penchovsky and Stoilova 2013).

Previously, forms of a plant riboswitch, thiamin pyrophosphate (TPP), were characterized as ligands and were found to regulate thiamin biosynthesis in plants and algae (Cheah et al. 2007; Croft et al. 2007). Riboswitches control gene expression in plants by the splicing and alternative 3′ end processing of mRNAs (Wachter et al. 2007). How the structural rearrangement of the TPP aptamer affects splicing in plants was previously reviewed by Bocobza and Aharoni (2014). Their findings indicate that ribosome stepping and mRNA unwinding are force-dependent because the mechanistic nature of this force relies on ribosomal distance. Counterintuitively, the portended correlation between strong mRNA folding and translating ribosomes has been proposed, and the stronger folding of more abundant mRNAs results in the slower evolution of more highly expressed genes and proteins. Therefore, it unveils the impact of natural selection at the mRNA level in constraining protein evolution (Park et al. 2013). However, no systematic study has been reported in plant systems that unveils the correlation between mRNA secondary structures and the higher expression levels of recombinant proteins.

Subcellular targeting

The overexpression of a targeted gene via cellular compartments has gained more attention in recent years. However, in addition to the optimization of the exogenous mechanism involved in recombinant protein production, endogenous factors related to housekeeping genes and essential metabolism are features limiting the increased yield of recombinant proteins. This limitation could influence the accumulation of recombinant proteins that require post-translational modification in the ER through the activation of the unfolded protein response (UPR) (Thomas and Walmsley 2015).

Furthermore, directing recombinant proteins to subcellular organelles creates an environment low in proteases and helps to increase protein production and recovery (Schillberg et al. 1999; Fischer and Emans 2000). In addition to the ER and oil bodies, proteins can be targeted to the vacuoles, apoplast, and plastids, and they can even be directed to a hydroponic medium in plant roots (Horvath et al. 2000; Doran 2006).

Cytoplasm accumulation

Usually, cytoplasm targeting is an unfavourable strategy for the accumulation of proteins due to several reasons, including the presence of chemical reduction–oxidation reactions that result in the production of unfolded proteins; the secretion of proteases; the effectiveness of the Ubiquitin Proteasome Pathway (UPP), which is responsible for the identification and degradation of unfolded proteins; and the lack of post-translational modifications for correct folding, assembly or/and stability of recombinant proteins (Benchabane et al. 2008). Thus, different strategies can be applied to circumvent these challenges. An alternative could be the co-expression of protease inhibitors to minimize protein degradation during storage in the cytoplasm (Egelkrout et al. 2012). For example, a tomato cathepsin D inhibitor (CDI) expressed in potato leaves showed an average increase of 35–40% in leaf protein content for intrinsic and transgenic proteins in the cytosol (Goulet et al. 2010). The use of fused proteins or other “tags” can also assist in protein recovery and stability. In some cases, they can increase protein accumulation in the cytosol and resistance to proteolysis (Amin et al. 2004). Moreover, recombinant protein accumulation in subcellular organelles, such as the ER, chloroplast and oil bodies, is an alternative strategy to circumvent the lability of cytoplasm targeting (Alvarez et al. 2010; Giddings et al. 2000; Torrent et al. 2009).

Organelle accumulation

The ER is an ideal region to increase protein accumulation and improve protein assembly, as well as to control glycosylation (Aebi 2013). The ER is a region with a low quantity of proteases, and the presence of a high concentration of chaperones can assist recombinant proteins in post-translational folding and stability (Nuttall et al. 2002). Previous studies demonstrated that the apoplast targeting of recombinant antibody fragments, when retained in the ER, yielded 3.8 µg/g of protein in O. sativa cells. The same antibody fragment, when produced in tobacco cells, corresponded to 0.064% of the total soluble protein (Table 1) (Fischer et al. 1999; Torres et al. 1999). However, ER addressing is not recommended for proteins that require downstream modifications in the Golgi, vacuoles and chloroplast (Doran 2006).

Furthermore, the accumulation of antibodies directed to the ER using (SE)/(H)/(K)DEL signal peptides increased 2- to 10-fold compared to proteins lacking retention signals (Conrad and Fiedler 1998). In addition, the use of N-terminal γ-zein proline-rich sequences to target proteins to the ER and protein bodies (PBs) increased the stable accumulation of proteins in seeds (Torrent et al. 2009).

The use of apoplasts to store different human recombinant proteins expressed in tobacco plants—such as human serum albumin and human granulocyte–macrophage colony-stimulating factor (hGM-CSF)—significantly increased protein yield (Table 1) (Sijmons et al. 1990; James et al. 2000; Ramirez et al. 2000). In some cases, however, targeting an organelle for protein accumulation does not imply that different tissues or organs will store the same concentration of recombinant protein. For example, when the silk-like protein DP1B was directed to vacuoles, transgenic Arabidopsis seed cells presented higher concentration levels of the recombinant protein than leaf cells (Yang et al. 2005).

The chloroplast is another compartment used for the production of recombinant proteins, as it provides several advantages during protein accumulation, including the production of a large copy number of inserted genes and the absence of gene silencing (Michelet et al. 2011). The mechanism of gene expression in the chloroplast is regulated at the transcriptional and post-transcriptional levels (Stern et al. 2010). It was previously reported that this mechanism could be controlled by chloroplast-produced signals (Michelet et al. 2011). The chloroplast genome is assembled as operons that are transcribed as polycistronic transcriptional units and it shows prokaryotic and eukaryotic properties, as the control of chloroplast gene expression, including transcription, post-transcriptional processing, translation, and post-translational modifications, is similar to that of prokaryotic and/or eukaryotic systems (del Campo 2009). Moreover, chloroplasts offer a location with a low content of proteases, and they perform transcription, translation post-transcriptional processing and post-translational modifications. These organelles also offer high transgene stability over continuous generations, making them a secure bio-containment system, as the genes are not transmitted via pollen (Dufourmantel et al. 2006; Gray et al. 2009).

Many recombinant proteins have already been targeted to chloroplasts in order to increase protein accumulation and stability, such as human growth hormone (Staub et al. 2000), human serum albumin (Fernandez-San Millan et al. 2003), cholera and tetanus toxin fragments (Daniell et al. 2001; Tregoning et al. 2003), and a thermostable xylanase (Leelavathi et al. 2003). In addition, increased levels of a recombinant cellulose from Thermobifida fusca expressed in tobacco corresponded to 10.7% of the total soluble protein due to accumulation in the cell chloroplasts (Gray et al. 2009). In a recent work, the expression and development of several antigens and vaccines were reported for algae chloroplasts (Specht and Mayfield 2014). The green microalgae chloroplast provides a unique space for the assembly, folding and post-translational modifications of transgenic proteins (Fletcher et al. 2007; Chebolu and Daniell 2009; Specht et al. 2010) through cis-acting units, including 5′- and 3′ UTRs and promoters that affect transcription, mRNA stability and translation (Michelet et al. 2011).

Post-transcriptional gene silencing (PTGS) is considered a key parameter in low-level protein accumulation. Therefore, PTGS suppression might lead to boosts in protein production. In addition, targeting proteins to various cellular compartments, coupled with an agroinfection approach, is a new strategy to increase protein accumulation. A novel technology for accumulating recombinant protein has been established in Nicotiana benthamiana using a suppressor of PTGS and specific sub-cellular localization (Azhakanandam et al. 2007). Amplicon-plus Targeting Technology (APTT) demonstrated remarkable efficacy for increasing the accumulation levels in chloroplasts by creating fusions between the recombinant protein and different targeting peptides. APTT can overcome existing troubles regarding agroinfection systems, making them economically feasible and promising large-scale production of recombinant protein in a short period of time.

Host system

The selection of a suitable host plant and the consideration of its effect on the efficiency of recombinant protein accumulation are important key strategies for boosting intrinsic yield. There are economic issues affecting the selection of plant hosts as suitable benchmarks for the production of recombinant proteins. These economic factors include storage property, scalability and transportation, the cost of downstream processing, the potential of a short-time scale and edibility (Obembe et al. 2011). Additionally, efficient transformation and regeneration may contribute to the selection of a host plant as an amenable resource for recombinant protein production. However, it is not possible to set a perfect single system with all economic features, as every system has its own advantages and disadvantages that need to be considered as signposts for selecting the best system for protein production. As currently reviewed (Thomas and Walmsley 2015), there is a significant correlation between host benchmarks and endogenous molecular mechanisms, such as protein folding, glycosylation profiles, and chaperone pools. Moreover, the endogenous proteolytic activity of the host system might severely affect protein stability during expression, extraction, and harvesting (Pillay et al. 2014). Hence, modification of the host system can significantly increase the accumulation of recombinant proteins in plant systems (Egelkrout et al. 2012). In this context, cellular and molecular factors can contribute to the accumulation of recombinant proteins, which is correlated with the rate of synthesis and degradation of the desired protein. Understanding the association between flux (explained as the rate limiting step) and metabolic pathway, as well as identifying the limiting reaction event during protein accumulation in the host system, can increasingly influence protein production (Morandini 2013).

Furthermore, the replacement of the host plant pathway with a non-host plant pathway through metabolic engineering might facilitate increased recombinant protein production. For example, the accumulation of bacterial polyesters (polyhydroxyalkanoates) was reported by introducing the microbial pathway into plants (Poirier 2001). In another similar work, bacterial genes related to the generation of butanetriol, a useful precursor for the synthesis of several drugs, was applied into A. thaliana (Abdel-Ghany et al. 2013). However, toxic side effects on plant growth produced by metabolic engineering are considered limiting factors for the accumulation of these metabolites in plant systems (Keasling 2012; Zingaro and Papoutsakis 2012). The toxic side effects could be decreased by the identification of intermediate metabolites involved in toxicity and their direction to specific organelles (Bornke and Broer 2010).

Leafy crop-based expression

Leafy crops, including tobacco, lettuce, and alfalfa, are well established in the commercialization of biopharmaceuticals due to the many expectations regarding their biomass efficiency and capability at a massive scale. For example, the ability of transgenic tobacco (N. tabacum) to produce 1-100 tons of biomass per hectare per year makes this plant a reliable platform for the production of a high concentration of biopharmaceuticals. Moreover, easy genetic transformation and regeneration are favoured by using tobacco as a laboratory model in terms of molecular pharming. Therefore, both scalability and successful gene expression history make tobacco a pioneer for the production of various biopharmaceuticals (Twyman et al. 2003). In this context, Kentucky BioProcessing (USA) uses its proprietary stable nuclear platform in tobacco to produce an oral subunit vaccine for Norwalk virus (NoroVAXX) (http://www.kentuckybioprocessing.com). Moreover, the oral delivery of bioencapsulated transmucosal carrier cholera toxin B subunit (CTB) fused with green fluorescent protein (GFP) expressed in tobacco chloroplast genome could be used to tackle brain failure, especially in terms of the blood–brain barrier (BBB) and blood–retinal barrier (BRB), using a mouse model (Kohli et al. 2014). The results of this investigation showed that the oral administration of CTB-GFP resulted in binding to intestinal GM1 receptors and the release of fused protein (CTB) into the circulatory system. In this context, the long-term stabilization of plant-based protein reduced the high cost of oral delivery of neurotherapeutic proteins in terms of the BBB and BRB. A drawback of tobacco is its inherent production of a large number of toxic compounds, which interfere with downstream processing, as well as its protein instability due to secreted proteolysis enzymes (Rosales-Mendoza et al. 2010; Obembe et al. 2011; Stoger et al. 2014). It is worth bearing in mind that these drawbacks decrease the value of leafy crops for the production of recombinant proteins. In this context, many researchers have focused on ways to minimize these adverse effects and boost the capacity of leafy crops for the production of recombinant proteins.

Seed-based expression system

Seed-based expression systems provide a good platform for the production of recombinant proteins at massive and minute levels. (Stoger et al. 2014). In both cases, the high cost of process development, such as bioreactor-based production, would be reduced. Moreover, this system could resolve problems related to leafy crops, including proteolytic degradation, protein instability and decreased activity of recombinant proteins due to long-term storage (Nochi et al. 2007; Faye and Gomord 2010). Most importantly, seeds can be used to create master and working cell culture platforms (Paul et al. 2013). As an example, Ventria Bioscience (Fort Collins, USA; http://www.ventria.com/) developed the large-scale production of human serum albumin based on rice seeds. The worldwide demand for this recombinant protein is approximately 500 tons per year (Chen et al. 2013b).

A seed-based expression system is also an excellent platform for the production of orally administered vaccines and antigens. Orally delivered vaccines contribute to the reduction of immunogenicity via systemic humoral and cellular immune responses in the gut and mucosal surface (Woodrow et al. 2012). Seed-based oral immunotherapy presents a cost-effective system for the production of T-cell epitope peptides or recombinant hypoallergens. Compared to a purified vaccine, an orally administered antigen produced in transgenic seed plants shows more resistance to gastrointestinal enzymes, due to the bioencapsulation of the vaccine via plant cell barriers, such as protein bodies (Takaiwa 2009). Several studies have demonstrated the benefits of encapsulation, including increased resistance to enzymatic digestion and stronger immune response (Chikwamba et al. 2003; Takagi et al. 2010; Takaiwa 2011; Suzuki et al. 2012). Additionally, the delivery of orally administered vaccines (attenuated vaccines) may reduce the possibility of reversion to virulence, guaranteeing a high immune response of the vaccine (Stoger et al. 2014). Therefore, the ability of transgenic rice seed to produce T cell epitope peptides of Cry j I and Cry j for the induction of a high mucosal immune response to allergen pollen in mice is a good example of a drug orally administered by plant seeds (Takagi et al. 2005).

Moreover, the expression and accumulation of an orally administered vaccine in plant seeds facilitate the delivery of autoantigens to mucosal surfaces while maintaining a high immune response (Lakshmi et al. 2013; Kohli et al. 2014). Thus, rice, barley and maize seeds have been explored as commercial benchmarks for recombinant protein production. Given the numerous advantages of seed-based expression, Pharma-Planta Consortium (http://www.pharma-planta.net/) and ProdiGene Inc. (USA; http://www.prodigene.com/) adopted and developed the commercial production of HIV microbicide and bovine trypsin using the maize seed platform. Ventria Bioscience established an Expressed-Tec system in rice seeds for the production of various biopharmaceuticals, including lactoferrin and lysozyme, transferrin, and human albumin (Tang et al. 2010; Zhang et al. 2010). An Icelandic biotechnology company (http://www.orfgenetics.com/), ORF Genetics, uses its proprietary ORFeus platform in the endosperm of barely seed to accumulate a range of cytokines and human growth hormones.

Fruit and vegetable expression systems

Fruits and vegetables are the last categories of transgenic host plants that have contributed to the production of vaccines. The capability of oral vaccine delivery is recognized as a benefit of this system over conventional production. This ability not only removes the required purification step for recombinant protein but also enables the distribution of the product in the absence of a cold chain process (Paul et al. 2013; Stoger et al. 2014).

Currently, Protalix Biotherapeutics released some oral therapeutic enzymes into clinical trials using lyophilized carrot cells, including a modified version of the recombinant alpha-Galactosidase-A protein (PRX 102) in phases 1/2 of clinical trials; PRX-106 (Oral antiTNF), which has completed phase I of a clinical trial; PRX-105, a biodefence drug, in phase 1; and PRX-112 (Oral glucocerebrosidase for Gaucher’s Disease), which has completed phase I (Wolfson 2013).

Due to the benefits of molecular pharming for the production of recombinant proteins in terms of stability and scalability, the EU Pharma-Planta consortium provided a new project establishing the commercial production of two HIV antibodies, 2G12 and 2F5, using maize and tobacco platforms (Rademacher et al. 2008; Paul et al. 2011). This project resulted in the GMP-compliant development of recombinant protein 2G12 using transgenic tobacco and launched to a phase I clinical trial, indicating the success of molecular pharming in microbicide protein production. Nevertheless, the accessibility of molecular pharming products to underdeveloped countries and their connection to the global health network requires a cost-efficient production and development infrastructure, which can be offered by transgenic plants.

Moss-based expression system

Recently, a moss-based expression system has been exploited for the accumulation of high-value biopharmaceuticals. The moss P. patens is a reliable system for the production of recombinant proteins, as stable genetic transformation and endogenous gene disruption are easily achieved. The latter is beneficial for knocking out genes involved in the addition of a non-human glycogen structure onto the protein sequence, enabling moss to express humanized glycoprotein (Reski et al. 2015). Moreover, protein secretion and product stability in the medium add versatility to this system, as proteolytic degradation has not been reported so far. Thus, protein homogeneity and purification costs are advantages of this system. Recently, several human proteins were expressed using a moss system, including asialo-erythropoietin (AEPO) (Parsons et al. 2012), placental secreted alkaline phosphatase (SEAP) (Gitzinger et al. 2009), complement factor H (FH) (Buttner-Mainik et al. 2011), epidermal growth factor (EGF), hepatocyte growth factor (HGF) (Niederkrüger et al. 2014), and multi-epitope fusion protein from human immunodeficiency virus (Poly HIV) (Orellana-Escobedo et al. 2015). The German company Greenovation Biotech GmbH (http://www.greenovation.com/) developed a highly stable and high-yielding strain of P. patens as a benchmark for the establishment of several recombinant proteins involved in orphan diseases, such as Fabry’s Disease, Gaucher’s Disease, atypical haemolytic uremic syndrome (aHUS), and Pompe Disease.

Transient expression-based systems: agroinfiltration and agroinfection

Transient gene expression is an efficient, cost-effective and time-saving strategy for yielding high amounts of recombinant proteins, as genetic transformation is a slow process, requiring months or years to generate transgenic plants due to regeneration protocols (Hefferon 2012). In addition, it is a genome integration-independent strategy, and consequently, it is not affected by position effects, existing in a stable transformation once the expression vector remains as an episomal DNA molecule. Moreover, transient gene expression can be detected within 3 h after plasmid delivery, reaching an expression threshold after 18–48 h and remaining transcriptionally active for approximately 10 days. Besides these features, transient expression can contribute to overcoming concerns on biosafety issues (Komarova et al. 2010).

In plants, transient gene expression is usually accomplished using Agrobacterium tumefaciens in agroinfiltration experiments by infiltrating bacterial cell suspensions into leaf cells with the consequent delivery of T-DNA to the host cells (Circelli et al. 2010). Furthermore, an A. tumefaciens approach can be coupled with replicating plant virus genomes in a virus-based replicon system, whose advantages rely on viral properties: viruses are small, can be easily manipulated, have a simple infection process and are able to replicate at high levels. These properties make them ideal vectors for heterologous expression and a suitable alternative to stable transgenic systems (Lico et al. 2008).

In some reports using viral-based systems, high-level expressions were achieved for different biopharmaceuticals, such as full-size mAbs and VLP (Virus-like particle) vaccines, showing a much higher yield from 0.5 to 0.8 mg per gram of fresh-leaf biomass (Giritch et al. 2006; Huang et al. 2010; Pogue et al. 2010). To achieve this, viral genomes were cloned as full-length complementary DNA (cDNA) into the desired expression vector through gene replacement, gene insertion or gene fusion (Ronald 2007). Hence, replicon systems belong to two possible categories: independent-virus (inoculated as virions or isolated viral genomes, and the infection activity is triggered by cell-to-cell and systemic movement spreading) or minimal-virus (viral vectors lacking cell-to-cell movement capacity, which allows the efficient expression of larger recombinant proteins). In general, replicon systems are derived mainly from RNA virus genomes (e.g., Potexvirus, Tobamovirus, Comovirus, Potyvirus, Tobravirus, Closterovirus, and Sobemovirus) (Pogue et al. 2010).

Minimal-virus infection is usually coupled with agroinfiltration in an Agrobacterium-mediated delivery system defined as agroinfection, which is practical technique because it does not require any delay associated with systemic spreading. Moreover, it is able to trigger high-level replication and, consequently, very large amounts of proteins in a shorter period than independent-virus infection (Pogue et al. 2010). Furthermore, due to the method, most of the infiltrated area provides synchronous transgene expression once at least 96% of the cells become infiltrated (Komarova et al. 2010).

Some plant species (e.g., A. thaliana, Nicotiana spp.) are also able to trigger PTGS against replicon transgenes, resulting in an accumulation failure of transcripts as a consequence of their sequence-specific targeting and destruction (Mallory et al. 2002). Accordingly, to maintain a very high level of replication, PTGS can be overcome through the co-expression of viral or non-viral silencing suppressors (e.g., Tombusvirus P19 protein; Potyvirus P1/HC-Pro proteins; pectin methylesterase inhibitors; and Pol II-directed short noncoding RNAs), which enhance mRNA stability and improve heterologous expression (Komarova et al. 2010).

During the last years, multi-component viral vectors were commonly used for these purposes, and DNA systems were developed and shown to be feasible and efficient for driving protein expression (Huang et al. 2009). These Geminivirus-based systems have already been used for VLP-based vaccine and monoclonal antibody production (Huang et al. 2010). Once the recombinant protein is produced, the leaves must be harvested for protein extraction and complete purification as required for biopharmaceutical production.

A recent significant development in the area of agroinfiltration is using “deconstructed” viral vectors. In this new type of vector, unnecessary viral genome components for the function of plasmid expression are removed, which leads to the assembly of larger transgenes while keeping viral replication and transcription (Chen et al. 2013a). The MagnICON system represents an efficient and robust gene-transferring technology for the transient expression of biopharmaceuticals in plant platforms. Using this system, the need for plasmid delivery based on complicated methods of generating RNA is eliminated. The MagnICON system provides an efficient system without functional infectious proteins because of the deletion of the CP gene, whereas the yield and speed of viral system are maintained. Moreover, this technology is able to integrate the posttranslational processing of plant systems for the production of complex proteins (Chen et al. 2013a). Numerous proteins have been expressed by this system, and the high-level production of desired proteins demonstrated up to 1 mg per gram of fresh weight in a short period of time (7–10 days after agroinfiltration) (Giritch et al. 2006; Lai et al. 2010; Phoolcharoen et al. 2011; He et al. 2012; Lai and Chen 2012). Currently, with the help of this system, the well-documented biopharmaceutical ZMapp™, which is composed of three humanized monoclonal antibodies, is manufactured in Nicotiana for the treatment of Ebola infections (http://mappbio.com/). The development of this system will efficiently contribute to using plant transient expression systems as a robust and prominent platform for the commercial production of pharmaceuticals.

Conclusion

Plant benchmarks have been engineered as expression vehicles for the production of different recombinant proteins. The secretion of produced proteins and their simple purification process provide an economical alternative to other systems. Moreover, plant-produced proteins are safer than those derived from other benchmarks. The development of plant systems for high protein accumulation is a priority as a large-scale concept. Several features, from transcription to translation, regulate protein production, and they need to be considered before beginning the steps of protein production. Currently, successful optimization resulted in good protein secretion at high levels (>30% TSP), and there are valuable databases and online software that can screen and evaluate many characteristics influencing protein accumulation. Different approaches can be selected, combined, and used as strategies to improve protein expression levels in plants. Among them, numerous parameters could be altered, which are related to the genetic elements that constitute the expression vector used to drive heterologous protein expression and their orientation within the backbone. There are also several molecular strategies for controlling glycosylation patterns to obtain the desired glycoforms, and these strategies vary from molecular tools for gene silencing to subcellular targeting. The latter can also be used for storage purposes to optimize and enhance protein accumulation, which, when coupled to viral-based replicon systems, might overexpress proteins at high levels.