Introduction

In the past four decades, the methylotrophic yeast Pichia pastoris (reclassified as Komagataella phaffii) has been widely used for both basic research and industrial production of recombinant proteins (Ahmad et al. 2014; Cereghino and Cregg 2000; Cregg et al. 2000; Kim et al. 2015; Macauley-Patrick et al. 2005). Compared with other generally expression systems, P. pastoris possess many advantages such as standardized protocols for molecular genetic manipulation, the ability of growing on minimal media with high cell density, the powerful secretory capacity with low background of endogenous proteins, the presence of alternative constitutive and inducible promoters and the availability of post-translational modifications (Cregg et al. 2009; Felber et al. 2014; Spohner et al. 2015). Especially, since the publication of detailed genome sequences (De Schutter et al. 2009), P. pastoris has received much more attention for producing pharmaceuticals (Spadiut et al. 2014; Vogl et al. 2013) and commodity chemicals, for instance xanthophylls, lycopene, β-carotene, nootkatone, and glucaric acid (Araya-Garay et al. 2012; Liu et al. 2016; Wriessnegger et al. 2014). Moreover, as a major breakthrough, P. pastoris has been ruled as a GRAS (generally recognized as safe) strain for usage in food industries by the Food and Drug Administration (FDA) (Ciofalo et al. 2006; Thompson 2010). More recently, the achievements and challenges of the P. pastoris expression system for producing heterologous enzymes and biopharmaceuticals have been well reviewed (Ahmad et al. 2014; Puxbaum et al. 2015; Vogl et al. 2013). In this review, recent advances on the development of molecular toolbox (including promoter, terminator, signal peptide, secretory machinery and genome engineering tools) in P. pastoris are described and summarized. Furthermore, the direction for future research perspective and applications towards P. pastoris are also discussed.

Promoter and transcriptional terminator toolbox

The initial transcription is generally a critical step in protein expression. Therefore, identification, characterization and construction of inducible and constitutive promoters with different strength are essential to engineer P. pastoris as a synthetic microbial cell factory towards enzymes (Cos et al. 2006; Jin et al. 2014; Vogl and Glieder 2013) or metabolites (Vogl et al. 2013). To this end, many natural strong, tight-regulated inducible promoters (for instance PAOX1 which depressed by glucose and glycerol and activated by methanol) and constitutive promoters have been characterized and widely used for production of heterologous proteins (Ahmad et al. 2014; Cregg et al. 2009; Felber et al. 2014; Karaoglan et al. 2016; Liu et al. 2013). By investigating and clarifying a rhamnose utilization pathway in P. pastoris, two rhamnose-inducible promoters have been identified and characterized as excellent candidates for driving the production of food-grade and therapeutically proteins (Liu et al. 2016). Similarly, by comprehensively analyzing the transcriptome of P. pastoris, many new carbon source dependent promoters were identified (Love et al. 2016). More recently, Vogl et al. deeply studied the regulation of the methanol utilization pathway in P. pastoris and successfully identify a powerful set of strong and weak methanol-induced promoters, which not only realized strictly regulated high coexpression of interested pathway genes for balanced metabolism but also increased their genetic stability because of the different promoter DNA sequences (Vogl et al. 2016).

To date, although a large set of wild-type inducible and constitutive promoters are available, novel short artificial promoters with different properties are required for producing industrial enzymes and fine-tuning gene expression in metabolic engineering and synthetic biology. In this regard, based on identification of the cis-acting elements for regulating AOX2 gene (Ohi et al. 1994) and a positive acting transcription factor (MXR1) in P. pastoris (Lin-Cereghino et al. 2006), Hartner et al. developed a novel short artificial PAOX1 synthetic promoter library by combining cis-acting elements with basal promoter (Hartner et al. 2008). Subsequently, synthetic inducible promoters by fusing the cis-acting elements and the core promoter fragments were constructed and successfully applied to improve the production of porcine trypsinogen (Ruth et al. 2010). In light of promoter engineering strategies that focused on the upstream regulatory sequences (URS), the 5′ untranslated region (UTR), and the core promoter sequence (Blazeck et al. 2012; Blount et al. 2012; Redden and Alper 2015; Xuan et al. 2009). Vogl et al. designed a group of synthetic core promoters for P. pastoris (Vogl et al. 2014), which will facilitate construction and application of novel orthogonal promoters for engineering dynamic synthetic circuits and pathways.

Additionally, due to the toxicity of methanol to P. pastoris and human being, it is much more favorable to optimize the P. pastoris cell factories with other inducible or constitutive promoters, especially towards the production of food products (Spohner et al. 2015). In fact, compared with methanol induced promoters, constitutive promoters always generated simplified cultivation process (Zhang et al. 2009) and the constitutive promoter PGAP has been applied for large-scale production of enzymes (Mao et al. 2015; Varnai et al. 2014; Zhang et al. 2009). Moreover, it has been demonstrated that in many cases, constitutive promoters can generated higher expression levels of enzymes of interest comparing with PAOX1 (Cos et al. 2006; Spohner et al. 2015; Zhang et al. 2009). Thus, construction of an alternative synthetic constitutive or inducible promoter library is also desirable (Fig. 1). To facilitate fine-tuning and precise control of gene expression, Qin et al. created a functional promoter library through mutagenesis of the constitutive promoter PGAP, which enabled an activity ranging from 0.6% to nearly 19.6-fold (Qin et al. 2011). Additionally, Curran et al. have successfully realized de novo design of synthetic promoters in S. cerevisiae by utilizing a designed computationally-guided approach after investigation of nucleosome architecture (Curran et al. 2014). From the perspective of metabolic engineering, dynamic control of pathway enzymes by applying stress-response promoters is desirable (Dahl et al. 2013). By modifying transcription factor binding sites in the upstream activation sequence of the YGP1 promoter, the low-pH performance was significantly increased. On this basis, a novel low-pH (pH ≤ 3) dependent promoter from the unrelated CCW14 promoter was engineered, which realized tenfold increase in the production of lactic acid compared to the commonly used TEF1 promoter (Rajkumar et al. 2016). According to these studies and findings, it is predictable that synthetic specific stress or phase-dependent promoters with different strengths for P. pastoris could be designed and constructed in near future.

Fig. 1
figure 1

Strategies for construction minimum synthetic promoters and promoter mutant libraries

In addition to promoters, transcriptional terminators which determine the position of transcription termination and poly(A) addition also play critical roles in regulation of stability of mRNAs and the expression level of the genes in yeast (Curran et al. 2013, 2015). Especially, it has been demonstrated that combinatorial optimization of promoters and terminators is an effective strategy to balance metabolic pathways (Curran et al. 2013; Vogl et al. 2016). Consequently, screening and recruitment of native terminators or construction of short synthetic terminators for construction of yeast terminator toolbox for synthetic biology is imperative (Curran et al. 2015). In this regard, the terminator regions in S. cerevisiae have been comprehensively evaluated at genome-wide scale (Yamanishi et al. 2013), which not only resulted in the creation of a “terminatome” toolbox but also provided valuable information to deeply learn the modulatory roles of terminator. On the other hand, Curran et al. have successfully designed and characterized a panel of short (35–70 bp) synthetic terminators for modulating gene expression in S. cerevisiae (Curran et al. 2015). Furthermore, the synthetic terminators are also highly functional in an alternative yeast, Yarrowia lipolytica, suggesting these synthetic terminators are transferrable between diverse yeast species. More recently, MacPherson and Saka developed a valid strategy to develop orthogonal synthetic terminators for regulate gene expression, efficient assembly of transcription units and stable chromosomal integration (MacPherson and Saka 2016). These standardized short terminators with identical length share few homologous sequences, which not only facilitate molecular operation but also mitigated the risk of undesired recombination events. Although fundamental knowledge and molecular tools of P. pastoris are relatively limited compared to that for S. cerevisiae, most of the regulatory elements including promoters and terminators are common in S. cerevisiae and P. pastoris. Thus, it is feasible and worthwhile to design and develop a set of short artificial terminators for P. pastoris.

Signal peptide toolbox

Secretion signal peptides (SP) which generally locate at N-terminal and comprise three parts determine the translocation of nascent polypeptide into the endoplasmic reticulum (ER) and secretion into extracellular medium. In the past decades, although P. pastoris has been considered as an ideal expression host especially for glycosylated proteins (Puxbaum et al. 2015), very few known SP sequences were characterized and applied for secretory expression of heterologous proteins (Table 1). In addition to the commonly used S. cerevisiae alpha-factor prepro-peptide (Cereghino and Cregg 2000; Cregg et al. 2000) and signal sequence, only the native acid phosphatase (PHO1) (Heimo et al. 1997; Romero et al. 1997), the S. cerevisiae SUC2 gene signal sequence (Paifer et al. 1994) and the bovine β-casein signal peptide (He et al. 2012) were occasionally used. Additionally, the secretory efficiency of signal peptides always differ widely when associated with different recombinant proteins (Ghosalkar et al. 2008; Zhu et al. 2011). Thus, it is of great importance to identify and characterize new candidates and construct a signal peptide library to test individually for different proteins. In view of this perspective, Kottmeier et al. characterized three novel secretion signals originating from hydrophobins of Trichoderma reesei and showed that the secretion sequences derived from HFBI and HFBII have the potential to achieve an efficient secretion of heterologous proteins in P. pastoris (Kottmeier et al. 2011). For high level expression of Candida antarctica lipase B (CALB), the native lipase B signal (nsB) peptide with 25-amino acid was recently investigated and evaluated. As a result, about a threefold increase in CalB production was achieved compared to alpha-factor prepro-peptide, suggesting that this short nsB signal peptide can be a good alternative for heterologous protein expression in P. pastoris (Vadhana et al. 2013). More recently, Kang et al. also improved the production of a leech hyaluronidase (LHAase) in P. pastoris by replacing the alpha-factor prepro-peptide with nsB sequence (Kang et al. 2016), further demonstrating the great potentials of this short nsB peptide in enzyme expression. Additionally, in order to avoid fragmentation of the proteins containing the Kex2p cleavage sites such as KR and RR, Govindappa et al. have successfully identified a new signal sequence with 18 amino acids from a P. pastoris protein with efficient secretion. Furthermore, expression of the plant originated porcine carboxypeptidase B and Erythrina trypsin inhibitor demonstrated the powerful capacity and robustness of this short signal peptide (Govindappa et al. 2014).

Table 1 Signal peptides used for expression of enzymes in P. pastoris

In addition to the above occasionally isolated SP sequences, many new SPs have been extracted and determined by the help of in silico and subsequent experiment analyses. To date, several SP prediction programs including SignalP4.1(Petersen et al. 2011), Phobius(Kall et al. 2004), WolfPsort0.2 (Horton et al. 2007), ProP1.0 (Duckert et al. 2004) and NetNGlyc1.0 (http://www.cbs.dtu.dk/services/NetNGlyc/) have been developed. By applying the SignalP program (Bendtsen et al. 2004), the potential signal peptides from three P. pastoris proteins PpScw11p, PpDse4p and PpExg1p were predicted and their secretion capacities were investigated with green fluorescent protein (GFP) and CALB as reporters (Liang et al. 2013). The results demonstrated that these SPs had equally or slightly higher secretion efficiency compared with the alpha-factor prepro-peptide. More recently, Aslan Massahi and Pınar Çalık systematically screened and identified novel SPs in P. pastoris (Massahi and Calik 2015). Excitingly, eight SPs had higher D-score values than that of S. cerevisiae α-mating factor while three SPs showed highest D-score values which were MKILSALLLLFTLAFA (D = 0.932), MRPVLSLLLLLASSVLA (D = 0.932) and MFKSLCMLIGSCLLSSVLA (D = 0.918), respectively. On this basis, the authors selected five SPs (D-score > 0.8) for production of recombinant human growth hormone (rhGH) (Massahi and Calik 2016). In comparison, SP23 generated highest production. The results suggest that the constructed SP library is very useful for individual testing of SPs towards specific enzymes of interest especially due to the unclear correlation between secretion efficiency and SP physicochemical properties. Additionally, recent studies on Streptomyces griseus Trypsin in P. pastoris found that N-terminal sequence affected the secretory expression and the enzymatic properties (Ling et al. 2012, 2013, 2014; Zhang et al. 2016). As a result, rapid evolution and development of novel short synthetic SPs with synthetic biology methods (Jin et al. 2016a, b) should be also a promising direction.

Genome engineering toolbox

Because of the unavailable stable plasmid expression systems for P. pastoris, nearly all the constructed expression cassettes were integrated into genome for efficient expression by homologous recombination (HR) and non-homologous end joining (NHEJ). Although NHEJ is more dominant in filamentous fungi and higher eukaryotic organisms compared with HR, the uncertainty of integration sites and the unpredictable deletions of nucleotides often occurs (Naatsaari et al. 2012). Therefore, development and application of the HR-dependent integration systems attracted more attention. Currently, several auxotrophic- (ADE1, MET2, URA3, URA5, ARG1, ARG2, ARG3, ARG4, HIS1, HIS2, HIS4, HIS5, HIS6, MET2 and FLD1) (Cereghino and Cregg 2000; Nett and Gerngross 2003; Nett et al. 2005; Sunga and Cregg 2004; Thor et al. 2005) and antibiotic-dependent (Zeocin, blasticidin, kanamycin/G418 resistance) (Lin-Cereghino et al. 2008; Scorer et al. 1994) selectable marker genes have been used for selection and screening of positive integrants.

To realize efficient and repeated operation of genome and construct marker-free strains, Flp recombinase dependent modification system from the yeast 2 μ plasmid (Broach et al. 1982) was applied for P. pastoris genome engineering. After expression of the DNA fragments that located between two inverted repeat sequences (FRT) were precisely removed. Eventually, one 34 bp FRT site was left in the locus (Cregg 1989). Similarly, the Cre-loxP system that was well developed for S. cerevisiae (Gueldener et al. 2002; Guldener et al. 1996) was successfully introduced into P. pastoris (Marx et al. 2008) (Fig. 2a). Also, the loxP site as a scar was permanently left in the target site which might result in unpredictable recombination. In consideration of scarless genome engineering, the T-urf13 gene from the mitochondrial genome of male-sterile maize was used as a counterselectable marker (which expression confers sensitivity to methomyl) for genome engineering in P. pastoris (Soderholm et al. 2001). To develop a universal scarless genome engineering tool, Yang et al. recruited an Escherichia coli coli toxin (MazF) encoding gene and successfully constructed an efficient tool for repeated knocking-in, knocking-out and site-directed mutagenesis in P. pastoris (Yang et al. 2009) (Fig. 2b).

Fig. 2
figure 2

Genome engineering strategies with homologous recombination. a Cre-lox dependent system; b mazF-dependent system; c CRISPR/Cas9 dependent system

Compared with S. cerevisiae, P. pastoris has a less efficient homologous recombination system. In S. cerevisiae, the targeting efficiencies can be close to 100% with homologous over-hangs of approximately 50-bp. However, even homologous over-hangs of 1000-bp can only result in a frequency of 10–30% in P. pastoris (Li et al. 2007; Naatsaari et al. 2012). To improve gene targeting efficiency in P. pastoris, Näätsaari et al. identified and deleted the P. pastoris KU70 homologue, which encodes a key player in the NHEJ repair system, and substantially increased the homologous recombination frequency over 90% with only 250-bp flanking homologous DNA (Naatsaari et al. 2012). During multiple rounds of cultivation, no severe growth retardation or loss of gene copy numbers was observed. Therefore, the ku70 deletion strain could be used as a platform for protein production and synthetic biology studies. To introduce programmable breaks at positions of interest in the genome, Weninger et al. systematically investigated and optimized the combinations of co-overexpression of the nuclease Cas9 and the guide RNA (gRNA) with RNA Polymerase III and II promoters (Weninger et al. 2016). Specifically, a nuclear localisation sequence (NLS) (Weninger et al. 2015) was fused to Cas9 to guarantee its activity in nucleus. Eventually, this CRISPR (clustered regularly interspaced short palindromic repeats)/Cas9 system (Jinek et al. 2012) was successfully developed in P. pastoris (Fig. 2c), which allowed rapid, marker-less introduction of multiplexed gene deletions and integrations of homologous DNA cassettes. This system has been widely adopted in metabolic engineering and synthetic biology applications in P. pastoris. Due to the toxicity of the nuclease Cas9 and the possibility of off-target, fine-tuning of Cas9 and optimization of gRNA should be considered for further improving this CRISPR/Cas9 system in P. pastoris.

Outlook

Among all yeast species, P. pastoris has been the most commonly used eukaryotic expression system for production of heterologous protein. Especially in recent years, many genome-scale metabolic models and analysis for glycosylation, synthetic gene design, and pathway engineering have been reported (Ang et al. 2016; Chung et al. 2010; Irani et al. 2016). It can be anticipated that with the rapid development of more synthetic biology toolboxes and deeper understanding of the physiological processes and genetic information, many bottlenecks in gene expression regulation, protein folding and secretion, and glycoengineering will be soon addressed, which will further boost the applications of this eukaryotic cell factory in food and pharmaceutical applications.