Introduction

Introduced as methylotrophic yeast in the 1960s, Pichia pastoris now reassigned as Komagataella phaffii (K. phaffii) has been proven as a valuable expression system for a growing list of therapeutic proteins and industrial enzymes. This species was initially used for the production of single-cell protein as an animal feed additive, leveraging methanol as a carbon and energy source but the rising cost of oil made it economically unviable. Approximately ten years later Philips Petroleum developed P. pastoris as a heterologous gene expression system along with the Salt Institute Biotechnology/Industry Associates, Inc. (SIBIA) [1, 2]. Two decades earlier, this system was licensed to Invitrogen to be made available for researchers worldwide and The Food and Drug Administration (FDA) considered P. pastoris as Generally Recognised as Safe (GRAS) [3].

The publication of a high-quality genome sequence of CBS7435 (parental strain) in the year 2011 served as a breakthrough and the first strain developed for protein expression is GS115 [4]. High cell culture density, ease of genetic manipulations, ability to secrete recombinant proteins, protein processing, eukaryotic post-translational modifications, and stable genetic constructs [5,6,7] makes K. phaffii (often referred to as P. pastoris) a versatile and most favored eukaryotic expression system. It also lacks the well-known drawbacks present in bacterial expression systems such as the formation of inclusion bodies and the presence of endotoxins or mammalian systems such as high cultivation, handling costs, and reduced levels of protein expression [8]. The basic characteristics of different host systems are listed in Table 1. The most studied yeast expression system platform for recombinant protein production is the Baker’s yeast Saccharomyces cerevisiae owing to its advantages such as secretory expression, glycosylation, and eukaryotic post-translational modifications but the expressed proteins are often hyper glycosylated which in turn affects the protein immunogenicity [9]. The major advantage of P. pastoris gaining momentum over Saccharomyces cerevisiae as an expression platform is attaining high cell density culture due to the aerobic mode of respiration, expression of protein under tightly regulated methanol inducible AOX1 promoter, both the intracellular and extracellular expression of recombinant proteins by secretory expression (alpha mating factor—MATα) pathway, and metabolic engineering of yeast glycosylation pathway(Glycoengineered strains) to mimic the mammalian system [3, 10, 11]. Recombinant proteins can be produced in either a constitutive or induced manner depending on the type of promoter used and are secreted to the culture supernatant with less than 10% of endogenous proteins making it easier for various purification strategies in this host system [12].

Table 1 Different types of expression systems

Strong methanol inducible promoters from the alcohol oxidase genes (alcohol oxidase 1, AOX1, and to a lesser extent, alcohol oxidase 2, AOX2), as well as from the glyceraldehyde 3-phosphate genes, have helped P. pastoris succeed in producing recombinant proteins. Despite the effectiveness, strict control, and recombinant protein productivity attained when PAOX1 is used to drive transgene expression, this manufacturing system has some issues related to the use of methanol. Methanol usage results in high oxygen demand for catabolism because it is harmful to cells and causes oxidative stress [13]. Apart from this, recombinant protein production is hindered at two different levels, one is protein synthesis and the other is protein folding and secretion. Considerable attempts have been made to better understand the physiology and cell biology of P. pastoris [2, 14, 15], protein folding and intracellular trafficking [5, 16], metabolic engineering concerning glycosylation [17], and process monitoring [14] to increase recombinant protein production in recent years. However, it is important to emphasize and summarize the use of methanol-independent promoters as a substitute (a full list of vectors with various promoters that are commercially available are listed out under “Promoters” section), developments on strain engineering, and methods to overcome the challenges in the production of recombinant proteins with respect to copy number, codon optimization, culturing conditions, use of 2A peptide system, and CRISPR techniques and their contribution to Pichia’s effectiveness to produce recombinant proteins for commercial use.

Methodology

For this review, we conducted an extensive search of literature survey on various reliable databases such as PubMed, Google Scholar, Wiley Online Library, Elsevier, and Scopus. Full-length articles about the subject of interest—Bio therapeutics in P. pastoris expression system, published between 2010 and 2023, written in the English language were taken into consideration for this systematic review. The keywords used in various combinations to search the articles in the above-mentioned databases were K. phaffii, P. pastoris, recombinant proteins, CRISPR/Cas, 2A peptide system, promoters, and chaperones. Since a large amount of data is available online, certain inclusion and exclusion criteria were used to refine the screening process as described in Fig. 1. The inclusion criteria included proteins expressed for therapeutic applications using P. pastoris, investigation on various promoters, and strategies to overcome recombinant protein production. Conference abstracts, duplicated studies, and non-peer-reviewed articles were excluded from the study. 95 out of 200 articles were included for this study purpose.

Fig. 1
figure 1

Methodology for the search of literature

Pichia pastoris as a Lower Eukaryotic System

Expression in P. pastoris follows a distinct workflow different from the other expression systems such as bacteria and higher eukaryotic systems (Fig. 2). The target gene is cloned into the P. pastoris vectors such as pPICzαa or pPIC9K and tandem inserts of the gene of interest are also possible. The linearization of the construct gives rise to 5′ and 3′ regions of homology and upon transformation; it integrates into the genome of the appropriate strain used through homologous recombination [18]. Reports are suggesting that the integration of multiple copies enhances the expression levels of the recombinant protein [19]. This is followed by a screening of various constructs based on the integration event, antibiotics selection, and Mut phenotype (MutS/Mut+). This gives rise to a few highest expressers for scale up in a shake flask or fermenter very much depending on the scale of the study.

Fig. 2
figure 2

Schematic illustration of the distinct workflow for the expression of recombinant proteins in the lower eukaryotic organism P. pastoris. Linearized DNA (pPIC9KVector with the gene of interest—GOI) (a) after confirmation through sequencing is transformed into electrocompetent GS115 cells and gets integrated via homologous recombination, (b) The transformed colonies are selected using cell auxotrophy, (c) and then grown on a 96 well plate supplemented with yeast-peptone-dextrose broth, (d), further selected based on increasing concentrations of antibiotics (Geneticin), (e) Antibiotic selection is followed by determining Mut phenotype (on a scoring template) and integration into the host genome (gene-specific and AOX-specific PCR), (f) Small-scale expression studies are carried out in 96 well plates, (g) and in 10–30 ml culture volume in shake flasks, (h) Upon analysis of the protein of interest by SDS PAGE electrophoresis, (i) and functional assays, (j) the best expression clone is identified and taken further for bioreactors

MUT Pathway

The understanding of the MUT is essential when it comes to the expression of recombinant proteins in P. pastoris as it metabolizes methanol as its carbon and energy source. This methanol metabolism pathway takes place both in the peroxisome and cytosol of yeast. Briefly, the alcohol oxidase enzyme involves in the oxidation of methanol into formaldehyde and H2O2. Under the assimilatory pathway, the resulting formaldehyde is processed into dihydroxyacetone (DHA) and glyceraldehyde 3 phosphate (GAP) by the peroxisomal dihydroxyacetone synthase (DAS) in peroxisome and ends in the cytosol with the generation of fructose 1,6-biphosphate whereas H2O2 breaks down into water and oxygen (activity of catalase). Under the dissimilatory pathway, the formaldehyde is oxidized to carbon dioxide by the enzymatic route involving formaldehyde dehydrogenase (FLD), formyl glutathione hydrolase (FGH), and formate dehydrogenase (FDH) with the release of NADH (Fig. 3) [20, 21]. The widely used strain for heterologous protein production is GS115 and it carries two alcohol oxidase genes AOX1 and AOX2. The majority of alcohol oxidase is produced by the AOX1 gene and comprises about 30% of the total proteins produced in yeast. Knocking down of the AOX1 gene results in reduced or slow growth of cells on methanol and this particular phenotype is referred to as MutS roughly translating to methanol utilization (MUT) slow and the presence of AOX1 and AOX2 intact results in the fast growth of cells on methanol and this particular phenotype is referred as Mut+ (methanol utilizing plus). When both the genes are knocked down, the strains won’t be able to grow on methanol and are referred to as Mut (methanol utilization minus). These phenotypes give rise to the strain diversity in P. pastoris.

Fig. 3
figure 3

Methanol utilization pathway in methylotrophic yeasts

Promoters

For the expression of recombinant proteins, much importance is given to the design of the expression system which comprises promoters, selection markers, signal sequences, and host strains. Promoters, generally located upstream of a gene or at the 5′ end of the transcription initiation site are the most important part of the recombinant protein expression and it is classified into two groups. They are tightly regulated inducible promoters and unregulated constitutive promoters not affected by environmental or other chemical factors. Different promoters under inducible and constitutive conditions are listed in Fig. 4.

Fig. 4
figure 4

Constitutive and inducible promoter systems available in P. pastoris for the expression of recombinant proteins (accession codes—7MB0, 3VM5 shown as an example)

Inducible Promoters

An inducible promoter is a regulated promoter that expresses its associated genes only in response to specific chemical and physical factors such as alcohol, steroid, antibiotics, oxygen level, temperature. Alcohol Oxidase (AOX1), a  ~ 960 bp fragment corresponding to the alcohol oxidase I gene regulates the metabolism of methanol and is a widely used promoter in the expression of recombinant proteins in P. pastoris. It is one of the major components in the MUT pathway and uses methanol as a sole carbon source for the induction of expression of recombinant proteins. This promoter is strongly repressed during the growth phase of Pichia pastors in glucose or glycerol and upon depletion of these sources, depression of the promoter takes place but is fully induced upon the addition of methanol. One out of many proteins produced using this promoter is Thermus thermophilus’s glucose isomerase (xylA) with a yield of 137 U/g DCW [22]. Dihydroxyacetone synthase (DAS) is similar to the AOX1 promoter and relies on methanol for carbon and energy source. This promoter is also part of the MUT pathway [23, 24].

Glutathione-dependent Formaldehyde Dehydrogenase 1 (FLD1), a ~ 1200 bp fragment was found to be an alternative for AOX1 and also as a marker for the selection of multi-copy expression strains [3]. It uses methanol as a carbon source (with ammonium sulfate as a nitrogen source) and methylamine as a nitrogen source (with glucose as a carbon source), an inexpensive nontoxic nitrogen source for the induction of protein expression. Isocitrate lyase ICL1, a ~ 1563 bp fragment found to have a 64% identity between P. pastoris Icl and Saccharomyces cerevisiae Icl. The promoter is repressed by glucose and induced in the absence of glucose or by the addition of ethanol [25]. Dextranase gene (dexA) from Penicillium minioluteum under the control of the ICL1 promoter in P. pastoris is considered a good alternative for the expression of heterologous proteins [26].

Putative Sodium Coupled Phosphate Symporter (PHO89), a ~ 1044 bp fragment exhibited stronger transcriptional activity with higher specific productivity. Expression of a bacterial lipase gene upon phosphate starvation in different modes of fermentation was shown to be regulated using the PHO89 promoter [25]. Thiamine Biosynthesis (THI11), a ~ 1000 bp fragment gene encodes a protein involved in the synthesis of thiamine precursor, and this promoter in P. pastoris is repressed by thiamin. Fed-batch cultivation of human serum albumin under this promoter showed high transcript levels at a low specific growth rate and also interesting regulatory properties on the availability of thiamine in a growth medium [27].

Alcohol Dehydrogenase (ADH1), a ~ 1000 bp fragment encoding the alcohol dehydrogenase gene is required for the reduction of acetaldehyde to ethanol. This particular promoter is repressed on glucose and methanol while induced on glycerol ad ethanol for the production of recombinant proteins in P. pastoris. Five different synthetic promoters derived from ADH1 by addition and deletion of regulatory sites within the promoter were found to increase the product range of extracellular xylanase between 165 and 200% when compared to the native promoter [28]. Glycerol Kinase (GUT1) is similar to ADH1 and gets repressed on methanol and induced on glucose, glycerol, and ethanol. Enolase (ENO1) gets repressed on glucose, methanol, and ethanol and induced on glycerol. These three promoters cannot be compared with widely used AOX1 due to the lack of absolute expression values. Peroxisomal Matrix (PEX8), a moderately expressing promoter directs the peroxisomal matrix protein formation which is essential for peroxisomal biogenesis.

Constitutive Promoters

A constitutive promoter is an unregulated promoter that allows the transcription of its associated genes continuously and is not affected by environmental or developmental factors. Glyceraldehyde-3-P Dehydrogenase (GAP), a 477 bp fragment is a key enzyme in the glycolysis pathway. Like AOX1 in inducible promoters, this is widely used for constitutive expression of recombinant proteins using glucose and to a lesser extent glycerol [29]. Since methanol is not used for induction, this promoter makes it straightforward without any carbon source change during the growth and induction of various heterologous proteins [30]. Rab Family GTPase Yeast Protein (YPT1), a 618 bp fragment encodes GTPase essential in secretion and is involved in moderate expression levels of recombinant proteins. This provides a low but constitutive expression of glucose, methanol, or mannitol as a carbon source.

Phosphoglycerate Kinase (PGK1), a 1251 bp gene shows high identity to homologous proteins from other yeasts and codes for the protein 3-phosphoglycerate kinase. This promoter is shown to be a potential alternative for the constitutive expression of glucose and less on glycerol and methanol [31]. Sorbitol Dehydrogenase (SDH), a 211 bp fragment is an important enzyme in carbohydrate metabolism. This promoter was evaluated for expression of two heterologous proteins, human serum albumin and erythrina trypsin inhibitor under repressive as well as non-repressive carbon sources [32].

Translation Elongation Factor (TEF1), a gene with an open reading frame of 1380 bp with the potential to encode 450 amino acids was tested in comparison to the well-studied GAP promoter. It is found to have a strong promoter activity in high glucose and carbon-limited conditions and produces recombinant heterologous proteins at levels similar to or greater than the GAP promoter [33]. Glycosyl Phosphatidyl inositol (GPI)-anchored protein (GCW14), an 822 bp promoter region was identified and characterized the regulatory sequences involved in this particular promoter and showed that it enables stronger expression than GAP and TEF1 when enhanced green fluorescent protein was used as a reporter constitutively [34].

Glucose (G1) and (G6), genes encoding for high-affinity glucose transporter and putative aldehyde dehydrogenase were identified as novel regulated promoters that do not use methanol as an inducer. These promoters get repressed on glycerol and induced upon glucose limitation. G1 was well suited for the protein production processes when compared to G6 and Gap promoter [25].

To address the limited number of promoters available, two distinct methodologies were used to identify novel promoters that aid in recombinant protein expression. The first strategy, heterologous microarray hybridization, was used to hybridize Pichia cDNA with S. cerevisiae cDNA microarrays after it was isolated being grown on separate carbon sources at two distinct pH values. The transcriptome data mining revealed 15 genes with high expression levels, indicating the presence of strong promoters. The second technique, the logical selection of promoters from different yeasts based on literature mining, resulted in the identification of nine probable strong promoter sequences. 80% of the promoter sequences discovered from the transcriptome data demonstrated their promoter activity on all carbon sources typically used for P. pastoris growth. A total of 24 potential promoter sequences were examined for their promoter activities for both intracellular and extracellular protein expression. The promoter sequences that were rationally chosen had a very low success rate. Many of these discovered promoters, including two chaperones (HSP82 and KAR2) and three ribosomal (RPL1, RPS2, and RPS31) promoters, exhibited behavior that was growth dependent [3]. The Lonza XS Pichia 2.0 glucose-regulated promoter system has also been engineered to circumvent the restrictions associated with the toxic effects of methanol, which can affect purity and productivity at high growth rates [35]. Based on these findings, a broad toolbox for promoter engineering in P. pastoris may be set up based on the rational design and a better understanding of the target promoter control mechanisms.

Vectors and Selection Markers

Vectors otherwise referred to as plasmids used in recombinant DNA technology usually possess an origin of replication, a multi-cloning site, and a selectable marker. They do have a bi-functional system that lets them propagate or replicate in E. coli and express the gene of interest in P. pastoris. They are characterized by the presence of selection markers such as auxotrophic markers or genes conferring resistance. The auxotrophic markers include HIS4, ARG4, ADE1, URA3, URA5, GUT1, and ADE2 to name a few while the antibiotic-resistant genes include Zeocin, Geneticin, Kanamycin, Blasticidin S, Hygromycin and so on. The different vectors used in the production of recombinant proteins both intracellular and extracellular with their characteristics and features using P. pastoris are listed in Table 2. Apart from the vectors presented in the table, academic researchers have been successful in modifying the commercially available vectors (with prior permission from the respective supplier) with different promoter sequences mentioned in “Promoters” section of this review.

Table 2 Commercial vector systems and its characteristics

Expression Strain Development

The methylotrophic yeast strains extensively used for the production of recombinant proteins were all derived from NRRL-Y 11430 strain. They are briefly classified into auxotrophic strains, protease deficient strains, and glycoengineered strains based on the mutations on aox1,his4, arg4, pep4, prb1, etc. The most frequently used strains for heterologous protein productions are auxotrophic strain GS115 (his4), prototrophic strain X33(WT), aox knockout strains KM71 (his4 arg4 aox1::ARG4), KM71H (arg4 aox1::ARG4). The presence of these genes in the vector also facilitates a better screening protocol in place. The strain diversity also includes protease-deficient strains such as SMD1163 (his4 pep4 prb1) SMD1168 (his4 pep4::URA3 ura3) and SMD1168H (pep4) and ade2 auxotrophic PichiaPink™ strain. The strains derived from P. pastoris CBS7435 are not covered by any patent protection and are widely used for commercial purposes and a huge number of different strains are developed according to the need of the researcher. One such example is CBS7435 MutS strain provided by the Graz Pichia pool being a marker-free as it was developed using Flp/FRT recombinase system technology. Apart from these strains, glycoengineered strains using GlycoSwitch® technology were made available for facilitating homogenous, humanized glycosylation patterns in recombinant proteins. A detailed review of the host strains is available online on the topic of Protein expression in P. pastoris by Ahmed et al. [26]. The engineering of the CBS7435 strain to produce cholesterol instead of ergosterol was studied to functionally express human membrane proteins [36]. A recent study on 45 different strains of P. pastoris suggests that cumulative oxygen transfer might be used as a screening criterion to facilitate the pre-selection of high-producing strains [37]. The genetic toolsets required to generate P. pastoris strains secreting recombinant proteins have advanced dramatically in recent decades. As a result, the bottleneck in bioprocess development is no longer the creation of strains, but the identification of high-performing clones. Recently a microscale cultivation strategy has been shown to increase the efficiency of high throughput screening of recombinant clones by optimizing the architecture of 96 deep-well plates, shaking throw diameter, shaking frequency, culture volume/well, and media composition [38]. Through fluorescence-activated cell sorting, a switchable secrete and capture technology has demonstrated the ability to efficiently isolate high-producer clones of Fab fragments from millions of cells [39]. In a rather new study, the Mattanovich team created an autotroph P. pastoris strain that can grow on CO2. The peroxisomal methanol-assimilation route of Pichia was engineered into a CO2-fixation pathway reminiscent of the Calvin-Benson-Bassham cycle by the addition of eight heterologous genes and the deletion of three native genes; the resulting strain demonstrated the ability to grow continuously with CO2 as a unique carbon source [40]. Ito et al. showed that a combination of genome-wide screening of effective factors for gene disruption, their accumulation in one strain, and an Adaptive laboratory evolution(ALE) for recovering the reduced cell growth of gene-knockout strains is an effective strategy for enhancing the secretion of heterologous proteins by the unconventional yeast P. pastoris [41]. These new techniques, in conjunction with the well-studied host strains, may offer a time-effective strategy for primary screening, allowing for the accelerated selection of high-producing P. pastoris strains.

Strategies of Recombinant Protein Production and Its Challenges

The well-studied and easy-to-use P. pastoris expression has its challenges and some degree of optimization is essential for better production of recombinant proteins. It is evident from the vast literature that the expression of proteins in Pichia is highly target specific and various strategies are applied for significant achievements.

Gene Codon Optimization and Copy Number

Codon optimization has become a go-to tool for improved expression of a gene in P. pastoris and the usual strategy is replacing rare codons to match the codon usage bias [42,43,44]. Numerous studies suggest a substantial shift in the understanding of the function of codon bias and how it affects genes in native and expression hosts [45]. A version of α amylase (codon optimized) from Bacillus showed significantly higher expression levels and showed an activity of 8100/ml (2.31 fold higher) compared to the wild-type version of the same gene [43]. A new tool is developed called CPO (Codon Pair Optimisation) to provide a simple and efficient codon pair optimization for synthetic gene design in P. pastoris. They show that gene design based on codon pair bias significantly improved the protein expression levels and might be an alternate strategy to codon usage bias [46]. Karaoglan and Erden-Karaoglan proposed a model for the expression of the A.niger protein endo-polygalacturonase (Pgl), whose sequence was subjected to codon optimization, evaluating its performance under the control of two promoters (PAOX and PADH2). The best productivity was obtained in shaking flasks using the codon-optimized PGL employing the PADH2, with a productivity of 42.33 U/mL (fourfold increase) [47].

Gene copy number analysis has been carried out for the past 20 years and a direct correlation between the copy number and expression levels has been shown for intracellular expression [19, 48]. But the direct correlation is not necessarily valid when it comes to secretions of that particular protein. Multiple copy integration through a single crossover event accounts for only 1% of all transformants from a single transformation and screening of this is labor intensive and is possible through antibiotic selection and not by auxotrophic markers. A recent study suggests that human DNA topoisomerase I was successfully expressed with single and multicopy inserts (via in vivo strategy using pPIC3.5K) and the multicopy transformant was found to express the highest expression levels of total protein and also exhibited the highest enzyme activity [48]. But this was possible with the use of an intracellular expression vector system and the same is not apparent for secretory proteins. Reports are suggesting that multicopy integration also leads to genetic instability by excision of the integrated gene through the loop-out method owing to the highly recombinogenic nature of P. pastoris. Additionally, a spike in the number of copies of foreign genes may change Pichia's regular metabolism, which would have a detrimental effect on the multiple-copy recombinant yeast's regular cell physiology. This calls for testing transformants with increasing gene copy numbers to determine the ideal gene copy number for maximal protein synthesis. Extensive research is needed to use the multi-copy hypothesis for the production of secretory proteins which is indeed an essential trait for the ease of downstream processing.

Culturing Conditions

Various optimization strategies including media composition and culture conditions are in place to produce recombinant proteins in P. pastoris. Essentially all P. pastoris strains grow on a defined medium with supplements following the phenotype of the strains. The wild-type strain X33 grows on minimal media whereas the widely used GS115 grows only on minimal media supplemented with histidine. However, Yeast-Peptone-Dextrose (YPD), Buffered Glycerol complex media (BMGY), Buffered methanol complex media (BMMY), and Basal salts media are extensively used for screening, expression, and fermentation studies. To attain high cell density growth and better expression of heterologous proteins, the medium consists of biotin and ammonium hydroxide as the nitrogen source, glycerol, and methanol as the carbon and energy source along with basal salt medium(BSM), and trace elements such as zinc chloride, ferrous sulfate heptahydrate [38]. Methanol as an inducer for the expression of recombinant proteins is the principal carbon source and monitoring of methanol percentage during biomass production and induction is crucial for better optimization of productivity and toxicity management. The utilization of methanol very much depends on the Mut form of the strains used (Mut+, MutS, Mut). An immense literature survey points out that the minimum concentration of methanol for the induction of heterologous proteins is 0.5% and the maximum is 2–2.5% for the production of fully expressed proteins. Any concentration of about 5% is considered toxic and results in the accumulation of formaldehyde and hydrogen peroxide and indeed death of the cells [10]. An extensively used alternative for glycerol with methanol is sorbitol and unlike glycerol, it does not induce or repress AOX promoters hence using sorbitol could reduce cell growth rate and increase recombinant protein production. It also can reduce intermediate metabolites, decrease toxicity production, and positively affect cell growth and energy supply for recombinant protein production [49,50,51,52]. In a different investigation, the production of three heterologous cellulases—an exoglucanase from Trichoderma reesei, β-glucosidase, and endoglucanase enzymes from Aspergillus niger—enabled the resulting strain to thrive on cellobiose and carboxymethyl cellulose [53].

P. pastoris is usually grown in an optimum temperature range of 28–30 °C and growth above 32 °C is considered detrimental to protein production. A reduction in temperature from 30 to 20 °C during induction has shown a tremendous change in protein production due to decreased folding stress and lower proteolytic activity. Two more critical factors in the production of heterologous proteins in P. pastoris are the incubation time and agitation speed. Production time is long in Pichia close to 100 h and the most preferred time point is 72–96 h of incubation time [54, 55]. In the shake flask culture, the agitation speed maintained is usually between 250 and 300 rpm and results in better aeration with high productivity titers.

Because the generation of high-titer recombinant proteins is heavily dependent on the target protein, incorporating cell engineering methodologies, as well as co-substrate feeding and auxiliary carbon sources, may provide a new approach for P. pastoris culture strategies. Some of the recently produced recombinant proteins expressed in P. pastoris are listed in Table 3 with their different culturing conditions.

Table 3 Recombinant biological molecules expressed in P. pastoris with varying culture conditions

Protein Synthesis and Its Secretion

With the help of codon optimization and various culturing conditions, a huge number of proteins can be produced with this expression system at high levels whereas complex, multimeric human proteins pose a great challenge. The expression of complex proteins often leads to the overproduction of misfolded proteins thereby triggering cellular stress in the host cells. The recombinant proteins intended for secretion undergo translocation into the endoplasmic reticulum lumen as a nascent peptide. This nascent peptide undergoes folding, disulfide bond formation, and other post-translational modifications which are unique to the expressed recombinant protein. Only the properly folded proteins are allowed to exit ER to the Golgi apparatus where post-translational modifications such as glycosylation occur.

Some recombinant proteins might fail to undergo proper post-translational modification for unknown reasons and that is when unfolded protein response (UPR) pathway and ER-associated degradation (ERAD) pathway come into play [65]. Overexpression of chaperones or helper proteins such as BiP/Kar2p [64, 66, 67], Pdi1 [44], Ero1p, and a transcriptional regulator HAC1 was found to be an alternate strategy to overcome secretory bottlenecks. A crucial technique for raising the yield of many recombinant proteins is the overexpression of the transcription factor Hac1. Huang et al. investigated the effect of HAC1p on a raw starch hydrolyzing-amylase (Gs4j-amyA) to improve the heterologous synthesis of the enzyme in P. pastoris. They also looked at the promoter and copy number variations used for HAC1 overexpression. In this case, a strain with basal expression of 305 U/mL and 12 copies of the GS4J-AMYA gene driven by PAOX1 was used. Amylase activity rose to 2200 U/mL upon the inclusion of six copies of PAOX1-driven HAC1 [68, 69]. Yu et al. reported the distinct impacts of the co-overexpression of nine proteins under PAOX1 regulation on PGAP-driven k-carrageenase production, including seven chaperones (Pdi: protein disulfide isomerase; Ire1: endoplasmic reticulum stress transducer; ero1: endoplasmic reticulum oxidoreductase; Kar2: immunoglobulin-binding protein; Aha1: activator of Hsp90 ATPase; Ypt6; GTPase; Prx1: thioredoxin-linked peroxidase) and two transcription factors Yap1 and Rpn4 (proteasome subunit transcription factors) [70, 71].

The production of transcription factors like Yap1 and Hac1 is another method that enhances protein folding and secretion. Although it was anticipated that their expression in combination with other folding facilitator proteins like Pdi1 and Kar2 would increase the production of the target recombinant protein, research by Sun et al. and Duan et al. revealed that the productivity of the recombinant protein was either maintained or decreased [72, 73]. As a result, many researchers have implemented multiple modifications at the same time, combining the co-expression of chaperones and/or foldases with other genetic manipulation tools such as optimization of codon usage, gene copy number, co-expression/modulation of transcription factors, and variation in culture conditions to improve heterologous protein productivity [47, 66, 67, 74, 75]. These results show that no combinatorial approach offers equivalent benefits for the secretion of all recombinant proteins, and all the above-mentioned parameters need to be tested individually for the specific protein of interest.

2A Peptide System

Co-expression of several genes in eukaryotic cells can be accomplished by introducing separate plasmids into the host cell, utilizing a plasmid with multiple promoters, creating proteolytic sites, internal ribosome entry sites (IRES), or self-processing 2A sequences between genes. 2A sequences are short 18–20 aminoacid peptides derived primarily from viral polyproteins that enable a ribosome-skipping event that allows numerous distinct proteins to be synthesized from a single open reading frame. When used in metabolic engineering and synthetic gene circuits, 2A peptides enable co-regulated and consistent expression of several genes in eukaryotic cells. The core sequence motif of a 2A peptide is DXEXNPGP (where X refers to any amino acid and ↓ refers to the site of cleavage between the C terminal glycine of 2A and N terminal proline of the downstream protein) [76]. The 2A peptides extensively used for the co-expression of multiple proteins are P2A (porcine teschovirus-1), F2A (foot and mouth disease virus 18), E2A (equine rhinitis A virus), and T2A (thosea asigna virus). Out of these 4 2A peptide sequences, the most widely used are the P2A system [77]. Recently, the F2A system was applied to construct a polycistronic system to produce fusaruside by co-expressing biosynthetic genes in engineered P. pastoris due to its capacity for simultaneous and efficient expression of multiple genes. The yield obtained using this system after 120 h of induction was 5× higher than that of the yield obtained after 10 days of induction [78].

By using a P2A peptide sequence instead of a 3CD (viral protease precursor protein) to terminate translation between individual capsid proteins and comparing this to protease-dependent production of enterovirus virus-like particle (VLP), Sherry et al. investigated the potential for protease-independent production VLPs in P. pastoris. They demonstrated that one of the VLPs tagged with P2A maintained their native antigenicity and are thermostable and also corroborate with VLPs produced in other expression systems [79]. Another study on producing thermophilic cellulases using a polycistronic construct established successful transcription of the genes and recombinant proteins were detected by enzymatic assay and fluorescent microscopy [80]. To the best of our knowledge, the construct with the most genes expressed in a coordinated manner is the work by Geier et al., in which nine genes were successfully expressed from a single polycistronic transcript via T2A peptides in the yeast P. pastoris [81]. Self-cleaving 2A peptides enable the formation of polycistronic sequences for gene co-expression in eukaryotes, making them the finest genetic tool for reducing the number of transcription units in pathways design. With a large decrease in the number of promoters and terminators participating in the synthetic network, it may be possible to incorporate an entire pathway or a full synthetic gene circuit into a single transcription unit [77]. Although there haven’t been many reports of 2A-based multicistronic vectors up to now due to the lack of a complete understanding and characterization of how 2A peptides function in various expression systems and especially P. pastoris, there has been a resurgence of interest in this particular DNA sequence in recent years. This strategy might be employed to engineer P. pastoris in the synchronous production of recombinant proteins.

CRISPR/Cas Techniques

The effectiveness and precision of genome editing have improved recently with the development of CRISPR/Cas technology (Clustered regularly interspaced short palindromic repeats/CRISPR-associated protein 9). Many advances have been made by utilizing the advantages of CRISPR/Cas in the bioengineering of non-conventional yeasts such as P. pastoris. To achieve effective and accurate genome editing in P. pastoris, Weninger et al. systematically improved the CRISPR/Cas9 expression system. This included but was not limited to, Cas9 coding sequences, gRNA sequences, gRNA structures (such as those with ribozyme sequences), and promoters for the expression of Cas9 and gRNAs. Only 6 constructs out of 95 combinations were discovered to be functional for genome editing, demonstrating the need for additional optimization [82]. Multiplex integration of heterologous genes is another crucial synthetic biology approach for creating P. pastoris as cell factories for heterologous proteins in addition to gene disruption. Later research improved the system's targeted integration of donor DNAs by introducing two simultaneous double-strand breaks, followed by donor DNA replacement in the intermittent region. This is especially true in the P. pastoris ku70 knockout strain, where the homologous recombination DNA repair process is enhanced by lowering non-homologous end joining (NHEJ), whereas broken DNA strand repair by NHEJ predominates in wildtype strains [83, 84]. This has recently been aided by the improved P. pastoris genome sequence, which includes over 500 corrected locations, corrected annotations, and comparative analysis of the most commonly used P. pastoris strain genomes and transcriptomes [85,86,87]. For double- and triple-locus co-integration, three high-efficiency gRNA targets (PAOX1UP, PTEF1UP, and PFLD1UP) were chosen. These were integrated simultaneously in a KU70-deficient strain of P. pastoris, with integration efficiencies ranging from 57.7 to 70% and 12.5 to 32.1% for double- and triple-loci, respectively [88].

For precise genomic changes in P. pastoris and ease of use, Gassler et al. have created a CRISPR/Cas9-based kit for gene insertions, deletions, and substitutions. The CRISPi kit from Addgene is a ready-to-use plasmid kit based on Golden PiCS modular cloning for CRISPR/Cas9 mediated genome editing in P. pastoris [89]. Another recent study reported multiplexed genome integration of three glycosylation-related genes (gnt1, mns1, and mnn2) using orthogonal tRNA-sgRNA cassettes expressed via the tRNA promoter. Using this method, rapid, multiplexed engineering of a complicated phenotype was feasible, resulting in humanized product glycosylation in two successive engineering phases [90]. These methods can now be quickly applied to a wide range of applications, such as the introduction of mammalian chaperones to improve the folding of complex molecules, the creation of new protease-deficient strains to increase the yields of full-length products, the reduction and redirection of vacuolar and endoplasmic reticulum-associated protein degradation pathways, the enhancement of lipid synthesis and vesicular machinery [91,92,93,94]. This approach enables quick, marker-free genome engineering in P. pastoris, allowing for novel strain and metabolic engineering applications.

Conclusion

Pichia pastoris has established itself as a successful platform for producing biotechnologically and commercially appealing products, owing to its ability to produce a wide range of functionally active heterologous proteins, ranging from microbial proteins to complex eukaryotic proteins with numerous applications. As each protein requires the optimization of different parameters, we have elaborated on different strategies employed in the production of recombinant proteins as this process is sometimes based on trial and error experiments. Despite AOX1 being the widely used promoters, on-going study on the availability of methanol independent promoters might pave way in circumventing the disadvantages of using methanol. On the other hand, considerable discussion is given regarding the role of molecular and chemical chaperones for a better understanding of the production and secretion of proteins, as well as how these chaperones assist in reducing the degradation of proteins. Since most of the reviews talk about metabolic engineering, we focused on genetic engineering approaches such as CRISPR/Cas technology and 2A peptide system which has been a recent trend and not much has been reported. We have extensively reviewed research articles pertaining to the above said techniques in the last 3–4 years and carefully analyzed how they can be used as one of the strategies to improve protein production in the versatile cell factory P. pastoris. We believe that the moment and technology are likely more appropriate than ever to have a significant impact on bio manufacturing in the upcoming decade due to the growing demand for cost-effective recombinant therapeutic protein production.