1 Introduction

Aspartyl protease (PEP4) is classified as pepsin-like aspartic proteinase with a proteolytic activity at acidic pH [1, 2]. It is a remarkable enzyme produced by various organisms including plants and animals, mainly as coagulating agents [3, 4]. Likewise, it involves in a loss and gain function of proteins, by undergoing both limited and digestive proteolysis [5]. In yeast cells, PEP4 enzymes inhabits vacuole [6]. They are basically identified by the presence of two adjacent and coplanar aspartic acid side chains at their active site. Furthermore, they are categorized generally into two sub-families which are the retro-pepsins (retroviral proteinases) that are dimers consisting of two identical subunits and the pepsin-like proteinases with two non-identical but similar lobes [7].

Interests in the structure and function of members of this group of proteolytic enzymes are due to clarification and elucidation of their mechanism in degradation and in part, to their importance in several pathological processes [8,9,10,11]. An example is the therapeutic target for inhibitor drugs for a membrane aspartyl protease that function in the cleavage of brain β-amyloid precursor protein (APP) that leads to the production of β-amyloid, where excess of β-amyloid is the major leading factor in Alzheimer’s disease [9, 12]. In the vacuolar proteolytic system of yeast cells, PEP4 is regarded as essential enzyme as it is involved in the activities of other hydrolases which include proteinase B (PrB), carboxypeptidase (CPY) and aminopeptidase [13, 14].

M. guilliermondii strain SO is a novel yeast expression host isolated from spoilt orange and used for production of heterologous proteins under the regulation of PAOX1 and PFLD1 promoter systems [15]. Its aspartyl protease (MgPEP4) has been identified in its proteome which was deposited in the GenBank (BioProject: PRJNA547962) as a pepsin-like proteinase [16].

Protein folding is a process by which protein molecules fold into its unique and functional 3-dimensional structure. In modern biotechnology, modeling techniques aided by computation techniques have been developed to bridge the gap between sequence and structure databases [17, 18]. A development in protein structure prediction has been the threading approach, which employs techniques of identifying the whole folds of a protein from the amino acid sequence by aligning the sequence with 3D structure in the PDB library [19,20,21,22]. It is a profound novel approach for assessing the fitness of a protein sequence with a given protein structural fold [23]. Phyre2 is one of the several protein threading tools on the web that is utilized to predict and analyze 3D protein structure [24].

In addition, conserved motifs of protein structure can elucidate more on the functional clues or confirm tentative functional assignments inferred from the sequence. Disulfide bridges formed by the intramolecular bonds of cysteine residues usually serve as additional covalent linkages in protein sequences, which contributes significantly to the protein stability [25]. A single disulfide bridge can stabilize the protein by 2–5 kcal/mol [25,26,27]. In 2006, Nakka et al., [28], reported the crucial role of a single disulfide bridge in the dimer interface strengthening, thus contributing to the thermal stability of a thermostable glucose-6-phosphate dehydrogenase (tG6PDH) from the hyperthermophilic bacterium Aquifex aeolicus.

In M. guilliermondii strain SO, the structure of the native MgPEP4 has not been discussed, studied or released in any detail to understand the role of this enzyme in strain SO as well as its potential industrial applications as an expression host for heterologous protein production. Thus, in this study, we aimed to describe the protein modeling of aspartyl protease of M. guilliermondii strain SO (MgPEP4) and the presence of one intramolecular disulfide bridge observed in the predicted structure as well as other conserved essential features using Phyre2 software.

2 Materials and Methods

2.1 Sequential Analysis of Aspartyl Protease (MgPEP4)

Nucleotide sequence of M. guilliermondii strain SO aspartyl protease (MgPEP4) (accession number: VMS101000005.1) was obtained from GenBank (BioProject: PRJNA547962) and was used to deduce the primary amino acid sequence using ExPASy translate server (https://web.expasy.org/translate/). Subsequently, multiple sequence alignment (MSA) of aspartyl protease amino acid sequence with PEP4 from other yeasts was carried out using Clustal Omega (www.ebi.ac.uk/Tools/msa/clustalo/). Phyre2 server (http://www.sbg.bio.ic.ac.uk/phyre2/html/page.cgi?id; Protein Homology/ analogy Recognition Engine V 2.0), was used to predict the secondary structure for MgPEP4 [24]. Analysis of the molecular weight and physiochemical properties were estimated and computed using ProtScale (http://www.expasy.org/tools/protscale.html) and the ExPASy ProtParam Tool (http://web.expasy.org/protparam/), respectively. Prediction of the N-terminal signal peptide was also performed using SignalP v4.1 [29].

2.2 Structural Prediction of MgPEP4 via Protein Threading

The mature amino acid sequence of MgPEP4 was used to predict the three-dimensional (3D) protein structure by uploading the sequence into the Phyre2 server (http://www.sbg.bio.ic.ac.uk/phyre2/html/page.cgi?id; Protein Homology/ analogy Recognition Engine V 2.0) was based on evolutionary variation patterns [24], which searched for templates against PDB entries. Furthermore, information of the secondary structure was predicted and mapped preparatory onto the alignment with the template model. Graphical presentation of all structures was performed using PyMOL Molecular Graphics system, version 1.7.4.5, Schrodinger, LLC [30] and UCSF Chimera [31]. Further template model analysis was conducted via BLAST by uploading the mature amino acids sequence of MgPEP4, where templates with evolutionary related protein structures are searched against protein database bank using protein BLAST with the protein-specific iterative (PSI-BLAST) algorithm (https://blast.ncbi.nlm.nih.gov/Blast.cgi) with functional annotations inferred as well [32].

2.3 Validation of the Predicted MgPEP4 Structure

The structure with the best confidence value as well as percentage identity was subjected to validation phase. It was assessed via ERRAT (servicesn.mbi.ucla.edu/ERRAT/) [33], Verify3D (servicesn.mbi.ucla.edu/Verify3D/) [34, 35], PROCHECK (servicesn.mbi.ucla.edu/PROCHECK/) [36, 37] and plotted in a Ramachandran diagram [38]. The results were classified based on the protein structures stretching from analysis of the overall fold of the proteins to the identification of highly specific clusters of functional residues.

2.4 Superimposition of Predicted MgPEP4 Structure with Template 1DPJ

Superimposition of the predicted aspartyl protein structure with the template (PDB ID:1DPJ) was performed using PyMOL Molecular Graphics system, version 1.7.4.5, Schrodinger, LLC [30], to determine and compare the active sites and substrate-binding site in both structures. Then, the root-mean-square deviation (RMSD) was recorded.

2.5 Molecular Docking

The predicted MgPEP4 structure was docked with the ligand that was co-crystallized with a protease molecule (PDB ID: 2RMP) [39]. The template (PDB ID: 1DPJ) was used as positive control for the analysis. The grid box was adjusted in the substrate-binding cleft. In addition, the pre-process requirements which included preparation of the protein and ligand, addition of hydrogen and energy minimization were performed before carrying out the docking experiments [40]. The torsional bonds of the ligand were free to rotate, while the protein molecules (macromolecule) were set to be rigid. The protein–ligand interaction was viewed using PyMOL [30] and LIGPLOT v 2.2 [41].

3 Results and Discussion

3.1 Sequence and Structural Analysis of Aspartyl Protease of strain SO

The PEP4 gene encoding aspartyl protease spans 1,227 bp in length and encodes a protein of 408 amino acids residues, with a predicted size of 44.2 kDa. Aspartyl protease of M. guilliermondii strain SO (MgPEP4) revealed a high percentage identity (more than 55%) of amino acid sequence with all the sequences analyzed (Fig. 1). As shown in the multiple sequence alignment (MSA) analysis of the MgPEP4, it suggested that MgPEP4 shared complementary primary features with other reported yeast PEP4 and high conservation in the functional residues and elements of the family enzymes were also revealed. The active site of a protein contained the catalytic residues which were responsible for the bio-catalytic activity and were usually conserved [42]. The two aspartic acid residues, which served as catalytic essential residues in the active site of proteinase_A_fungi superfamily [43], were well conserved in MgPEP4 in a DTG motifs in each lobe (Asp112 and Asp297) (Fig. 1).

Fig. 1
figure 1

Multiple sequence alignment analysis of the deduced amino acid sequence of M. guilliermondii strain SO aspartyl protease (MgPEP4) aligned with Komagotaella phaffii (XP_002493333.1), Saccharomyces cerevisiae (NP_015171.1), Candida albicans (Xp_713148.1), Aspergillus niger (XP_001399855.1), Candida maltose (EMG47219.1), Ogataea parapolymorpa (XP_013935327.1), Verruconis gallopava (XP_016212833.1). The active flap sequences are indicated by a red dotted line box. The conserved superfamily domain is indicated by the black dotted lines and the red box indicates the catalytic aspartic acid residues

With the aid of Phyre2 server, the secondary structure of MgPEP4 was predicted utilizing the functions of SOPMA (Self-Optimized Prediction method With Alignment) techniques that generate a better and improved rate of the second-order forecast based on the primary sequence [24]. The predicted structure constructed showed that the MgPEP4 was composed of 17% α-helix, 42% β-sheet (Fig. 2) which was similar in comparison to other yeasts PEP4 secondary structures (Table S1). At the N-terminal of MgPEP4 sequence, contains the predicted signal peptides computed with SignalP v4.1 between position 21 and 22: ADA-AV with a probability of Probability: 0.7896 which indicated that the protein can be cleaved and secreted extracellularly (Fig. 2) [44].

Fig. 2
figure 2

Secondary structure prediction of MgPEP4 using Phyre2, showing β-sheet and α-helix. The signal peptide predicted using SignalP v4.1 is indicated by the yellow arrow between position 21 and 22: ADA-AV

3.2 Structure Prediction and Validation of MgPEP4

This approach has been confirmed to be efficient in analyzing the relative importance of amino acids residues to protein function and structure [45]. The 3D structural prediction of MgPEP4 by utilizing the intensive mode of Phyre2 [24], was modeled at > 90% confidence score where 3 templates (PDB ID: 3PSG; 1QDM and 1DPJ) were selected based on heuristics to maximize confidence score, alignment coverage and percentage identity. Meanwhile, the template PDB ID: 3PSG and PDB ID: 1QDM shared lower identity with the predicted model of MgPEP4 (37% and 46%, respectively), whereas PDB ID: 1DPJ, Saccharomyces cerevisiae PEP4 [46] had a higher identity of 75% and was found to be sufficient for further analysis as validated by ERRAT, Verify3D and PROCHECK (Table S2).

Furthermore, the Ramachandran plots revealed the occupancy of residues of all the main chain angles of the predicted structure in the most favored region was 91.1% occupancy of residues in the most favored region (Figure S1). Thus, these results inferred that the validations of the predicted model were confirmed to be a suitable model for further analysis as the Ramachandran plot can detect any structural gross errors, with the most crucial signal for quality structures of the proteins which depends on the dihedral angles (φ and ψ torsion angles) [47, 48]. In the disallowed region, the occupancy of residue was found to be 0.0% (Figure S1). Therefore, the validation results depict a high-quality model predicted and confirmed by the percentage of the amino acid residues in the core which presents a better guide to stereochemical quality [49].

More so, MgPEP4 (Fig. 3) structurally appeared as a kidney-shaped bilobed globular protein majorly consisting of mainly β- strands divided into N-terminal and C-terminal domains as previously reported from other studies [50, 51]. This presents a large substrate-binding pocket between the two lobes and each having the catalytic aspartic acid residues in the conserved DTG motifs at position Asp112 and Asp297 (Fig. 3). Furthermore, the mechanism of enzyme catalysis is majorly dependent on an enzyme (substrate-binding pocket) as well as a specific substrate which forms an intermediate complex and afterward, leads to the formation of a product, due to the decomposition of the activated complex.

Fig. 3
figure 3

The predicted 3D structure of MgPEP4. The unlinked disulfide bridge between Cys331and Cys364 (red) in the C-terminal domain and single S–S bridge between Cys125 and Cys130 (cyan) in the N-terminal domain were illustrated. The two catalytic aspartic residues (Asp112 and Asp297; yellow sticks) were seen in the active site. The position of residue Tyr155 (dark blue) can be seen near the active site. The active flap is shown in green and the polyproline region is shown in purple where residues Gly156 and Glu375 are indicated with orange and light blue respectively. The structural figure was generated using the PyMOL Molecular Graphics system, version 1.7.4.5, Schrodinger, LLC

In most aspartyl proteases, a unique structural motif termed the active flap consisting of a β-hairpin loop [52, 53] and the polyproline loop which extends over the active site had been implicated in catalysis [54], thus facilitating the interaction of substrate binding in the binding pocket, is conserved in MgPEP4 predicted structure (Fig. 3). It is known to be the most flexible element of the structure in aspartyl proteinases and among different crystal structures of an enzyme, it has been reported to localize up to 8.7 Å [55]. The active flap region (green) and the polyproline loop (purple) appeared to cover the pocket of the substrate-binding site. Gly156 residue was located at the tip of the active flap as well and Glu376 was located as the hinge residue on the polyproline loop (Fig. 3).

Furthermore, in proteins, free cysteine residues do occur but most are covalently bonded to other cysteine residues to form disulfide bonds, which play a crucial role in some proteins folding and stability, especially extracellularly secreted proteins [56]. It was observed that four cysteine residues (Cys125, Cys130, Cys331 and Cys364) were conserved but only two were involved in the formation of the intramolecular disulfide bridge which built up the N-terminal loop (Fig. 3) as opposed to two or three disulfide bonds present in PEP4 in yeasts [57,58,59] and template 1DPJ [55]. More so, the absence of the linkage between Cys331 and Cys364 with 5.7 Å apart, contributed to the lower stability of the predicted structure on the C-terminal region which was worth investigating through superimposition.

Additionally, a fundamental element that underlies the activity of aspartyl proteinases is the presence of the active flap residue Tyr75 which is said to be a conserved residue for pepsin family [60]. It was also conserved in MgPEP4 sequence at Tyr155 (Fig. 1 & 3), and also the key residues that were associated with the binding pocket formation in the active site.

Some studies postulated that the orientation of the conserved tyrosine residue of the active flap of pepsin family might have a negative effect on aspartyl proteases catalytic capabilities (i.e., self-inhibition) [60]. The term ‘‘self-inhibition’’ of proteinase A described may represent only a transitional structural feature rather than a true inhibition. It could be different from the operation that aspartic proteinases have evolved for modulating or stabilizing their own intrinsic activity in the yeast cells. Interestingly, studies have reported the interaction mediated via the conserved water molecules within the crystallized structure of aspartyl proteases and the tyrosine residue of the active flap possesses functional significance. The authors suggested that upon ligand binding, a stronger network of hydrogen bond was formed, thus implicating the catalytic mechanism of aspartic proteases [61, 62]. Hence, the position of Tyr155 side chain in MgPEP4 could exhibit an operational significance as a mechanism for stabilizing the intrinsic activity, which was potentially connected to substrate capture and cleavage [54, 63]. However, this in silico prediction and hypothesis could only be verified through the in vitro protein characterization.

3.3 Superimposition of MgPEP4 and Template 1DPJ

The superimposition between MgPEP4 and the template (PDB ID: 1DPJ) showed that the RMSD value was 2.499 (Fig. 4). The structures were superimposed using the backbone carbon atoms. From the superimposition, the folds of the proteases backbone were essentially identical, with the exception of the N- and C- terminals and some loops which were regarded as the surface exposed regions. Relatively large conformational differences in these regions are supposition of mutations (deletions and/or insertions of residues) and usually present a major issue for direct RMSD calculation which regularly gives an unrealistic measure of the similarity [64, 65]. To further justify the RMSD value computed, the secondary structural fold of MgPEP4 and template 1DPJ were predicted and mapped using Phyre2 tool which aided in the positional equivalence and deviations were indicated as β-strands (twenty-four), α-helices (four) and coils (Figure S2).

Fig. 4
figure 4

Superimposition of protein structures. A Structural alignment of the MgPEP4 and template 1DPJ in gray and red, respectively. B Close-up view comparison of the conserved catalytis aspartic acid residues, where strain SO catalytic residues are indicated as yellow and template 1DPJ as magenta. Conserved cysteine residues are also aligned for both structures, where strain SO residues are indicated as blue with only one disulfide bond formation. While template cysteine residues are indicated as green with two disulfide bonds formation. The structural figure was generated using the PyMOL Molecular Graphics system, version 1.7.4.5, Schrodinger, LLC

The spatial proximity of the two catalytic Asp residues in the complex presented an equivalent position as reported in nonliganded mature aspartic proteinases [8, 46]. Conversely, the proximity of the active site pocket was highly conserved but variation was observed in the polyproline and terminal regions. Furthermore, differences in some amino acids base pair within the superfamily domain could lead to the differences in the secondary structure prediction (Figure S2).

3.4 Molecular Docking

AutoDock Vina which was compatible with MGLTools was used to compute the binding energy and interaction by molecular docking of the predicted protein structure and ligand [40].

To elucidate more on the binding affinity of ligand toward the predicted protein receptors, molecular docking [66], was performed using the classical aspartic proteinase inhibitor pepstatin A (PepA), a microbial hexa-peptide produced by Streptomyces sp. [67], complexing with Rhizomucor miehei aspartic proteinase (PDB ID:2RMP). PepA is an aspartic protease inhibitor and also known to inhibit proteases such as chymosin, renin, pepsin, HIV protease and cathepsins D and E [68]. It presents a competitive inhibition mechanism [69, 70]. Its chemical structure composes mainly of two residues of an unusual amino acid, 4-amino-3-hydroxy-6-methylheptanoic acid (statine), having the sequence isovaleryl-L-valyl-L-valyl-statyl-L-alanyl-statine (Iva-Val-Val-Sta-Ala-Sta) [69]. The protein–ligand interaction was viewed using the PyMOL Molecular Graphics system, version 1.7.4.5, Schrodinger, LLC which generated intermolecular interactions and their strengths, hydrogen bonds, hydrophobic interactions and atom accessibilities [30].

From the docked complex (Fig. 5), the ligand (PepA) interacted with the residues in the substrate-binding pocket by both hydrogen bonds and hydrophobic interactions where the contacting residues gave a binding affinity of -8.0 (kcal/mol). Furthermore, the docking analysis revealed the fitting of the PepA into the large substrate-binding cleft between the two domains of MgPEP4.

Fig. 5
figure 5

Molecular docking analysis of predicted structure of MgPEP4 and pepstatin A. The ligand (pink) was observed to be within the substrate-binding site of MgPEP4. The catalytic aspartic acid residues (Asp112 and Asp297) were involved in the hydrophilic interactions with PepA via the hydrogen bonds (yellow dotted lines) with distances of 3.6 Å respectively, besides other hydrophilic residues (Asn91 and Ser57 with distances of 3.5 Å and 2.5 Å respectively) in the binding cleft. The ligand was also observed to be surrounded by the hydrophobic residues (cyan) which were mainly made up of phenylalanine and glycine residues. All structural figures were generated using the PyMOL Molecular Graphics system, version 1.7.4.5, Schrodinger, LLC

The hydroxyl group (OH) of the statin (STA) molecule of the PepA formed hydrogen bonds with the catalytic aspartic residues (Asp112 and Asp297) present in both domain N- and C- terminals with distances of 3.6 Å, respectively. The hydroxyl group at the This phenomenon was justified by the study reported by Yang and Quail, [41] where the statin residue hydroxyl group formed hydrogen bonds with the two catalytic Asp38 and Asp237 in Rhizomucor miehei aspartic proteinase (RMP) when complexed with PepA. Moreover, this conformation mimicked the anticipated transition state of the enzyme–substrate interaction which infers the industrial application potential of MgPEP4 in the degradation of heterologous proteins when M. guilliermondii strain SO is used as an expression host. In addition, there was no large distortion of the active site because of the binding of the ligand to the enzyme.

Besides the hydrophilic interactions, other hydrophobic contacting residues of the substrate-binding cleft were Gly189, Phe194, Met160, Lys196, Ala97, Leu380, Asn91, Gly259, Leu111, Thr113, Val190, Phe192, Phe373, Ala295, Gly114, Gly156, Gln154, Tyr155, Ile208, Ala295, Ile382 and Thr300. These residues were also found to be similar to those of 1DPJ except for the difference in position. Furthermore, an in silico site directed-mutagenesis (SDM) conducted on the catalytic residue present at the N-terminal domain to leucine in MgPEP4 observed a conformational change in the substrate-binding pocket, thus suggesting the high impact of mutation on the strictly conserved and catalytically essential residues in the protein catalytic function (Figure S3) in the degradation of recombinant proteins when M. guilliermondii strain SO is used as an expression host.

4 Conclusion

In conclusion, the aspartyl protease (MgPEP4) from M. guilliermondii strain SO was a typical proteinase A that belongs to the pepsin-like/proteinase A superfamily, involved in the hydrolysis of peptide bonds. The MgPEP4 structure predicted via protein threading was highly similar with the template 1DPJ, possessing a large substrate-binding cleft between the two lobes. A single disulfide bridge was formed with two out of the four cysteine residues from the predicted structure which conferred stability on protein structures. Nevertheless, further analysis at the molecular and biochemical levels could also be done towards revealing the stability of MgPEP4. The protein–ligand interaction justified the significance of the conserved catalytic residues which could be considered a bottleneck in the M. guilliermondii strain SO as an expression host. This finding has also given an insight about the potential application of MgPEP4 as potential degradative agent in heterologous protein production as well as its potential applications in cheese-making, baking, leather, food, beverage and therapeutic target for drug inhibitors in biopharmaceutical industries. It is also recommended to develop aspartyl protease mutant of M. guilliermondii strain SO to reduce the degradative function in future expressions of recombinant proteins.