Introduction

In the cellulose hydrolysis industry, pretreatment of lignocellulose materials and enzymatic catalysis represent two costly stages. Because enzymatic catalysis is usually performed at high temperatures, commercialized cellulase enzymes are mostly derived from thermophilic or thermostable sources or are naturally thermotolerant. To reduce costs and increase efficiency, development of new enzymes with higher levels of thermal stability and enhanced catalytic activity is needed. The extracellular thermophilic cellulase from Trichoderma reesei is used in the pulp and paper industry (Bergquist et al. 2002), whereas thermophilic glycoside hydrolases from Aspergillus spp. are widely used for commercial purposes (Mehboob et al. 2014; Moretti et al. 2012). Additionally, thermostable Dictyoglomus cellulases are also commercialized (Peacock et al. 2013).

Currently, most commercial cellulose-degrading enzymes are engineered proteins exhibiting enhanced thermal stability and activity. In a study by Escovar-Kousen et al. (2004), Cel9A from Thermobifida fusca showed increased activity by 40% following rational design of the F476Y mutation. Additionally, Vu and Kim (2012) described a 7.93-fold enhanced Bacillus amyloliquefaciens endoglucanase activity after a single amino acid substitution (E289V). Moreover, a glycoside hydrolase (GH)5 family member and thermostable endoglucanase encoded by celH from Clostridium thermocellum is a component of the cellulosome complex and used for biomass conversion and biofuel applications (Hirano et al. 2016). This enzyme was first sequenced by Yague et al. (1990), revealing its ability to hydrolyze carboxymethylcellulose, p-nitrophenyl-β-D-cellobioside, methylumbelliferyl- β-D-cellobioside, barley β-glucan, and larchwood xylan. This enzyme contains 900 residues, including a 44-residue N-terminal signal-peptide sequence, cohesion-binding site, dockerin-binding site, carbohydrate-binding domain, ion-binding site, GH-conserved domains, and Pro/Thr-rich linker regions (Yague et al. 1990). The GH domains of endoglucanase H are categorized as belonging to family 5 (Cel5E) in the N- terminal region and family 26 (Lic26a) in the C-terminal region of the protein. Moreover, its carbohydrate-binding domain shows high levels of similarity to carbohydrate-binding module family 11, with residues in the N- and C-termini unnecessary for Cel5E catalytic activity (Yague et al. 1990). Therefore, for pretreated substrates, Cel5E can perform endohydrolysis of (1 ≥ 4)-β-D-glucosidic linkages in cellulose, as well as xylanase activity, with the carbohydrate-binding module influencing its substrate specificity (Ichikawa et al. 2015). The crystal structure of the Cel5E active site contains 414 amino acids (Yuan et al. 2015) is highly similar to that of other GH5 family members, such as Clostridium cellulolyticum endoglucanases (PDB: 1EDG and 3AMC) (Wu et al. 2011). Although the Cel5E active site is responsible for endoglucanase activity, this activity was not observed in recombinant variants expressed in Escherichia coli (Yuan et al. 2015). The structure of Cel5E contains a TIM barrel (α/β)8 architecture common in other GH5 family members, with two Glu residues predicted as active site residues (UniProt: P16218). Within the TIM barrel structure, there is a semi-hydrophobic pocket comprising β2, β3, and β4 strands located near the active site and catalytic residues. Cel5E as a single catalytic subunit exhibits specific activity of ~ 300 U/mg on carboxymethyl cellulose (CMC) and 1200 U/mg on barley β-glucan (Yuan et al. 2015); however, when the enzyme is linked to a carbohydrate-binding domain, the activity of the complex can increase by up to tenfold (Ichikawa et al. 2015). Therefore, the catalytic subunit and the carbohydrate-binding domain represent two targets for activity improvement.

In this study, we targeted the Cel5E active site and performed rational site-directed mutagenesis within the semi-hydrophobic pocket in order to increase the hydrophobicity level of the pocket to (i) increase protein internal forces to enhance enzyme stability against thermal denaturation and (ii) influence active-site conformation to enhance specific activity.

Materials and methods

Software and servers

Prediction of ligand-binding sites was performed on the DEPTH server (http://mspc.bii.a-star.edu.sg/tankp/run_depth.html) (Tan et al. 2013). Prediction of protein stability following point mutation was performed using PoPMuSiC (http://dezyme.com/) (Dehouck et al. 2011), I-Mutant 2.0 (http://folding.biofold.org/i-mutant/i-mutant2.0.html) (Capriotti et al. 2005), iStable (http://predictor.nchu.edu.tw/iStable/indexSeq.php) (Chen et al. 2013), and MUpro (http://mupro.proteomics.ics.uci.edu) web servers (Cheng et al. 2006). Molecular-docking studies were performed using AutoDock and Molegro Virtual Docker (MVD) (Thomsen and Christensen 2006). Energy minimization and molecular dynamics simulations were performed using GROMACS 4.6.5 software and the GROMOS 9653a6 force field (Pronk et al. 2013). Protein root mean square deviation (RMSD) was calculated using UCSF Chimera software (Pettersen et al. 2004). Mutations were simulated on the protein structure using modeler version 9.12 software (Fiser and Sali 2003). Enzyme kinetic parameters were calculated using SigmaPlot version 12.5 (Systat Software, Inc., San Jose, CA, USA; www.systatsoftware.com).

Mutation selection

The GH from C. thermocellum is a thermostable enzyme and harboring several stabilizing regions. To predict areas responsible for thermostability, we used the SCide algorithm (Dosztanyi et al. 2003). Following a strategy to enhance thermal stability, we performed systematic mutations in the protein structure, with every amino acid position mutated to other amino acids, followed by assessment of thermal stability for each mutation. This process was performed on the PoPMuSiC web server, which is a trained neural network (Dehouck et al. 2011). The top ranked mutations predicted to enhance thermal stability by > 0.9-fold were selected for further complementary in silico validation using neural-network-based approaches. The top predicted stabilizing mutations were evaluated by I-Mutant 2.0, MUpro, and Istable, with overlapping data considered theoretically thermal-stabilizing mutations. To assess the effect of these mutations on protein structure, we generated three-dimensional (3D) models of the mutants using Modeller 9.12 software and a template structure (PDB: 4U3A). Mutated structures were optimized by the variable target-function method and the conjugate-gradient technique. Optimized mutant structures were evaluated for predictive accuracy using the Q-Mean (Benkert et al. 2008) and ProSA (Wiederstein and Sippl 2007) algorithms. Verified structures were used for protein-ligand-interaction studies.

Molecular simulation

The cellulose crystal structure was obtained from a complex structure (PDB: 3AOF). Cellulose was docked to the active site of the wild-type and mutant Cel5E structures in the same conditions and coordinates, which were obtained by superimposing our modeled structures onto the crystallographic structures of Thermotoga maritima Cel5A in complex with a cellobiose substrate (PDB: 3AMG) and T. maritima Cel5A in complex with mannotriose (PDB: 3AOF). Additionally, a computational method using the DEPTH server was used to determine the exact coordinates of the binding site (Tan et al. 2013). In the predicted binding site, E137 and E242 as catalytic residues were present. Molecular docking simulations were performed using MVD and PYRX software (Dallakyan and Olson 2015). MVD includes the MolDock (Thomsen and Christensen 2006) and PLANTS (Korb et al. 2009) scoring functions, and PYRX is a graphical interface for Autodock and Autodock Vina, which use a Lamarckian genetics algorithm as the scoring function (Morris et al. 2009). Charges were calculated and added to the protein and ligand by MVD, and the best-designed structures underwent molecular dynamics simulations using GROMACS version 4.6.5 and the gromos9653a6 force field (Hess et al. 2008). The ligand structures were parameterized by the PRODRG web server (Schuttelkopf and van Aalten 2004), and the force field simulated a dodecahedral water box with a 1.5-nm distance used for the solvation step. The system was neutralized using Na or Cl ions, and after a 50-ns simulation, binding affinities were calculated using the g_mmpbsa package (Kumari et al. 2014).

Bacterial strains, plasmids, and culture

E. coli DH5α cells were used for cloning, and E. coli BL21 (DE3) sigma CMC0014 was used for protein expression (Sigma-Aldrich, St. Louis, MO, USA). Cells were cultured in Luria-Bertani (LB) broth supplemented with 100 μg/mL kanamycin at 37 °C until the optical density at 600 nm (OD600) reached 0.6. The culture was centrifuged at 2000g for 5 min, and the supernatant was replaced with an equivalent volume of fresh LB medium. To induce recombinant protein expression, 0.5 mM isopropyl-β-thiogalactopyranoside (IPTG) was added, followed by incubation at 30 °C for overnight. The pGH plasmid (GV0108-A; Generay Biotech, Shanghai, China) was used as the cloning vector, and the pET-26b(+) vector (Novagen; Merck, Darmstadt, Germany) was used as the expression vector.

Gene cloning and mutagenesis

The amino acid sequence of the GH enzyme from C. thermocellum obtained from the NCBI database (accession No. 500163419) included the catalytic domain, with both N- and C-terminal linkers added to the catalytic core of the protein. Residues 325 through 630 were selected for gene cloning and expression studies. E. coli-optimized codons were generated using the OPTIMIZER web server (Puigbo et al. 2007). The optimized gene was inserted into the pGH vector using the cloning sites NcoI-XhoI. We performed site-directed mutagenesis by overlap extension using the polymerase chain reaction (PCR), as described previously (Ho et al. 1989). Mutant primers were designed using SnapGene software (GSL Biotech, Chicago, IL, USA). The mutant segments were confirmed by sanger sequencing and cloned into the pET-26b(+) vector using the same NcoI-XhoI sites, followed by transformation into E. coli BL21 (DE3) competent cells.

Expression and purification of wild-type and mutant proteins

Wild-type and mutant genes were cloned at a point immediately following the PelB leader sequence. Transformed E. coli BL21 (DE3) cells harboring the pKHT-26-Native (GenBank: MG717496), pKHT-26-N94A (GenBank: MH404258), pKHT-26-N94W (GenBank: MG717499), pKHT-26-E133A (GenBank: MH404257), pKHT-26-N94F (GenBank: MG717498), and pKHT-26-E133F (GenBank: MG717497) recombinant plasmids were cultured in 10 mL LB medium containing kanamycin (100 mM). The starter cultures were incubated at 37 °C overnight and used to inoculate 100-mL primary cultures, which were then incubated at 37 °C to an OD600 of 0.6, followed by addition of 0.4 mM IPTG to induce protein expression at 30 °C overnight. Cells were separated by centrifugation at 4 °C, and the supernatant was collected. For purification of wild-type and mutant enzymes, nickel-affinity chromatography was performed by adding binding buffer (10 mM imidazole, 300 mM NaCl, and 50 mM NaH2PO4) to the collected supernatant and loading the supernatant onto a 3-mL nickel-nitrilotriacetic acid column equilibrated with the same binding buffer. The column was washed with wash buffer (20 mM imidazole, 300 mM NaCl, and 50 mM NaH2PO4), and products were eluted with elution buffer (250 mM imidazole, 300 mM NaCl, and 50 mM NaH2PO4). Eluted products were evaluated by sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE).

Enzyme activity assay

Enzyme activity was assessed using the 3,5-dinitrosalicylic acid (DNS) method using CMC-Na (Sigma Aldrich) and barley β-glucan (G6513; Sigma-Aldrich) as substrates. Purified enzyme (100 μL) was added to 900 μL of substrate solution containing 1.0% (w/v) CMC-Na diluted in 50 mM phosphate buffer (Gusakov et al. 2011). The reaction mixture was incubated at 60 °C for 10 min, and the enzymatic reaction was terminated by adding 1 mL DNS, followed by boiling for 5 min. Released reducing sugars were determined by measuring the absorption of the solution at 540 nm. One unit was defined as the amount of enzyme required to release 1 μmol of glucose-reducing-sugar equivalents per min. The activities of the wild-type and mutant enzymes were measured in a temperature range of 40 to 80 °C and a pH range of 3 to 11.

Results

Sequence and structure analyses of the GH5 endoglucanase from C. thermocellum

Because the catalytic core of the GH5 endoglucanase from C. thermocellum (residues 344–604) is not expressed in soluble and active forms of the enzyme (Yuan et al. 2015), we added N- and C-terminal linker regions to enhance enzyme solubility and activity. Therefore, a sequence containing 306 amino acids (325–630) was used for this study. Although the N- and C-terminal linkers were added to the catalytic core, the selected sequence contained neither the N-terminal glycosyl hydrolase family 26 conserved domain (conserved domain database: 304502) nor the C-terminal carbohydrate-binding domain family 11 region (conserved domain database: 281426). After re-numbering the residues, the catalytic residues in the active site of our truncated protein included residues Glu137 and Glu242. Superimposition and other analytical methods indicated that the binding site coordinates were as follows: X, − 0.92; Y, 35.29, and Z, − 5.66. Additionally, residues present in and around the substrate-binding site were predicted as Asn277, Trp275, Glu137, Tyr173, His196, Trp174, Asn175, Tyr197, Asp199, Phe195, Tyr201, Met245, Tyr198, His205, Met287, Glu242, Asn285, His97, His96, and Asn136. The structure of the truncated catalytic core comprised six α helices and two partial α-helix-like structures in the N- and C-terminal regions along with eight β sheets (Fig. 1a). Assessment of predicted structure quality returned a Z-score of − 8.77, indicating that our predicted 3D model had X-ray diffraction quality and was similar to that found in crystallographic structures (Fig. 1c, d).

Fig. 1
figure 1

The predicted structure of the truncated form of glycoside hydrolase GH5 from Clostridium thermocellum using meta-prediction approach. a The structure of truncated enzyme consisting of six alpha helixes plus two partial alpha like structures in the N and C terminal regions with eight beta sheets. b The energy profile per residue of predicted model that is calculated by QMEAN. c The assessment of predicted structure quality by ProSA Z score of − 8.77 indicated that our predicted 3D model has X-ray diffraction quality. d QMEAN, DisCo, and QMEANDisCo scores indicated that the predicted model has x-ray diffraction quality

Mutagenesis

For rational design, we theoretically predicted candidate substitutions using pre-trained artificial neural networks, which were chosen because the data used for their training were experimentally verified. Among the predicted mutations, we selected substitutions exhibiting reduced fluctuations within the Cel5E hydrophobic pocket, as previously described (Badieyan et al. 2012; Pikkemaat et al. 2002). The resulting predicted thermal-stabilizing mutations are described in Table 1. The 3D conformers of the top 50 mutants were predicted and optimized, followed by quality verification by Z-score analysis (data not shown). Mutant structures were then used for molecular-docking analyses performed in a water box equilibrated with neutralizing ions. Among the top mutations predicted to increase both thermal stability and ligand-binding affinity (Table 1), we selected residues at positions 94 and 133, which were unanimously predicted as hot spots for thermal-stability modification. Mutations resulting in considerable enhancement in both ligand-binding affinity and thermal stability included E133F, E133A, A94A, N94W, and N94F. These variants underwent a 50-ns molecular dynamics simulation, with Fig. 2 showing the backbone RMSD plot of the wild-type and each of the mutant variants. Results showed that the simulated constructs reached an equilibrated state within 50 ns, and that the ligands were stable within the mutant proteins. We then calculated ligand-binding affinity, including in vacuo potential energy, Van der Waals forces, electrostatic interactions, and net non-bonded potential energy between protein and ligand, as well as the polar solvation energy of the protein-inhibitor complex (Fig. 3 and Table 2). The binding affinity was − 64.964 KJ/mol for the wild-type variant as compared with − 176.148, − 200.921, − 120.038, − 101.26, and − 123.76 KJ/mol for N94F, N94W, E133F, N94A, and E133A, respectively, indicating enhanced ligand-binding affinities in all of the mutants. Similar results were obtained using an alternative method (MVD; Table 2).

Table 1 Predicted protein stability changes upon point mutation
Fig. 2
figure 2

The root mean square deviation of native, E133F, E133A, A94A, N94W, and N94F after 50 ns molecular dynamics simulation by GROMACS package using gromos 9653a6 force filed. a The backbone RMSD of native model within 50 ns MD simulation reached plateau state. b The backbone RMSD of E133F mutant reached plateau state during 50 ns MD simulation and indicated similar fluctuations in protein backbone like native model. c The backbone RMSD of N94F mutant indicated similarity to native model and reached plateau state during 50 ns MD simulation. d N94W model reached plateau state after 35 ns MD simulation. e N94A model has reached plateau state after 20 ns. f E133A model reached plateau state after 35 ns

Fig. 3
figure 3

Energy analysis of ligand binding in the active site of truncated structure of glycoside hydrolase GH5 from Clostridium thermocellum. a Apolar solvation energy of native protein, black: protein energy, red: ligand energy, green: protein-ligand energy. b Lennard-Jones and coulomb binding affinity of ligand-native model. black: coulomb energy, red: Lennard-Jones energy. c Apolar solvation energy of N94F mutant. d Lennard-Jones and coulomb binding affinity of ligand-N94F model. e Apolar solvation energy of N94W mutant. f Lennard-Jones and coulomb binding affinity of ligand-N94W model. g Apolar solvation energy of E133F mutant. h Lennard-Jones and coulomb binding affinity of ligand-E133F model. i Apolar solvation energy of N94A mutant. j Lennard-Jones and coulomb binding affinity of ligand-N94A model. k Apolar solvation energy of E133A mutant l Lennard-Jones and coulomb binding affinity of ligand-E133A model

Table 2 Kinetic parameters and theoretical binding energy of native and mutant models calculated by MolDock and g-mmpbsa

Expression, purification, and enzyme assays

In vitro experiments were performed in order to verify the predicted increases in specific activity of the mutant enzymes. Mutant primers (Table 3) were used for overlapping PCR, followed by transformation and expression of the wild-type and mutant variants in E. coli BL21 (DE3) cells. The recombinant proteins were isolated and purified by nickel-affinity chromatography and assessed by SDS-PAGE (Supplemental Fig. S1). Purified proteins were used for specific activity assays. To determine the optimal assay conditions, wild-type and mutant enzymes were subjected to pH ranges of 3 to 11 (Fig. 4), and temperature ranges from 50 to 70 °C, as the enzyme is naturally thermostable, with optimal activity observed at 60 °C (Yuan et al. 2015). The pH assay indicated maximal activity at 537 U/mg for the wild-type enzyme at pH 7 in the presence of CMC and 1164 U/mg in the presence of barley β-glucan. Notably, this activity was obtained in a single subunit comprising the catalytic core of Cel5E and excluding the carbohydrate-binding domain. Although the activity of our truncated variant was 1.7-fold higher than previously reported truncated forms (Yuan et al. 2015), the carbohydrate-binding domain can increase the overall activity of the enzyme by up to tenfold (Ichikawa et al. 2015). Moreover, at acidic pH levels of 3 and 4 and alkaline pH levels of 10 and 11, enzyme activity was undetectable in wild-type Cel5E. Interestingly, low levels of CMCase activity (39 U/mg) were detected in the E133F mutant at pH 10. It is possible that substituting Glu133 with a phenylalanine resulted in an overall positive charge of the enzyme. A previous study reported altered activity across different pH ranges (Mahadevan et al. 2008), where five single amino acid substitutions resulted in a shift in the optimal pH from 5 to 5.4 in Cel5A from T. maritima. Although considerable variation in the overall charge of a protein occurs at pH 10, we speculated that modification of the hydrophobic pocket in the core of the protein might increase the hydrophobicity of the enzyme core, thereby leading to increased enzyme stability at an alkaline pH. In all of the mutant enzymes, we observed 4% activity at pH 10 and pH 4 relative to no activity observed in the wild-type enzyme. This result was likely due to the structural stability promoted by alteration of the hydrophobic pocket located at the center of the enzyme.

Table 3 The descriptions of mutant primers, which used for mutagenesis purpose. For mutagenesis purpose, two PCR steps has been applied
Fig. 4
figure 4

The activity vs. pH diagram of native enzyme in comparison with mutant types tested in CMC (a) and barley β glucan (b). The optimum pH was detected as 7 for native enzyme as well as mutant types. The PI of the native enzyme was estimated as 5.14 while E133F mutant was 5.21. In addition, the net charge of native, N94W and N94F in the pH of 7 was estimated at − 10.8 while the E133F mutant reached − 9.8. In all of the mutant enzymes, an un-ignorable activity around 4% was observed in both strict alkaline (pH = 10) and strict acidic (pH = 4) conditions

Thermal stability and enzyme kinetics

To measure residual activities, the purified enzymes were incubated in 0.1 M phosphate buffer for 4 h at the optimal pH 7 and temperatures (60 and 62 °C). The wild-type enzyme retained its residual CMCase activity (~ 88%) at 60 °C and 86% at 62 °C, whereas activity in the presence of barley β-glucan was 87 and 82% at these respective temperatures. For the N94W variant, residual CMCase activity was 95 and 92% at 60 and 62 °C, respectively, and 94 and 91% in the presence of barely β-glucan at these respective temperatures. This result indicated that the N94W mutant showed higher thermostability at elevated temperatures relative to most GH enzymes. Additionally, N94F N94A retained residual CMCase activities of 94 and 91% and 91 and 90% at 60 and 62 °C, respectively, and 91 and 89% and 90 and 88%, respectively, in the presence of barley β-glucan at the respective temperatures. For E133F and E133A, the residual CMCase activities at 60 and 62 °C were 94 and 93% and 87 and 84%, respectively, and 92 and 91% and 86 and 83%, respectively, in the presence of barley β-glucan and at the respective temperatures. Results for the kinetics parameters Km and Vm are shown in Table 2.

To calculate the temperature allowing maximal enzymatic activity (Tmax) and the temperature at which 50% of the maximal activity of an enzyme is retained (T50), activity at a temperature range of 40 to 80 °C was assayed (Fig. 5). The N94F mutant showed similar activity as that of the wild-type enzyme, with Tmax values in the presence of CMC (541 U/mg) and barley β-glucan (1168 U/mg) of 60.4 and 60.7 °C, respectively, and T50 values for CMC (270 U/mg) and barley β-glucan (584 U/Mg) of 51.6 and 50.4 °C, respectively. The N94 W variant showed Tmax values in the presence of CMC (693 U/mg) and barley β-glucan (1517 U/mg) of 60.5 and 60.6 °C, respectively, and T50 values for CMC (346 U/mg) and barley β-glucan (758 U/mg) of 51 and 50.9 °C, respectively. Both enzymes lost between 2 and 6% activity following increases in reaction temperature of 2 °C, and at 66 °C, 8–43% loss of activity was observed for all enzymes.

Fig. 5
figure 5

Activity vs. temperature diagram of native enzyme in comparison with mutant types indicated that all of the mutant enzymes could reach higher activity levels in the optimum temperature of 60 oC in both CMC (a) and barley β glucan (b). N94W has indicated a unique thermal stability pattern and could reach its optimum CMCase activity at 62 °C. Although E133F could not reach higher activity levels than N94F in the optimum temperature, its stability pattern indicated higher levels of activity in 64–70 °C

E133F exhibited 1.16-fold higher activity relative to the wild-type enzyme and Tmax values of 60.7 °C for both CMC (634 U/mg) and barley β-glucan (1306 U/mg) and T50 values of 50.1 and 50.2 °C for CMC and barley β-glucan, respectively. Additionally, the activity of the E133F variant suggested better thermal stability at 66, 68, and 70 °C relative to both the wild-type and N94F variants. The E133A variant showed Tmax values in the presence of CMC (331 U/mg) and barley β-glucan (641 U/mg) of 60.4 and 60.1 °C, respectively, and T50 values for CMC and barley β-glucan of 49.7 and 50.3 °C, respectively. The N94A variant showed Tmax values in the presence of CMC (623 U/mg) and barley β-glucan (1314 U/mg) of 60.6 and 60.8 °C, respectively, and T50 values for CMC and barley β-glucan of 50.4 and 50.7 °C, respectively. N94W showed a unique pattern associated with both thermal stability and catalytic activity, with CMCase activity reaching 1014 U/mg at a Tmax value of 62.3 °C, which was 2.3 °C higher than that of the wild-type enzyme. Interestingly, the Tmax value in the presence of barley β-glucan reached 1714 U/mg at 60.9 °C, with T50 values for CMC and barely β-glucan of 52.1 and 52.4 °C.

Discussion

The results of point mutations were comparable with those previously reported for protein engineering Cel12B from T. maritima, with results reaching 1.44- and 1.43-fold higher activity levels in the E225H and K207G variants, respectively. Moreover, the E225H/K207G double mutant and the E225H/K207G/D37V triple mutant exhibited 1.77- and 1.87-fold higher catalytic activities, respectively (Zhang et al. 2015). Interestingly, Vu and Kim (2012) engineered the endoglucanase from B. amyloliquefaciens by random and side-directed mutagenesis, finding a E289V variant enhanced enzyme activity by 7.93-fold (Vu and Kim 2012), which was comparable with our findings in Cel5E, where enhancing the hydrophobic core of the protein was augmented by mutating N94 and E133 to hydrophobic residues. A previous study by Miyazaki et al. (2006) on Bacillus subtilis family-11 xylanase reported that an N8F mutation increased hydrophobicity, thereby enhancing thermostability. However, potential steric hindrance created by the substituted residue is an important factor. Because our selected pocket for site-directed mutagenesis was located near the active site, such alterations would likely influence the active site, causing variation in specific activity. Although substitution with a hydrophobic residue in this region-enhanced enzyme thermostability, side-chain size was also a factor. Kellis et al. (Kellis et al. 1988) reported that in the ribonuclease from B. amyloliquefaciens, reducing the size of hydrophobic side chains significantly decreased enzyme thermostability. Therefore, we used phenylalanine and tryptophan for our substitutions in order to maximize potential hydrophobic interactions. Although interactions within the protein core in an aqueous environment is complex, it is an accepted principle that thermal stability will increase because of increasing hydrophobic interactions within the protein core (Arakawa and Tokunaga 2004; Kellis et al. Kellis et al. 1988; Privalov and Gill 1988; Radhakrishna et al. 2013).

Structural analysis of mutant enzymes

In this study, we substituted polar residues with hydrophobic residues within a hydrophobic pocket. Additionally, alanine was used as a small side chain in order to expand the volume of the active site. Replacements with hydrophobic residues increased the volume of this pocket, which might have increased internal interactions at the substitution sites. At position 94 located in the middle of the β3 strand, surrounding residues within a 4-Å radius included Asn136, Met135, Ser95, Glu133, Pro54, Val55, Ile53, Ile93, and Ile92, which are mostly hydrophobic. At position 133, the Glu residue is located within a semi-hydrophobic pocked surrounded by Asn94, Thr194, Arg52, Ser95, Ile134, Ile168, Ile92, Ile93, Phe132, Leu131, Met135, and Asn136. Interestingly, Glu137, which is a catalytic residue, is not located within 4 Å of Glu133. As shown in Fig. 6, E133 and N94 are located in close proximity. Theoretically, substitution of a polar residue with nonpolar residues would enhance internal stability of this area. As shown in Supplemental Fig. S2, the root mean square fluctuation (RMSF) plot of the wild-type protein was 0.1778 nm around N94 and 0.1683 nm E133, which is not in equivalence with their nearby residues and suggests that they are sites of instability in the protein. The N94F and N94W mutations resulted in decreased RMSF values of 0.0619 and 0.0615 nm, revealing decreased fluctuation and increased local stability. This finding was comparable with the results of a study by Xie et al. (2014) showing that the half-life of lipase B from Candida antarctica increased 13-fold following D223G/L278 M substitutions, which enhanced overall enzyme rigidity by decreasing the fluctuation of active site residues (Xie et al. 2014).

Fig. 6
figure 6

Position of N94 and E133 in a hydrophobic pocket in the middle of glycoside hydrolase GH5 from Clostridium thermocellum. By substituting polar amino acids with nonpolar, the hydrophobic pocket was expanded and lead to higher levels of thermal stability

Additionally, the RMSF associated with E133 was 0.1683 nm and changed to 0.058 nm in the E133F mutant. This supported our hypothesis that local stability can be enhanced by expanding the hydrophobic pocket. Moreover, the RMSF pattern observed between wild-type and mutant enzymes was similar; however, the mutant enzymes showed unique levels of variation, suggesting that the mutations affected both the local pocket and their surrounding area, resulting in a global effect on protein structure through altered rigidity and flexibility. As shown in Supplemental Fig. S2, decreased variations in flexibility and rigidity were observed in the mutant enzymes at positions 144, 214, and 282 and was accompanied by increases in RMSF values associated with several positions. Although there are correlations between thermostability and protein rigidity, thermostability is a multifactorial phenomenon, with compactness, overall hydrophobicity, helical content, total charge, and exposed surface area key factors (Nick Pace et al. 2014).

Structure comparison of wild-type Cel5E with Cel5E mutants

Differences in the stability of enzyme homologs can occur as a result of relatively few sequence variations (Eijsink et al. 2004). Because we targeted both enzyme thermostability and activity, we incorporated point mutations in order to influence enzyme activity. Superimposition of wild-type and mutant structures was performed using the average structure extracted from the final 10 ns of the equilibrated state of the molecular dynamic simulation as the reference structure for the final functional form of the enzyme. As shown in Supplemental Fig. S2, the overall structure did not undergo alterations; however, local changes in the hydrophobic pocket represented the primary source of altered thermostability. Moreover, improved catalytic activity would also be expected because of increased ligand-binding affinity. Interestingly, the distance between the catalytic residues E137 and E242 was 5.19 Å in the wild-type enzyme, whereas this distance in the N94F, N94W, and E133F variants was 5.17, 6.28, and 5.26 Å, respectively, which is comparable with the average distances observed in GH5 enzyme. This suggested that the mutations changed the active site cavity, but maintained the distance between key residues within a functional range. Moreover, based on our enzyme activity assays, these structural alterations to the hydrophobic pocket influenced the distance between E137 and E242 and catalytic activity. The size of the active site cavity in the wild-type enzyme was 65.7 Å, whereas that of the N94F, N94W, and E133F variants was 69.4, 72.3, and 64.1 Å, respectively. For E133A, the distance between these catalytic residues decreased to 4.75 Å in E133A mutant relative to the wild-type enzyme, and the local RMSD value (residue 125–140) differed from that of the global value (0.3994 vs. 0.16238 Å). Therefore, it is probable that structural alteration of the active site explained the decreased catalytic efficiency of the E133A.

In summary, we generated three mutant and truncated forms of Cel5E that showed enhanced CMCase levels in acidic and alkaline pH conditions. Molecular dynamics simulations revealed that the mutant enzymes harbored an expanded hydrophobic pocket, which enhanced overall enzyme thermostability. Furthermore, this structural alteration improved the catalytic activity of most of the mutants. These results described three novel engineered enzymes available for commercial applications.