Introduction

Enzymes produced by living cells are molecular machines, which catalyze chemical reactions. They are tools of nature and are specific in their action with high efficiency under mild conditions [1]. What is more, enzymes can carry out reactions which are challenges for chemical synthesis. Because of these characteristics described above, functional enzymes have been commercially applied in many fields, such as textile, food, detergent, paper, and medical industries [2,3,4,5,6]. Important industrial enzymes and their application conditions are shown in Table 1.

Table 1 Industrial applications of enzymes

Microorganisms are the primary source of industrial enzymes. Enzymes can be obtained by different microorganisms, such as bacteria, fungi, and also from animals and plants. In industrial production, the majority of enzymes are produced by a very limited number of microbial species, of which fungi and bacteria, especially Aspergillus, Trichoderma, Bacillus and yeast, dominate the chief position. For example, amylases, which catalyze starch hydrolysis, can be produced by various microorganisms, but in commercial applications, Bacillus spp. are commonly used, such as B. stearothermophilus, B. licheniformis, and B. amyloliquefaciens [17]. Fungi, bacteria, yeast, and Streptomyces contribute 60%, 24%, 4%, and 2%, respectively [18].

The industrial application prospects of enzyme are very attractive, not only due to their high efficiency and specificity, but also due to their environmental-friendly features, which can minimize waste generation. However, microorganisms used to produce industrial enzymes are mainly obtained through screening from natural environments. The performance of enzymes produced in the natural environment usually cannot meet the needs of the industry, and their properties have to be optimized. Thus, there are still some problems limiting the application of industrial enzymes, such as thermal stability, pH stability and organic solvent tolerance. Classical strategies such as immobilization, additives or process engineering have been used to overcome these problems (Fig. 1), but these techniques could provide only limited improvement. The recent advancements of different technologies and methods have given enormous guidances to speed up the process. Among these techniques and methods, DNA recombinant technology and protein engineering play vital roles in efficient enzyme production [19]. Common strategies used in molecular engineering of industrial enzymes include direct evolution, site-directed mutagenesis, terminal fusion and truncation [20,21,22,23]. However, each of these has its own disadvantages. For example, the mutation library of direct evolution is so large that it results in low efficiency of screening; to do site-directed mutagenesis, we need to know the structure–function relationships of target enzymes [20].

Fig. 1
figure 1

Schematic diagram of the biocatalytic process with multiple approaches. This figure shows the enzyme modification approaches including process engineering, enzyme immobilization and protein engineering

Nowadays, the increase of computational power makes the biomolecular simulations a useful tool to understand enzyme mechanisms and create improved or novel enzymes [24, 25]. Computational engineering has advanced our knowledge of protein function and afforded new perspectives to functional enzyme design. These additional knowledge can help us to design or evolve enzymes with high activity, environmental tolerance, specific catalysis abilities and stereo/enantioselectivity (Fig. 2). Commonly used enzyme modification design strategies include rational design and semi-rational design, and de novo design is a newly developed strategy to create enzymes catalyzing novel reactions, which will be a trend in the future (Table 2). This review aims to summarize the current advances of protein design methods in molecular engineering of industrial enzymes and discuss their possible future applications.

Fig. 2
figure 2

The process of designing or evolving enzymes with high activity, environmental tolerance, specific catalyze abilities and stereo/enantioselectivity. This figure shows a flow diagram of enzyme design from structure analysis to hot-point screening and then getting the novel enzymes with enzyme design strategies

Table 2 The design strategies, methods and optimization target of enzymes

Rational enzyme design

Enzymes selected and evolved in natural environments cannot meet the requirements of artificial production environment, such as high catalytic activity, substrate specificity and environmental adaptability. Most enzymes used as catalysts need to be modified and optimized before they are applied in industrial production. Wei et al. used an immobilization method to enhance the activity and stability of multiple enzymes, which offers excellent biocompatibility. They introduced a facile co-immobilization of glucose oxidase (GOx) and horseradish peroxidase (HRP) in the matrix of Cu (copper) and guanosine 5-monophosphate (GMP), which increased 40% of the stability at pH 3 and more than 70% at 90 °C [41]. Ribosomally synthesized and posttranslationally modified peptides (RiPPs) are a class of natural products with diverse structural features and biological activities. Mitchell et al. demonstrated that ranthipeptide-forming rSAM enzyme PapB is intolerant to expansion and contraction of the thioether motifs and defined its intolerant acceptor position through a mass spectrometry-based activity assay. Then they increased the activity of PapB by in vitro recombination, which provided insights into ranthipeptide biosynthesis [42]. With these deep researches of enzyme production, different enzymes with high stability and wide substrate specificity have been found, which leads to an increasing understanding of the mechanism of enzymes to maintain their stability and specificity. Rational design is based on understanding of protein structures and functions of enzymes. According to the structural information of protein and the sequence comparison of homologous protein, amino acid residues are rationally selected as the target. Combined with site-specific mutation technology, key amino acids are mutated to screen the target mutants. Currently, the widely used rational design methods include: design strategy based on protein conformation and design strategy based on computer simulation.

Analysis of different binding domains of enzymes based on their structure, function and catalytic mechanisms can give effective guidances in modification of enzyme catalytic activity, substrate specificity and stability [43]. In the early 1980s, Winter et al. used recombinant tools, such as oligonucleotide synthesis, DNA amplification and site-specific cutting and pasting, to modify native enzymes successfully for the first time [44]. These recombinant tools enabled researchers to deliberately and precisely substitute specific amino acid residues [45]. Arnold raised the idea of directed evolution of enzyme, and proposed remodeling the natural enzyme by replacing amino acid residues in 1993 [46]. Recently, many enzymes have been modified to enhance their activity and stability by rational design. For example, Jakobrinnert et al. [47] analyzed the protein structure (PDB: 1r37) of carbonyl reductase (CPCR2) of Candida parapsilosis. They found that A275 was located at the dimer interface, which was close to the active center and contained a subunit hydrogen bond inside. Then, a mutant strain was obtained by site-directed mutagenesis at A275, which improved its enzyme activity and stability in the two-phase medium by 1.5 times. In addition to site-directed mutagenesis, there were also reports of rational truncation, truncation or replacement of a flexible loop to improve the stability of enzymes. Hauer et al. [48] analyzed the structure of NCR reductase, and obtained a loop with high B-factor value. B-factors (atomic displacement parameters) from crystal structures were analyzed by the B-FIT method and were considered as an important index to reflect the structural flexibility and dynamics of proteins, and thus were applied to predict the flexibility and thermal stability of proteins [48]. After four amino acids were cut off, the activity of mutant enzyme at 45 °C was 84.6% higher than that of wild type, and the stability in organic solvent 2-methyl-2-pentenal was increased by 54.1%. By analyzing the structure of phospholipase D, Iwasaki et al. [49] found a highly flexible loop on the surface of the enzyme. The thermostability of the mutant enzyme at 70 °C was increased by 11.7 times by cutting off the whole loop. Simulation analysis showed that the removal of the loop ring improved the conformational stability of the enzymes. On the other hand, loop replacement was also reported to improve the stability of the enzyme. Bornscheuer et al. [50] replaced the “cap region” of thermostable esterase BsteE with the corresponding region of its homologous thermostable esterase BsubE, and increased the dissolution temperature of BsteE by 4 °C. Thus, the stability and catalytic properties of enzymes can be effectively improved by rational design based on protein conformation.

The above rational design is usually easily possible to remodel the active site. However, due to the complexity of enzymes’ structures, deeper understanding of the conformation of enzymes is needed to help researchers to predict their effects. With the development of X-ray crystal diffraction technology, more and more protein structures have been resolved. The total number of resolved enzymes is still small, which greatly limits the further development of rational design. In recent years, the development of computer simulation technology has made up for this deficiency. Homologous structure can be used as a template when there is no information about its three-dimensional structure. Molecular simulation technology, to predict structures of enzymes, has been gradually applied to the rational design of enzymes [51]. Gao et al. [52] simulated the three-dimensional structure of thermophilic xylanase AoXyn11A and its homologous thermophilic enzyme EvXyn11TS by molecular dynamics, and then 31 amino acids at the N-terminal were compared, which showed large difference in B-factor value. After the N-terminus of AoXyn11A from Ser1 to Gln37 was replaced by the corresponding region of EvXyn11TS from Asn1 to Asn 42, the total energy value of modified AoXyn11A was reduced from − 611.2 to − 663.2 kJ mmol−1 via molecular dynamics simulation. This result proved that the conformation of modified AoXyn11A was more stable. What is more, the enzyme activity of AoXyn11A after modification was increased by 197 times under 70 °C. Similarly, using homologous modeling and molecular docking technology, Chao et al. [53] found that the 192, 155 and 208 amino acids of cinnamyl coenzyme reductase (CCR) of Populus tomentose formed the substrate binding region. Then a mutant F155y, obtained by site-directed mutagenesis, showed 4.7 times higher catalytic efficiency than that of wild type. Thus, enzymes could also be modified through the computer simulation-based rational design strategy when the three-dimensional structure of the target enzyme protein is not clear enough. However, due to the limitation of delivery efficiency, it may be difficult to evaluate the phenotype of the enzyme variants obtained from traditional protein engineering methods, which made the construction of screening methods particularly important for the rational design of enzymes [54]. Recently, the CRISPR/Cas9 technology has achieved great progresses and has been applied for the engineering of microbial strains [55, 56]. Due to its simplicity and efficiency, CRISPR/cas9 is an ideal tool, which can integrate gene variants of the target enzyme into the chromosomes of the production strain quickly and effectively [57]. A protein engineering method combining CRISPR/cas9-facilitated engineering with growth-coupled and sensor-guided in vivo screening (CGSS) was proposed by Zeng et al. [58]. They used lambda-red recombination coupled with the CRISPR/Cas9 system to integrate the mutation library of the target gene into the chromosome with high efficiency. This growth-coupled and sensor-guided library screening methods could be combined with in vivo targeted mutagenesis [59, 60], which may trigger the development of more advanced methods.

Semi-rational enzyme design

Due to the large number of mutants in the constructed library of direct evolution, the workload of screening is extremely huge. Therefore, researchers begin to pay more attention to semi-rational design methods combining the crystal structures and catalytic properties of enzymes [61]. Semi-rational design is to randomly mutate some “hot spots” or specific areas on the enzyme molecules that play an important role in the enzyme properties to quickly obtain the mutant enzyme with changed enzyme properties. On the premise of not affecting the mutation effect, this method can reduce the size of the mutation library and significantly improve the efficiency of molecular engineering.

The combination of active site saturation test (CAST) is particularly suitable for optimizing the stereoselectivity of enzyme catalysts or expanding the substrate spectrum. The basic idea of CAST is to determine the active center of enzyme catalysis by analyzing the three-dimensional structure or homologous modeling structure of the target enzyme. Then, some amino acid residues within the determined range are selected to construct the mutation library. This method was first applied to the enantioselective modification of epoxide hydrolase from Aspergillus niger [62], in which 15 amino acid residues around the active pocket were selected, and 6 mutation libraries were constructed by CAST method to obtain a high E-value mutant. CAST requires a certain understanding and analysis of the three-dimensional structure and function of the enzyme to identify the amino acids that make up the substrate binding pocket. At the same time, a more accurate high-throughput screening method is also needed to screen the mutation library. Compared with random mutation, the size of the mutation library of CAST is greatly reduced [33]. This is one of the trends of protein evolution: building a small and precise mutation library.

In recent years, great progress has emerged in high speed and intelligent tools for excavating the detailed structural and functional roles of proteins, such as InterProScan5 [63] and PFAM search [64], which greatly improves our ability to define “hot spots” or specific areas of certain enzymes. To further reduce the screening workload and improve the modification efficiency, researchers have developed a series of new methods based on amino sequences or three-dimensional structure, such as single codon saturation mutagenesis (SCSM) [34] or triple codon saturation mutagenesis (TCSM) [35]. The constructed mutation libraries with these new methods are smaller, which improves the screening efficiency. Three codon saturation mutagenesis is to select three kinds of amino acids as the target amino acids of saturation mutagenesis by analyzing the sequence and structure information of protein. Recently, the TCSM methods have been successfully applied to the modification of the enantioselectivity, catalytic activity and stability of cytochrome P450 monooxygenase, limonene epoxide hydrolase and alcohol dehydrogenase. Alsao, the co-evolution of enantioselectivity, activity and stability of limonene epoxide hydrolase was realized by using TCSM strategy, which increased EE value to 94% (s, s) and thermal stability to 5–10 °C [35].

Molecular dynamics (MD) of proteins play an important role in the process of enzyme catalysis, which affects the key steps of substrate recognition and binding, allosteric regulation, enzyme–substrate complex formation and product release [65, 66]. Molecular dynamics simulation is a method that uses Newtonian mechanics to simulate the motion of atoms and molecules in relative space, and obtains the motion track of atoms through calculation and analysis. RMSF (root mean square fluctuation) is usually used to express the freedom degree of each atom in the molecule and then represent the conformational change of a certain part of the protein. NAMD [37], AMBER [67] and GROMACS [68] are commonly used molecular dynamics simulation software, which can be used to simulate the molecular dynamics of biological macromolecules such as proteins, sugars and nucleic acids. With the development of computer technology, MD simulations have become an effective method to improve the enantioselectivity and activity of enzymes [69].

De novo enzyme design

Recent advances in protein structure prediction, evolutionary information, protein folding [70] and computational modeling [71, 72] have strongly promoted the development of de novo protein design [73]. De novo design is a newly developed methodology to create enzymes catalyzing novel reactions. Different from the design strategy described above, de novo design aims to create a new direction that has not been discovered in the evolutionary process. The core problem of de novo design can be divided into two subclasses: the de novo design of new function, and the redesign of existing enzymes [74].

Using the program Dezymer to design a His3-Fe–O2 metal active site to obtain a superoxide dismutase-like enzyme was the first time to attempt computational protein design (CPD) techniques into enzyme design [75, 76]. Due to the complexity of molecular structure simulation, the method of computer simulation is used in protein structure design and sequence prediction. It mainly relies on the initial skeleton structure and heuristic algorithm to produce a large number of possible results. Rosetta is a set of software for protein structure calculation, modeling and analysis [72], which has been widely used in the de novo design of proteins and plays an important role in the design and synthesis of enzymes [77]. In addition, Ludwiczak et al. [78] combined Rosetta with molecular dynamics to enhance protein structure design and corresponding sequence selection. The combination method generated 20–30% more sequences than the existing method, which improved the diversity of sequence prediction and the similarity with the natural sequence. In the last 5 years, over 80 methods have been developed to solve protein design problems with Rosetta, including the prediction of protein structures; modeling protein–protein complexes; docking small-molecule ligands into proteins; modeling and designing antibodies and immune system proteins; designing new proteins and functions; designing interfaces between proteins and interaction partners; symmetric protein assemblies modeled using parametric design; modeling peptides and peptidomimetics; modeling membrane proteins; adding carbohydrates to the modeling process [79]. The above prediction ability could not only be used to analyze the existing data, but also to provide information for experiments, hence stimulate the engineering of industrial enzymes and enable the synthesis of new biomaterials [79].

Naturally existing enzymes often have some special structural characteristics, which play a decisive role in some specific functions. Therefore, how to design and synthesize enzymes with these special structural features from scratch has a great influence on the application of de novo design. Huang et al. [39] synthesized a TIM barrel protein with high-temperature resistant and reversible folding ability by a de novo design strategy. The model structure and the natural structure showed high similarity. The successful synthesis of this structure provided a new pathway for enzyme customization. Thomas et al. [80] designed a α-helix barreled protein (α-HB), which can adjust the size of the hydrophobic channel to adapt to the binding of specific target molecules. At the same time, due to the existence of thermal stability and low complex repeat sequence in its structure, the protein can withstand more mutations in its hydrophobic channel, which shows that it can be used as a potential carrier for small-molecule recognition carrier. Recently, Jiang et al. designed and constructed a pathway for the synthesis of acetyl-CoA (SACA) by recombinant glyoxalase synthase and acetyl-phosphatase synthase. They designed and engineered a glycolaldehyde synthase by de novo design to condense two molecules of formaldehyde into one glycolaldehyde with more then 70-fold higher activity [81]. Boyken et al. created a de novo six-helix bundle, which was a protein complex consisting of identical alpha helices [40]. To create a protein that could be energetically stable in more than one conformation, Backer et al. transformed an IgG-binding protein with a 4β + α fold into a 3-α fold protein with albumin-binding ability [82]. This work pushed the boundaries of de novo design by building on top of a synthetic protein [83].

Conclusion and perspectives

Herein, we have reviewed the current advance of protein design strategies in molecular engineering of industrial enzymes. From rational design to de novo design, we can find a clear logic line from the development process: initially, enzymes were isolated from nature and were used for their natural activities. Rapidly, enzymes produced in natural environment usually could not catch up with the needs of industrial production and their properties have to be optimized. Therefore, protein engineering techniques, such as site-directed mutagenesis, directed evolution, saturation mutagenesis, truncation, and terminal fusion, has become a research hotspot of researchers. At that time, due to the lack of information on the catalytic mechanism and active sites of enzymes, random mutation was the commonly used technique to modify enzymes, which required a large workload to screen the target enzymes from mutation libraries. With the development of DNA sequence and protein structure analysis techniques, the deeper understanding of the relationship between enzyme function and structure made it possible to use rational design strategy on enzyme modification. The emergence of rational design greatly alleviated many problems, such as reducing the number of mutants and providing a new method for enzyme–substrate-specific modification. In recent years, with the information of enzyme gene sequence becoming available, spatial structure and catalytic mode have been understood gradually. To further reduce the workload and dependence on high-throughput screening methods, semi-rational design strategy was developed to guide the design of smaller and more focused libraries of protein mutants. This strategy combined the information from protein structure databases and the advanced computational algorithms to make researchers focus on amino acid residues in or near the active site to effectively guide protein engineering. The newest goal for enzyme design was “you get what you want”. De novo design allowed researchers to create enzymes that can catalyze novel reactions. Moreover, the enzymes produced by the combination of computational design and protein engineering techniques now can be used in new metabolic pathways. However, one of the basic problems encountered in redesigning natural proteins to provide new functions (such as catalytic sites) is that changing a large number of amino acid residues to introduce functions will inevitably change aspects of the structure, which may influence the stability of enzymes. Also, protein designed by this method is expected to be stable in calculation, but its expression in the host cannot be guaranteed. Thus, this method requires future improvement of the energy calculation and requires the screening of more suitable criteria for protein design.

With the rapid increase of the sequencing flux and the structural data, how to dig and analyze the biological data as well as to provide a new idea and platform technology for the rational design of enzymes is the key problem that needs to be solved. Recently, machines can be trained to predict key factors for enzyme evolution by collecting data in larger amounts with better quality via more advanced experimental techniques, such as next-generation sequencing, high-throughput screening, and microfluidics. Looking forward to the future, artificial intelligence and other advanced computing tools will play an increasingly important role in the prediction of enzyme structure and its corresponding characteristics and functions, which will make the design of artificial enzyme more reasonable and accurate.