Introduction

Enzymes are protein catalysts widely distributed in all biochemical processes that range from generation of energy, the activation of other proteins for performing their functions or inactivation of harmful molecules. Enzymes are highly selective since they accept a single substrate among a great variety of small biomolecules and large biopolymers to execute their reactions [1, 2].

Naturally, errors may arise in the codification of the sequence of amino acids of an enzyme, which results in a regular tendency for mutations to occur over an indefinite period of time, consequently randomly new catalytic activities arise, allowing for protein evolution. The mutability of these molecules was a feature that could be exploited to positively transform our daily lives, and until the mid-twentieth century, thanks to the availability of novel experimental techniques, scientists have used this evolutionary approach to explore and engineer new enzymes with a novel function at a laboratory-scale, this way artificially accelerating the evolutionary process of a single enzyme from millions of years to a few weeks [3]. This involves selecting the enzyme of interest (EOI) with a specific natural activity and to optimize its amino acid sequence through different methods to create a library of enzyme-variants for function screening and selection [4, 5].

Important developments in instrumentation, induced-mutation and genome-editing techniques, algorithms and automatized processes for catalysis screening, but the limitations in this area continue to focus on understanding the structure–function of enzymes, and on efficiently predicting the changes of these two components after modification of the protein structure [6, 7]. Nonetheless, the progress in this area has been substantial enabling the possibility to tune a given enzyme for a specific application, thus lessening the boundaries in biotechnology to solve big challenges such as bioremediation; the production of new drugs, biocatalysts and biofuels; and innovating industrial processes into highly effective, low cost and environmentally friendly processes, among others [8,9,10]. Some ambitious trends are evolving entire biochemical pathways or genomes through genetic circuit design to create whole cell biocatalysts for advanced applications [11, 12].

The purpose of this review is to present the relationship between enzymatic structure and function, and how the methods of mutagenesis, protein design and selection are used to manipulate protein structure and improve the function, stability or specificity of an enzyme.

The Structural Balance Between Selectivity and Promiscuity in the Activity of an Enzyme

In biocatalysis, the induced fit model is the most accepted model of substrate binding to the active site of an enzyme to form the enzyme-substrate complex. It proposes that initially when both the enzyme and substrate bind, their interactions are weak, but soon, a conformational change in their structure occurs and their interactions grow stronger to allow the catalytic reaction. After which, the product is released, and enzyme will return to its native structure [13].

In this way, the three-dimensional structure of the enzyme-binding pocket and of the substrate pair with each other so that specific interactions occur, and the transition state is achieved faster, these features determine the rate of reaction and specificity of the enzyme. However, enzymes are flexible enough to alter their structural folding, and under physiological conditions, have more than one conformational state which affects the position of amino acids near the active site, altering the ability of the enzyme to selectively bind its substrate [14].

The function, specificity and stability of an enzyme relies on both intra-molecular (pi stackings, salt bridges, hydrogen bonds and hydrophobic interactions occurring within the molecule) and inter-molecular (occurring with solvents and their solutes such as ions and cofactors) interactions, which are fundamental for its structure [15]. Therefore, if an enzyme is in an environment that affects any essential interactions, it will change its properties.

Protein engineering of an enzyme is intended to modify its structure to improve function, stability or selectivity; as the designed mutations should introduce molecular and structural changes to form novel molecular interactions or provide them with new biochemical features to promote the desired catalysis. This is possible because, even if the current enzymes have evolved to specialize at a given biochemical reaction, often their selectivity is not absolute and show a minor activity to catalyse other reactions, and this alternative activity can be adjusted by mutation and selection, this way enzymes can be evolved artificially to acquire useful properties via the optimisation of their promiscuous activities [16] (Table 1). Experimentally, some mutations that partially reduce or increase the activity affect kcat (catalytic rate constant) but the KM (Michaelis constant) value remains, meaning that the enzyme-substrate complex had the same affinity but the catalytic efficiency is altered [17,18,19].

Table 1 Some examples of mutational modifications on proteins to change their activity, stability or selectivity for novel applications or understanding of its catalysis

In addition, protein engineering improves our understanding of proteins and their function, as enzymatic screening with in vitro selection of mutants gives further insight about the structure, catalytic mechanism and activity of the new enzyme variations [34,35,36,37]; allowing to design strategies for predictions about the amino acids that are important to catalysis, or to adjust protein structure, and reduce constraining effects.

As should be expected, obtaining a novel specific and efficient engineered enzyme depends on the appropriate interaction of various biological, chemical and physical conditions, which are closely interconnected, and any modification may disrupt the balance between them, bringing serious alterations to the fundamental properties of the enzyme [38]. For example, in humans many natural genetic defects with single nucleotide polymorphisms are characterized by having problems in enzyme production or stability by aggregation, rather than the actual function of the protein. Thus, changes in enzymatic function rely in great proportion on stability, which depends on both expression and structural conformation [39].

Through protein engineering is possible to modify proteins to increase their applications. For one example is the green fluorescent protein, widely used as a reporter protein, for which through mutagenesis was possible to mutate specific residues in the chromophore of the protein to change the wavelength of energy emission, thus, modifying the fluorescent colour observed from green to variations of either blue, cyan o yellow, allowing for multifluorocrome-labelling methods [40, 41]. Another example is with an engineered transaminase for production of sitagliptin (anti-hyperglycemic drug) designed to substitute a rhodium-catalysed asymmetric enamine hydrogenation, the enzyme was modified for an efficient transamination of prositagliptin ketone, a compound for which initially it had no activity [42]. Computational tools and experimental protein-evolving techniques applied in the design of protein assemblies for molecule encapsulation, have allowed the production of libraries of synthetic capsid for RNA packaging which protects it from degradation from nucleases and extended circulation in vivo [43]. Considering potential vaccines, neutralizing antibodies of HIV-1 are produced by nearly 10% of patients with the disease, and identification of antibodies that interact with specific envelope (Env) epitopes of HIV has been studied to explore novel vaccine anti-viral antibodies through protein engineering to promote a wider range of neutralization of HIV-1 strains which could be potentially used in passive immunization or treatment [44]. These examples show a few useful applications and similar approaches may also be applied to any other protein or enzyme, thereby potentially expanding their applicability and versatility.

Approaches for Redesigning a Protein

Changing the stability, selectivity or activity of an enzyme is no easy task, even predicting changes on function by a single amino acid mutation is complicated as often there is no structure–function information [45] and any mutation, especially of conserved residues, and may lead to folding instability [46], nonetheless, there are methods to approach protein modification.

Rational protein design is a method that requires full comprehension of the protein structure and its catalytic mechanism, to systematically use site-directed mutagenesis to alter the active site and change the EOI in the desired way [19, 45]. Therefore, the structure, amino acids, charges and distances of the enzyme should be deeply analysed, and often it is difficult to predict the effect of the mutation [4, 8, 19, 47].

Directed evolution is a second method which does not require understanding of the structure of the enzyme and relies on the natural selection of proteins after several rounds of random mutagenesis [48]. Thus, based on unexpected changes in the genetic material and an adequate method for selection the EOI is isolated; however, the main disadvantage is the need of a high-throughput assay to screen for activity [49,50,51].

The iterative saturation mutagenesis (ISM) scheme, is a systematic probing of only a segment of the EOI in order to minimize the screening effort by reducing the number of residues of a given randomization site, and this has proved to be effective in improving the catalytic features and binding affinity of proteins [52,53,54].

The data-driven design or semi-rational design uses the information available to select mutations while allowing unpredictable substitutions that produce mayor changes. This method greatly relies on computational modelling and bioinformatics [55].

In research, these methods complement each other and can be used in parallel towards the evolution of an EOI.

Computational Analysis for Protein Structure Prediction

Experimental strategies may lead to large libraries with a time consuming and costly screening, and unfortunately, most mutants will have no better features than the WT enzyme. To overcome these big challenges, computational analysis came as an option to design targeted smaller libraries based on the data available.

There are numerous computational tools available, but in a simplistic description, these algorithms are based on different principles, a general approach to describe them may consider the “ab initio methods” whose foundations are chemical and physical principles to propose the minimal energetical cost in the protein folding and structure-affinity analysis on several types of substrates. The “comparative methods” that may include threading recognition, fragment based [56] and homology modelling, use similarities and repeated features among families of proteins to model the structure of ancient, current and novel proteins. The “empirical methods” that use heuristic and profiling strategies to solve a structure when the search space is immense make an exhaustive exploration impossible. The evaluation of the model is the most important aspect because it must stick to the scientific principles of chemical structures and must present the relevant biological features as close as possible with reality. Most of times, this last aspect is the hardest to assess and that produces greater uncertainty [9, 57,58,59,60,61,62]. Applications of these methods along with other experimental techniques have aided in this “enzyme reconstruction and resurrection” [63], which has brought powerful catalysts that are active and stable at different physicochemical conditions from the current existing enzymes [64].

Some examples of these computational methods are Phyre, SWISS-MODEL, FoldX, Modeller, ROBETTA, PoPMuSiC [65,66,67], MOSST (Mutagenesis Objective Search and Selection Tool) [45, 68], SCHEMA [69,70,71], PINGU (PredIcting eNzyme catalytic residues usinG seqUence information) [72] and THEMATICS (Theoretical microscopic titration curves) [73, 74].

Some limitations for computer algorithms include that the processing of data takes a long time, but currently there are megaprojects such as Folding@home, by which people around the world can connect to a server and contribute to the processing of structures [75].

Experimental methods for enzyme design and selection

Protein structure has a high level of complexity within its inter-molecular and dynamic forces, therefore in the analysis of its structure is important to consider its folding stages, structural rearrangements, substrate binding and catalytic activity, among other specific features [76]. Any enzyme to obtain its final functional conformation must be folded into its definitive 3D structure, which is a process that starts as early as necessary, and in which several chaperone proteins, cofactors and metal ions are involved to assist in the folding [77,78,79]. Also, conformational transitions are crucial during catalysis, and many of these transitions involve disordered regions in the enzyme, which can exhibit various rearrangements that are essential for activity. It is important to note that disordered regions are different from flexible regions that have various conformations but do not intervene in the catalytic activity of the protein, and the distinction between cannot be made on the sole basis of a three-state secondary structure [76]. Therefore, modification of an EOI requires to observe its functional cycle and structure, as it is a constant-dynamic and environmental-interactive macromolecule.

The combination of various computational methods can enhance the accuracy of predictions, however, experimental assessment of the modified EOI is required and this requires several methods and approaches to observe if the desired results were achieved. For this purpose, protein engineering uses molecular biology techniques and qualitative-quantitative analytical techniques.

DNA editing methods

DNA cloning in molecular biology is mainly used to isolate a gene from the rest of the cell genome to modify it and propagate it in the same or in a different species, later the cloned gene can be introduced into an expression vector for the production of mutated proteins. Currently, these techniques are a routing practice and there are mainly two approaches, for one, a gene is isolated from genomic DNA by selectively cutting it with restriction enzymes, and the gene (or genes) DNA fragment is directionally inserted and ligated though DNA ligase into a vector, which can be a viral vector, a plasmid or prokaryotic/eukaryotic artificial chromosomes. Then, the small recombinant DNA is introduced into competent cells to replicate and produce the intended protein, or to isolate the plasmid and generate a library of mutants. The second approach is the generation of cDNA from mRNA transcripts by using reverse transcriptase, which represents a tremendous advantage for eukaryotic genes that are edited from a pre-mRNA into a mature mRNA. After this step, the cDNA is directionally cloned into the vector for later the transformation of competent cells or to as in the previous approach, to generate a library of mutants [80].

There is a variety of cloning methods, that range from restriction and ligase enzymes-dependent, to ligation-independent, to PCR-based, to recombination-based cloning, all intended to increase ease of the procedure, efficiency and yield. These strategies can be selected according to the length, repetitions and complexity of the sequence, the number of genes, the expression system and the tags to use. Selecting the vector is of equal importance, as the production of functional enzyme, especially some enzymatic complexes, carries another challenge due to the folding, solubility, assembly and co-expression of specific subunits. Approaches for this include using multiple vectors fused with a single gene and selection marker to introduce genes into host cells. Another is poly-cistronic constructs for the expression of multiple genes from a single vector and under the control of a single promoter, but each gene has its ribosomal binding site. Finally, using a single vector but each gene is regulated individually by a separate, equal or different, promoter. One must consider that design of the cloning strategy should be flexible for a combinatorial approach that is often required for multi-gene constructs and PCR-based methods.

However, sufficient amount of soluble protein relies not only on the insertion of the gene into a particular vector, but also depends on the selected of expression system, the conditions used and vector topology [81].

Complications may arise with genes that contain internal restriction sites present in the cloning site, during expression constructs with poor compatibility between the cloning sites and vectors, when prokaryotic transcription and translation systems produce an enzyme with bad performance, when the unintentional introduction of additional nucleotides into the coding sequence leads to the addition of non-active extra amino acids in the EOI that may hinder activity or in repetitive protein coding genes. Fortunately, solutions to these problems have been developed such as multiple-host vectors that permit protein expression of the same construct in bacteria, insects and mammalian cells, cloning protocols for templates to recombine multiple fragments with no sequence homology and powerful recombinant vector systems for multiprotein assemblies [82], among others. Therefore, with several cloning strategies available, more than one option will be useful (Table 2).

Table 2 Typical cloning strategies commercially available

Generation of mutations and recombination in the gene (or genes) of interest, is of utmost importance for protein engineering. Parent genetic template sequences, either created in the laboratory or from a natural origin, are modified into novel combinations of sequence information to generate mutant enzyme libraries, which are later screened for the desired function.

There are dozens of methods described that have been developed for DNA manipulation to create specific constructs in a broad range of organisms with the intention of introducing insertions, deletions, mutating either a single site or various sites randomly, etc. Generally, two different approaches for the generation of mutant libraries exist: asexual (non-recombinant) and sexual (recombinant).

For example, site-directed mutagenesis, is a very important asexual method in the study of structure–function relationships of genes and proteins. It a technique that uses two complementary synthetic oligonucleotides, which contain the mutation at the corresponding codon, to facilitate the introduction of a single point mutation in the gene (Fig. 1). Site-directed random mutagenesis (site-saturation), is a derived technique from the previous technique, for which the oligonucleotides used contain a degenerated codon encoding for all amino acids to mutate a selected position in the gene [77]. It targets a specific residue, thus mutants in the library tend to maintain the WT structure of the enzyme with high probabilities of functionality and there are several cases where it has been successful for alteration of substrate specificity or enantioselectivity, novel catalytic activity, enhanced stability, reduced immunogenicity, among many other applications. It is a PCR-based method with full amplification of the gene and vector, and to eliminate the original WT-DNA, DpnI is used, that as it is that an enzyme that can degrade methylated DNA. However, there are limitations with this method by using complementary oligonucleotides, one being that cannot introduce several mutations, and other that dimers may be produced [87, 88].

Fig. 1
figure 1

Site-directed mutagenesis creates targeted changes in dsDNA plasmid by using designed oligonucleotide primers (~ 25 bp) in a regular PCR to amplify the full vector and confers the desired mutation with a selected amino acid. It can be used to change a single codon to another, to add or remove a small sequence of codons (~ 20 nucleotides). In site-saturation mutagenesis, the primers are degenerated at a designated codon (in diagram shown as striped nucleotides or an X), either NNN (where N is any nucleotide) or with a codon NNK (where K is either a T or a G). The original methylated template must be digested with DpnI before transformation. The primers yield a circular doubly-nicked plasmid that can be directly transformed into competent bacteria where it can be ligated in vivo into a circular DNA by bacteria or by an added kinase and ligase

Combinatorial cassette mutagenesis is another asexual technique used for simultaneous saturation mutagenesis in multiple sites, as it is capable of introducing segments of random arrangements of DNA into a target sequence without adding other non-native sequence. It requires a DNA cassette with restriction sites available to insert the gene(s) of interest or the mutated sequences flanked by two restriction sites with the same cleavable motif as the cassette, for their ligation. If the restriction sites are not present in either the cassette or the sequence of interest, or both; they can be designed by site-mutagenesis, with primers codifying for a specific restriction site according to the enzyme to be used. This method can easily generate combinatorial libraries that after ligation to a s, the can be selected according to their functionality protein is carried out (Fig. 2) [4, 78].

Fig. 2
figure 2

Combinatorial cassette mutagenesis allows the insertion mutagenic oligodeoxynucleotide cassettes. It is performed by inserting the DNA sequence (dark grey) in the cassette (light grey) using the corresponding restriction sites (in diagram: EcoRI, black for cassette restriction sites and light grey for sequence restriction sites). In both cases the fragment of the gene to be inserted and the DNA where it will be ligated must have sticky ends so that they align, and latter can be covalently ligated with a DNA ligase. After ligation takes place, this construct can be used for cell transformation

Error-prone PCR (ep-PCR), is considered an asexual technique based on inaccurate copying of a sequence by DNA polymerase, which under specific experimental conditions or type of enzyme used, it may add 23 incorrect bases to each replicated DNA strand of the gene. Unlike site-directed random mutagenesis or cassette mutagenesis, it allows introduction of random mutations within a wide range of the target gene, simulating the natural process of natural random mutagenesis, thus, it is a widely used technique, but the biased occurrence of amino acids is an intrinsic drawback (Fig. 3).

Fig. 3
figure 3

Error-prone PCR (ep-PCR), it is a PCR that takes advantage of the low fidelity of Taq polymerase and other engineered polymerases, to insert randomly a mispair when performing DNA polymerization. Another method to promote mutations is creating non-ideal conditions of the reaction, such as spiking some of the dNTPs, either dCTP and dTTP with dGTP and dATP, also by increasing the concentration of MgCl2 or MnCl2. The sporadic mutated bases along the DNA segment (in diagram shown as striped nucleotides or an X) promote random mutations for a different codifying codon. After generation of DNA, the sequence can be subjected to site-directed mutagenesis, to introduce restriction sites for further ligation into a vector

DNA shuffling is a random recombination of DNA fragments from homologous genes into full-length chimerical gene. It simulates the process of natural recombination by digesting DNA sequences of homologous genes and combining the fragments by denaturation, annealing and elongation with a DNA polymerase (Fig. 4). Comparing DNA shuffling with ep-PCR, DNA shuffling can be applied to sequences > 1 kb, while ep-PCR does not allow fragments > 0.5–1.0 kb. However, both techniques can be used in a pool of unknown sequence and share a similar mutagenesis rate. DNA shuffling with WT-DNA removes neutral mutations produced by repeated cycles of any mutagenesis strategy [89].

Fig. 4
figure 4

DNA shuffling is a technique for the recombination of homologous gene sequences (gene 1 black and gene 2 grey). The genes are digested with DNAse to obtain random small fragments which are subjected to melting, annealing and extension in a PCR-like process with no added primers [90]. Finally, the sequences obtained can be cloned into a vector once the restriction sites are inserted in the sequence by a site-directed mutagenesis

DNA shuffling was demonstrated by Stemmer in 1994 and since then, numerous in vitro recombination methods have been developed. Some of these methods for recombination and also methods for asexual mutations are described in Table 3.

Table 3 Typical methodologies to obtain mutations or recombination

Other types of methods involve chemical or physical random mutagenesis, which are based on the variety of chemical reagents and physical factors that are reported to mutagenize DNA by inducing alkylation’s, deamination, pyrimidine dimers, base oxidation among others.

There are various perspectives for the advantages and disadvantages of each method. Considering a point of view from natural evolution that is a function of variation and selection, computational simulation studies of evolution of protein sequences and their structural classes, have demonstrated homologous recombination in the evolution of biological systems, which appears remarkably advantageous by combining valuable mutations that have arisen independently and may be synergistic, while simultaneously removing mutations that decrease the fitness of an organism. Unfortunately, all these methods generate large numbers of mutants, most of them non-functional, and therefore the screening is extensive and requires high-throughput methods that are costly and time consuming.

Using strategies to achieve enzyme modification for stability, activity and selectivity

Protein modification may implicate producing multiple variants as the number of amino acids and mutations is enormously vast. However, random mutant libraries of a few hundreds or thousands of mutants are enough to obtain a stable and active enzyme, as it is not necessary to exhaustively examine an EOI in each and every single position, but rather create a random mutagenesis library to get a statistically significant number of mutants with the desired features when comparing them to the WT protein. After the screening, a few variants may be chosen, and if necessary, deeper work on the selected mutations can be performed [100].

Site-directed mutagenesis is commonly used in rational protein design to manipulate the fine balance between structure, ligands and protein recognition to obtain new desirable features, and in some cases even transform the activity of the EOI. This process that naturally occurs in divergent evolution throughout species, and it is followed by functional evolution or neo-functionalization [101]. For instance, the membrane associate guanylate kinase (MAGUK) contains a GUK domain (GUKdom) that is used to interact with the proteins of the cytoskeleton for adhesion and signal transduction. However, other enzymes have the GUK catalytic domain (GUKenz) that is used to bind ATP [101, 102]. Comparison of these two GUK domain sequences in Drosophila revealed that aside of a P-loop, the sole different residue between GUKenz and GUKdom is Ser-68, which is a Pro respectively, and it is key residue which marks the functional difference between a nucleotide kinase and a recognition domain [101].

Enzymes must be functional and stable in the cell, so their folding, aggregation and degradation are primary structural aspects. So, in directed evolution, the more stable the EOI is, the bigger the range it has for mutational changes and evolvability. In this aspect, mutations that increase stability can help to balance the disrupting effect of other mutations, unfortunately these modifications are permanent during the functional cycle of a protein and do not provide any further advantage. An alternative is using molecular chaperones, providing a temporal buffering that does not affect the production and concentration of the enzyme. Moreover, chaperones not only provide stability to some mutations, but also help with the folding of the protein [19, 103].

During interaction of enzyme and substrate, the availability and location of the binding site are central for activity. Therefore, orienting the EOI with a specific alignment through immobilization is a possibility to promote higher activity and carry out sequential synthesis in a single step by co-immobilizing different related enzymes [104]. Site-directed mutagenesis has facilitated to immobilize enzymes, and even in some cases, it has improved thermostability and stability with organic solvents [105]. Additionally, cell surface display, which is a technique that permits the expression of proteins fused to membrane-proteins of bacteria, phage or yeast; works similarly for immobilization and has allowed the development of vaccines, drug-delivery systems, new catalysts and bioremediation alternatives [8].

Screening for the enzyme of interest

Mutagenesis in the starting point in protein engineering, a suitable screening is key step in this process to obtain the variants of interest in short time and less labour-intensive. The strategy to implement it should consider correct production of the protein from the host, the efficiency of transformation, assay development and its reproducibility and equipment needed, among others. All techniques have advantages and disadvantages; thus, the idea is to follow a complementary approach to designing high-throughput strategies for protein purification and measuring activity (should be straightforward, inexpensive and fast) either with assays for broad substrate specificity assessment or for substrate profiling, to screen protein with the desired substrate.

It is best to have the protein as pure as possible as impurities might interfere with results and the screening of enzymatic activity will involve measuring any of the components of a reaction: the free enzyme, the substrate or the product. Each of these may be directly detected by specific labelling or if the molecule is not spectroscopically active, it may be indirectly detected by coupling the reaction to a reporter molecule. Thus, isotope labelling, fluorescent products, detection antibodies and tags, are highly desired in these experiments. If not available, there are other instrumental techniques that can be used to detect special products and to study their structure and molecular changes such as mass spectrometry or nuclear magnetic resonance, a brief description of these is presented in Table 4.

Table 4 Instrumental techniques to study proteins

Selection of a direct or indirect detection method depends on the high sensitivity of the method of detection and the stability of the measured component is required. Detection techniques for data collection of a reaction may be limited by automatization, real time monitoring or requirements of sample processing. In indirect detection schemes, a similar rate of transformation and saturation are also important, so that the original reaction is not limited by the kinetics of the reporter reaction.

Determination of the Km and kcat are of main importance as they serve to standardize the assay and are critical for the assessment of activity and stability of the expression of constructs, determining of inhibitors, substrate binding, compound screening and other factors that have significant consequences in a biochemical reaction [116, 117].

Experimental techniques for assay screening of the EOI are varied and use different conditions, but as general rules: (1) The concentration of the EOI depends on the method of choice, but it is suggested to do dilutions where it is in complete solution, and select the proper dilution according to the signal obtained in the instrument (thus minimizing the sample consumption and spent effort), (2) The tertiary structure of proteins is observed near the UV region where aromatic amino acids absorb [118], (3) Osmolytes, modulate their macromolecular properties according to their protonated states, which can lead to undesired results that do not normally occur in native conditions [119].

For protein stability, the difference of free energy (ΔG°) between the folded and unfolded states is key in thermodynamic stability, however, just a minimum fraction of protein may be unfolded in native conditions, hence the difficulty in quantifying the folded-unfolded state equilibrium, and it is necessary to change the conditions to shift the equilibrium [82, 119, 120]. The thermostability can be tested by measurement of the residual activity under heat exposition since most protein structures will loss activity as they denature and precipitate at high temperatures [119]. However, thermostability and even proteolytic resistance does not always relate with solubility and functionality, as there are cases where the protein variants are expressed as insoluble inclusion bodies [19].

Solubility is an essential feature but several times it is no easy task to assess it due to changes in viscosity, pH shifting, binding to surfaces or aggregation, among others [121]. Normally, the activity of the cell lysate is measured to identify the concentration at which the protein is soluble and active [119, 122]. Another method may be indirect assay with a reporter protein, where its activity is related with the solubility of the EOI. Typical examples of reporter proteins are the green fluorescent protein, β-galactosidase, dihydrofolate reductase or an antibiotic resistance protein [19].

Predictor methods for solubility have also been developed; unfortunately require specific conditions, as rely on a linear relationship between protein solubility and the addition of another co-solute of low viscosity and insignificant denaturation effect, this way the amount of precipitated protein increases in proportion to the amount of polymer [121, 123].

Modification of specificity vs selectivity of the enzyme

Enzymes can be selective catalysts and bind preferably to their substrate, but at the same time, they may be able to bind with other compounds with less affinity. On the other hand, the specificity of an enzyme arises from the exclusive reaction with a definite substrate. It is proposed that a good substrate can cause a conformational change in the active site of the enzyme, and similarly, the enzyme can induce a transition state in the substrate so that both can react; consequently, the specificity aligns with the rate in which a particular substrate reacts with the enzyme rather than the tendency between the enzyme and an analyte to bind each other [124].

To measure the rate of reaction with a specific substrate and under the assumption that enzymes operate under steady-state conditions, kcat and KM are experimentally calculated, and the ratio kcat/KM is often used to assess enzymatic specificity and to compare the relative rates of reaction of the given substrates [125,126,127]. A comprehensive treatise on kcat and KM is beyond the scope of this review and for better explanation of these concepts suggested reviews are Cornish-Bowden [127] and Schnell [125].

If kcat/KM large (by large value of kcat or small value of KM) indicates optimum kinetics of the system and the high enzyme specificity and which includes the control for isomer specificity of substrate and product. One example is pyridoxal phosphate (PLP), a generalized cofactor that can catalyse multiple reactions like transamination, racemization and decarboxylation; in contrast, PLP-dependent enzymes present characteristic and unique features in their structure to be limited a specific reaction, which permits specific stereoelectronic effects and chemical conditions that are exploited by the catalytic mechanism of the enzyme [128,129,130].

Another case to illustrate enzyme specificity is histone modification, where these reactions for post-translational modifications not only are specific, but also reversible, dynamic and highly controlled as they intervene in the interconversion of euchromatin to heterochromatin and modify gene expression (in eukaryotes) [131, 132]. Methyltransferases (MTases) are part of these genome-editing tools and use cofactor S-adenosyl-l-methionine (SAM) to mono-, di- or tri-methylate Arg and Lys residues of histones [132,133,134]. A conserved SET domain is shared by most types of Lys-MTases, this domain has a hydrophobic structure known as the Lys-channel, where the Lys reacts with the cofactor and the nucleophilic attack to SAM for methylation occurs. This structure also confers the specificity of the MTase to mono-, or di- or tri-methylate the terminal amino group of Lys. Structural analysis of the individual residues that define the internal size of the Lys-channel, observes that specific amino acids at a conserved structural position, allow specificity of the methylated states of the products of Lys-MTases. This conserved position is denominated the Tyr/Phe switch, as either amino acid can be found at this position and determine if the enzyme is able to transfer to its substrate up to three methyls by having a Phe, or if the enzyme is limited to mono-methylation by having a Tyr. Site-directed mutagenesis in the Lys-channel to exchange Phe to Tyr or the opposite allows either including or excluding various degrees of methylation in the Lys without altering the general catalysis of the MTase [135,136,137,138,139,140,141,142,143,144,145,146,147].

Furthermore, though the substrate binding sequence of some MTases is present in various hundreds of non-histone cellular proteins, in vivo studies indicate that the substrate amino acid sequence alone is not enough for activity, and that the adoption of a precise conformation of the protein structure is required; consequently, just a few non-histone proteins are substrates of MTases [148,149,150].

It is possible to modify and re-design the active site of an enzyme to improve or even modify its substrate specificity and enantiomeric selectivity. Since enantiomers are mirror images, mutations must be designed as if the enzyme has to recognize and bind a whole new molecule [151, 152]. Physicochemical properties, such as high hydrostatic pressure, have been used to change stereoselectivity of an enzymatic reaction with no need of mutagenesis [153, 154]. Rational design in protein engineering has been used to change specificity and additionally allows understanding of the molecular basis of substrate specificity and chiral selectivity of enzymes [155, 156]. Mutagenesis through directed evolution is a viable option to change specificity as long as the experiment design and selection method are adequate, even if there is not much understanding of the structural features of the enzyme [91, 157].

Protein engineering has been applied to obtain orthogonality of enzymatic systems. Bioorthogonality is an important feature desired for artificial systems that urge to avoid any interactions with the inherent biochemical pathways in the cell. In orthogonal enzymatic labelling it is desired to selectively attach tags to proteins using an enzyme with higher affinity for a synthetic analogue cofactor than for the WT cofactor that might be highly abundant in the native cellular environment. This involves adjusting the structures in the enzyme-cofactor-substrate complex that permit the natural specificity, in order to re-design crucial interactions and thus enable the desired orthogonality [158,159,160]. Thus, the engineered enzyme in these bioorthogonal systems shows exclusive activity with an artificial substrate and cofactor, while it does not bind nor reacts towards the native molecules.

Conclusions and future perspectives

In this review, we have shown that protein engineering is a wide and developing area with an open range of methods and techniques that are constantly improving to yield the desired features in biocatalysts. This area combines both rational and empiric knowledge, along with various fields and analytical tools, and therefore lies at the interface between chemistry, biology and biotechnology. Moreover, advances in protein engineering and chemical proteomics not only propose applications to improve various areas, like environmental bioremediation or obtaining of new drugs, but also have the potential to improve our understanding of enzymes in aspects such as structure and activity, by providing both qualitative and quantitative information.

The improved knowledge in protein modification and biocatalysis enables research through the combination of computational and experimental methods, making protein engineering faster, cheaper and in various occasions with remarkable results; such as improved activity, stability, selectivity or even orthogonality beyond a traditional design. As aforementioned, it is a trend that bioinformatics and cheminformatics will continue to crossover and assist in the development of enzyme engineering.

With the novel goals in synthetic biology and biocatalysis, such as optimization of biosynthetic pathways, genetic circuits and genome engineering; novel tools and new knowledge will come forward to satisfy the needs of research and our understanding of life from a biological multilevel perspective that is the genome, to the transcriptome, to the proteome and epigenetics.