Introduction

The major goal of protein engineering is to design novel proteins and to develop protein-based bioprocess required for production of high-value biomolecules in various areas of pharmaceutical, agricultural, industrial, and environmental fields [1]. For industrial application, the engineered protein or enzyme should have highly stable reactivity even under harsh environments such as high temperature, salinity, and acidity [2]. To develop novel proteins for its industrial application via a rational design strategy, the molecule-level mechanisms of protein thermostability should be revealed and understood [3, 4]. As one research approach, analysis of thermostabilizing factors found in the proteins from thermophilic organisms has been performed since thermophilic proteins show substantially higher intrinsic thermostability than their counterparts from mesophilic organisms, while retaining the basic fold characteristic of the particular protein family [5, 6].

Although several comparative studies between thermophilic protein and their mesophilic counterparts suggested several thermostabilizing factors [711], the results revealed by comparative studies could not be accepted as general modes governing protein thermostability. The reason is that the results are quite various according to protein families, that is, each comparative study shows just case-by-case result in each protein family. To achieve more general patterns of protein thermostability, systematical analyses for investigating the common thermostabilizing factors highly found in several families of thermophilic proteins (not highly found in mesophilic ones) have been performed [1217]. However, the systematic analyses have investigated so simple features, for example, total number of residual interactions or total size/volume of hydrophobic area or electronegative surface, etc., so that the findings were not helpful for understanding detailed thermostabilizing mechanisms, which could be used as practical guideline to engineer protein thermostability [1]. In other words, because the conventional systematical analyses were focused on too general features, it was difficult to find out the residue-level aspects or factors of protein thermostability.

In previous reports, we have tried to analyze the residue-level features of thermostabilizing factors in a systematical fashion by employing a concept of residual structure states estimated by packing value [1820]. Considering residual structure states, we compared the distribution of amino acid types, residual properties, and structural features between thermophilic and mesophilic protein families. Finally, we found out distinctive residual patterns related with protein thermostability [1820], that is, we showed how differently the amino acids, residual properties, and structural features are distributed in the thermophilic protein families from the mesophilic protein ones [1820].

Herein, to find out the residual thermostabilizing factors, we approached it in a different way by investigating distinctive relative occurrences of amino acid kinds, residual properties, and secondary structure types in each residual structure state. We carried out a comparative analysis of 20 pairs of thermophilic and mesophilic proteins to achieve more explicit thermostabilizing factors. Through these analyses, we could get some clues on which residual factors in each residual structure state could contribute more to protein thermostability.

Material and Methods

Protein Model System

All the protein structural data used in this study were found in the Protein Data Bank (PDB) atomic coordinate database at the Research Collaboratory for Structural Bioinformatics (RCSB) [21]. The criteria for selecting the sets of thermophilic and mesophilic protein structure pairs were based on our previous report [1820], and the number of the sets for thermophilic and mesophilic protein pairs is 20. The list of the set used in this study is as follows (PDB code of thermophilic/mesophilic): 1zin/1aky, 1tmy/3chy, 1aj8/1csh, 1tfe/1efu_b, 1yna/1xnb, 1gtm/1hrd, 1hdg/1gad, 2prd/1ino, 1ldn/1ldg, 1bdm/4mdh, 3mds/1qmn, 1xgs/1mat, 3pfk/2pfk, 1php/1qpg, 1ebd/1lpf, 1ril/2rn2, 1caa/8rxn, 1thm/1st3, 1lnf/1npc, and 1btm/1ypi.

Residual Structure States

The residual structure states were determined by the residual packing value of each residue, which was calculated by an extension of the occluded surface algorithm [22, 23]. Residual packing value equals to 0.0 if there is no occluding van der Waals surface within 2.8 Å of the molecular surface; residual packing value equals to 1.0 if 100 % of the molecular surface were in contact with the van der Waals surface of other atoms. On the basis of the calculated residual packing values, five kinds of residual structure states were determined as follows: in the case of the fully exposed state (FE state), the residues have 0 to 0.15 residual packing value; in the exposed state (E state), the residues have 0.15 to 0.30; in the partially exposed (or partially buried) state (P state), the residues have 0.30 to 0.45; in the buried state (B state), the residues have 0.45 to 0.60; and in the well-buried state (WB state), the residues have 0.60 to 0.75.

Residual Properties Investigation

Flexible residues (Flex) and rigid residues (Rigid) were selected through comparing their α-carbon flexibility with one another. The residual α-carbon flexibility is calculated by obtaining the temperature B value of the α-carbon atoms in the PDB data [24]. In this study, flexible residues (Flex) were defined as the top 10 % residues showing the highest α-carbon flexibility among all the residues in a protein. Rigid residues (Rigid) were defined as the top 10 % residues showing the lowest α-carbon flexibility.

For describing the relationship of water solvation, the residues with high solvation energy (HSE) and low solvation energy (LSE) in the native state of protein were considered through comparing their residual solvation energy with one another. The residual solvation free energies calculated are based on the atomic solvation parameters of Eisenberg and McLachan [25]. For a given residue, the residual solvation free energy is determined as the sum of the contributions from the individual atoms (neglecting hydrogen atoms) in the residue taking into account their solvent-exposed surface area in the native state of protein. The residues with HSE were defined as the top 10 % residues showing the highest solvation energy among all the residues in a protein structure. The residues with LSE were defined as the top 10 % residues showing the lowest solvation energy.

The number of hydrogen bonds (Hbond) was calculated by counting the number of non-hydrogen atoms in each residue-involved donor and accepter hydrogen bonds, which are determined by the distance of donors and accepters within 4.0 Å [26]. Calculation of the number of hydrogen bonds was carried out by the portable module of biopolymer on SYBYL. In the case of salt bridges (SB), cation–pi interactions (Cat–pi), and disulfide bonds (SSbond), each number was estimated by the Protein Explorer package provided by the Department of Microbiology, University of Massachusetts Amherst. Salt bridges are assigned to two atoms of opposite charge, when the atoms were observed to be within 4.0 Å. Positively charged atoms include side chain N atom in ARG, LYS, and HIS, while negatively charged atoms include side chain O atom in ASP and GLU [27]. Cation–pi interactions are assigned to aromatic residues, when a cationic side chain of ARG or LYS is near an aromatic side chain of PHE, TRP, or TYR. Ninety-nine percent of significant cation–pi interactions occur within a distance of 6.0 Å [28].

The secondary structure was determined by the Kabsch–Sander procedure [29]. The one-letter code of the secondary structure state was assigned to the residue as follows: E is the residue in the extended strand that participates in the beta ladder, B is in an isolated beta bridge, H is in the alpha helix, G is in the 3/10 helix, and T is in the hydrogen bonded turn. The residue irrelevant to any secondary structure (e.g., residues in loop) was not assigned with the one-letter code.

Statistical Analysis

The relative occurrence of residual factor in each residual structure state i was determined here as (occurrence of residual factor)/(total residue number of residual structure state i). To select a higher relative occurrence of residual factor in each residual structure state found in thermophilic proteins than in mesophilic proteins, test statistic for average relative occurrence (R i ) of both groups was carried out by calculation of t value. The t i value is calculated as follows:

$$ {t_i}={{{\left( {{R_{{i\text{-}\mathrm{Th}}}}-{R_i}_{\text{-}\mathrm{Me}}} \right)}} \left/ {{\surd \left( {{{{{S^2}_{{i\text{-}\mathrm{Th}}}}} \left/ {{{N_{\mathrm{Th}}}}} \right.}+{{{{S^2}_{{i\text{-}\mathrm{Me}}}}} \left/ {{{N_{\mathrm{Me}}}}} \right.}} \right)}} \right.}, $$
(1)

where R i-Th and R i-Me are the average relative occurrences of residual factor in each residual structure state i of thermophilic protein groups and mesophilic protein groups, respectively; S 2 i-Th and S 2 i-Me are the deviations of average relative occurrences of R i of thermophilic protein groups and mesophilic protein groups, respectively; and N Th and N Me are the total number of thermophilic protein groups and mesophilic protein groups, respectively. The degrees of freedom, df (= N Th + N Me − 2), is 38, in which the value is large enough to be considered as infinite sample sets. Critical levels of t 0.1 (= 1.282) and t 0.01 (= 2.326) in a one-tailed t test (with df = infinite) were used as criteria for selecting the distinctive residual factor of thermophilic proteins [30].

Results and Discussion

Thermostabilizing Factors in Fully Exposed Residual State (FE State)

We investigated the relative occurrences of amino acids, residual properties, and secondary structure types in FE state and compared each statistic value between thermophilic and mesophilic protein groups as arranged in Table 1. t test results showed which residual factors in FE state play more important roles related to protein thermostability. Among the results, the relative occurrences of GLN, ILE, and PHE were observed to be distinctively higher in FE state of thermophilic proteins than those of mesophilic ones. t values of GLN, ILE, and PHE were investigated to be 1.4208, 1.6278, and 1.2833, respectively. The results belonging to other categories such as residual properties and secondary structure types showed no characteristic difference in FE state.

Table 1 Different relative occurrences of amino acids, residual property, and secondary structure in fully exposed structure state (FE state) between thermophilic and mesophilic proteins

Compared to ASN and VAL in FE state, GLN with long alkyl group and ILE with long chain were observed to have more different relative occurrence in FE state of thermophilic proteins. In addition, PHE, an aromatic residue without polar group, showed high relative occurrence in FE state compared with TRP and TYR, aromatic residues with polar group. Interestingly, GLN and ILE can be considered as the ones with higher flexibility and higher solvation energy than ASN and VAL. PHE can be also considered as the one with higher solvation energy than TRP or TYR. Nevertheless, the residues with 10 % top highest flexibility (Flex) or the residues with 10 % top highest solvation energy (HSE) were investigated to have no distinctive differences. They were even observed to be less found in FE state of thermophilic proteins than those of mesophilic ones (−1.1643 and −0.8969 of t values, respectively). There are two explanations possible: One is that in fully exposed state, long-chained or more hydrophobic amino acids might play important roles related to protein thermostability among the amino acids with similar properties. However, they might not be the residues responsible for increasing the structural flexibility and solvation energy. The other is that thermophilic proteins may be more tolerant of such long-chained or more hydrophobic amino acids being placed in the FE state than mesophilic proteins.

Thermostabilizing Factors in Exposed Residual State (E State)

The relative occurrences of amino acids, residual properties, and secondary structure types were investigated in E state and statistically compared between thermophilic and mesophilic proteins as arranged in Table 2. In the case of amino acids, the relative occurrences of ALA, ARG, GLU, SER, and VAL in E state were observed to show distinctive values in thermophilic proteins compared to mesophilic ones. t values of ALA, ARG, GLU, SER, and VAL were investigated to be −2.3812, 1.8547, 1.8874, −1.3134, and −1.3300, respectively. In the case of residual properties, the relative occurrences of salt bridges (SB) and the residue with LSE were investigated to show characteristic differences related to protein thermostability (2.4225 and 1.5801 of t values, respectively). In the case of secondary structures, the relative occurrence of the residues in 3/10 helix (G) was observed to be highly found in thermophilic proteins (1.4897 of t value). In particular, the relative occurrences of salt bridges and ALA in E state showed big differences between thermophilic and mesophilic proteins since their t values are above the critical value of t 0.01 (= 2.326).

Table 2 Different relative occurrences of amino acids, residual property, and secondary structure in exposed structure state (E state) between thermophilic and mesophilic proteins

A significant increase in the number of salt bridges has been reported for most structures of thermostable proteins [9, 31, 32]. Our previous results also reported that salt bridge has higher preference for the exposed state of thermophilic proteins [18]. Salt bridges could be expected to stabilize the exposed part of a protein structure, which might be composed of more flexible and less stable structure than the buried part. Among the relative occurrences of salt bridge participants, only ARG and GLU showed a distinctive difference. Compared to other salt bridge participants (LYS or HIS and ASP), ARG and GLU could be expected to play important roles in exposed state of thermophilic proteins through constituting the salt bridge, which could be contributive to increasing the thermostability of the exposed structure. In addition, the charged residues were expected to contribute to thermostability since the residues with LSE were observed to have higher relative occurrence in the exposed state of thermophilic proteins than mesophilic ones. These results presented that in E state, the charged residues (in particular, ARG and GLU) could contribute to the electrostatic interactions and make the exposed interactions preferable for water molecules (that is, LSE), which can be energetically effective for stabilizing the local conformation of protein structure.

On the other hand, it is interesting that ALA, SER, and VAL were observed to have lower relative occurrences in E state of thermophilic proteins. The results that small aliphatic residues such as ALA and VAL were less found in E state of thermophilic proteins should be considered with the trend about higher occurrence of long-chained ILE or GLN in FE state of thermophilic proteins as mentioned in the above section, that is, in fully exposed and exposed state of thermophilic proteins, the residues with shorter alkyl groups have low occurrence, whereas longer residues were observed to show the reverse. In the case of SER, its molecular interaction, binding with surrounding waters, should be considered. Since the water, which could be interacted with SER (usually mediated by hydrogen bonding), would be released at higher temperature, the local protein structure around water-binding site such as SER could be changed and become unstable enough to evoke protein instability [33, 34].

In addition, the residues in G (3/10 helix) structures were investigated to show higher relative occurrences in E state of thermophilic proteins. Since 3/10 helix structure is one of the alpha-helix structure variants and this structure is expected to play an important role in the stabilization of the whole alpha-helix structure in the protein structure [35], such higher occurrence of 3/10 helix structure in E state of thermophilic proteins might be related to protein thermostability.

Thermostabilizing Factors in Partially Exposed (or Partially Buried) Residual State (P State)

We investigated the relative occurrences of amino acids, residual properties, and secondary structure types in P state and compared each statistic value between thermophilic and mesophilic protein groups as arranged in Table 3. t test was also performed to analyze which factors in P state play more important roles related to protein thermostability. In the case of amino acids, the relative occurrence of SER in P state was observed to have a distinctively lower value in thermophilic proteins (−2.6958 of t value). In the case of residual properties, the relative occurrence of the flexible residue (Flex) in P state was observed to be characteristic features of thermophilic proteins compared to mesophilic ones showing 2.7045 of t value. These two results could be considered as outstanding patterns related to protein thermostability since their t values are below the critical value of −t 0.01 (= −2.326) and above the critical value of t 0.01 (= 2.326), respectively.

Table 3 Different relative occurrences of amino acids, residual property, and secondary structure in partially exposed (or partially buried) structure state (P state) between thermophilic and mesophilic proteins

In the case of SER, as mentioned in the above section, its binding interaction with surrounding waters may be involved. The water interacted with SER (usually mediated by hydrogen bonding) would affect protein instability when the water is released from the water-binding site at higher temperature [33, 34]. Although SER was investigated to have lower number in thermophilic proteins than in mesophilic ones, they showed the lowest relative occurrences especially in P state. Since the residues in P state have a higher chance to interact with neighboring residues and constitute the local protein structure among the residues in water-interacted parts (outer or boundary part), they also have higher chances to evoke protein instability in case that water is released at higher temperature.

In the case of local flexibility and rigidity of protein structure, several molecular interactions such as hydrophobic interaction, hydrogen bonding, salt bridge, or disulfide bond could be involved [36]. In P state, flexible residues have higher chances to interact with neighboring residues and to be stabilized by molecular interactions and forces. Therefore, this result indicated that the flexible residues were located more prevalently in P state of thermophilic proteins than mesophilic ones by such big differences.

Thermostabilizing Factors in Buried Residual State (B State)

The relative occurrences of amino acids, residual properties, and secondary structure types in B state were statistically analyzed between thermophilic and mesophilic proteins as arranged in Table 4. t test results showed which factors play more important roles in B state related to protein thermostability. In the case of amino acids, the relative occurrences of ARG, GLU, and MET were observed to show distinctive values in thermophilic proteins compared to mesophilic ones. t values of ARG, GLU, and MET were investigated to be 1.3206, 1.9562, and −1.6770, respectively. Other results belonging to residual properties and secondary structure types showed no characteristic occurrences in B state.

Table 4 Different relative occurrences of amino acids, residual property, and secondary structure in buried structure state (B state) between thermophilic and mesophilic proteins

The results of ARG and GLU were expected to get involved in their contribution to electrostatic interaction in the inner part of the protein structure. Although the relative occurrences of hydrogen bond and salt bridge in B state were observed to have no difference between thermophilic and mesophilic proteins, ARG and GLU might play prevalent roles in making electrostatic interaction in thermophilic proteins since the salt bridges mediated by ARG–GLU paring were investigated to be effective for stabilizing protein structure. In the case of MET, it can be expected that in B state, MET would be less found in thermophilic proteins than in mesophilic proteins since its sulfur group could influence the internal network of hydrophobic interactions, although MET showed hydrophobic property.

Thermostabilizing Factors in Well-Buried Residual State (WB State)

We investigated the relative occurrences of amino acids, residual properties, and secondary structure types in WB state and compared each statistic values between thermophilic and mesophilic protein groups as arranged in Table 5. In the case of amino acids, the relative occurrences of ALA, ASP, and GLY were observed to show distinctive values in thermophilic proteins compared to mesophilic proteins. t values of ALA, ASP, and GLY were investigated to be 1.3967, −1.3908, and −1.7916, respectively. In the case of residual properties, the relative occurrence of cation–pi interaction (Cat–pi) was observed to show characteristic value in WB state of thermophilic proteins showing 1.3708 of t value. In the case of secondary structures, the relative occurrences of the extended beta strand (E) and 3/10 helix (G) were observed to show distinctive structural difference between thermophilic and mesophilic proteins. t values of E and G were investigated to be −1.3511 and 1.4529, respectively.

Table 5 Different relative occurrences of amino acids, residual property, and secondary structure in well-buried structure state (WB state) between thermophilic and mesophilic proteins

GLY is known as the residue for making void volume or cavity in the inner part of the protein structure [34]. However, most of the other systematical analyses have reported that there is no typical pattern of GLY occurrence in thermophilic proteins. In our other study, GLY was observed to show no structural difference in distribution between thermophilic and mesophilic proteins [19]. However, this study showed that in WB state, GLY was less found in thermophilic proteins than mesophilic ones. Since GLY has a higher chance to make void volume or cavity in the inner part of the protein structure, GLY was expected to be less located in thermophilic proteins. Compared to GLY, ALA showed higher occurrence in WB state of thermophilic proteins than that of mesophilic ones. It can be expected that ALA would be more preferable to contribute to the increased hydrophobic packing than GLY with no methyl group. In the case of ASP, small-charged amino acid such as ASP was less found in WB state of thermophilic protein than that of mesophilic ones compared to large-charged amino acid such as GLU.

Cation–pi interaction is assigned when aromatic side chain of PHE, TRP, and TYR is near a cationic side chain of ARG and LYS [28]. Like salt bridges, cation–pi interactions are also subject to their location and geometry in protein structure. Among the aromatic amino acids, TRP is the most likely to be involved in a cation–pi interaction. Higher relative occurrence of TRP in WB of thermophilic proteins might be correlated with higher relative occurrence of cation–pi interaction.

In the case of secondary structure types, the extended beta strand (E) was less found in WB state of thermophilic proteins. This result might be related with the fact that the residue might have difficulties in keeping the extended beta strand in the dense packing structure of the inner part of the protein [35]. On the other hand, the residues in 3/10 helix (G) were investigated to show higher relative occurrences in WB state of thermophilic proteins. As mentioned in the previous section, 3/10 helix, one of the alpha-helix structure variants, can be expected to contribute to stabilizing the whole alpha-helix structure [35], that is, the high occurrence of the residues in 3/10 helix in WB state together with those in E state would be related to protein thermostability.

Conclusion

This study analyzed the different relative occurrences of amino acids, residual properties, and secondary structure types between thermophilic and mesophilic proteins according to the residual structure states such as fully exposed, exposed, partially exposed (or partially buried), buried, and well-buried state. Through statistical analysis, several important factors related to protein thermostability were suggested in each residual structure state as follows (also as arranged in Fig. 1): (1) in the case of fully exposed state, higher relative occurrences of GLN, ILE, and PHE; (2) in the case of exposed state, higher relative occurrences of ARG, GLU, salt bridges, the residue with LSE, and the residues in 3/10 helix, and lower relative occurrences of ALA, SER, and VAL; (3) in the case of partially exposed (or partially buried) state, higher relative occurrence of flexible residue and lower relative occurrence of SER; (4) in the case of buried state, higher relative occurrences of ARG and GLU, and lower relative occurrence of MET; and (5) in the case of well-buried state, higher relative occurrences of ALA, cation–pi interaction, the residues in 3/10 helix, and lower relative occurrences of GLY, ASP, and the residues in the extended beta strand. Compared to other systematical analyses, this study showed several factors distinctively found in each residual structural state, which could be considered as significant modes related to protein thermostability. These results could be used as guidelines to understand the structural basis of thermophilic protein structure and develop rational design strategies for protein thermostabilization.

Fig. 1
figure 1

Summary of t values for the thermostabilizing factors distinctively found in thermophilic proteins