Introduction

More than a century after the first diffraction experiment and more than half a century after the determination of the first protein crystal structure, a large amount of structural data has been accumulated in databases. The Protein Data Bank, the repository of all the structures of biological macromolecules, contains nowadays more than 130,000 entries, most of them determined with single-crystal X-ray crystallography. Each entry contains a rich assortment of annotations, ranging from experimental details to biological features, and the description of the three-dimensional structure of the macromolecule or of the supramolecular assembly made by two or more macromolecules. The essence of this description is the list of the atoms, their names, their position in space (three coordinates x, y, and z) and their occupancy, which is equal to one if the atoms have a unique stable position or it is less than one if the atoms are conformationally disordered and have two or more stable positions (the sum of the occupancies should be equal to one); and, if it is a crystal structure, the atomic ADP, which is as important as the other positional parameters.

Chemists, physicists, and molecular biologists have been using with increasingly interest this enormous amount of structural data, and structural bioinformatics tools are being increasingly used and useful in macromolecular science. ADPs have been studied too and it was found that they provide valuable information and, in certain cases, allow predicting biologically interesting features. This review summarizes numerous structural bioinformatics analyses and applications where ADPs play starring roles.

First, the physico-chemical significance of the ADP is summarized, especially for readers not familiar with crystallographic computing. Then several ADP features, which are inferable from database information, are described: its dependence on crystallographic resolution and its relationship with temperature. These features point to the ADP standardization problem, which is described in detail, and several standardization techniques that have been used in structural bioinformatics studies are presented. Later, ADP distributions and techniques for ADP prediction are summarized. The use of ADP for protein thermostabilization, conformational disorder prediction, protein folding kinetics prediction, and protein binding sites prediction are then summarized. Conservation of ADPs during evolution is also mentioned and the use of ADPs to estimate the atomic positional accuracy in protein crystal structures is described.

Atomic displacement parameters

This chapter is addressed to non-crystallographers, who might be not familiar with the physico-chemical significance of ADPs, with their determination and refinement, and with potential pitfalls in their use.

In the crystalline state, protein atoms can move in several ways. For example, they can simply oscillate around their equilibrium positions or they can move from one equilibrium position to another, showing what is known as dynamic conformational disorder, which becomes static if the temperature is sufficiently low to prevent the overcome of the activation energy associated with the passage from one position to the other (Giacovazzo et al. 2002; Schmidt and Lamzin 2010).

In X-ray crystal structures, displacements are monitored by atomic displacement parameters (ADP) (Dunitz et al. 1988a, b; Trueblood et al. 1996), which are frequently named ADPs or thermal factors and are related to the mean-square amplitude of displacements of the atoms around their equilibrium positions (\( \langle u^{2} \rangle \)) according to,

$$ B = 8\pi^{2} \langle u^{2} \rangle . $$
(1)

ADPs are estimated from refining parameters of an atomic model against diffraction intensities, since the decrease of the atomic form factors (f) associated with the diffraction angle (θ) is enhanced by an ADP increase according to

$$ f = f_{0} \times \exp \left( { - \frac{{B \times \sin^{2} \theta }}{{\lambda^{2} }}} \right), $$
(2)

where f 0 is the atomic form factor at B = 0 Å2 and λ is the X-ray wavelength. This implies that atoms with larger ADPs contribute less to the diffraction intensities than atoms with smaller ADPs and that it is possible, as a consequence, to determine not only the positions of the atoms but also their displacements.

In macromolecular crystallography, ADPs are usually refined isotropically, by assuming that oscillation amplitudes are equal in all directions around the equilibrium position of the atom (Zanotti 2002). Although this is a rather severe approximation, it is adopted because of the scarcity of diffraction data, which does not allow one to refine more than one variable per atom, the atomic displacement, in addition to the three coordinates x, y, and z. However, when more diffraction data are available, ADPs are refined anisotropically (Dunitz et al. 1988a, b; Dauter et al. 1997), by assuming that atomic displacements can be different in the three dimensions and in this case six additional parameters (the six unique elements of a symmetric 3 × 3 tensor) are refined in addition to the three coordinates x, y, and z. Obviously, this is also an approximation, though less severe than the isotropic model.

It is well known and accepted that other structural features influence the ADP values. The one, mentioned above, is conformational disorder. At high resolution, it is often possible to identify alternative atomic positions and refine them together with their respective occupancies, which should sum up to one, if the atom has not been lost because of radiation damage or other degradation reactions (Garman 2003; Carugo and Djinovic-Carugo 2005; Garman and Owen 2006; Holton 2009; Bury et al. 2017). In this case, under the anisotropic approximation, at least 19 variables must be refined per atom, when there are only two alternative positions (two sets of x, y, and z coordinates, two sets of anisotropic ADPs—six variable each—and one value of occupancy occ—the other being 1 − occ).

However, it is often impossible, if the resolution is insufficient, to identify alternative positions and this is compensated by an increase of the ADP. It may happen that the atom is positioned in between the two (or more) positions really occupied and refined with an ADP large enough to encompass the entire region occupied by its electron density. It is difficult to say if this happens seldom or frequently, though protein flexibility suggests that conformational disorder is rather common, at least at the protein surface (Hartmann et al. 1982; Läuger 1985; Stein 1985; Smith et al. 1986; Declercq et al. 1999; Woldeyes et al. 2014).

Several other factors may affect ADP values. Among them, it is necessary to remember that most macromolecular crystallography refinements are restrained (Zanotti 2002): for example, deviations of bond distances from their ideal values are penalized—and in this way the ideal bond distances are treated as further experimental data that add to the diffraction data. Analogously, ADPs of atoms connected by a covalent bond are restrained to have similar components along the covalent bond, since this one is rather rigid and cannot stretch vigorously. Therefore, the variability of the ADP values is somehow reduced.

It is also necessary to be aware that occupancy and ADP are correlated, since decreases in occupancy are accompanied by ADP decreases, since the reduction of the number of electrons that occupy a certain part of the crystal implies a parallel reduction of the apparent oscillation amplitude. Erroneous ADP values may also arise from mistakes in the interpretation of the electron density map. For example, an isolated peak may be interpreted as a calcium(II) cation, with a larger ADP, or as a water molecule, with a smaller ADP, since calcium(II) electrons are more numerous than water electrons and thus they try to spread around more than those of water to fit the electron density peak.

ADP and crystallographic resolution

One of the reasons why ADPs may be different in different crystal structures of the same protein is that the average ADP tends to increase if resolution decreases.

Based on analyses of a limited number of protein crystal structures, the dependence of ADPs on resolution was observed nearly 20 years ago (Carugo and Argos 1999).

Figure 1 depicts the dependence of the average ADP on resolution in the entire Protein Data Bank and in a non-redundant subset of the Protein Data Bank obtained by imposing a maximal pairwise percentage of sequence identity of 30% (both data sets were generated in July 2017). Only protein crystal structures were considered, while structures of nucleic acids and of protein–nucleic acid complexes were discarded.

Fig. 1
figure 1

Relationship between the average ADP (B-factors, Å2) of protein crystal structures and resolution (Å)

Clearly, a strict relationship between average ADP and resolution is apparent and it can be fitted by:

$$ B = 8.11\, \cdot \,{\text{resolution}}^{2} , $$
(3)

and by

$$ B = 9.09\, \cdot \,{\text{resolution}}^{2} , $$
(4)

for all protein crystal structures (Pearson correlation coefficient = 0.982) and for the non-redundant subset of protein crystal structures (Pearson correlation coefficient = 0.984).

It might be therefore unnecessary to standardize ADPs when comparing protein structures of similar resolution and it might be simple to rescale the ADPs of a protein structure to make them comparable to those of another protein structure.

Large ADPs

In PDB files, there are four types of lines that can be used to indicate the atoms/residues that were invisible in the electron density maps computed in crystallographic studies. Lines beginning with “REMARK 465” and “REMARK 470” enumerate residues and atoms of the protein that were invisible and were not included in the “ATOM” lines; lines beginning with “REMARK 475” and “REMARK 480” enumerate residues and atoms that were invisible and were included in the “ATOM” lines with zero occupancy. It is obviously a rather arbitrary decision whether the electron density is interpretable or not and, perhaps, this is the reason why during the last decade many crystallographers prefer to include in the refinement also the atoms that are (nearly) invisible, allowing their ADPs to inflate enormously.

Figure 2 shows that up to 2007–2008 only 15–20% of the protein X-ray crystal structures deposited in the Protein Data Bank had at least one atom with an ADP larger than 100 Å2 and that in the same period the percentage of complete structures, containing coordinates of all protein atoms and without missing atoms, was in the range 80–90%. After 2008, the percentage of structures with large ADPs began to increase and now more than 50% of the structures contain atoms with large ADP. Analogously, the percentage of structures without missing residues began to decrease and now less than 50% of the structures have coordinates for all the atoms.

Fig. 2
figure 2

Fraction of PDB files (X-ray crystal structures only) containing the coordinates of all residues and without missing residues and fraction of PDB files containing large ADPs (B-factors; larger than 100 Å2)

The attitude to allow ADP values to inflate in an uncontrolled way is scientifically questionable, since if it is true that the agreement between the model and the experimental observations (the R-factors) may marginally improve if the atoms are not visible in the electron density map, it is also true that there is no physical understanding behind the fit enhancement. For example, one may decide to place an arbitrary number of uranyl cations (UO22+) in the asymmetric unit and allow their ADPs to increase enormously, without significant consequences either on the rest of the model or on the R-factors.

It must also be remembered that the inclusion in the model of atoms with extremely large ADPs, which reflect their immense positional spread, may result in over-interpretations of the structural data delivered to the scientific community. For example, the electrostatic potential at the protein surface might be absolutely inaccurate if atoms/residues, the position of which is uncertain, are included in the calculation.

For this reason, ADP thresholds must be used to filter off structure moieties that cannot be considered to have been experimentally determined. For example, Benkert and co-workers discharged structures with more than 20% of the residues having an ADP above two standard deviations in an analysis of statistical potential in globular proteins (Benkert et al. 2008). However, it is necessary to design less arbitrary criteria to handle atoms, residues and structures associated with enormous and unreasonable ADPs.

ADPs and temperature

Protein X-ray crystal structures, once routinely determined at room temperature, are nowadays determined in general at low temperature (100 K), to reduce radiation damage induced by bright synchrotron X-ray beamlines and to allow the analysis of small and tiny crystal specimen (Carugo and Djinovic-Carugo 2005). Presently (September 25, 2017) 87,044 protein crystal structures, deposited in the Protein Data Bank together with the experimental data, have been determined at 90–110 K and only 4941 have been determined at 280–320 K (ratio 18 to one); moreover, while 66% of the 280–320 K crystal structures have been deposited prior to 2008 (10 years ago), only 26% of the 90–110 K crystal structures have been deposited prior to 2008.

After the first attempts to determine protein crystal structures at temperature below 273 K (Alber et al. 1976), several studies have been dedicated to the analysis and comparison of room-temperature and low-temperature protein crystal structures.

In general, only modest modifications of the protein structure are associated with the temperature decrease. Small reduction in the protein volume and subtle changes of contacts between α-helices have been observed in myoglobin (Frauenfelder et al. 1987). Protein shrinkage was observed also in ribonuclease A (Tilton et al. 1992). Juers and Metthews reported that cryo-cooling generally increases lattice contacts and reduces protein volumes, but causes only small changes in crystallographic models (Juers and Matthews 2001). However, it has also been suggested that cryo-cooling modifies the repertoire of accessible conformations and, consequently, it has been proposed that room-temperature data provide a fuller description (Fraser et al. 2011a, b). This hypothesis is supported by the observation that the crystal cryo-cooling process is too slow (several seconds) to trap the room-temperature equilibrium distribution of protein and solvent configurations (Halle 2004). In early times, Frauenfelder and co-workers hypothesized that minor conformational substrates are influenced by cryo-cooling (Frauenfelder et al. 1979).

It is expected that ADPs depend on the temperature at which crystal structures are determined. The cooling-induced reductions in ADPs suggest that cryogenic structures adopt less variable conformations (Fraser et al. 2011a, b). Huber and co-workers observed that the average ADP decreases from 13.3 to 6.1 Å2 in the crystal structures of trypsinogen if temperature decreases from 293 to 213 K and that the decrease is not linear but sigmoidal, with a sharp decrease in a small temperature range that depends on the solvent composition (Singh et al. 1980). Similarly, the average protein ADP is 14 Å2 at 300 K and 5 Å2 at 80 K in the structures of met-myoglobin and the ADP decrease is not linear, but shows a discontinuity of slope (Hartmann et al. 1982).

The slope discontinuity is believed to depend on the “glass transition”. In crystalline RNaseA, a “glass transition” in the protein between 212 and 228 K reduces ADPs and the cooling-induced reductions in ADPs suggest that cryogenic structures adopt less changeable conformations (Rasmussen et al. 1992; Tilton et al. 1992). Similarly, the B-factors in thaumatin decrease on cooling, indicating a reduction in thermal motions, but there is a sudden change in the slope dB/dT at T ≈ 210 K, due to the protein dynamical transition (glass transition) (Warkentin and Thorne 2009, 2010).

However, large ADPs at low temperature have been observed recently for thaumatin by Russi and co-workers (26 Å2 at 100 K and only 19 Å2 at 278 K), who suggested that the ADPs reflect prevalently the radiation damage at low temperature, while other features play a relevant role at room temperature (Russi et al. 2017).

Interestingly, there is no trace of ADP decrease at low temperature on the Protein Data Bank. A simple statistical survey is summarized in Table 1. At high resolution, the average ADPs, computed only on protein atoms, are nearly identical in the data sets of structures determined at low temperature and in the data sets of structures determined at room temperature. At intermediate and low resolution, on the contrary, ADPs are larger at low temperature.

Table 1 Average ADPs (Å2) of protein X-ray crystal structures determined at low (90–110 K) and at room temperature (280–320 K) and at various resolution ranges (standard errors on the last digit in parentheses)

This analysis is certainly extremely simple, since it compares proteins that have completely different dimensions, folds, and secondary structure compositions. A better methodology would require the comparison of pairs of identical proteins, one determined at room temperature and the other at low temperature. However, the data sets of Table 1 are rather large and consequently it seems reasonable to suppose that they contain similar levels of structural heterogeneity both at room and low temperature. Therefore, it seems also reasonable to suppose that the average ADP values based on these large data sets are close to the real and genuine average values. It must, however, be observed that further and more accurate analyses are necessary to fully characterize the relationship between temperature and ADPs based on PDB data.

ADP standardization

It has been observed that average ADP values may change drastically among different crystal structures of the same protein. For example, Fig. 3 shows the average ADPs, plotted against the resolution, of 109 sperm whale myoglobin crystal structures. The average ADPs, in few cases, are lower than 10 Å2 or higher than 30 Å2. Three extreme cases can be examined: 1mbn (Watson 1969), 1ebc (Bolognesi et al. 1999), and 4of9 (Wang et al. 2014) (Table 2). In model 1mbn, which is one of the oldest protein crystal structures, deposited in the Protein Data Bank in 1973, the ADPs were not refined, as it was common practice in the early days of macromolecular crystallography. In 1ebc, the average ADP is large, more than 45 Å2, and in 4of9 it is more than four times smaller (9 Å2). On the one hand, it might be expected to observe lower ADPs in 4of9, since the diffraction data were collected at lower temperature (100 K in a synchrotron beamline), while the data collection was performed at room temperature (300 K with rotating anode X-ray generator) in 1ebc, as it was the routine until the end of last century. On the other hand, large ADPs are expected in 4of9, since the fraction of the crystal volume occupied by liquid solvent is considerably larger in 4of9 (60%) than in 1ebc (38%), and this should increase the average mobility of the atoms in 4of9 with a consequent increase of the ADPs. However, it is not surprising that 4of9 and 1ebc have different average ADPs, since other features discriminate the two crystal structures. The space groups are different: hexagonal in 4of9 and monoclinic in 1ebc; different refinement programs have been used: TNT, which was widely used at the time of 1ecb, and REFMAC, which was commonly used at the time of 4of9; and also the resolution was different: better in 4of9, is associated, on average, with smaller ADPs.

Fig. 3
figure 3

Average ADPs (B-factors) and resolutions of 109 sperm whale myoglobin X-ray crystal structures (vertical bars represent estimated standard errors)

Table 2 Features that discriminate the three protein crystal structures of sperm whale myoglobin

It is clear that often ADPs in a structure cannot be directly compared with ADPs in another structure. In these cases, it is necessary to standardize them and the most common procedure is to transform them into z-scores (Carugo and Argos 1999; Smith et al. 2003; Yang et al. 2016), often named normalized ADPs (BN), according to

$$ BN = \frac{{B - B_{\text{ave}} }}{{B_{\text{std}} }}, $$
(5)

where Bave and Bstd are the average ADP and its standard deviation, respectively, defined as

$$ B_{\text{ave}} = \frac{\mathop \sum \nolimits B}{n} $$
(6)

and

$$ B_{\text{std}} = \sqrt {\frac{{\mathop \sum \nolimits \left( {B - B_{ave} } \right)^{2} }}{n - 1}} , $$
(7)

where n is the number of protein atoms. In this way, all crystal structures have an average BN equal to zero and a standard deviation of the population equal to one, though BNs are dimensional and thus part of the information provided by Bs is lost.

A slightly different approach was followed by Gourinath et al. (Gourinath et al. 2003) in comparing ADPs of a single helix in several states of myosin, where the normalized ADPs (BN’) were defined as

$$ BN^{\prime } = \frac{{B - B_{\text{ave}} }}{{\frac{{B_{std} }}{n}\sqrt {\frac{N - n}{N - 1}} }}, $$
(8)

where Bave and Bstd were computed only on the N residues of helices and strands, thus ignoring loops, and n is the number of residues in the examined helix. This standardization should be preferred when the examined sample is small.

Another standardization that has been used is

$$ BN^{\prime \prime } = \frac{B + D}{{B_{\text{ave}} + D}}, $$
(9)

where the value of D is empirically selected to yield normalized B values (BN″) with mean 1.0 and root-mean-square deviation 0.3 (Vihinen et al. 1994).

A further standardization has been used, defined as

$$ BN^{\prime \prime \prime } = \frac{{B - B_{\text{ave}} }}{{B_{\text{std}} }} \times \frac{1}{1.654}, $$
(10)

where the number 1.645 is a typical threshold in standard normal distributions, indicating the 0.05 probability of a value outside the interval − 1.645 to 1.645 for each of the two tails, and where the values − 1 or + 1 was imposed to BN″′ values lower than − 1 or larger than + 1 (Liu et al. 2013, 2014).

Other standardization procedures can be conceived. For example, since independent sources of disorder add in determining the resulting ADP, it can be envisaged that it is sufficient to subtract a constant, equal to Bave, from individual ADPs to standardize their values among different crystal structures (Elgavish and Shaanan 1998). Similarly, the minimum-function method equalizes the minimum ADP values found in two protein structures (Frauenfelder and Petsko 1980; Ringe and Petsko 1986). Alternatively, one might refine each crystal structure with exactly the same computational protocol, for example by using the PDB_REDO server (Joosten et al. 2014), although two data sets at different resolutions might require different ADP handlings (for example, isotropic refinement in a structure with very high-resolution data, which allow anisotropic refinement, could lead to erroneous ADP that cannot be compared to ADPs of a medium-resolution data structure, which cannot be refined anisotropically).

ADP distributions

Parthasarathy and Murthy analyzed the ADPs of the Cα atoms of more than 35,000 residues found in a non-redundant ensemble of 110 high-resolution (better than 2.0 Å) protein crystal structures and found that the distribution of the normalized BN values bimodal, according to

$$ p\left( {BN} \right) = k_{1} {\text{e}}^{{ - k_{2} \left( {BN - B_{1} } \right)^{2} }} + k_{3} {\text{e}}^{{ - k_{4} \left( {BN - B_{2} } \right)^{2} }} , $$
(11)

where k1, k2, k3, k4, B1, and B2 are parameters that were optimized with least-squares procedures (Parthasarathy and Murthy 1997). The same authors also investigated the correlation between main- and side-chain atom ADPs and found that it is quite variable (Parthasarathy and Murthy 1999).

Different results were published more recently by Erman, based on the analysis of Cα atom ADPs of more than 400,000 residues found in 2000 non-redundant protein crystal structures (Erman 2016). The distribution of the ADPs is unimodal and can be fitted by a gamma function

$$ p\left( B \right) = \frac{130}{{B_{\text{av}}^{5} }} \left( {B_{\text{av}}^{4} } \right){\text{e}}^{{{\raise0.7ex\hbox{${ - 5B}$} \!\mathord{\left/ {\vphantom {{ - 5B} {B_{\text{av}} }}}\right.\kern-0pt} \!\lower0.7ex\hbox{${B_{\text{av}} }$}}}} , $$
(12)

were Bav, the average ADP, is equal to 12.9 Å2. A similar expression can be employed to fit the ADP distribution in a single protein crystal structure:

$$ p\left( B \right) = \frac{1}{a}\, \cdot \,\frac{130}{{B_{\text{av}}^{5} }}\left( {\frac{{B_{\text{av}} }}{a}} \right)^{4} {\text{e}}^{{{\raise0.7ex\hbox{${ - 5B}$} \!\mathord{\left/ {\vphantom {{ - 5B} {aB_{\text{av}} }}}\right.\kern-0pt} \!\lower0.7ex\hbox{${aB_{\text{av}} }$}}}} , $$
(13)

where the scaling factor a is

$$ a = \frac{{\left( {B_{ \hbox{max} } - B_{ \hbox{min} } } \right)}}{{2B_{ \hbox{max} } }} $$
(14)

and

$$ \left( { - \; aB + B_{\text{av}} } \right) \le B \le \left( {B_{\hbox{max} } - aB + B_{\text{av}} } \right), $$
(15)

where Bav, Bmin, and Bmax are the average value, the minimal value, and the maximal value of the ADPs of the protein crystal structure. Clearly, this indicates that large ADPs are extremely unlikely, since these distributions are positively skewed.

The reason of the discrepancy between the results of Parthasarathy and Murthy, on the one hand, and of Erman, on the other hand, is unclear. It is possible that the much larger data set analyzed by Erman makes his results more reliable, though it must also be remembered that the structures analyzed more than 20 years ago were likely determined at room temperature, while those examined more recently were mostly determined at 100 K and that a temperature-dependent effect cannot be disregarded. Moreover, while Parthasarathy and Murthy analyzed normalized BN-factors, Erman analyzed ADPs.

ADP prediction

ADP prediction has attracted considerable attention. ADP profiles, where a single ADP value is associated with each amino acid (Cα’s ADP), have been predicted from the protein sequence to estimate the flexibility of each residue. Individual, atomic ADPs have been predicted from the protein tertiary structures to estimate the flexibility of each atom in computationally built structures. Unfortunately, several methods have not been benchmarked and we lack a systematic comparison of these computational tools.

ADP profiles have been predicted with a variety of methods. Yuan et al. used a support vector regression (SVR) approach to predict ADP profiles from protein sequence with Pearson correlation coefficient of 0.53 between experimental and predicted ADPs (Yuan et al. 2005). A more complex technique, where the most important global and local features of the protein sequence, identified with random forests, are imputed into a two-stage support vector regression tool, has been developed and a web server (www.csbio.sjtu.edu.cn/bioinf/PredBF) is available for academic use (Pan and Shen 2009).

Support vector regression was used also to predict individual atomic ADPs (Yang et al. 2016) and a server is presently available at https://zhanglab.ccmb.med.umich.edu/ResQ/ to allow users to predict ADPs based on modeled three-dimensional structures. Graph theory-based methods, which consider both covalent and non-covalent interactions observed in protein structures, were used to predict isotropic ADPs of all protein atoms (Jacobs et al. 2001; Gohlke et al. 2004). Atomic ADPs were predicted also from atomic fluctuations in molecular dynamics simulations (Higo and Umeyama 1997; MacKerell et al. 1998; Hinsen and Kneller 1999; Pang 2016), from normal mode analyses of protein structures (Levitt et al. 1985; Tirion 1996; ben-Avraham and Tirion 1998) and from Gaussian network models (Bahar et al. 1997, 1998; Haliloglu and Bahar 1999; Halle 2002; Kundu et al. 2002). Recently, Nguyen et al. (2016) proposed a new predictive method, named flexibility–rigidity index (FRI), to predict ADPs. Generalized Gaussian network models, coupled with anisotropic network model, were used to foresee ADPs, with performance close to FRI (Xia et al. 2015). In a study, Weiss described the relationship between ADPs and the number of atomic contacts for each atom and used this simple relationship to predict ADPs (Weiss 2007).

Eventually, it is interesting to mention that ADPs might be drastically underestimated by crystallographic refinements. Based on classical molecular dynamics simulations of villin headpiece domain crystals, Kuzmanić and co-workers observed that isotropic and anisotropic ADPs underestimate their values computed in silico by even sixfold, probably because of inadequate conformational averaging and treatment of correlated motions (Kuzmanic et al. 2014).

Extremophilic proteins and thermostabilization

While most living organisms presently known grow best at moderate temperature, around 20–45 °C, several organisms prefer either lower temperature, and they are named psychrophiles, or higher temperature, and they are named thermophiles (psychro- and thermo- taken together are named extremophiles).

Proteins of extremophiles have been studied intensively, because of their potential biotechnological applications, and their ADPs have been analyzed (Parthasarathy and Murthy 2000; Gianese et al. 2002).

By comparing the structures of 93 mesophilic and 21 thermophilic proteins, Parthasarathy and Murthy observed that serines and threonines have lower ADPs in thermophilic proteins and that lysines and glutamates are more frequent in high ADP protein moieties in thermophilic proteins (Parthasarathy and Murthy 2000). On the contrary, the overall dispersion of B values is similar in mesophilic and thermophilic proteins (Parthasarathy and Murthy 2000).

Based on the hypothesis that thermostable proteins tend to be more rigid than mesophilic proteins, thermostabilization of the mesophilic lipase A from Bacillus subtilis was achieved by mutations of amino acids that display the highest B-factors, corresponding to the most pronounced degrees of thermal motion and thus flexibility (Reetz et al. 2006). Similarly, the “rigidity theory” has been applied to the thermostabilization of lipase A from Bacillus subtilis (Rathi et al. 2016). Recently, ADPs were examined to identify residues for site-saturation mutagenesis to stabilize Candida rugosa lipase 1 (Zhang et al. 2016). Similarly, Huang and co-workers selected mutation sites based on ADPs to thermostabilize Aspergillus terreus amine transaminase (Huang et al. 2017).

Based on a careful intra-family comparison of psychrophilic, mesophilic, and thermophilic protein structures, Siglioccolo and co-workers observed that flexibility is more heterogeneous in psychrophilic enzymes, which show an irregular alternation of rigid and flexible small regions (Siglioccolo et al. 2010).

Conformational disorder and flexibility prediction

Given that they reflect positional spread, ADPs have been analyzed with the aim of predicting protein flexibility and conformational disorder. This is justified by many observations. For example, it has been shown that the ADPs of the atoms flanking polypeptide segments that are “invisible” in the electron density maps are increasingly large in approaching these segments (Djinovic-Carugo and Carugo 2015). Given that the conformational disorder of the segments invisible is likely to be too extreme to leave a trace in the electron density maps, it follows that the last residues still visible and close to the missing segment are considerably disordered.

Prediction of flexibility from amino acid sequence is somehow similar to prediction of ADPs, though flexibility may be defined in different ways, always related to ADPs. Early flexibility predictions, based on few protein crystal structures, provided quite contradictory results (Karplus and Schulz 1985; Bhaskaran and Ponnuswamy 1988; Ragone et al. 1989; Vihinen et al. 1994). This research field converged with the more specific problem of ADP prediction, which is described in another section of the review.

Predictions of conformational disorder can be done with several programs and meta-servers (Lieutaud et al. 2016). One of them, DisEMBL, is based on ADP analyses (Linding et al. 2003). It consists of three different predictors, one aimed at the prediction of loops, one at the prediction of “hot loops”, which are characterized by large ADPs, and the third one aimed at the prediction of strings of residues that were not detected in the electron density maps (Linding et al. 2003). Despite that it is not really recent, DisEMBL is used in several meta-servers, like DisMeta (Huang et al. 2014), GeneSilico MetaDisorder MD2 (Kozlowski and Bujnicki 2012), MetaPrDOS (Ishida and Kinoshita 2008), MobiDB-lite (Necci et al. 2017), and MeDor (Lieutaud et al. 2008), and its results are included in databases (Potenza et al. 2015).

ADPs and sequence evolution

Given that protein flexibility is stringently related to protein function and stability, it is expected that it is conserved during evolution and sequence divergence and, given that ADPs reflect protein flexibility, studies have been devoted to ADP conservation.

Maquid and co-workers analyzed the evolutionary divergence of Cα atom ADPs in homologous proteins classified into families and superfamilies and observed that Cα atom flexibility diverges slowly and that it is sometime conserved even for protein pairs with insignificant sequence similarity (Maguid et al. 2006). It became possible to predict ADPs profiles based on evolutionary information and statistical methods (Yuan et al. 2005).

Protein folding

In vitro protein folding rates are extremely variable and depend on several factors, including the occurrence of post-translational modifications, the fold topology, the amino acid sequence composition, the size of the protein, etc. They also depend on the local flexibility, which may hinder or favor certain backbone movements. Based on this consideration, Gao and co-workers designed three predictors, for two-state, multistate, and unknown folding kinetics, which require, among other parameters, predicted ADPs (Gao et al. 2010).

Protein binding sites

In the mainstream of recent structural bioinformatics, prediction of binding sites at the protein surface has attracted conspicuous attention and ADPs have been repeatedly used.

A first problem, when dealing with protein crystal structures, is the distinction between protein crystal contacts and protein–protein physiological contacts (Janin and Rodier 1995; Carugo and Argos 1997; Krissinel and Henrick 2007; Duarte et al. 2012). Liu and co-workers defined four variables to describe the ADP of protein–protein interfaces (Liu et al. 2014):

$$ \sum B = \mathop \sum \limits_{j = 1}^{n} BN_{j}^{\prime \prime \prime } , $$
(16)

where n is the number of interfacial atoms and BN ″′ j is the standardized ADP of the j-th interfacial atom;

$$ {\text{avg}}\,\varSigma B = \frac{\varSigma B}{{\log \left[ {\min_{r} + 1} \right]}}, $$
(17)

where min r is the smaller number of the average numbers of residues per chain for the two biological units in a complex; and

$$ {\text{avg}}\,{\text{NoB}} = \frac{\text{NoB}}{{\log \left[ {\min_{r} + 1} \right]}}, $$
(18)

where NoB is the number of interface atoms with a negative standardized ADP and a combination of the last two,

$$ {\text{avg}}\,{\text{NoB}} \times {\text{avg}}\,\varSigma B. $$
(19)

Empirical threshold values allow one to reach positive and encouraging prediction accuracies on various data sets (Liu et al. 2014).

A machine learning technique, random forest, has been used by Jiao and Ranganathan to predict interface residues in a set of heterodimers, where each surface residue is described by several variables, among which ADP plays a prominent role (Jiao and Ranganathan 2017). Another machine learning technique, support vector machine, was used to predict interface residues in non-obligate dimers by imputing standardized ADPs besides sequence profiles and solvent-accessible surface areas (Liu et al. 2010).

A further question is the computation of binding affinity, and ADPs (the standardized BN″′ values) have been shown, with machine learning methods, to play a significant role in improving previous prediction methods in protein–small molecule complexations (Liu et al. 2013).

Order parameter and positional accuracy

The position accuracy of an atom is obviously related to its thermal motion and atoms with extremely large ADPs are hardly detectable in the electron density maps. Cruickshank observed that the positional standard error (psu) increases with B with a quadratic trend:

$$ {\text{psu}} = a + b\, \cdot \,B + c\, \cdot \,B^{2} , $$
(20)

where the parameters a, b, and c depend on the crystal structure that is examined (Cruickshank 1999) and it has been proposed to estimate the average coordinate standard error [σ(x i )] of the atoms of type i (for example, nitrogens, oxygens or carbons) with the following expression:

$$ \sigma (x_{i} ) = \frac{1}{2}\sqrt {\frac{{N_{i} }}{{n_{\text{obs}} - n_{\text{par}} }}} \, \cdot \,R \cdot {\text{res}}, $$
(21)

where nobs is the number of experimental observations, npar is the number of refined parameters, R is the R-factor, res is the crystallographic resolution, and N i is the number of atoms of type i needed to give scattering power equal to that of the asymmetric unit of the structure.

$$ N_{i} = \frac{{\mathop \sum \nolimits_{j = 1}^{n\_atoms} f_{j}^{2} }}{{f_{i}^{2} }} , $$
(22)

where f i is the atomic form factor of the atom i and the sum at the numerator is obtained over all the atoms in the asymmetric unit. The average coordinate standard error can be used to estimate the standard error of each individual atom [σ(x i ,B)] with the following expression:

$$ \sigma \left( {x_{i} ,B} \right) = \sigma \left( {x_{i} } \right)\frac{{a + b\, \cdot \,B + c\, \cdot \,B^{2} }}{{a + b\, \cdot \,B_{\text{ave}} + c\, \cdot \,B_{\text{ave}}^{2} }}, $$
(23)

where Bave is the average ADP and the parameters a, b and c depend on the crystal structure. The rather empirical nature of this expression made it unfortunately little used by the scientific community.

More recently, Fenwick and co-workers proposed an ADP-based order parameter (OP) for pairs of bonded atoms defined as:

$$ {\text{OP}} = 1 - \mathop \sum \limits_{i = 1}^{n} o_{i} \frac{{B_{u,i} + B_{v,i} }}{{8\pi^{2} }}, $$
(24)

where the sum is obtained for all the i-th conformational states of the atoms u and v, o i is the occupancy of the i-th conformational state, and B u,i and B v,i are the ADPs of the atoms u and v in the i-th conformational state (Fenwick et al. 2014). Interestingly, if the numerator (B u,i  + B v,i ) is equal to 8π2 (≈ 79 Å2), then OP = 0: this indicates a completely disordered pair of atoms. On the contrary, OP approaches 1 if the ADPs are extremely small and in the case the pair of atoms is particularly ordered. It must be observed that OP is only applicable to high-resolution structures (Fenwick et al. 2014).

Conclusions

ADPs, which are refined in crystal structures since decades and which depend on structural heterogeneity, provide a wide spectrum of information, which can be used in numerous fields of structural biology and bioinformatics. Here, several applications of ADPs are reviewed, ranging from conformational disorder prediction in proteins to protein thermostabilization. A crucial aspect is the standardization of the ADPs when comparisons between two or more protein crystal structures are made, since ADPs are differently affected by several factors, from crystallographic resolution to refinement protocols, and several standardization procedures are briefly summarized. A potential limitation to ADP analysis is the modern tendency to let ADPs to inflate up to extremely large values that have little physico-chemical meaning, and the definition of upper limits, probably resolution dependent, is necessary.