Introduction

Proteins are the functional forms of polypeptides, built up from 20 different amino acids (AAs) linked by covalent bonds. The biological activity of a protein depends on its three-dimensional (3D) structure, which in turn is determined by its AA sequence, and manifold interactions with other molecules including solvent components (Creighton 1993; Van Holde et al. 1998).

Among the naturally occurring proteins, the water-soluble globular proteins are important representatives; albumins and enzymes belong to this group. Proteins in solution are neither rigid nor motionless, since the bonds in the backbone and the AA side chains allow considerable flexibility; these motions are of fundamental importance for their functional role, e.g. for enzymatic activity. Hydrophilic proteins are surrounded by some kind of hydration layer. Since all living systems exist in largely aqueous media, there is utmost interest in the structural organization of water molecules within and around proteins. Water may be regarded as the "lubricant of life", easing necessary hydrogen bonding changes: it stabilizes biological structures and allows motions of its constituents.

The solubility of a protein is determined by the composition and distribution of polar and nonpolar AAs on its surface and the surrounding milieu (pH, presence of additives, ionic strength, temperature, pressure, etc.). Polar protein side chains interact more with water molecules than nonpolar chains (Thanki et al. 1988). The extent of water binding is highly influenced by the protein surface topography; ordered waters are three times more likely to be found in solvent-accessible surface grooves than elsewhere on the protein surface; even deep surface grooves typically have the width of a water molecule (Kuhn et al. 1992). Obviously, such crevices represent local energy minima. In crystals, however, the ordered waters, except those in crevices, seem essentially caused by the geometry of the crystal lattice (Levitt and Park 1993). The internal water sites are well conserved in different crystal modifications (Blake et al. 1983), but only part of the surface waters are suggested to be conserved in different crystal forms (Zhang and Matthews 1994). Though, at present, most biophysicists accept the existence of well-ordered water molecules, the hydration problem is still a matter of controversy. With all techniques the identification of bound waters is somewhat subjective, in particular when they are located at the protein surface (e.g., correct choice of residence times and electron density thresholds). In this context it has also to be mentioned that a few authors even deny the requirement of a layer of solvating water to explain the hydrodynamic properties of proteins; for example, the dynamics simulations by Smith and van Gunsteren (1994) invoke the absence of any water of hydration associated with the proteins.

Several types of locally organized water molecules have been described (Kuntz and Kauzmann 1974; Hopfinger 1977; Rashin et al. 1986; Rupley and Careri 1991; Teeter 1991; Kakalis and Kumosinski 1992; Westhof 1993; Gregory 1995; Schoenborn et al. 1995; Finney 1996; Harding 2001a, 2001b; Rowe 2001), in addition to the water molecules of bulk water. Apart from a few "tightly bound" water molecules (e.g., located within the cavity of an active site of an enzyme or in the interior of a protein), a magnitude of partially localized ("bound") water molecules has been identified by several physicochemical techniques, including spectroscopic, thermodynamic, hydrodynamic, X-ray and neutron scattering and diffraction methods. These preferentially bound water molecules are positioned in the first layer of water on the protein surface, they have properties different from those of the bulk water, they are more ordered, less mobile and have, on the average, a higher density than bulk water (Perkins 1986, 2001; Creighton 1993; Gerstein et al. 1995; Gerstein and Chothia 1996; Murphy et al. 1998; Svergun et al. 1998; Ebel et al. 2000; Durchschlag and Zipper 2001; Chaplin 2002; Henchman and McCammon 2002; Merzel and Smith 2002; Smith et al. 2002). In solution, all bound water molecules exchange rapidly with bulk water, with residence times differing over a wide range. As follows from high-resolution NMR studies (Otting et al. 1991; Wüthrich et al. 1992, 1996; Brunne et al. 1993; Otting and Liepinsh 1995; Denisov and Halle 1996), the residence times of waters in the protein interior are in the range of about 10-2-10-8 s, while the hydration of the protein surface is characterized by waters in the subnanosecond time scale. By contrast, the motions of the water molecules of the bulk phase (approx. 1 ps) are about one or two orders of magnitude faster (Rupley and Careri 1991; Pal et al. 2002). The transition state between bound and bulk water ("transition water") is identical to a secondary layer of water, which is, however, difficult to identify since the properties of the transition water are only marginally different from those of bulk water (perturbations decay exponentially with the distance from the surface).

A comparison of the results of various physicochemical and modeling techniques has shown that a hydration of 0.35 g water per gram of protein may be considered to be a reasonable default (average) value for the hydration of an average globular protein, corresponding to approximately two water molecules per AA residue (Durchschlag and Zipper 2001). Though it is not a priori clear if different techniques measure the same water entity, a critical comparison of the results obtained from different solution techniques rather points to differences in the assumptions underlying the interpretation of hydration than to real differences. In contrast to earlier interpretations of hydrodynamic data, which indicated rather high levels of hydration (0.53-0.54 g/g: Kuntz and Kauzmann 1974; Squire and Himmel 1979), present studies (including electrostatic and X-ray solution scattering and diffraction data, in addition to hydrodynamic approaches) rather seem to point to a unified picture of protein hydration (Zhou 1995, 2001; Durchschlag and Zipper 2001). However, only a part of the hydration waters seems to be observed by X-ray crystallography: between one half and two ordered water molecules per AA residue are found in protein crystal structures, the observed number strongly dependent on the resolution and state of refinement of the particular crystal structure (Blake et al. 1983; Levitt and Park 1993; Schoenborn et al. 1995; Carugo and Bordo 1999). In the context of the X-ray detectable water molecules, several important questions should be addressed. (1) Are the positions of individual hydration water molecules the same in protein crystals and aqueous solution, i.e. are they assigned to the same AA residues? (2) Is the entire surface of proteins covered with preferentially bound waters? (3) Why is there a discrepancy in the amount of hydration observed by X-ray crystallography and the solution techniques? (4) Is there a distinction in the properties of hydration waters seen by crystallography and solution techniques, respectively?

Modeling of protein structures in aqueous solution for scattering and hydrodynamic purposes involves modeling of both the contributions of AA residues and water molecules. In the context of hydration contributions, we have to focus our attention mainly on the preferentially bound water molecules, since (because of their extraordinary features) only these contribute to the observable scattering and hydrodynamic behavior. This may be accomplished by quite different approaches:

  1. 1.

    Whole-body (WB) approaches (e.g., Kumosinski and Pessen 1982; Harding 1989, 1995; Durchschlag and Zipper 1997a, 1997b, 1999, 2002b; Winzor et al. 2001).

  2. 2.

    Multibody (MB) approaches: "bead modeling" in the classical sense (e.g., Durchschlag 1975; García de la Torre and Bloomfield 1981; García de la Torre 1989; García de la Torre et al. 1997; Carrasco and García de la Torre 1999; Byron 2000; Behlke 2001; Durchschlag and Zipper 2002b).

  3. 3.

    Use of atomic coordinates obtained from X-ray or neutron crystallography and high-resolution NMR spectroscopy (e.g., Müller 1983, 1991; Durchschlag et al. 1991, 1996; Müller and Schrauber 1992; Beavil et al. 1995; Byron 1997, 2000; Spotorno et al. 1997; Zipper and Durchschlag 1997, 1998, 2000, 2002a, 2002b; Perkins et al. 1998; García de la Torre et al. 2000; García de la Torre 2001; Perkins 2001).

  4. 4.

    Use of density distributions obtained from 3D reconstructions from cryo-electron microscopy (e.g., Zipper and Durchschlag 2000, 2003; García de la Torre et al. 2001; Zipper et al. 2002).

  5. 5.

    Ab initio modeling approaches for the prediction of low-resolution 3D structures from scattering profiles (e.g., Chacón et al. 1998, 2000; Svergun 1999, 2000; Byron and Gilbert 2000; Walther et al. 2000; Zipper and Durchschlag 2003).

The present paper is concerned with different aspects of how hydration contributions are considered in the different approaches applied, with special reference to solution scattering and hydrodynamics. For our calculations we chose proteins of different size [monomeric lysozyme, dimeric citrate synthase (CS), trimeric malate synthase (MS)], all of them well characterized from the physicochemical point of view. In the case of lysozyme and CS, the crystallographic structures are known to high resolution.

For modeling individual, bound water molecules, the atomic coordinates in connection with the surface calculation algorithm SIMS (Vorobjev and Hermans 1997) and our hydration algorithm HYDCRYST (Durchschlag and Zipper 2002a) have been applied. Since the structural organization of water bound to lysozyme is described in some detail by high-resolution X-ray diffraction studies (Blake et al. 1983; Kodandapani et al. 1990; Niimura et al. 1997; Bon et al. 1999), this enzyme was chosen for a comparison with our approach of modeling individual water molecules. Moreover, this enzyme has already been the subject of several molecular dynamics simulations (e.g., Brooks and Karplus 1989; Smith and van Gunsteren 1994). CS was used to compare the results of our hydration strategy with the water-shell approach CRYSOL (Svergun et al. 1995), frequently used for the prediction of the SAXS behavior of proteins from atomic coordinates. Further, we used the ab initio modeling approach GASBOR (Svergun et al. 2001) to predict the solution structure and hydrodynamic properties for an enzyme (MS) whose precise crystal or EM structure is unknown to date. For the prediction of hydrodynamic parameters, in all cases modifications of the program HYDRO (García de la Torre et al. 1994) have been used.

Materials and methods

Sources of data

Atomic coordinates of proteins were obtained from the Protein Data Bank (PDB) (Berman et al. 2000) and molar masses from the AA sequence data as stored in the SWISS-PROT data bank (Bairoch and Apweiler 2000). Scattering and hydrodynamic parameters were extracted from the relevant literature. In particular, the following properties were used: atomic coordinates, scattering intensity I(h), molar mass M, partial specific volume \( \bar v \), radius of gyration R G, hydrated volume V, amount of hydration δ1, sedimentation coefficient s, and translational diffusion coefficient D.

Lysozyme from hen egg-white: atomic coordinates (Diamond 1974): PDB ID: 2LYZ; SWISS-PROT entry: P00698, monomer, M=14.313 kg/mol; \( \bar v \)=0.702 cm3/g (Lee and Timasheff 1974); R G=1.43 nm and V=24.2 nm3 (Pessen et al. 1971); δ1=0.317 g/g (Pessen and Kumosinski 1985); R G=1.38 nm (Stuhrmann and Fuess 1976); R G=1.43 nm (Krigbaum and Kügler 1970); R G=1.52±0.02 nm and D=11.8×10-7 cm2/s (Luzzati et al. 1961); R G=1.54±0.02 nm (Svergun et al. 1998); R G=1.55±0.05 nm (Ducruix et al. 1996); s=(1.87±0.02)×10-13 s and D=(10.4±0.1)×10-7 cm2/s (Colvin 1952); s=1.91×10-13 s and D=11.2×10-7 cm2/s (Sophianopoulos et al. 1962); D≈11.5×10-7 cm2/s (Dubin et al. 1967; Nicoli and Benedek 1976); D=(10.6±0.2)×10-7 cm2/s (Foord et al. 1970); D=(10.6±0.1)×10-7 cm2/s (Dubin et al. 1971, 1973); D=10.7×10-7 cm2/s (Barel et al. 1972); D=10.3-11.5×10-7 cm2/s (Mikol et al. 1990); D=(10.2±0.4)×10-7 cm2/s (Eberstein et al. 1994); D=10.5-11.6×10-7 cm2/s (Grigsby et al. 2000).

CS from pig heart: atomic coordinates (Remington et al. 1982): PDB ID: 1CTS; SWISS-PROT entry: P00889, homodimer, M=97.838 kg/mol; I(h), \( \bar v \)=0.740 cm3/g, R G=2.91±0.02 nm, V=174.4±2.0 nm3 and δ1=0.339±0.011 g/g (Durchschlag et al. 1991, 1996); s=6.0×10-13 s and D=5.8×10-7 cm2/s (Wu and Yang 1970); s=6.2×10-13 s (Singh et al. 1970).

MS from baker's yeast: SWISS-PROT entry: P21826 and P30952, homotrimer, M=188.391 or 188.373 kg/mol; I(h), M=187±3 kg/mol, R G=3.96±0.02 nm, V=338±5 nm3 and δ1=0.35±0.02 g/g (Zipper and Durchschlag 1978); \( \bar v \)=0.745 cm3/g, s=(8.6±0.1)×10-13 s and D=(4.4±0.1)×10-7 cm2/s (Durchschlag et al. 1996); threefold symmetry (Durchschlag et al. 1978).

Concerning the accuracy of scattering and hydrodynamic data, it has to be mentioned that there is a considerable scatter of experimental results, as follows, for example, from a critical inspection of the data for lysozyme, one of the most frequently investigated proteins. For modeling purposes, therefore, often consensus values are used to fit the parameters (e.g., Allison and Tran 1995).

Apart from experimental deficiencies, the diversity of values is influenced by experimental conditions and evaluation procedures (purity of protein samples, protein concentration, protein aggregation, salt type and concentration, temperature, pH, standardization procedures regarding solvent viscosity and temperature, extrapolation to infinite dilution, etc.). Of course, the absence of unique data sets (in particular in hydrodynamics) impedes the ultimate assessment of protein hydration and to distinguish unequivocally between different assumptions and specific input parameters (particularly concerning fine tuning parameters of specific hydration models) required for the model calculations. These constraints hold in a similar manner for the prediction of scattering and hydrodynamic parameters, since the changes in protein size caused by hydration are similar in scattering and hydrodynamics.

Modeling approaches for proteins and consideration of hydration contributions

WB approaches

Proteins of simple shape can be modeled as spheres or prolate/oblate ellipsoids of revolution (PE/OE) or other triaxial bodies with unequal axes (cf. Durchschlag and Zipper 2002b). For approaches dealing with scattering and hyrodynamic data, the hydration is taken into account by using either quantities involving the contribution of bound water (e.g. hydrated volume, V, or surface-to-volume ratio, S/V, obtained from SAXS experiments) or by applying qualified assumptions regarding the amount of bound water, δ1, or some kind of rescaling procedures. In the context of the hydration problem, also the introduction of hydration-independent shape functions has to be addressed (Harding 1995).

MB approaches

In the case of MB approaches, the structure of a protein is approximated by many spheres ("multisphere approaches", "bead modeling"), ranging from a few spherical entities (e.g., corresponding to the number of subunits) to several thousands to model the exact protein shape (cf. Byron 2000; García de la Torre et al. 2000). Thereby even very complex and branched structures may be modeled. Hydration contributions may be considered by rescaling procedures (thereby enlarging the whole molecule or only surface beads) or modeling a water shell.

Modeling starting from the atomic coordinates

Since, nowadays, many 3D structures are available (currently about 20,000 PDB entries), modeling approaches based on the atomic coordinates of proteins seem to be the most realistic manner of acting. This procedure corresponds to a MB approach using plenty of coordinates (cf. Byron 2000; García de la Torre et al. 2000). Problems may arise from the exact values to be used for molecular volume and hydration of the constituents (atoms or AA residues). The molecular volume of a protein may be calculated by summing up the volumes of AA residues (according to Traube or Cohn and Edsall or using similar approaches; see Durchschlag 1986, 2003; Perkins 1986; Durchschlag and Zipper 1997c). Again, the most intriguing question is connected with the consideration of hydration contributions, and may be solved by scaling factors or construction of a water shell surrounding somehow the dry (anhydrous) protein. Moreover, special hydration algorithms for considering the individual water molecules on the protein surface have been suggested recently (Durchschlag and Zipper 2001, 2002a, 2002c; Zipper and Durchschlag 2002a, 2002b). Since these approaches will be used in the following, they will be discussed below in more detail.

Modeling starting from density distributions

Density maps derived from 3D reconstructions from cryo-EM, in combination with bead-modeling strategies, may also be exploited for deriving the protein solution structures (Zipper and Durchschlag 2000, 2003; Zipper et al. 2002). A sophisticated analysis of the voxel density distribution allows us to discriminate between anhydrous and hydrated protein models. By using definite density thresholds, an optimal agreement with experimental solution parameters (V, R G) can be achieved.

Ab initio modeling based on SAXS profiles

Starting from experimental SAXS curves, without (exact) knowledge of 3D information, modeling may be achieved by classical (trial-and-error) or more advanced (ab initio) shape determinations. Among the advanced methods, in particular the programs DALAI_GA, DAMMIN, SAXS3D and GASBOR have to be mentioned.

Low-resolution models can be obtained (1) from an iterative fitting of scattering curves by the genetic algorithm GA (Chacón et al. 1998, 2000), (2) by the program DAMMIN (Svergun 1999), which uses a multiphase model from densely packed dummy atoms (beads) and employs simulated annealing to find a configuration that fits the data, or (3) by SAXS3D (Walther et al. 2000), a Monte Carlo type reconstruction algorithm, using a "Give'n'Take" algorithm to add and remove beads until an optimum fit to the experimental SAXS profile is reached. With all three programs mentioned (DALAI_GA, DAMMIN, SAXS3D), hydration is considered by fitting the experimental SAXS profile which inherently contains hydration contributions. A comparison of the validity of these programs has been presented elsewhere (Zipper and Durchschlag 2003).

The program GASBOR (Svergun et al. 2001) is similar to the DAMMIN procedure mentioned above (use of simulated annealing), but rather uses a chain-like ensemble of dummy residues to simulate the backbone of the protein structure. Contrary to the above programs, this approach uses dummy water molecules to calculate models of hydrated proteins.

Modeling proteins by considering the contributions of individual water molecules bound to the protein surface

Modern surface calculation programs allow the exact surface topography of proteins to be calculated analytically, e.g. by applying the programs MSRoll (Connolly 1993) or SIMS (Vorobjev and Hermans 1997). Both programs produce a smooth molecular "dot surface" by rolling one or two probe spheres, respectively. In previous studies (Durchschlag and Zipper 2001, 2002a, 2002b, 2002c; Zipper and Durchschlag 2002a, 2002b), we have used both programs for advantageously creating a plethora of dot surface points (N dot) and corresponding normal vectors from the original PDB file. For the calculations in this study, we used the program SIMS, applying different values for the dot density (d dot), a probe radius r probe of 0.145-0.1525 nm (depending on the water volume V w) and a smoothing probe sphere of r sm=0.04 nm.

The surface points and normal vectors created by an appropriate surface calculation procedure can be used as a reservoir of starting points for hypothetical positions of water molecules on the protein surface. Special hydration algorithms (program HYDCRYST for atomic coordinates, program HYDMODEL for AA coordinates) have now to be applied to select appropriate positions out of this pool of possible points. The number of water molecules assigned to each accessible AA residue was based on hydration numbers derived from NMR experiments on polypeptides (Kuntz 1971) and a minimum distance between adjacent waters (d w=0.29-0.305 nm). Each selected water molecule was then approximated as a sphere of volume V w. The molecular volume of bound water was assumed to be smaller than that of the bulk water: 0.0245-0.0284 nm3, in comparison to 0.0299 nm3; this corresponds to an increase in the average density of bound waters by ca. 5-20%. No clear-cut decision in favor of a definite water density can be given to date.

In the past we have tested various tuning and fine-tuning parameters, in order to test the surface calculation and hydration approaches, on the one hand, and to modulate water binding in a wide range, on the other. Above all, scaling procedures had to be applied to achieve different degrees of hydration [e.g., minimum, intermediate (realistic), maximum hydration]. This was achieved, in particular, by introducing a factor f K, acting directly on the original hydration numbers given by Kuntz (1971), which, of course, do not discern between surface and internal AA residues.

Visualization of models

The resultant anhydrous and hydrated models can be visualized by any 3D molecular graphics program. Our graphics were made with the program RASMOL (Sayle and Milner-White 1995).

Prediction of structural and hydrodynamic properties of hydrated proteins

Prediction of hydration and scattering data

The original hydration numbers of Kuntz (1971) may be used for estimating in good approximation an overall hydration of a given protein, in agreement with the results of various physicochemical investigations (Durchschlag and Zipper 2001). These hydration numbers may also be used to predict a theoretical hydration of accessible AA residues, N w,acc, and a maximum hydration, N w,max, if all AA residues would be hydrated.

The coordinates of atoms (or coordinates of AA residues derived therefrom) and selected water molecules can be used directly to derive structural parameters such as the radius of gyration, R G, and hydrated volume, V.

Scattering curves I(h) can also be calculated by MB modeling, using Debye's formula (Glatter and Kratky 1982). For the prediction of SAXS patterns, however, the spheres have to be weighted according to the calculated number of excess electrons (difference between the number of electrons of atoms or AA residues or bound water molecules and the number of electrons in the same volume of bulk water). Of course, structural parameters such as R G and V can also be derived from the SAXS profiles. If required, I(h) can be converted to pair-distance distribution functions p(r) by Fourier transformation. In principle, p(r) functions can also be calculated directly from atomic coordinates.

For the calculation of the scattering behavior of hydrated protein models we used the following approximations. The protein atoms were assumed as hard spheres; the form factors of these hard spheres were corrected according to Fraser et al. (1978) with respect to the displaced solvent. However, both displaced bulk water molecules (V w=0.0299 nm3; electron density: 334 e/nm3) and bound waters [electron density: 352-408 e/nm3, depending on the assumed value for V w (0.0284-0.0245 nm3)] were treated as Gaussian spheres. Modeling bound water molecules as Gaussian spheres is suggested by their high fluctuation rate.

The program CRYSOL (Svergun et al. 1995) allows us to evaluate solution scattering from atomic coordinates. The program uses multipole expansion of the scattering amplitudes to calculate the spherically averaged scattering profile and takes the hydration into account by assuming a hydration shell (0.3 nm thickness) of constant electron density; crystallographically identified waters are ignored. It can either predict the SAXS curve or fit the experimental curve. The form factors of the dummy solvent atoms used are expressed as described by Fraser et al. (1978). For the envelope function, a quasiuniform grid of angular directions using Fibonacci numbers (Svergun 1994) is evaluated. The SAXS intensity depends on the average displaced volume per atomic group (expressed by the effective atomic radius, r a) and the contrast of the border layer (Δρ). In general, the bound solvent is assumed to be 8% denser than the bulk solvent (Δρ=30 e/nm3); the maximum contrast of Δρ of about 75 e/nm3 corresponds to a more pronounced density difference (20%) between bound and bulk water. CRYSOL is able to take information from a GNOM output file (Svergun 1992).

Prediction of hydrodynamic data

Sedimentation coefficients s and translational diffusion coefficients D were calculated by means of the well-established program HYDRO (García de la Torre et al. 1994, 2000), modified for our special purposes. We used a "filling model" strategy to derive both scattering and hydrodynamic quantities. Hydrodynamic models were built from spheres of equal density and more or less unequal size. Though the usage of overlapping spheres is usually discouraged (García de la Torre et al. 2000), it cannot be avoided when creating space-filling models of multiple spheres. Already the representation of the atoms or atomic groups of a crystal structure by spheres corresponding to the volumes of the atoms or groups causes a considerable amount of overlapping (Zipper and Durchschlag 1999). The problem of overlapping non-equal spheres was handled by use of an ad hoc modification in the interaction tensor, as described elsewhere (Zipper and Durchschlag 1997; Carrasco et al. 1999), in order to avoid erroneous results. Usage of overlapping beads does not result in other hydrodynamic computation problems or in the restriction of the number or nature of the parameters to be calculated.

If required for hydrodynamic modeling, for the hydrated models discussed above the initial models (IM) had to be reduced to a manageable number of beads (N beads,IMN beads,RdM), preferably by a cubic grid approach (Zipper and Durchschlag 1997). Thereby the given IM (representing the atomic coordinates) was mapped into a cubic grid of cells of given edge length, and finally spheres of appropriate radii were placed at the centers of gravity of mass in all occupied cells.

Our hydrodynamic computations concern translational friction only. Rotational friction and intrinsic viscosity parameters were not considered in our present calculations, in order to avoid the problem of special volume corrections (García de la Torre and Carrasco 1998; Carrasco and García de la Torre 1999; Gmachowski 2001). Moreover, in the case of simple globular particles the values for these quantities are not very sensitive to particle size and shape and therefore of similar magnitude, as may be inferred from a comparison of various values for intrinsic viscosities, essentially ranging between 3 and 4 cm3/g (Harding 1997; Durchschlag and Zipper 1999).

If one is striving for hydrodynamic parameters only, the internal beads can be removed. The interior residues are not experiencing any contact with the solvent, because hydrodynamic friction occurs at the molecular surface. This can be done effectively by replacing the primary hydrodynamic model by a shell model (García de la Torre et al. 1997, 2000; Carrasco and García de la Torre 1999; García de la Torre 2001); in this approach the hydration problem is solved by assuming for the primary hydrodynamic model an effective hydrodynamic radius of the protein atoms of ca. 0.3 nm instead of the conventional van der Waals value of ca. 0.2 nm.

Since our primary goal was the computation of hydrodynamic properties from both crystal and solution X-ray data, in connection with special site-specific hydration algorithms, we preferred filling models. These models allowed the calculation of both scattering and hydrodynamic properties, without changing the strategy by use of shell models, and allowed application of our site-specific hydration algorithms for the water molecules within and around the proteins.

Results and discussion

Lysozyme

For lysozyme, a great number of PDB structures are available, based on crystallographic data retrieved from different enzyme sources and crystal modifications. We chose a PDB file of hen egg-white lysozyme which contained a major number of water coordinates, in addition to the coordinates of the protein atoms, to enable a comparison of crystallographically observable water sites with the positions of water molecules created by our hydration modeling algorithm HYDCRYST. The resultant space-filling models are shown in Fig. 1 and details of the underlying models are given in Table 1.

Fig. 1.
figure 1

Different views (a-d: successive 90° rotations around the y-axis) and the corresponding central slabs (e-h) of space-filling models for hydrated lysozyme, showing protein atoms and individual water molecules. The coordinates of the basic atoms of the protein moiety (C, O, N, S) and of O atoms belonging to water molecules were taken from the original PDB file (2LYZ). Alternatively, coordinates of bound water molecules were derived by application of the hydration modeling program HYDCRYST (based on surface calculations by the SIMS program, d dot=500 per nm2, and applying V w=0.0257 nm3 and f K=1.25). The basic protein atoms are shown in gray. Bound waters derived from crystallographic work are visualized in green and waters obtained by application of HYDCRYST are displayed in orange; waters common to both approaches (overlapping >20% of their volume) are colored blue (derived from crystal data) or red (derived from HYDCRYST). Graphics were made with the program RASMOL

Table 1. Comparison of calculated parameters for hydrated lysozyme models as obtained by HYDCRYST and different input parameters,a together with a survey of the experimental valuesb

Concerning the results stemming from application of HYDCRYST, Fig. 1 suggests that only part of the surface is covered by water molecules. This is in accord with findings be Lee and Richards (1971), who postulated for lysozyme a considerable percentage (41%) of hydrophobic surface. Obviously, the bound water molecules do not form a uniform hydration shell; instead, they are placed at preferred positions on the protein surface. This is also in accord with the crystallographically detectable waters. HYDCRYST points to the existence of a few water molecules in the protein interior, in full accord with the finding of four internal waters by crystallography (Blake et al. 1983). A close inspection of the positions covered by waters produced by HYDCRYST and observed by crystallography, however, reveals only a moderate coincidence of water sites: only about 29 out of 101 waters listed in the crystallographic data sufficiently overlap (distance <0.22 nm; cf. Rashin et al. 1986) with waters produced by HYDCRYST.

Figure 2 illustrates a quantitative comparison of crystal data and HYDCRYST for one selected example (V w=0.0257 m3, f K=1.25) out of 35 analyzed models, with respect to the number of hydrated AA residues and total amount of bound water molecules. Certain AAs, such as R (Arg), N (Asn), D (Asp), E (Glu), K (Lys) and S (Ser), are preferably hydrated, as observed with both approaches. It is not surprising that most representatives of this group are charged AAs which are known to bind a lot of water. Since X-ray crystallography only observes a minor number of bound waters, the total amount of bound waters suggested by HYDCRYST must exceed that found in the crystals.

Fig. 2.
figure 2

Histogram showing the total number of AAs of lysozyme, the number of hydrated AAs, together with the total number of bound water molecules. The hydration numbers were retrieved by X-ray crystallography (2LYZ) or HYDCRYST (V w=0.0257 nm3, f K=1.25). AAs are given in alphabetic order of their three-letter code; for short, however, they are symbolized by the one-letter code

The example shown in the figure represents one of the best HYDCRYST fits concerning the hydrodynamic data presented in Table 1, if compared to the experimental results for s and D. Unfortunately, the comparison of predicted and experimental values is impaired by the considerable uncertainty in the experimental D values; at present, use of a consensus value of ca. 11×10-7 cm2/s for this quantity is the best choice. Similarly, the comparison of the predicted values for V, δ1 and R G,SAXS with the experimental ones is satisfactory, particularly when considering the most reliable value for R G of ca. 1.5 nm. In this context, one has to mention some discrepancy between the values for R G,SAXS and R G,Hydro, in particular for high values of V w. This is obviously caused by different calculation procedures for these two quantities (calculation from excess electrons, radii and coordinates of the beads for R G,SAXS, and only bead radii and coordinates in the case of R G,Hydro). The model also yields a reasonable value for the hydration: δ1 = 0.345 g of water per gram of protein. Nearly equally good results are found for f K values near 1, irrespective of the value chosen for V w. Clearly, the results for the unhydrated model disagree with the experimental findings. On the other hand, variation of f K in the HYDCRYST procedure allows us to cover a wide range of possible extents of hydration.

Citrate synthase

Table 2 summarizes a variety of hydrated CS models, created by HYDCRYST, and input parameters (V w, f K) differing over a wide range. We also calculated several further parameters such as the total volume of bound waters ΣV w, the number of excess electrons in the bound water N exc.el,w, the radius of gyration of the initial model R G,IM, which is nearly identical to the radius of gyration derived from the SAXS curve (data not shown), and the fitting parameter χ (a measure of the goodness of the fit of the experimental or calculated curve, allowing a qualified selection of models under consideration). As in the case of lysozyme, the values found for R G,Hydro generally exceed those of R G,SAXS. Some representative CS models are pictured in Fig. 3, which have been obtained by the use of different V w and f K values. The figure impressively shows how the extent of hydration can be changed by varying f K, and to a minor extent also by changing V w. This, of course, also follows from the values presented in Table 2.

Table 2. Comparison of calculated parameters for hydrated CS models as obtained by HYDCRYST and different input parameters,a together with a survey of experimental valuesb
Fig. 3.
figure 3

Selected space-filling models for hydrated CS (a-d, f) together with an illustrative slab (e). Hydration was modeled by means of HYDCRYST, applying different input parameters with respect to water volume (V w) and scaling factor for the extent of hydration (f K). The protein atoms are given in gray and individual water molecules are highlighted in red: (a) V w=0.0245 nm3, f K=0.8; (b) V w=0.0245 nm3, f K=1.0; (c) V w=0.0245 nm3, f K=2.0; (d) V w=0.0269 nm3, f K=2.0; (e) V w=0.0269 nm3, f K=2.0, central slab; (f) V w=0.0284 nm3, f K=2.0

The gallery of scattering curves presented in Fig. 4 (use of different V w and f K values) reveals that, in the case of CS, f K values near 1 obviously fit the data best. Of the water volumes chosen, V w values of 0.0284 nm3 (corresponding to a 5% increase in density as compared to bulk water) turn out to be most appropriate. Accordingly, the corresponding parameter predictions given in Table 2 are in fair accord with the experimental data. This may also be taken in a more quantitative way from a comparison of the fitting parameter χ: the best fit of the experimental scattering curve (lowest χ value) is found for V w=0.0284 nm and f K=1.5. Though the agreement of V, δ1 and R G with experimental values is by no means perfect, the coincidence of the ΣV w value of 41.8 nm3 (Table 2) with the volume of 41.49 nm3 for the water shell derived by CRYSOL (Table 3, footnote f) is surprising. Also the agreement between observed and predicted hydrodynamic parameters (s, D) is satisfactory, irrespective of the value chosen for V w. Therefore, this model was chosen for illustrating the histogram presented in Fig. 5. Also in the case of CS, the charged amino acids R (Arg), D (Asp), E (Glu) and K (Lys) are the preferred targets for water binding.

Fig. 4.
figure 4

Normalized scattering curves I(h), of models for unhydrated and hydrated CS, where \( h = {\rm{4}}\pi \,{\rm{sin}}\theta {\rm{/}}\lambda \) (2θ=scattering angle, λ=wavelength). The selected models comprise a few representative examples, covering a broad range of water volumes (V w=0.0245-0.0284 nm3) and scaling factors (f K=0.8-2.0) when using the hydration modeling program HYDCRYST. For comparison, the experimental SAXS function (circles) is also shown

Fig. 5.
figure 5

Histogram showing the total number of AAs of CS, the number of hydrated AAs, together with the total number of bound water molecules. The hydration numbers were derived from application of HYDCRYST (V w=0.0284 nm3, f K=1.5). AAs are given in alphabetical order of their three-letter code; for short, however, they are symbolized by the one-letter code

Use of the CRYSOL program also leads to a perfect approximation of the scattering behavior, when comparing the experimental scattering profile (Fig. 6). This approach obviously presents an alternative method to simulate hydration in the context of SAXS in a very efficacious way. Application of CRYSOL to the scattering curves of the hydrated models created on the basis of HYDCRYST and listed in Table 2 renders the results summarized in Table 3. For the unhydrated model, CRYSOL yields the contrast Δρ=0, but an appreciable amount of hydration (δ1=0.255 g/g). This discrepancy can be explained by the procedure to calculate the hydration from the ratio of the shell volume to the volume of a single water molecule. Though CRYSOL produces slightly enhanced hydration values (up to 0.29 g/g) with increasing f K values, the changes are much smaller than implied by our hydrated models. This also follows from a comparison of the numbers of excess electrons of bound water, N exc.el,w, with the corresponding numbers in Table 2: the values calculated from Δρ are too small. Use of alternative values for V and r a did not improve significantly the predictions under consideration. Nevertheless, the scattering curves of the models created by HYDCRYST are fitted perfectly by CRYSOL (not shown).

Fig. 6.
figure 6

Normalized scattering curve I(h) of hydrated CS as created by CRYSOL, together with the experimental profile (circles)

Table 3. Comparison of parameters as obtained by CRYSOL from the experimental scattering curve of CS and of calculated curves of hydrated modelsa

Malate synthase

GASBOR was chosen as an ab initio modeling program, since this program explicitly models a hydration shell. By contrast, the other ab initio programs (DALAI_GA, DAMMIN, SAXS3D) take hydration into account by merging the hydration contribution with the contributions of the protein building blocks. This may be achieved by some kind of scaling (by modeling more masses at the protein surface compared to the true mass distribution). As input parameters the number of AAs (554 per subunit) and threefold symmetry were used; the trimeric structure of the enzyme was established previously (Durchschlag et al. 1978, 1981; Zipper and Durchschlag 1978). Figure 7 clearly reveals the occurrence of an oblate shape, in agreement with our previous results (Zipper and Durchschlag 1978). The visualization of water molecules, however, discloses that these are only hypothetical water sites needed for the performance of calculations; they represent no physical reality. However, as may be taken from Table 4, the agreement of calculated R G, s and D values with experimental values is excellent. The predicted values were obtained by equating the volume of a dummy residue of the protein with the mean dry volume of an AA residue. Other ab initio modeling programs gave also a nearly perfect agreement with the observed data (Zipper and Durchschlag 2003), though these programs handle the hydration problem in a way different from GASBOR.

Fig. 7.
figure 7

Top, side and bottom views (a-c) of space-filling models for hydrated MS, created by the ab initio modeling program GASBOR after 50 consecutive cycles. The basic protein units (an assembly of 1662 spherical dummy residues) are displayed in gray and a variety (987) of hypothetical water molecules are shown in red. Thereby the protein is surrounded by some kind of hydration layer of 0.3 nm thickness

Table 4. Comparison of experimental and calculated structural and hydrodynamic parameters, as obtained by various approaches from experimental scattering curves of MS utilizing both slit and cone collimation dataa

Conclusions

Water bound to the protein surface or in the protein interior is fundamental to protein folding, stability, function (activity and recognition) and for an understanding of internal water dynamics. A variety of studies of water in and around proteins have made us acutely aware of the problems of this pivotal topic. Recent modeling approaches (Durchschlag and Zipper 2002a) have shown that modeling of individual, bound water molecules is possible, allowing both the scattering and hydrodynamic properties of the hydrated proteins to be calculated. This approach complements the information revealed from solution NMR, crystal diffraction and molecular dynamics simulation methods. Though water molecules possess a considerable mobility and have no particular binding site (they may alternate between several equally energetically stable sites), they may be visualized at preferentially occupied sites (Schoenborn et al. 1995). This should be taken into account when discussing conflicting results from different techniques, e.g. when comparing the results detected in crystals, on the one hand, and observed in solution, on the other.

As outlined previously (cf. García de la Torre et al. 2000; Durchschlag and Zipper 2002a, 2002b, 2002c), atomic or other types of coordinates together with bead modeling may be used efficaciously to predict the scattering and hydrodynamic behavior of proteins of simple and complex structure. "Filling-model" strategies have to be preferred, if both scattering and hydrodynamic quantities are to be predicted without changing the computation strategy. Since the behavior of proteins in aqueous solution is influenced both by the molecular characteristics of the protein under investigation and manifold interactions with the solvent water, the hydration problem is still the most crucial problem for examining the solution behavior in detail. This especially holds for the precise interpretation of results stemming from scattering and hydrodynamic studies. As discussed above in detail, hydration contributions can be taken into account by quite different approaches, including rescaling procedures, construction of shell models and sophisticated algorithms for determining the protein surface and hydration contributions. In all cases, hydrated protein models yield much more accurate predictions of scattering and hydrodynamic data than the anhydrous model.

Our hydration modeling approach HYDCRYST yields the number of bound water molecules and suggests possible positions for them. As shown in previous studies (Durchschlag and Zipper 2002a, 2002c), the choice of most input parameters is of subordinate importance, except those for scaling the amount of AA hydration (f K) and molecular water volume (V w). A comparative analysis of our results, however, indicates that values for f K between 1 and 2 and V w of ca. 0.0269±0.0015 nm3 (corresponding to an increase of water density by 10±5% compared to bulk water) would represent realistic default values for modeling bound water. This can be proven by the finding of reasonable values for protein hydration (δ1), by best fits to experimental scattering curves (e.g., expressed by low values for the fitting parameter χ) and by correctly predicting both scattering and hydrodynamic parameters (V, R G, s, D). However, if required, a higher coverage of the protein envelope by waters can be achieved by usage of enhanced f K values.

If only the prediction of scattering and/or hydrodynamic parameters is desired, the other approaches also tested, CRYSOL (which also uses atomic coordinates but calculates a water shell), GASBOR (an ab initio modeling approach which uses dummy waters) and, presumably, some others, which consider hydration contributions somehow, are successful. It should be kept in mind, however, that these methods use hydration contributions only as an auxiliary means for performing calculations, and, in general, no statements concerning exact amount of hydration or water placement should be inferred. By contrast, our HYDCRYST approach aims at evolving a biophysically more realistic picture by placing water molecules at sites where, in principle, they could be for a moment, both from geometric and energetic considerations. In this context it should be stressed that the initial values used by us for hydration, the hydration numbers by Kuntz (1971), are very close to the numbers by Hopfinger (1977) which have been derived from thermodynamic considerations (cf. Durchschlag and Zipper 2002a). Consequently, these numbers already include energetic contributions.

The amount of hydration achieved by HYDCRYST for lysozyme and CS (about 0.3 g of water per gram of protein for the above-mentioned default values for f K and V w) is within the range of δ1 values commonly obtained (Durchschlag and Zipper 2001). On the other hand, only part of bound water molecules is usually detected by crystallographic methods. This may also be taken from the histogram in Fig. 2, comparing HYDCRYST data with crystallographically identified waters. Both approaches show a preferred water binding of charged AA residues, though differing in the assessment of hydrated AAs and total amount of water binding. A critical comparison of the water sites monitored by HYDCRYST or crystallography indicates only moderate coincidence. This scenario, however, is by no means surprising, since crystallography only depicts a static picture, and the results of HYDCRYST, admittedly, are influenced by the sequence of steps to be performed when applying the hydration algorithm. This, however, is no matter of concern, since water molecules permanently fluctuate and precise positions on the protein surface make no sense. Neutron diffraction studies (Bon et al. 1999) also show that only part of crystallographically localized water molecules sit in definite orientations; they rather form a constantly fluctuating water network covering the protein surface. Recent water simulations also indicate that there is little preference for a water to occupy a particular site; nevertheless, the concept of hydration sites is useful for describing the water structure around proteins (Henchman and McCammon 2002).

At present, only two proteins (lysozyme, CS) have been used for a detailed comparison of crystallographic, SAXS and hydrodynamic data with respect to hydration contributions. Based on these data, it appears that the method applied underestimates slightly both the sedimentation and diffusion coefficients (Tables 1 and 2). Further computations including additional proteins, improved modeling procedures and a fine tuning of input parameters (e.g., probe radius) will presumably show if this is indeed the case. These calculations could also disclose if the original hydration values of Kuntz (1971) have to be modified to avoid an overestimation of hydration; the present results indicate that even values of f K lower than unity would be required to properly match some data.

Summarizing our present knowledge, the hydration of globular proteins has a significant influence on the interpretation and prediction of scattering and hydrodynamic data in aqueous solution. The correct prediction of the molecular properties of proteins is possible if the hydration is taken into account by rescaling procedures (e.g., by a uniform expansion of whole-body or multibody models), by the assumption of a uniform water shell (e.g. used in the public domain program HYDROPRO; García de la Torre et al. 2000), or by a specific hydration model (exploiting specific, preferential hydration sites) as outlined in this study. If only scattering or hydrodynamic data are to be predicted, all mentioned approaches can be used. All of them allow the prediction of solution properties which deviate only a few percent from the experimental data, provided hydration contributions are taken into account somehow. The accuracy of all experimental data in the literature is obviously not sufficient to make a clear choice in favor of a definite approach to be used. From the biophysical point of view, however, our specific hydration model is much more realistic. It allows many details of hydration to be discussed in terms of various important aspects, for example aiming at the localization of preferentially bound and internal water molecules, at the elucidation of molecular mechanisms of enzyme activity and stability, at the exact volume occupancy in the context of ligand binding, transport phenomena, and drug design projects, etc. Possible implications in connection with radiation-induced events in proteins have been mentioned recently (Durchschlag et al. 2003). Because of the fluctuation of the water molecules, the assumption of time-averaged positions of bound waters with average properties (e.g. regarding their density) are sufficient for realistic computer simulations of hydrated protein structures.