Introduction

Not all proteins fold completely after exiting the ribosome. For example, the eukaryotic nucleosome histones, H3, H4, H2A, and H2B, while adopting the generic histone fold during dimerization in the cytosol are transported into the nucleus along with long tail extensions involved with binding to DNA as well as with dynamic function associated with DNA replication. Since the histone dimers are incompletely folded, one might ask whether the distribution or organization of residue hydrophobicity from the protein interior of the central globular regions of the oligomeric dimer structures approximates that found for single-chain, single-domain globular soluble proteins. Furthermore, to what extent is this organization the same for the individual histone folds comprising the dimers?

The nucleosome core particle consists of a central tetramer of two bound H3–H4 dimers flanked by two H2A–H2B dimers. X-ray structures at atomic resolution of the core particle (Luger et al. 1997; Davey et al. 2002; White et al. 2001; Harp et al. 2000) enable investigation of the correlation of specific amino acid attributes with the three-dimensional nucleosomal protein structures as well as with the DNA that enfolds it. While the sequence similarity among the H3, H4, H2A, and H2B histones is low, the H3–H4 and H2A–H2B dimers adopt a comparable structural “handshake” motif (Arents et al. 1991; Arents and Moudrianakis 1995). Consequently, interest has focused on a comparison of the relative stability of these two different, however, structurally similar heterodimers (Karantza et al. 1995, 1996; Gloss and Placek 2002; Placek and Gloss 2002; Banks and Gloss 2003, 2004). One study (Karantza et al. 1996) describes how the relatively high value of the change in heat capacity upon unfolding of the H3–H4 dimer might be accounted for by an unusually high apolar accessible surface area in its native state. The other study (Banks and Gloss 2003) describes how the burial of charged residues as well as differences in the average solvent accessible surface areas of charged atoms might account for the lower concentration of denaturant required to induce unfolding of the H3–H4 dimer relative to the H2A–H2B dimer and for differences in the measured free energies. The spatial distribution of residue hydrophobicity of the dimers is, therefore, a subject of interest.

The spatial distribution of residue hydrophobicity from the interior of a protein can be calculated with an appropriate distance metric and hydrophobicity scale (Silverman 2001). This had provided a global characterization of the hydrophobic core of soluble globular proteins (Zhou et al. 2003). Similar calculations performed on the histone dimers show such characterization comparable with what had been previously obtained. This is not surprising since dimer formation is expected to occur in the cytosol, prior to transport through the nuclear membrane and subsequent assembly in the nucleosome. While the overall global characteristics of the amino acid hydrophobic distribution of the dimers appear similar to what had been obtained for globular soluble proteins, a more detailed examination of the correlation between residue hydrophobicity and distance from the dimer interior shows a well-defined difference between the H3–H4 and the H2A–H2B dimers. The H3–H4 dimer does not bury hydrophobic residues to the same extent as the H2A–H2B dimer. Subsequent calculations show the H3 histone fold to be responsible for this difference. The origin of this difference is narrowed down further by examination of the correlation between residue hydrophobicities and distances over different local regions of the fold sequence. Windowing over local stretches of residue hydrophobicity also makes the correlation between the longer wavelength variation of this attribute and the residue distance from the dimer interior visually overt. It is found that the region of the sequence where this correlation is diminished is the region at the carboxyl end of the H3 histone fold, the helical region of the fold involved in the H3–H3′ binding of the (H3–H4)2 tetramer of the nucleosome core particle. Hydrophobic interactions apparently contribute to the binding of this fourfold helical bundle and this evolutionary requirement may trade off against the requirement for H3–H4 dimer stability.

While the archaeal and eukaryotic histones have evolved from a common ancestor (Pereira and Reeve 1998; Soares et al. 2003; Reeve et al. 2004), with the two HMFA and HMFB archaeal folds from the hyperthermophilic archaeon Methanothermus fervidus exhibiting 84% sequence identity, considerable evolutionary divergence is reflected by the eukaryotic H3, H4 folds of, for example, the nucleosome protein 1KX5, which exhibit 24% sequence identity over their combinatorial extension (CE) (Shindyalov and Bourne 1998) structurally aligned regions and comparable identities over their structurally aligned regions with the archaeal folds. Evolutionary divergence from homo- to heterodimer appears to have contributed to the stability of the fourfold helix of the (H3–H4)2 tetramer. This is consistent with the observation of (H3–H4)2 eukaryotic tetramers in solution (Banks and Gloss 2004) and HMFA and HMFB tetramers that are apparently found only in the presence of DNA (Grayling et al. 1995, 1996; Pereira and Reeve 1998).

The availability of X-ray structures of the HMFA and HMFB folds (Decanniere et al. 1996, 2000) enables structure as well as sequence alignments to be performed involving the archaea. Interestingly, the CE structure alignments between the amino acid α-carbon atom locations of the archaeal and eukaryotic folds, which use no information of residue type, are closely identical with the amino acid sequence alignments. The burial or lack of burial of archaeal and eukaryotic aligned residues of differing hydrophobic attribute can thereby be investigated with respect to their corresponding structures. Special attention is focused on amino acid replacements from archaea to eukarya that have involved extreme changes in hydrophobic character. Most of these changes are shown to have occurred nearer the carboxyl ends of the chain. Finally, spatial and hydrophobic profiling over the lengths of the archaeal HMFA and HMFB folds does not show the lack of burial of hydrophobic residues near the carboxyl end of the sequences, as observed for the H3 histones of the eukarya. This further supports the contention that the lack of such burial postdates the point of divergence from a common ancestor.

Methods

Calculations have been performed on histone dimer structures extracted from the nucleosome structures of the three species Xenopus laevis, Saccharomyces cerevisae, and Gallus gallus. Their PDB-IDs are 1KX5 (Davey et al. 2002), 1ID3 (White et al. 2001), and 1EQZ (Harp et al. 2000), respectively. The structural calculations are based upon the centroids of the residue side-chains. The centroid of all residue side-chain centroids provides a reference center of the dimer from which all residue side-chain distances are calculated. An ellipsoidal correction to this distance yields residue side-chain distances that correlate more closely with residue solvent accessibility than with the uncorrected radial distances from the center of the dimer (Silverman 2003). The ellipsoidal side-chain distance from the dimer centroid is just the value of the principal major axis of the nested ellipsoid upon which the residue side-chain centroid is found. This ellipsoid is nested within a more global ellipsoid that characterizes overall protein shape. The residue hydrophobicity scale chosen (Table 1) is that of Neumaier et al. http://solon.cma.univie.ac.at/∼neum/software/protein/aminoacids.html). This scale, obtained by a principal-component analysis of 47 different scales, yielded optimal correlation between protein ellipsoidal distance and hydrophobicity. Solvent accessible surface area for each of the residues was obtained from the Web site, http://www.scsb.utmb.edu/getarea/area_man.html, of the Sealy Center for Structural Biology, University of Texas Medical Branch, Galveston (Fraczkiewicz and Braun 1998).

Table 1 Neumaier hydrophobicity scalea

The histone tails are truncated. This is achieved by performing calculations on structures that include the three listed helices of the histone fold in the PDB file. The results of the calculations are qualitatively similar to results obtained by an alternate procedure of truncation that retained only the central globular structures of the folds of the heterodimers that aligned by a CE (Shindyalov and Bourne 1998) alignment. Guanidine hydrochloride–induced denaturation had shown the relative instability of the H3–H4 dimer to be similar with or without the amino-terminal tails (Banks and Gloss 2003). Urea-induced denaturation had shown the H2A–H2B dimer without tails to have stabilities similar to or slightly greater than that of the native H2A–H2B dimer (Placek and Gloss 2002). From Table 2, one notes that truncation of the tails yields mean hydrophobicities of the histones, on the Neumaier scale, that are comparable with that of 50 globular soluble proteins.

Table 2 Mean hydrophobicities of the 1KX5 histones

The calculations are performed in several different stages, with increasing attention focused on local structural detail. Initially, calculations are performed on the complete dimer. Zero-order and second-order hydrophobic moment profiles are calculated that characterize the shape of the hydrophobic core (Silverman 2001). Correlation coefficients are also calculated between the residue distances and the hydrophobicities of the dimer. Next, calculations are performed on the individual histone folds in the conformation of the dimer. The residue distances used in all of the calculations are identical to the residue distances obtained in the dimer conformation. This enables identification of the contribution from the individual histone folds to the correlation between residue hydrophobicity and distance when in the dimer conformation. Next, correlations are examined over local regions of the fold sequence. Finally, windowing, namely, averaging amino acid attributes over local stretches of the sequence, enables a visual comparison of the correlation between the hydrophobic averages over local stretches of the sequence and the tertiary structural features of the fold.

Calculations have also been performed with structural information provided by the X-ray analyses of the HMFA (1B67) and HMFB (1A7W) hyperthermophilic archaeon Methanothermus fervidus dimers (Decanniere et al. 1996, 2000). This enables CE alignment between the archaeal and the eukaryotic histone folds as well as calculations of the correlation between the histone amino acid distance from the dimer interior and residue hydrophobicity.

Results

The calculated hydrophobic profiles of the H3–H4 and H2A–H2B dimers are characteristic of what had been obtained previously for a set of nonredundant protein domains of the Protein Data Bank (PDB) (Zhou et al. 2003). Figures 1 and 2 show the zero-order, H0, and the second-order, H2, hydrophobic moment profiles of the 1KX5 H3–H4 and H2A–H2B dimers, respectively. The zero-order profile is obtained by summing all hydrophobic values of the residues within an ellipsoid with principal major axis of extent specified by the value of the abscissa. Residue collection is made in increasing steps of 0.5 Å. The second-order moment profile is a similar sum, with, however, the value of residue hydrophobicity multiplied by the square of the distance or principal major axis of the nested ellipsoid upon which the residue side chain resides. The relatively smooth profiles and values of the hydrophobic ratios, 0.70 and 0.78, the ratio of the distance at the second-order moment crossover, d2, to the distance at full residue accumulation, d0, is characteristic of globular soluble proteins (Silverman 2001; Zhou et al. 2003). Consequently, the dimers qualitatively exhibit a hydrophobic core characteristic of single-chain, single-domain globular soluble proteins.

Figure 1
figure 1

The zero-order, H0, and the second-order, H2, hydrophobic moment profiles of the 1KX5 H3–H4 dimer.

Figure 2
figure 2

The zero-order, H0, and the second-order, H2, hydrophobic moment profiles of the 1KX5 H2A–H2B dimer.

While the profiles involving residue accumulation are global averages and do not reveal detailed differences between the distributions, the correlation coefficient between the individual values of residue hydrophobicity and distance from the interior can reveal such differences. Table 3 lists the correlation coefficients between the residue hydrophobicities and the distances from the dimer interior, as well as between the percentages of residue solvent accessibility, for the dimers of the nucleosome structures, 1KX5, 1ID3, and 1EQZ. The moderate magnitude of the correlation coefficient of the dimers, for example, is comparable to what is found for the 29 native structures of the globin decoy set from Decoys ‘R’ Us (Park and Levitt 1996). These structures belong to the all-alpha class as the histones and have comparable numbers of residues. The mean value of their correlation coefficient between distance from the protein interior and residue hydrophobicity is 0.524. Moderate magnitude of this correlation coefficient is expected since hydrophobic and hydrophilic residues are distributed relatively uniformly from protein interior to exterior, with, however, a bias of hydrophobic residues in the interior. The distribution of the values of residue hydrophobicity from the dimer interior is shown in Fig. 3, for the H3–H4 and H2A–H2B dimers of 1KX5. From the figure, one notes the distributed nature of the distribution. Furthermore, one might perceive the slightly greater bias of hydrophobic residues in the interior of the H2A–H2B dimer. The difference between these two distributions translates into H2A–H2B dimer correlation coefficients of residue hydrophobicity with distance which are 20% to 30% greater than those of the H3–H4 dimer. Correlation coefficients between the percentages of amino acid exposure or solvent accessibility (Fraczkiewicz and Braun 1998) and hydrophobicity are also given in the fifth column of Table 3. Similar differences between the H3–H4 and the H2A–H2B dimers are observed.

Table 3 Correlation coefficients of the histone dimers and monomers
Figure 3
figure 3

Amino acid hydrophobicity (Neumaier scale) as a function of distance from the interior of the H3–H4 and H2A–H2B dimers.

These differences in the values of correlation coefficients of the dimers can be traced to differences in the contributions of the individual histone folds by calculating correlation coefficients between the residue distances of the fold from the dimer interior and the corresponding residue hydrophobicities of the histone monomers. These values are also provided in Table 3. One notes the significant reduction of such correlation for the H3 fold. This reduction is mainly responsible for the reduced value of the magnitude of the correlation coefficient of the H3–H4 dimer compared with the H2A–H2B dimer.

The origin of this difference can be narrowed down further by examining various ranges of sequence over the H3 and H4 folds. Table 4 shows the correlation coefficients calculated over the halves of the histone chains of the protein 1KX5, containing the amino and carboxyl ends. Correlation coefficients are also provided for the individual outer helices of the fold. The greatest reduction in correlation between the residue distances and the hydrophobicities is found for the region encompassing the carboxyl end of the H3 fold, the region that is part of the four-helix bundle that contributes to the stability of the (H3–H4)2 tetramer by the binding of the dimers.

Table 4 Truncated correlation coefficients

Figures 4 and 5 show residue distances and smoothed values of residue hydrophobicity for the H3 and H4 folds, respectively, along the sequence. A sliding window of eight residues is sufficient to eliminate contributions from the high-frequency hydrophobic oscillations, which include those associated with secondary structure. The high-frequency oscillations in distance are, however, retained and mirror the rapid inside to outside excursions of the residue side-chain centroids as the α-helices are traversed. The distances exhibit three valleys over several residues in extent which mirror the near-approach to the center of the dimer from residues of the three different helices of the histone fold. Residues in the long central helix of the fold approach nearest the center of the dimer. Peaks in the values of smoothed residue hydrophobicity are expected to register with the valleys in residue distance, namely, the more hydrophobic the local sequence, on average, the nearer it will be to the protein interior (Rose and Roy 1980). Figure 5 shows a peak in hydrophobicity which registers with the valley in distance at the carboxyl end of the H4 fold. Figure 4 does not show such registration for the H3 fold.

Figure 4
figure 4

Residue side-chain distance from the center of the protein and smoothed value of residue hydrophobicity at each residue location of the H3 fold of the protein 1KX5.

Figure 5
figure 5

Residue side-chain distance from the center of the protein and smoothed value of residue hydrophobicity at each residue location of the H4 fold of the protein 1KX5.

The carboxyl end of the H3 fold has a double duty to perform. It must contribute to dimer stability by wrapping in a manner to bury hydrophobic residues during the folding of the dimer. It must, however, also contribute to tetramer stability by burying hydrophobic residues in the region of the four-helix bundle, two assignments that are mutually exclusive. The performance of this dual role leads to some interesting residue side-chain orientations. Figure 6 shows a wire diagram of the backbone of the H3–H3′, four-helix bundle. Stick diagrams of three amino acids are also shown. ILE130 points to the interior of the bundle, contributing to its hydrophobic interior, whereas the ARG128 and ARG131 side chains point away from the interior of the bundle and toward the interior of the dimer, side-chain orientations that are not expected to contribute to the stability of the H3 histone fold of the isolated dimer if comparable orientations are maintained.

Figure 6
figure 6

A wire diagram of the backbone of the H3–H3′, four-helix bundle of the (H3–H4)2 tetramer of the nucleosomal protein, 1KX5, and stick diagrams of the three amino acids, Arg128, Ile130, and Arg131.

If the dimers of the nucleosome octamer are viewed as being connected together by three four-helix bundles in the order (H2A–H2B)–(H4–H3)–(H3′–H4′)–(H2B′–H2A′) (see, e.g., Pereira and Reeve 1998; Chantalat et al. 2003), one might then ask why the carboxyl ends of the H2B and H4 histones do not exhibit reduced correlation similar to that observed for the H3 histone fold. Such reduced correlation has not been observed by the present calculations. One hypothesizes that this is not observed due to the reduced interaction between the H2B and H4 helices. While the stability of the (H3–H4)2 tetramer and its fundamental role in nucleosome assembly have been extensively documented (see, e.g., Wolffe 1998), the H2B–H4 interaction is stable only in the presence of DNA. Similarly, (H3–H4)2 tetramers have been observed in solution (Banks and Gloss 2004), whereas tetramers of the archaeal HMFA and HMFB histones have been observed in vitro apparently only in the presence of DNA (Grayling et al. 1995, 1996; Pereira and Reeve 1998).

Figure 7 shows the results of a multiple sequence alignment (ClustalW European Bioinformatics Institute) of the sequences of the three H3 eukaryotic folds of 1KX5, 1EQZ, and 1ID3 and the sequences of the two HMFA and HMFB histone folds from the hyperthermophilic archaeon Methanothermus fervidus. At the bottom, in the rectangular box, is the result of the CE structural alignment of the H3 A-chain sequence of 1ID3 with the HMFB sequence of 1A7W. Note that aside from some minor differences at the amino end of the sequence, from the second residue on after the major archaeal gap in the sequence, the sequence and structure alignments are identical. Comparable results are obtained for all pairwise CE alignments between the eukaryotic and the archaeal folds. This particular pair has been exhibited since it most closely aligns structurally at the extremity of the carboxyl end of the fold. Arrows designate extreme differences between the archaeal and the eukaryotic amino acid hydrophobicities at correspondingly aligned locations. One global observation is that these differences are more frequent toward the carboxyl end of the sequences; four of them are near the terminal end of the chain. Several of the amino acid differences identified by the arrows are quite striking, for example, PHE68 and PHE67 of HMFA and HMFB, respectively, in the archaea appear as arginines at the corresponding locations of the eukarya; LYS54 and LYS53 of HMFA and HMFB, respectively, are replaced by valines in the eukarya; and LYS46 and LYS45 of HMFA and HMFB, respectively, are replaced by leucines in the eukarya. Amino acid residues of the eukaryotic H3 histones, at locations identified by the arrows, are primarily responsible for the reduction in hydrophobic residue burial within the dimer.

Figure 7
figure 7

Multiple sequence alignment of the amino acid residues of the three eukaryotic H3 sequences of 1KX5, 1EQZ, and 1ID3 and the monomeric archaeal sequences of 1B67 and 1A7W. The CE alignment of the H3 1ID3 histone fold with the 1A7W histone fold is enclosed in the rectangular box.

Figure 8 shows amino acid distances from the dimer interior of the archaea and the corresponding windowed and smoothed values of amino acid hydrophobicity along the extent of the folds. Both HMFA and HMFB folds show an increase in the windowed values of amino acid hydrophobicity as the residue distance at the carboxyl end of the chain approaches the dimer interior. Such behavior is not observed for the H3 chain of the eukarya as shown in Fig. 4 for the 1KX5 H3 histone fold.

Figure 8
figure 8

Residue side-chain distances from the center of the HMFA and HMFB dimers and windowed/smoothed values of residue hydrophobicity at each residue location of the HMFA and HMFB folds of the proteins, 1B67 and 1A7W.

One can calculate the change in the correlation coefficient between hydrophobicity and residue distance from the interior of the dimer due to any single amino acid replacement and consequently infer its contribution to dimer stability or instability. Switching amino acid identity from PHE67 to ARG67 in the 1A7W histone fold reduces the correlation between distance and hydrophobicity by 3% when calculated over the carboxyl half of the fold sequence. Over one-quarter of the sequence, the reduction is 10%, thereby destabilizing the dimer. Figure 9 shows that the CE structural alignment of the α-carbon atoms of the 1A7W histone fold and the H3 fold of 1ID3 yields an exceedingly close alignment of the corresponding archaeal PHE67 and ARG131 eukaryotic amino acid side chains. The terminal amines of the ARG131 side chain are seen to penetrate the PHE67 cyclic ring. The distance between the α-carbons is 3.2 Å, whereas the distance between the side-chain centroids, the amino acid locations upon which the calculations are based, is 1.2 Å. Figure 6 shows ARG131 pointing away from the four-helix fold and toward the interior of the dimer, thereby destabilizing the eukaryotic dimer. PHE67, at the corresponding spatial location of the archaeal dimer, performs an opposite role by stabilizing the dimer due to its reversed hydrophobic attribute. A similar spatial correspondence is found for the archaeal and eukaryotic side chains at other correspondingly aligned locations. Consequently, a reversed role is played by the two previously cited lysine replacements from archaea to eukarya. Since the lysines point away from the archaeal dimer interior and thereby stabilize the conformation, their replacements by considerably hydrophobic residues would be destabilizing in the eukaryotic dimer.

Figure 9
figure 9

Superposition of the ARG131 side chain of the 1ID3 histone with the PHE67 side chain of the 1A7W histone, from CE alignment of the α-carbon atoms of the histones.

The present paper has focused on relationships between the histones belonging to two different biological domains for which three-dimensional structures are available. Multiple sequence alignment was presented for these five sequences. Alignments can also be performed that include the archaeal sequences and multispecies H3 histone sequences for which no structural information is available. This has been implemented with use of the sequences listed in the H3 histone sequence database: http://research.nhgri.nim.gov/histones/complete.shtml.

Incorporating the highly conserved sequences in this database for which there is no truncation of residues at the terminal end or carboxy end of the protein, one obtains alignments comparable with the archaeal sequences as shown in Fig. 7. Multispecies information enables one to investigate changes in the hydrophobic character of amino acids at corresponding locations in the transit up to the mammalian animal kingdom. With this, one might presume, for example, that for the evolutionary journey of the residues of Arg66 and MET67 of 1A7W and 1B67, respectively, which involves eventual replacement to an isoleucine in the mammalian sequences, the transition may well have occurred through replacement by a leucine in the ancestral sequences of the fungi as observed in Fig. 7 for the sequence of 1ID3.

Summary

If archaeal histones are the ancestral homologues of their eukaryotic counterparts (Pereira and Reeve 1998; Soares et al. 2003; Reeve et al. 2004, and references therein), then the transition from homo- to heterodimer is a consequence of evolution. The present calculations suggest that this may well have involved amino acid selection that optimized the trade-off between two apparently mutually exclusive requirements, namely, the burial of hydrophobic residues within the interior of the H3–H4 dimer, thereby contributing to dimer stability, and the burial of hydrophobic residues within the interior of the H3–H3′ four-helix bundle, thereby contributing to tetramer stability. Amino acid modification in this carboxyl region of the H3 fold can have a number of other consequences as well. It is well known that differences in the amino acid sequence of histones that affect the dimer–dimer interfaces have the potential to modulate the unraveling of chromatin during DNA replication (Akey and Luger 2003). An H3 mutant, at the tetramer interface of the H3–H4 dimers, has been shown to destabilize the H3–H3′ four-helix bundle responsible for tetramer stability (Banks and Gloss 2004). The difference in the binding affinity to DNA of two closely related archaeal histones is also apparently determined by differences in the C-terminal region of their α-helices, the region of the dimer–dimer binding of the tetramer (Bailey et al. 2002). The present paper suggests that a trade-off in sequestration of hydrophobic residues within the dimer interior and interior of the H3–H3′ four-fold helix bundle may well have been one of the number of selective advantages fixed in this region of the fold during the evolution from homo- to hetero-dimer.