Introduction

Chemical shifts are often highly amino acid specific and can provide a strong indication of polypeptide secondary structure (Wishart et al. 1992; Wishart and Sykes 1994; Thanabal et al. 1994). Evaluation of chemical shifts typically relies on subtraction of representative random coil chemical shifts (RCCS) from an observed chemical shift in order to provide a secondary chemical shift (Δδ) independent of the identity of the amino acid. The Δδ values of Hα, Hβ, Cα, Cβ and C′ nuclei are particularly sensitive to secondary structure (Berjanskii and Wishart 2007) and are frequently used either directly as restraints in structure calculations (Kuszewski et al. 1996) or in calculation of chemical shift index (CSI) parameters used for identifying regions of secondary structure (Wishart et al. 1995a). In structural biology, chemical shifts are most often used to highlight regions of protein secondary structure and are implemented during structure calculation protocols in parallel with other distance, orientation or dihedral angle based restraints (Evans 1995).

Since the first report that chemical shifts are systematically affected by polypeptide secondary structure by Dalgarno et al. (1983), the relationship between chemical shifts and secondary structure has been extensively studied through empirical correlations (Szilagyi and Jardetzky 1989; Zuiderweg et al. 1989; Wiliamson 1990; Williamson et al. 1992; Iwadate et al. 1999; Osapay and Case 1991; Spera and Bax 1991; Beger and Bolton 1997; Le and Oldfield 1994), determined through probabilistic methods (Lukin et al. 1997; Wang and Jardetzky 2002), quantum mechanically (Xu and Case 2001; Jiao et al. 1993; Case 1998, 2000), and derived experimentally (Wüthrich 1986; Thanabal et al. 1994; Merutka et al. 1995; Wishart et al. 1995a; Schwarzinger et al. 2001, 2000). In cases with complete chemical shift assignment, including 13C and 15N nuclei, bioinformatics approaches such as TALOS (Cornilescu et al. 1999), TALOS+ (Shen et al. 2009a) and PREDITOR (Berjanskii et al. 2006) use databases to produce comparable protein fragments, in some cases using homology considerations, to define ϕ and ψ dihedral angles. DANGLE extends these approaches through inclusion of Bayesian inference to improve sampling of less populated regions of Ramachandran space (Cheung et al. 2010). Algorithms which build upon these concepts for complete protein structural restraint generation by chemical shifts alone include CHESHIRE (Cavalli et al. 2007), CS-ROSETTA (Shen et al. 2008) and CS23D (Wishart et al. 2008), with CS-ROSETTA recently extended to support cases of incomplete chemical shift assignment (Shen et al. 2009b). Although these methods do not explicitly account for variation in dielectric constant (ε) or environment between protein interior and exterior, the chemical shift comparison databases inherently consist of structural fragments in a wide variety of environments.

With incomplete chemical shift assignments, i.e., if isotope labeling is infeasible, or if the direct comparability of the protein database employed for dihedral angle prediction to the protein studied is questionable, Δδ and CSI type approaches based on comparison to RCCS datasets are employed. Our ability to accurately assess secondary structure from RCCS is dependent on solvent conditions (Avbelj et al. 2004; Plaxco et al. 1997; Thanabal et al. 1994), effects of nearest neighbours (Schwarzinger et al. 2001; Wishart et al. 1995a), temperature (Merutka et al. 1995; Tonan and Ikawa 2003), and spectral referencing (Wishart et al. 1995b). In studies examining empirical relationships between chemical shifts and secondary structure, typically derived from datasets of globular proteins studied in aqueous conditions, it is difficult to untangle the effects of environmental differences between nuclei exposed to aqueous solution or buried in the globular core of a protein versus the effects of polypeptide backbone structural differences. Furthermore, all currently available experimentally measured RCCS datasets have been determined in aqueous solution.

The solvent environment can perturb chemical shifts of solutes by a number of mechanisms, including effects of ε. Dielectric constant affects chemical shifts by altering the transmittance of the electrostatic field (Sondergaard et al. 2008). Other mechanisms by which solvent environment can affect chemical shifts include modification of preferred bond torsion angles, differences in hydrogen bonding with the solvent, van der Waals interactions, ring currents, and electric charge (Hass et al. 2008). Solvent-induced modification of protonation state at ionizable sites would also be expected to perturb chemical shifts of nearby nuclei (Kim et al. 2009). These other properties are solvent-specific and are not always correlated to the ε of the solvent. Given that chemical shifts are influenced by solvent (Shenderovich et al. 2001; Tonan and Ikawa 2003), there is a strong incentive to ensure that the environment used to determine RCCS matches that in which the polypeptide or protein is being studied. In particular, membrane proteins or fibrillar self-assembled proteins are found in liquid crystalline environments with substantially lower ε than water. Furthermore, the hydrophobic core of a globular protein is better represented as a region of low ε in comparison to water (GarciaMoreno et al. 1997).

In this paper, we present RCCS values determined in non-aqueous environments and examine the effects of perturbation of solvent environment on our ability to directly correlate Δδ to secondary structure. A set of 21 random coil peptides of sequence GGXAGG, where X is any of the 20 naturally occurring amino acids or the modified amino acid 4-hydroxyproline, was studied in two different media with ε lower than water. Dimethyl sulfoxide (DMSO; ε = 47.5) was used as a mimic of the bilayer/water interface (ε ~ 40) (Brockman 1994; Koehorst et al. 2008). A ternary solvent mixture (hereafter referred to as the trisolvent system; theoeretical ε ~ 37.8) composed of chloroform, methanol, and water in a 4:4:1 ratio (by volume) was also investigated due to its membrane mimetic properties (Slepkov et al. 2005). Notably, it is not clear which solvent perturbation effect(s) of those mentioned above would be most significant in the DMSO or the trisolvent system. However, since previous structural studies have successfully demonstrated both of these solvents to be reasonable membrane mimetics (reviewed in Rainey et al. (2006)), providing a good approximation of both the lower ε and decreased availability of H-bonding donors and acceptors in a membrane environment, RCCS values determined in these environments should also be applicable in a membrane environment. The python program CS-CHEMeleon, web-mounted at http://structbio.biochem.dal.ca/jrainey/CSChem, was implemented to allow rapid analysis of sizeable sets of globular and membrane protein structures solved by NMR methods with published chemical shifts. This approach allowed comprehensive statistical comparison of the relative ability to predict protein secondary structure in membrane and non-membrane proteins using Δδ values derived from our sets versus three other sets of RCCS values, providing the first differential analysis (to our knowledge) between these classes of proteins.

Materials and methods

Materials

9-Fluorenylmethoxycarbonyl (Fmoc) protected amino acids, Rink Amide AM Sure Resin (0.65 mmol/g loading) and coupling reagents were obtained from AAPPTec (Louisville, KY), except for Fmoc protected 15N-labelled Gly (Cambridge Isotope Laboratories, Andover, MA) and Ala (C/D/N Isotopes, Pointe-Claire, QC). N,N-dimethylformamide (sequencing grade) and acetonitrile (high performance liquid chromatography (HPLC) grade) were obtained from Fisher Scientific (Ottawa, ON). The deuterated solvents DMSO-d6 and methanol-d3 (CD3OH) were acquired from C/D/N Isotopes while chloroform (CDCl3) was obtained from Sigma–Aldrich (Oakville, ON). The chemical shift standard 2,2-dimethyl-2-silapentane-5-sulfonic acid (DSS) was obtained from Wilmad (Buena, NJ). All other chemicals were purchased as biotechnology, high performance liquid chromatography (HPLC) or reagent grade, as appropriate, from Sigma–Aldrich. All reagents and chemicals were used without further purification, unless otherwise specified. NMR samples were prepared in either 5 mm O.D. Wilmad 535-PP-7 or -8 tubes (DMSO and aqueous samples) or 535-TR-8 screw cap tubes (trisolvent mixture).

Peptide synthesis and purification

Peptides with sequence Ac-Gly-Gly-X-Ala-Gly-Gly-NH2 (X being any of the 20 L-amino acids or the modified amino acid 4-hydroxyproline) were synthesized at ~0.2 mmol scale on a semi-automatic solid-phase peptide synthesizer (Endeavor 90, AAPTec) on Rink AM resin. Protocols were as outlined in Langelaan et al. (2009) with the exception that peptides were N-terminally acetylated with anhydrous acetic anhydride (5 eq. to resin) and the cleavage cocktails used were appropriate for the side-chain protecting groups in a given peptide (Guy and Fields 1997). Reverse phase HPLC (Beckman System Gold, Fullerton, CA) purification was performed using a C18 column (5 μm particle, 120 Å pore size, 10 × 250 mm AAPPTec Spirit) at a flow rate of 3.0 mL/min. A linear water/acetonitrile (A/B) gradient was used from 98%A/2%B to 60%A/30%B over 25 min. Peptide identities and purities were confirmed by NMR spectroscopy.

Circular dichroism (CD) spectropolarimetry

Far-ultraviolet (far-UV) CD spectropolarimetry (J-810, Jasco, Easton, MD) was performed in 2,2,2-trifluoroethanol (Sigma, 99% non-deuterated NMR grade) for 5 peptides (peptide concentration determined by UV absorbance at 210 nm using a 1.0 cm path length quartz cuvette (Hellma, Müllheim, Germany) on a diode array spectrophotometer (Hewlett Packard, 8452A) for ellipticity normalization) with X = G, I, M, P, and V at 25°C (controlled with a NESLAB RTE-111 bath, Thermo Scientific, Newington, NH). Three repetitions (190–260 nm, 1 nm steps, 20 nm/min) were performed and averaged for all trials of each peptide in a 0.1 mm path length quartz cuvette (Hellma, Müllheim, Germany). All spectra were blank subtracted, treated with a weighted sliding-window average and converted to mean residue ellipticity [θ] as described previously (Langelaan et al. 2009).

NMR spectroscopy

Experiments were performed at 298 K either on the Nuclear Magnetic Resonance Research Resource (NMR-3, Dalhousie University) 11.7 T Avance II spectrometer (Bruker Canada, Milton, ON) equipped with a 5 mm broadband observed (BBO) probe or the National Research Council Institute for Marine Biosciences (NRC-IMB, Halifax, NS) 16.4 T Avance III (Bruker Canada) spectrometer equipped with a 5 mm indirect detection TCI cryoprobe. Samples (10 mM peptide, 5 mM DSS, 600 μL) were prepared in DMSO-d6 or CDCl3:CD3OH:H2O (4:4:1 by volume; mixture pH 5.5 ± 1). An aqueous sample of the Hyp peptide was prepared using the conditions of Wishart et al. (1995a). Spectra collected in DMSO are reported indirectly referenced to a value of 0 ppm for aqueous DSS using intermediate trimethylsilane (TMS) shifts (1H (Hoffman 2006), 13C (Hoffman 2003; Wishart et al. 1995b)). Shifts in the trisolvent system (and of the Hyp in aqueous solution) are reported relative to internal DSS at 0 ppm. The dielectric constant of the trisolvent system (ε s ) was estimated using a combination of equations published by Abraham et al. (1966) and Amirjahed and Blake (1975):

$$ \varepsilon_{s} = \sum {{\frac{{\varepsilon_{n} - 1}}{{\varepsilon_{n} + 2}}}} M_{n} $$
(1)

where ε n is the dielectric constant of solvent n with mole fraction M n and the sum is carried out over all components of the solvent mixture.

Unless otherwise specified, experiments were performed at 11.7 T (details in Table 1). In cases where 1D 1H NMR experiments were ambiguous, 1D nuclear Overhauser effect spectroscopy experiments (0.5 or 1 s mixing time) with 64 scans were performed by irradiation of the X amino acid Hα proton for identification of amide protons. A combination of distortionless enhancement by polarization transfer with modification for the detection of quaternary nuclei (DEPTQ-135; (Burger and Bigler 1998)) and 2D 13C-1H sensitivity-enhanced heteronuclear single quantum coherence (HSQC; (Farrow et al. 1994)) allowed accurate 13C shift assignment for all 21 peptides in all conditions. The 15N-labelled Gly and Ala peptides were analyzed at 16.4 T. Sensitivity-enhanced 15N-1H HSQCs (Farrow et al. 1994) were used to assign the labeled X-position amino acid 15N and 1H chemical shifts. 2D total correlation spectroscopy (TOCSY; 60 ms DIPSI-2 mixing time) was used to assign Hα and Hβ for Ala and Hα for Gly. All 1D experiments were processed and analyzed using TopSpin 1.6 (Bruker) and 2D experiments were processed using NMRpipe (Delaglio et al. 1995) and analyzed using Sparky 3.115 (Goddard and Kneller 2008).

Table 1 Details of NMR experiments performed

Comparative evaluation of random coil chemical shift tables

The DMSO and trisolvent system chemical shifts presented in this paper were compared to three sets of published aqueous chemical shifts, two derived experimentally and one by probability-based methods, for their ability to assess secondary structure in proteins. All 33 transmembrane (TM) protein NMR-STAR files currently in the BioMagResBank (BMRB: http://www.bmrb.wisc.edu/ (Ulrich et al. 2008)) along with their matching structural data from the protein data bank of transmembrane proteins (PDB_TM: http://pdbtm.enzim.hu/ (Tusnady et al. 2005)) were acquired (listed in the Supplementary Material). Chemical shifts and structures for a randomly selected set of 107 non-membrane proteins whose structures were determined in aqueous conditions (AQ) were acquired from the BMRB and the Protein Data Bank (www.pdb.org (Berman et al. 2000)), respectively. All PDB files with less than 10 models were eliminated; 2 cases (i.e., 4 structures out of 107) of identical proteins having differing structures and chemical shift were retained (PDB entries 2KAX and 2KAY; 1YZA and 1YZC), and an instance of protein structures for a pair of isoforms having different lengths and C-terminal extensions was retained (PDB entries 2YSE and 2ZAJ). Beyond these 6 cases, the remaining 101 proteins were distinct based on pairwise comparison of sequence identity, with <75% identity over 20 amino acid stretches (excepting a pair of matrix metalloproteinases, PDB entries 1YCM and 2JNP, which have <64% pairwise identity over their entire lengths). Statistical examination was performed on concatenated datasets containing all proteins and structural models of each class (TM or AQ) rather than on individual NMR-STAR or PDB files. NMR-STAR files not referenced to DSS, with the assumption that those with unspecified reference were referenced to DSS, were indirectly referenced to DSS when the standard was trimethylsilyl-2,2,3,3-tetradeuteropropionic acid (TSP), TMS, or H2O for 1H and TSP, TMS, or dioxane for 13C (Hoffman 2006, 2003; Wishart et al. 1995b).

For the purpose of this study, the structure at a given amino acid residue “i” was determined from the average ϕ and ψ dihedral angles of the NMR-derived structural ensemble. For cases of localized averaging, a 5-residue sliding window average (residues i − 2 to i + 2) averaged over all ensemble members was employed. The consensus was deemed the “true” structure of the peptide or protein. Backbone dihedral angles were calculated and the regions defining various secondary structures were geometrically derived from those of Lovell et al. (2003) (exact boundaries detailed in the Supplementary Material). The reliability of local structural averaging was verified against 10 randomly selected proteins from the AQ dataset by qualitative comparison of our secondary structure output (helix, sheet, or coil) to that of PROMOTIF v1.0 (Hutchinson and Thornton 1996).

The chemical shift based secondary structure prediction (i.e., helix, coil or sheet) for each residue was determined by comparison of the magnitude and sign of the Δδ relative to a specified threshold for a given nucleus, with Δδ values calculated including the correction factors of Schwarzinger et al. for nearest neighbour effects (Schwarzinger et al. 2001). Optimized thresholds for nuclei were determined iteratively for each RCCS table, with calculation of agreement using thresholds of −1 ppm to +1 ppm (0.01 ppm interval) for Hα and of −2 to +2 (0.1 ppm interval) for Cα and Cβ to obtain the best agreement between PDB file based dihedral angles and Δδ for the full AQ dataset (the TM dataset contained too few amino acids). Consensus Δδ-based predictions were defined as follows, based on the nuclei out of Hα, Cα, and Cβ that are reported in an NMR-STAR file. If at least two of these nuclei have chemical shifts provided, the consensus prediction required agreement of at least 2 Δδ-based threshold tests. If there was no consensus, then the amino acid was designated as part of a coiled structure. Local averaging, when employed, was carried out identically to the local structural averaging using a 5-residue sliding window. The numbers of correct and incorrect predictions were compared for each threshold during iteration and the ‘correct-incorrect’ predictions were normalized to a percentage. The optimal threshold for that atom’s structure type was at the highest normalization value. The standard thresholds of 0.1 ppm for 1H and 0.7 ppm for 13C introduced by Wishart et al. (1992) were used alongside the optimized thresholds determined for each RCCS dataset during analysis. The accuracy of RCCS assessment of secondary structure was expressed as a total percentage of correctly assessed structure for both the AQ and TM datasets, subdivided by RCCS set, atom type and secondary structure threshold. Glycine and proline were not used for comparison since they have different Ramachandran angle preferences from the other amino acids in the same secondary structures (Lovell et al. 2003) and also have a greatly elevated Δδ threshold (Wishart et al. 1992).

CS-CHEMeleon implementation

For data analysis, a python 2.5 program named CS-CHEMeleon was written. This has been configured to run from a web-based graphical user interface (freely available at http://structbio.biochem.dal.ca/jrainey/CSChem). Using a chemical shift file uploaded in NMR-STAR format (Ulrich et al. 2008), Δδ values are calculated directly using the RCCS dataset(s) specified by the user. By default, two experimentally determined aqueous chemical shift datasets (Schwarzinger et al. 2000; Wishart et al. 1995a), one aqueous probability-based RCCS table (Wang and Jardetzky 2002), and the DMSO and trisolvent tables presented herein can be used. Any desired alternative RCCS table may also be uploaded and used. Δδ values may be determined with or without accounting for nearest neighbours, using the correction factors published by Schwarzinger et al. (2001). Evaluation and comparison of Δδ values may be performed graphically within the web browser and/or Δδ values may be downloaded in ASCII format for offline analysis. The user also has the option to only assess secondary structure for residues inside or outside the membrane as defined in the associated extensible markup language PDB_TM file when investigating TM proteins.

CS-CHEMeleon also calculates ϕ and ψ dihedral angles upon demand for an uploaded NMR structural ensemble (or single structure) in PDB file format, allowing comparison between the secondary structure given by the region of the Ramachandran plot (Lovell et al. 2003) and the predicted secondary structure from a given RCCS table using the structural thresholds of Wishart et al. (1995a) or the optimized iterated thresholds presented here. The user can also make use of local sliding window structural averaging (in either the NMR-STAR, PDB or both files) if the assessment yields poor correlation to the PDB structure. The number of residues averaged (recommended 5) and the minimum proportion of agreement for a consensus definition over the window (recommended 0.51) are both defined by the user.

Results and discussion

Verification of peptide random coil character

The established random coil peptide series of Wishart et al. (1995a), with sequence Ac-GGXAGG-NH2, was used herein. In order to ensure that random coil character was maintained in organic solvent conditions, far-UV CD spectropolarimetry was performed on a sample set of 5 peptides containing a variety of X amino acids (Fig. 1). Because DMSO is not a suitable solvent for CD due to absorbance in the far-UV region and because of incomplete solubility of all 21 peptides in the trisolvent system, the α-helix inducing solvent trifluoroethanol (TFE) (Merutka et al. 1995) was used. All five peptides, including some with X residues having high helical propensity in aqueous (Blaber et al. 1993) or membrane-mimetic environments (Li and Deber 1994), show a strong negative band at ~198 nm characteristic of a random coil (Greenfield 2004) versus the positive band expected of a helix or sheet structure. Weak negative ellipticity in the ~210–230 nm region is also observed in most of these peptides, with varying strengths. The CD banding in this region is not indicative of sheet or helix (Greenfield 2004), particularly in the absence of a positive band at ~200 nm. Also informatively, the GGPAGG spectrum (most likely of all of the peptides to have polyproline-II character) contains no band structure indicative of polyproline-II character (Chellgren and Creamer 2004; Rath et al. 2005). We cannot explicitly rule out the weak negative ellipticity at 210–230 nm being caused by increased favourability of intramolecular or intermolecular interactions within or between peptides in TFE, relative to aqueous conditions. However, since all peptides examined had predominantly random coil CD characteristics even in an α-helical structure-inducing medium, we feel that the Ac-GGXAGG-NH2 sequence, which was random coil in aqueous medium (Wishart et al. 1995a), should serve as a valid random coil model system in the non-structure inducing DMSO and trisolvent system environments.

Fig. 1
figure 1

Far-ultraviolet circular dichroism spectra of five random coil peptides (sequence Acetyl-GGXAGG-NH2, with X given in legend) in 2,2,2-trifluoroethanol. The shown spectra are averages of three blank subtracted trials with ellipticity normalized to relative concentrations using UV–Vis absorbance at 210 nm, with a weighted 3 nm sliding-window average applied (detailed in “Methods”)

Random coil chemical shifts

The complete set of RCCS measured in DMSO is reported in Tables 2 and 3 and in the trisolvent system in Tables 4 and 5. Note that only 10 of the 21 random coil peptides dissolved in the trisolvent system with no discernable pattern in solubility. For comparison, the chemical shifts for Hyp measured using identical conditions to Wishart et al. (1995a) are: Hα 4.53; Hβ 2.35 and 2.06; Hγ 4.62; Hδ 3.79 and 3.57; Cα 61.56; Cβ 39.76; Cγ 72.5; and, Cδ 57.3. Although DSS was incorporated as an internal standard in the DMSO samples, DSS and DMSO interact. Chemical shifts in these samples were therefore indirectly referenced to DSS in water using published reference values (Hoffman 2003, 2006; Wishart et al. 1995b) to provide direct comparability for experimental data acquired in aqueous solution using the accepted biomolecular chemical shift standard of DSS (Wishart et al. 1995a). In contrast, the shifts reported in the trisolvent system were internally referenced to DSS since there is no straightforward way to indirectly reference in a ternary solvent system. Phase separation is noticeable in this solvent mixture (Slepkov et al. 2005), so it is likely that DSS referencing is very similar to that of DSS in water since DSS would be most soluble in water-rich components of the mixture. However, this uncertainty in referencing should be taken into account when employing the trisolvent-derived shifts. Ideally, RCCS would also be determined in a solvent of much lower ε for direct comparability to phospholipid tail-group regions or in the core of a globular protein. However, examination of a variety of solvents with lower ε (~4–20) demonstrated uniformly extremely poor solubilization of the random coil peptides. Designing a more hydrophobic, but still random coil, peptide would likely be required in order to obtain RCCS in a low ε medium.

Table 2 1H random coil chemical shifts for peptides of sequence GGXAGG measured in dimethyl sulfoxide
Table 3 13C random coil chemical shifts for peptides of sequence GGXAGG measured in dimethyl sulfoxide
Table 4 1H random coil chemical shifts for peptides of sequence GGXAGG measured in methanol:chloroform:water (4:4:1 by volume)
Table 5 13C random coil chemical shifts for peptides of sequence GGXAGG measured in methanol:chloroform:water (4:4:1 by volume)

The Ac-GGXAGG-NH2 peptide series allows determination of the effect of solvent environment in terms of decreased ε and the other perturbation factors discussed in the introduction upon RCCS. These RCCS are most directly comparable to the experimental aqueous RCCS dataset of Wishart et al. (1995a), since the same peptide series was employed. Comparison was also performed to the experimental aqueous RCCS dataset of Schwarzinger et al. (2000) and to the statistically derived amino acid chemical shift dataset of Wang and Jardetzky (2002). RCCS perturbations are evident to different degrees for different amino acids and, in some instances, for different RCCS comparison sets (Fig. 2). Perturbations relative to aqueous or statistically derived RCCS may be presumed to be arising from decreased ε, from changes to favoured dihedral angle ranges within a given random coil peptide from other non-ε derived properties of the solvent and, in the case of ionizable residues or H-bonding donor/acceptor side-chains, from protonation or H-bonding state. The aliphatic residues, for example, tend to be strongly perturbed in DMSO. Ionizable or H-bonding donor/acceptor side-chains, such as Asn and Gln, are also strongly perturbed in DMSO. As would be expected (Quirt et al. 1974), Cα is also strongly affected in the acidic residues, which are clearly protonated on the side-chain carboxylic acid in DMSO (Table 2) but typically deprotonated in aqueous conditions. Hα and Cα in DMSO generally (but not uniformly) experience opposite trends relative to aqueous RCCS, with Hα being deshielded while Cα is shielded. For both nuclei, more than half of the amino acids’ random coil values change by more than their respective structural thresholds defined by Wishart et al. (1995a), which should therefore be defined as a significant change.

Fig. 2
figure 2

Chemical shift differences between random coil Hα and Cα nuclei determined in high (aqueous) and intermediate (DMSO or trisolvent) dielectric environments. Aqueous random coil chemical shifts are from Wishart et al. (1995a) (blue), Schwarzinger et al. (2000) (red), and Wang and Jardetzky (2002) (green) except for the modified amino acid 4-hydroxyproline (Hyp), which is tabulated herein in aqueous conditions duplicating those of Wishart et al. (1995a)

In comparison, the chemical shifts determined in the trisolvent system show different trends from those observed with DMSO (Fig. 2). Although only 10 amino acids can be compared, a change in solvent from aqueous conditions to the trisolvent system does cause perturbations in both Hα and Cα chemical shifts. In comparison to the DMSO environment, a lower proportion of chemical shifts are perturbed by more than the standard thresholds in the trisolvent system and, as a set, they do not show any trend in shielding and deshielding although the theoretical ε is lower then that of DMSO. This implies that effects of ε alone are not entirely responsible for the generally larger chemical shift perturbations observed in DMSO. Since some degree of phase separation is obvious in the trisolvent system (Slepkov et al. 2005), it is possible that preferential peptide solvation in aqueous-rich phases is giving rise to a decreased difference from RCCS derived in aqueous conditions. This hypothesis, however, is at odds with the insolubility of 11/21 of the random coil peptides in the trisolvent system.

Statistical analysis of NMR structure and chemical shift datasets

The accuracy of Δδ-based secondary structure prediction, derived using RCCS in aqueous vs. DMSO and trisolvent solutions, in both AQ and TM proteins datasets was assessed by the python program CS-CHEMeleon. A Δδ based prediction was assigned either using a single nucleus type or as the consensus (i.e., agreement by at least 2) of the Hα, Cα, and Cβ Δδ values considered relative to the threshold in question. Since RCCS differ with solvent (Fig. 2), and since both DMSO and the trisolvent system are established membrane-mimetics, it was expected that the secondary structure of proteins in the TM dataset would have a higher agreement to the PBD structure when Δδ was calculated with RCCS in DMSO or the trisolvent system versus in aqueous conditions (and vice versa for the AQ dataset). Using the original thresholds for defining helix, coil or sheet defined by Wishart et al. (1995a), the most obvious observation is the incredible accuracy of Δδ in detecting helices compared to sheets and coils. This was true for both the AQ and TM protein datasets and with all five random coil tables (Table 6; Fig. 3).

Table 6 Agreement between secondary chemical shift (requiring consensus of 2 of 3 of Hα, Cα, and Cβ) based prediction of helix, sheet and coiled secondary structure in comparison to the consensus structure determined by average ϕ and ψ dihedral angles at that position in the ensemble of NMR structures
Fig. 3
figure 3

Relative predictive accuracy of secondary structure by secondary chemical shift for the indicated nucleus type for the non-membrane (AQ) and membrane (TM) protein datasets (dataset details in Table 8). The percentages of correctly predicted structure per amino acid using iteratively optimized secondary chemical shift thresholds (Table 7) were calculated as a function of nucleus (Hα, Cα, Cβ, and consensus ≥2 of the three atoms), random coil chemical shift, and secondary structure type (helix (H), sheet (S), and coil (C)). Localized averaging over 5 residues was applied for both the secondary structure derived by ϕ and ψ dihedrals and for secondary chemical shift predictions

In an attempt to increase predictive accuracy of Δδ for non-helical structure, we performed iterative optimization of the thresholds for each RCCS table and each nucleus (Table 7; detailed results in the Supplementary Material). Threshold optimization was performed on the AQ dataset by comparing the number of correctly to incorrectly assessed amino acids over a range of thresholds for each nucleus. The total number of “correct–incorrect” was normalized over the total number of predicted amino acids and the highest value was deemed the optimized value. These optimized thresholds are very different, in some cases, from the values proposed by Wishart et al. (1995a) and they vary by nucleus and RCCS dataset. This suggests that the previous thresholds may be too general. As expected, the trend of chemical shift shielding and deshielding relative to secondary structure is the same regardless of the actual threshold value on the nucleus type: Hα and Cβ experience shielding in helices and deshielding in sheets, while the opposite trend is true for Cα. Cursory examination of Table 6 implies overall higher accuracy of Δδ for predicting secondary structure in TM proteins (~87–88% accuracy) vs. AQ (~62–64% accuracy) proteins. However, the reason for this is actually the predominance of helical structure in the Hα containing portion of the TM dataset (Table 8), rather than an inherently better ability to predict TM protein structure from chemical shifts.

Table 7 Iteratively optimized thresholds for secondary chemical shift based prediction of helix, sheet, and coiled structures for the RCCS presented herein (Tables 2, 3, 4, 5) and for three published random coil sets: Schwarzinger et al. (2000) (Schwar.), Wishart et al. (1995a) (Wishart), Wang and Jardetzky (2002) (Wang)
Table 8 Number of residues in each dataset categorized by chemical shift availability per residue and classified by secondary structure (helix (H), sheet (S), coil (C)), with secondary structure determined using the consensus structure according to ϕ and ψ dihedral angles over a 5-residue local average around the residue in question

Addition of localized averaging for both the PDB file based dihedral angle test and Δδ analysis with both the optimized and original thresholds increased the overall accuracy of secondary structure prediction. For helix and sheet regions, accuracy generally increased modestly (~2–10%) for both the AQ and TM datasets, but by 20–40% for coils. This implies that structural averaging is important for use of Δδ values, particularly in identification of regions lacking defined secondary structure where individual residues may have characteristics of a secondary structure but where the segment as a whole is a coil. Only subtle differences in accuracy were observed between the original thresholds and our iteratively determined thresholds. This could be attributed to the fact that Δδ values are rather large in comparison to the thresholds for most instances of secondary structuring. Clearly, structural thresholds would be significantly more important for small values of Δδ, since these would lie closest to the boundary for prediction of coil versus structured. In these cases, the structural thresholds make a drastic difference. The predictions of sheets in both data sets were relatively poor even when using structural averaging. In comparison to helical structure, sheets are able to assume a much greater variety of backbone dihedral angles (Lovell et al. 2003), which may be a major factor in the relative inaccuracy in prediction of sheets since this should lead to increased variability in chemical shift perturbation from residue to residue. Furthermore, consideration and preferential weighting of different nuclei, such as those identified by Wang and Jardetzky (2002), may improve differentiation between sheet and coil. The addition of the capability to allow differential use of various chemical shift types to distinguish helix, coil and sheet to CS-CHEMeleon is likely in future iterations, but was not the focus of the present work.

Interestingly, there was no major improvement in accuracy when using RCCS acquired in different conditions when predicting protein secondary structure in the TM dataset versus the AQ dataset. In other words, there is no obvious correlation between the solvent environment used for RCCS determination and protein secondary structure. The RCCS providing the best overall predictive accuracy were those derived in DMSO and those determined in 8 M aqueous urea by Schwarzinger et al. (2000) (Table 6), while the others fell close behind. A study by Mielke and Krishnan (2004), evaluating only non-membrane proteins, came to the same conclusion about the applicability of the chemical shifts of Schwarzinger et al. versus other aqueous RCCS tables.

Although it would be most logical to correlate RCCS environment to the environment of the studied protein (i.e., DMSO or trisolvent with TM proteins and the table of Schwarzinger et al. for AQ proteins), our statistical analysis suggests that this is not actually the case. It is possible that RCCS derived in solvents with much lower ε or with complete lack of H-bonding capability, given an appropriately engineered peptide series, would provide significant improvement in secondary structure prediction for membrane proteins or the hydrophobic core region of globular proteins. However, the apparent insensitivity of Δδ-based prediction to fairly dramatic changes in RCCS does not provide a strong incentive to perform these studies. Furthermore, helical regions are already very well predicted, and there may be no major improvement in ability to predict sheet or coil regions, even with a set of chemical shifts derived in such an environment. Based upon our findings, the optimal method for any type of protein being studied in an aqueous or non-aqueous environment is to perform a comparison of the Δδ values derived using both the DMSO-based table and the table of Schwarzinger et al. (2000) in order to determine a consensus for secondary structure prediction. The comparison should also be made with more than one nucleus if possible. This type of comparative analysis is readily possible using CS-CHEMeleon.

Conclusions

We have determined the 1H and 13C chemical shifts of 21 hexapeptides with sequence Ac-GGXAGG-NH2, where X is one of either the 20 naturally occurring amino acids or the modified amino acid 4-hydroxyproline, demonstrating significant differences in RCCS between aqueous environments and intermediate ε environments. Structural studies of TM proteins that use Δδ values for fast assessment of secondary structure have used RCCS measured in aqueous environments, but the predictive accuracy of Δδ utilization has never been evaluated over a range of solvent environments. In this paper, we provide evidence that, although Δδ values themselves are affected, the solvent in which the RCCS were measured does not significantly affect the prediction of secondary structure. Rather, the type of secondary structure is a major factor in the agreement between Δδ-based prediction and experimental secondary structures. For a thorough assessment, all three major nuclei (Hα, Cα, Cβ) should be considered as well as the type of secondary structure being evaluated. Although Δδ use is well suited to an overall estimate of protein structure, the evidence presented herein of a bias towards helices with all RCCS datasets provides a strong incentive to employ alternative restraints during structure calculation and to ensure that Δδ-based restraints are not overrepresented in the energy expression being used for structure calculation. Choice of the Δδ threshold for helix versus coil or sheet also may slightly improve overall secondary structure prediction accuracy. Furthermore, optimal accuracy in secondary structure prediction is probably obtainable by comparison of Δδ obtained with the table presented herein in DMSO and the table determined in aqueous conditions by Schwarzinger et al. (2000). The web-based software CS-CHEMeleon, introduced herein, provides a rapid and versatile method to allow such a comparative analysis.