Introduction

Biomolecular structures at atomic resolution are crucial for interpreting cellular processes in a molecular context. In addition, they serve important roles in drug discovery and functional industrial design, such as the modification of enzyme properties. For all these applications, it is imperative that the biomolecular atomic structures are accurate, precise and truthfully reflect the experimental data on which they were based.

The Protein Data Bank (PDB) (Berman 2008; Bernstein et al. 1977) is the primary repository of the atomic coordinates of three-dimensional (3D) biomolecular structures. It currently contains more than 91,000 entries, which cover proteins, oligonucleotides and their complexes, including small-molecule ligands. The entries solved by Nuclear Magnetic Resonance (NMR) represent approximately 11 % of the total (>10,000 entries). The PDB archive is jointly managed by four partner organisations (RCSB PDB (Berman et al. 2000), PDBe (Velankar et al. 2012), PDBj (Kinjo et al. 2012) and BMRB (Ulrich et al. 2008) under the aegis of the wwPDB (Berman et al. 2007) consortium.

A series of erroneously modelled NMR structures (Clore et al. 1995; Lambert et al. 2004; Nabuurs et al. 2006; Spadaccini et al. 2006) and cases of outright scientific fraud (Borrell 2009) with X-ray derived structures underscore the need for dedicated tools to assess the structural quality of biomolecular structures as well as the agreement with the experimental data. Moreover, the 3D structure and dynamic properties of the biomolecules can change in response to interactions with other molecules and hence it is also imperative to carefully assess the accuracy of the structures.

Structure validation typically encompasses two broad aspects: the agreement of the experimental data with the resulting structure and a geometric validation. In order to calculate the agreement with the experimental data, a theoretical description relating the data to the atomic coordinates is required. These relations are typically also used during the structure calculation procedure to drive the convergence and hence the assessment only conveys the degree to which the structure was calculated properly. If, however, the data are internally inconsistent, this will typically result in statistically poor or unusual distributions of related structural parameters (vide infra). More independent measures are based upon cross-validation methods (Brunger et al. 1993; Clore and Schwieters 2006; Nabuurs et al. 2005; Tjandra et al. 2007) that exclude a fraction of the data in the structure calculation procedures.

Geometric structure validations aim to assess the quality in relation to the chemical and structural knowledge derived from relevant reference structures. Local structural parameters such as bond lengths, bond angles and torsion angles are obtained from X-ray crystallography data of small molecules and ultra-high resolution biomolecular structures (Engh and Huber 1991, 2001), whereas dihedral angle distributions are based upon a set of high-resolution X-ray structures. Clearly, there is an inherent danger that structures are evaluated with respect to an incomplete or biased reference, but it is nowadays generally appreciated that uncommon features flagged by a geometric assessment should be supported by solid experimental data (Vriend 1990; Chen et al. 2010; Bhattacharya et al. 2007; Doreleijers et al. 2012a; Nabuurs et al. 2006; Hooft et al. 1996).

Traditional and still-popular biomolecular NMR structure validation routines have relied on a limited set of tools and metrics. It has been customary to summarise restraint content using a simple count of the number of restraints, whereas it has long been known that these numbers are flawed for multiple reasons (Nabuurs et al. 2003). A recent large-scale analysis also showed great redundancy in the reported number of distance restraints (Doreleijers et al. 2009). It even proved possible to refine NMR-derived structures using random 15N-RDC values to acceptable Q-factors (Bax and Grishaev 2005). PROCHECK-NMR (Laskowski et al. 1996) has been the accepted choice for the assessment of the geometrical quality of NMR ensembles, in spite of it long being out-dated. High percentages of residues in the most favoured Ramachandran plot regions reported by PROCHECK-NMR were commonly regarded as an assurance of a good quality structure, but recent assessments have shown this to be invalid (Doreleijers et al. 2012a, b). Tools designed for X-ray crystallography, such as WHAT IF (Vriend 1990; Hooft et al. 1996) or Molprobity (Davis et al. 2007; Chen et al. 2010), can also be used for NMR-derived structures. However, NMR-specific properties such as the presence of multiple models in one structural ensemble and the potential dynamical aspects represented in this ensemble, often present problems that are not accommodated by these programs. In particular, most of the routines also fail to adequately address the validation of ‘ensembles of ensembles’, where the computational protocols simultaneously aim to treat both the structural model and the available dynamical data (Montalvao et al. 2012; Lindorff-Larsen et al. 2005). Structure validation software dedicated to NMR-derived structures, such as the PSVS suite (Bhattacharya et al. 2007) or CING (Doreleijers et al. 2012a), typically provide solutions for the issues inherently associated with X-ray oriented tools.

Compared to X-ray crystallography, validation of NMR-derived structures is in general more complicated. Not only do the tools need to take into account the aforementioned dynamical effects and the multiple conformers in the structural ensemble, but also the nature of the NMR data, which differs vastly from the X-ray situation. Whereas the latter only concerns the reflections, which are uniform in data content, NMR-derived structures can be based on a large variety of experimental data (Vuister et al. 2011). These can be both local in nature, such as distance and dihedral restraints, global in nature, such as residual dipolar couplings (RDCs) and pseudo-contact shifts (PCS), or describe the overall shape, such as the small angle scattering (SAS) data.

A typical NMR structure calculation protocol involves a customised simulated annealing procedure, typically in torsion-angle space, where restraints are included as pseudo-harmonic potentials (Stein et al. 1997; Güntert 1998). Many additions to the basic protocol have been proposed, such as the use of database potentials (Kuszewski and Clore 2000), radius of gyration (Schwieters and Clore 2008), 15N-T1/T2-relaxation parameters (Tjandra et al. 1997), SAXS-derived potentials (Gabel et al. 2008) or ensembles consistent with S2 order parameters (Best and Vendruscolo 2004). Refinement in explicit water using a more extended force field was shown to significantly improve the structural quality. (Linge et al. 2003; Spronk et al. 2002) Inferential structure determination (Rieping et al. 2005), while computationally expensive, was shown to significantly improve the treatment of dynamical effects and provide for a more unbiased parametrisation of the underlying theoretical models (Bernard et al. 2011).

Assigned chemical shifts are arguably the most important parameters obtained from NMR experiments. They are affected by the immediate chemical environments of the nuclei and can therefore reveal important structural information by themselves, a feature exploited by the recent methods of structure determination from chemical shifts, such as CS-ROSETTA and its derivatives (Shen et al. 2008, 2009b), CHESHIRE (Cavalli et al. 2007) and CS23D (Wishart et al. 2008). However, despite the success of these methods in some cases, they have proven to be not yet fully reliable. The results of the 2010 CASD-NMR competition (Rosato et al. 2009, 2012) showed that occasionally the chemical shift derived structures were up to 12 Å RMSD away from the manually determined reference structures, in spite of their excellent geometric validation scores.

More traditionally, chemical shifts have been used to automatically derive distance restraints primarily from NOE data by programs such as CYANA (Güntert et al. 1997; Herrmann et al. 2002; Lopez-Mendez and Güntert 2006) and Aria (Rieping et al. 2007). The completeness and correctness of chemical shift assignments thus influence the correctness of the distance restraints, the convergence of structure calculation protocols and the accuracy of the final structure ensembles.

Most existing methods for validation of chemical shifts rely on statistical analysis and comparison with databases. Of the methods surveyed for this paper, PSVS uses the Assignment Validation Suite (AVS) (Moseley et al. 2004) to identify outliers, while CING uses VASCO (Rieping and Vranken 2010) for referencing correction and SHIFTX (Neal et al. 2003) for back-calculation. If the difference between the observed and predicted shift values is greater than three standard deviations CING flags the nucleus as a chemical shift outlier. Vivaldi uses VASCO to correct referencing and identify statistical outliers based on amino acid type, secondary structure and accessible surface area. For referencing correction, other existing software includes CheckShift (13C and 15N) (Ginzinger et al. 2007, 2009), LACS (13C and 1H) (Wang et al. 2005; Wang and Markley 2009) and PANAV (Wang et al. 2010a), none of which require structural data, as well as SHIFTCOR (Zhang et al. 2003), which predicts only backbone chemical shifts and requires a structure.

Back-calculation of chemical shifts from structure is a rapidly developing field. Table S1 surveys the different programs used to predict protein chemical shifts. Some of these, i.e. SPARTA (Shen and Bax 2007), SPARTA+ (Shen and Bax 2010b), CamShift (Kohlhoff et al. 2009) and CheShift (Vila et al. 2009) provide chemical shifts predictions only for backbone nuclei, some others, i.e. CH3Shift (Sahakyan et al. 2011a) and ArShift (Sahakyan et al. 2011b) calculate side-chain values, and still others, i.e. SHIFTS (Xu and Case 2001; Moon and Case 2007), SHIFTX2 (Han et al. 2011), PROSHIFT (Meiler 2003), 4DSpot (Lehtivarjo et al. 2009), COSMOS (Möllhoff and Sternberg 2001; Jakovkin et al. 2012) and PPM (Li and Brüschweiler 2012) manage both. While a detailed analysis of these prediction programs is out of scope for this paper, they can be used to identify statistically unusual chemical shifts, by inspecting the differences between the predicted and measured values. However, caution needs to be taken with such an approach for the following reasons: (a) the prediction algorithms strongly depend on the accuracy of the underlying structure, and are therefore only as good as the structures are; (b) an anomalous chemical shift value is not necessarily an error, although it may require some supporting data, such as close vicinity of an aromatic group, unusual local conformation, etc.

Recognising the importance of structure validation, the wwPDB consortium has appointed special validation task forces (VTF) for X-ray crystallography (Read et al. 2011; Gore et al. 2012), electron microscopy (EM) (Henderson et al. 2012), NMR (Montelione et al. submitted) and SAS (Trewhella et al. 2013) methods. These VTFs will define a set of criteria and tools, which will be used at the time of deposition to assess the quality of the structural model, the intrinsic quality of the experimental data, and the fit between both. This paper reviews the tools currently available to NMR spectroscopists for evaluating the quality of their structures. We give an overview of the different checks performed by each package or program and discuss its relative merits using two examples: a small, globular monomeric protein (Staphylococcal nuclease from S. aureus, PDB entry 2kq3 (Wang et al. 2010b), S.Nase) and a small, symmetric homodimer protein [a region of human myosin-X, PDB entry 2lw9 (Lu et al. 2012)]. These two proteins present typical examples in terms of size and experimental data of the systems nowadays studied by NMR spectroscopy, and were solved by conventional triple resonance heteronuclear NMR technology. This review necessarily limits itself to the validation of protein structures and their complexes, as the tools for oligonucleotides and polysaccharides are much less developed. We also limit ourselves to testing ‘NMR-aware’ software, or at least software that does not presume the X-ray crystallographic origin of the structure.

Methods

We will first describe the different programs and tools available for validation of biomolecular NMR structures. An overview of their features is given in Table 1.

Table 1 Reviewed validation programs and supported features

PROCHECK-NMR

Historically the most popular NMR-specific validation tool (Laskowski et al. 1996), PROCHECK-NMR is not available via a web-server interface. Instead, a standalone local installation of the program is required. The program is no longer maintained and its underlying scoring database is generally considered out-dated. The program accepts PDB formatted input files and experimental restraints in the Aqua format, also no longer maintained and incapable of handling ambiguous restraints. The output of the program is presented as a collection of postscript-formatted files.

CING

The Common Interface for NMR structure Generation (CING) software package version 1.0 (https://nmr.le.ac.uk) (Doreleijers et al. 2012a) constitutes an integrated framework for the validation of NMR structures. CING assembles a set of experimental and structural data and generates an analysis based on the results of ~ 25 different programs and routines, both internal and external, and dependent upon the supplied input data. CING accommodates a diversity of different experimental data types, as well as handling multi-model ensembles properly in its analysis routines.

The experimental data are tested for internal consistency and agreement with the ensemble. Distance restraints are analysed for duplication, redundancy, completeness (Doreleijers et al. 2005) and information content (Nabuurs et al. 2003). RMSDs and violation analysis is reported. Dihedral restraints are analysed for violations and RMSD. RDC restraints are processed, but currently not validated.

Validation of chemical shift values is based on structural and sequence information, re-referenced using the VASCO routine (Rieping and Vranken 2010) and analysed relative to the BMRB database (Ulrich et al. 2008) and SHIFTX (Neal et al. 2003) back-calculated values. Chemical shifts are also used to assess potential cis/trans-proline errors (Schubert et al. 2002; Shen and Bax 2010a; Siemion et al. 1975), leucine side-chain conformation (Mulder 2009) and to predict ϕ, ψ dihedral angles using the program TALOS+ (Shen et al. 2009a).

The geometric quality of the 3D structure ensembles is assessed in relation to a database of reference structures using WHAT IF (Vriend 1990; Hooft et al. 1996), PROCHECK-NMR (Laskowski et al. 1996) and internal routines. Checks include those for the residue-specific Ramachandran and side-chain rotamer distributions, all dihedral angles, including the ω dihedral, packing, backbone conformations, bumps, bond lengths, bond angles and torsions. The ensemble is also analysed for secondary structure using DSSP (Joosten et al. 2011; Kabsch and Sander 1983) and for solvent accessibility, potential disulphide bridges and salt bridges.

CING uses a circular-variance based algorithm to select for ordered regions. Alternatively, chemical-shift derived S2 order parameters (Berjanskii and Wishart 2005) or user-defined regions can be used for the analysis.

CING generates a hierarchical, comprehensive, interactive HTML/Javascript-based validation report that should be thought of more as a program than as a collection of static HTML pages. The user can interact with the report in several ways using Web 2.0 Javascript functionality. The different pages of the report reflect the natural ordering of either structure or experimental data and are extensively hyperlinked. CING uses a simple Red-Orange-Green (ROG) score that directs the NMR spectroscopist to troublesome areas. The ROG scoring is dependent upon the combined analysis of all results and allows CING to summarise the important issues. A red colouring indicates some potentially serious issues, green denotes the absence of any detected issues and orange (amber) is intermediate between these two situations. In particular, CING’s so-called residue pages display the validation results in direct relation to the relevant experimental data.

The multilingual web server and a web service together are called iCing and allow for anonymous execution of CING validation runs. The iCing server (http://nmr.cmbi.ru.nl/icing/) natively accepts PDB, CYANA (Güntert et al. 1997) and CCPN (Vranken et al. 2005) formatted files for coordinate, restraint, peak list and chemical shift data.

Molprobity

Molprobity (Chen et al. 2010; Davis et al. 2007) is a validation tool evaluating and scoring several structural features. This program is available both as a downloadable stand-alone server and as a web service (http://molprobity.biochem.duke.edu), operating on standard PDB files (or released PDB entries). The latest version was released in February 2013 and is still in alpha testing. For the purpose of this review we used the latest stable version 3. Molprobity is also used by other software reviewed in this paper, all of which used version 3 at the time of writing. Although Molprobity was originally designed to tackle the structural validation of experimental X-ray protein and nucleic acids structures, NMR ensembles require no additional effort from the user. Ensembles are automatically split into single-model PDB files and each model is processed individually. However, at the moment, Molprobity has limited functionality to present combined results from these calculations.

Molprobity starts by analysing any uploaded PDB file, and checks for presence of hydrogen atoms. If needed, Molprobity uses the REDUCE module (Word et al. 1999) to create updated PDB files by introducing and/or removing hydrogen atoms as necessary and propose flips for Asn/Gln/His residues to optimise hydrogen-bonding networks. This feature, although mostly helpful for X-ray structures can be used to confirm the protonation state of modelled histidine side-chains, a common source of errors in structures submitted to the PDB archive.

Molprobity further uses several internal programs to analyse the geometrical quality of the models. Covalent geometry validation of backbone bond lengths and angles is performed by DANGLE and is based on parameters derived by Engh and Huber (1991, 2001) for proteins and Parkinson et al. (1996) for nucleic acids. Protein backbone and side-chain torsion angles are validated using internal routines based on a large set of carefully selected reference data. Backbone angles for the Ramachandran statistics are categorised in four groups, i.e. proline, pre-proline, glycine and a general group covering all other common L-amino acids. Similarly, nucleic acids are evaluated by an internal program SUITENAME to identify improbable ring puckers and unfavourable RNA backbone conformations according to Richardson et al. (2008). One additional score describing the Cβ geometry is calculated based on Lovell et al. (2003).

All-atom contact analysis is a major defining feature of Molprobity. It is performed by the program PROBE, (Word et al. 1999) which generates a list of close contacts for non-covalently bonded atom-pairs that are too close in 3D space, i.e. more than 0.4 Å closer than the sum of Van der Waals radii. An overall close contact score, ‘clash-score’, is calculated as the number of close contacts per 1,000 atoms. Molprobity finally combines the close contact score, percentage of Ramachandran outliers and percentage of bad side-chain rotamers into a highly popular single score per model in the NMR ensemble. The weights of the three scores are chosen such that the single score resembles the crystallographic resolution (in Å) at which such scores are most likely to be observed in X-ray structures.

Results are displayed as interactive web pages consisting of tables, text files containing outliers, Ramachandran plots, etc. Interactive 3D visualisation of validation scores is provided by KiNG (Chen et al. 2009). Scenes can be viewed online using the Java applet version of KiNG. Alternatively, larger scenes can be downloaded and viewed offline.

ResProx

ResProx, Resolution-by-proxy (http://www.resprox.ca), aims at providing a single model-based score that was proposed to function as an accuracy measure similar to the resolution reported for X-ray derived structures in the PDB (Berjanskii et al. 2012). In addition to this generalised resolution parameter, all individual Z-scores used for the calculation are presented in tabular form. For each unsatisfactory score, suggestions are provided on how to increase the overall structure quality and remedy the poor score in particular.

ResProx processes up to 25 measurable protein features, extracted from a multitude of auxiliary programs, in two parallel schemes to calculate two resolution estimates. A ‘decision maker’ will select the most appropriate score to present in the validation report using empirical rules. The first validation scheme uses a machine learning predictor for the resolution based on 25 protein features. This predictor was trained on a set of 2,427 X-ray derived protein structures covering a wide span of reported resolutions, and cross-referenced against a second set of 500 structures. In the second scheme, the ‘Z-mean’ metric is calculated using a linear dependence on a subset of 15 out of 25 criteria using a simple regression scheme.

Five Molprobity scores are used to assess the structure quality: Ramachandran outliers, side-chain rotamer outliers, bond lengths, bond angles and atom clashes. The program VADAR (Willard et al. 2003) contributes 11 scores, covering the validation of hydrogen bond energy through DSSP, χ1 and ω dihedral angles, and general protein packing. VADAR (http://vadar.wishartlab.com/) can also be run separately to get a more detailed view of the derived scores. It generates comprehensive text files containing tables of validation scores for every model as well as summaries for the whole NMR ensemble. Additionally, some validation scores are presented graphically in static images. GeNMR (Berjanskii et al. 2009) provides Ramachandran scores, an atom clash-score and assesses the observed radius of gyration. RosettaHoles2 (Sheffler and Baker 2010) is used to quantify the packing of the protein core. Finally, PROSESS (Berjanskii et al. 2010) is further used to evaluate hydrogen bonding and χ1 dihedral angles.

The stand-alone PROSESS server is also available at http://prosess.ca. Its output consists of multiple HTML pages presenting detailed structural validation scores in tables, graphs and static images of the protein. In addition to structural validation, PROSESS analyses chemical shifts in NMR-STAR (v2.1) format and distance restraints in Xplor format. We tested the geometry and experimental data validation with PROSESS for the two entries (2lw9 and 2kq3), after the supplied restraints files were manually reformatted to comply with the (rather) strict format requirements for PROSESS.

The ResProx web-server requires a PDB-formatted file as input. Alternatively a PDB entry code can be provided to run the ResProx validation on an entry in the PDB archive. Results are presented as simple HTML web pages.

PSVS

Protein Structure Validation Suite (PSVS (Bhattacharya et al. 2007); http://psvs-1_4-dev.nesg.org) is a versatile validation server developed by one of the groups in the Northeast Structural Genomics Consortium (NESG), the only Protein Structure Initiative (PSI) consortium with a substantial NMR component. PSVS is applicable to both X-ray and NMR structures in an effort to be able to compare structural scores directly. It combines the output from a number of programs developed by several groups, i.e. Molprobity (Davis et al. 2007; Chen et al. 2010), Verify3D (Eisenberg et al. 1997), ProsaII (Wiederstein and Sippl 2007), PROCHECK (Laskowski et al. 1993), PDB validation software (http://deposit.rcsb.org/validate), and by the Montelione group itself, i.e. PDBStat (Bhattacharya et al. 2007), FindCore (Snyder and Montelione 2005), AVS (Moseley et al. 2004) and RPF (Huang et al. 2005). PSVS checks both the geometric knowledge-based validation and the fit between the structure and the experimental data, if the latter is available. Many NMR structures feature long disordered termini or loops, which often lack long-range constraints and are not always modelled properly by the structure calculation software. PSVS accounts for this by allowing the users to specify which residues should be subject to the analysis: all, ordered as defined by circular variance (default), core as defined by the FindCore algorithm (Snyder and Montelione 2005), residues forming secondary structure elements or a custom selection. For the purpose of this review we have chosen the default option.

For geometric validation, PSVS is trained on a set of 252 X-ray structures of globular proteins of maximum 500 residues and with resolution of 1.8 Å or better, sharing at most 50 % sequence identity with each other. Each reported raw score is converted to a Z-score using the mean and standard deviation pre-calculated on the training set. In this implementation, a positive Z-score would indicate that the analysed structure is better than the typical high-resolution X-ray structure. Any negative value for a Z-score would indicate poorer than average quality parameter, a rule of thumb is that Z-scores below −3.5 point to serious problems with modelling, and would require careful analysis of the model and/or the underlying experimental data. For NMR structures, five geometric validation scores are reported as ensemble averages. These five scores are: Molprobity clash-score (Davis et al. 2007; Chen et al. 2010), which gives the number of steric clashes per 1,000 atoms, PROCHECK backbone and all dihedral angle G-factors (Laskowski et al. 1993), Verify3D score (Eisenberg et al. 1997), which gives the likelihood of the observed packing, and ProsaII score (Wiederstein and Sippl 2007), which reports on the likelihood of the observed fold. This allows for a simple and unbiased comparison between NMR and X-ray structures irrespective of the size of the protein. These overall scores are reported in the PSVS summary report, which also identifies secondary structure elements calculated by DSSP (Kabsch and Sander 1983; Joosten et al. 2011), and lists the mean RMSDs of model superposition, a number of per-residue scores, the Ramachandran statistics from both the PROCHECK-NMR (Laskowski et al. 1996) and Molprobity (Chen et al. 2010; Davis et al. 2007) and a figure visualising consistent Ramachandran outlier residues on the structure. Optionally and depending on the types of submitted data, the summary page may also contain statistics on the distance and dihedral angle restraints and their violations, completeness of the chemical shift assignments, list of atoms with unusual chemical shifts, the RPF scores (Huang et al. 2005) describing the goodness of fit between the NOESY peak lists and the ensemble of structures and a generalised RMSD score (GLM-RMSD, vide infra).

The full PSVS report assembles the output from the constituent software packages and allows a keen user to review all of them from one URL or PDF file. The overall grouping is by metric, and the user can drill down within the given page to individual models in the NMR ensemble and individual residues. Viewing all available information about a model or residue is, however, not straightforward and requires manual collation.

PSVS accepts PDB, CYANA (Güntert et al. 1997) and CNS/Xplor (Brunger 2007; Brunger et al. 1998) formatted files as input for the coordinate data. CYANA and CNS/Xplor formatted files are accepted for supplying experimental restraint data. The chemical shifts data can be uploaded as NMR-STAR files (either version 2.1 or 3.1) (Ulrich et al. 2008) whereas the format for peak files is flexible, e.g. tab-delimited, with the possibility for the user to describe the meaning of each column.

GLM-RMSD

GLM-RMSD (Bagaria et al. 2012) is a method to produce an aggregate validation score for a complete structure ensemble from the result of a number of existing programs, which was recently incorporated as part of the PSVS server. The method aims to yield an easily interpretable quality metric representing an estimate of the RMSD from the correct structure. The metric was derived using a generalised linear model based upon a number of well-established parameters: the RPF Discriminatory Power (DP) (Huang et al. 2005), Verify3D (Eisenberg et al. 1997), ProsaII (Wiederstein and Sippl 2007), PROCHECK-ϕ/ψ and all dihedral angle G-factors (Laskowski et al. 1993), Molprobity (Chen et al. 2010; Davis et al. 2007), the Gaussian Network Model (GNM) (Haliloglu et al. 1997), and the molecular size. The initial coefficients and weights for the various inputs were obtained using training data from CASD-NMR (65 structure ensembles for 16 proteins) (Rosato et al. 2009, 2012) and CASP (Moult et al. 1995, 2011). A jack-knifing procedure was used to guard against over-fitting. By successively removing input scores that were redundant or contributed little, a metric was derived that was comprised of a linear combination of only four inputs: the RPF DP score, the PROCHECK-ϕ/ψ score, the Molprobity clash-score, and the molecular size, yielding a correlation coefficient between predicted and actual RMSD values of 0.70 for all test data combined. Interestingly, this suggests that only the PROCHECK-ϕ/ψ and Molprobity scores are sufficient to evaluate the geometric quality of a structure. As 86 % of the structures with a GLM-RMSD < 2 Å were correct and 74 % of the structures with GLM-RMSD > 2 Å were erroneous, a GLM-RMSD of 2 Å was proposed as a quality cut-off. Since the RPF DP score is an important input to the algorithm it requires peak lists to obtain a result, which in turn excluded it from our practical tests for this review (vide infra).

QMEAN

The QMEAN (Benkert et al. 2009) structural quality score is comprised of six individual measures that probe local structure conformation, solvent accessibility and secondary structure. The latter is derived from both the PSIPRED score (McGuffin et al. 2000) and an analysis by DSSP (Kabsch and Sander 1983; Joosten et al. 2011).

The original QMEAN score was protein size dependent as larger proteins received higher absolute scores, which rendered its use somewhat problematic. This measure has now been superseded by a newer, normalised value QMEANnorm, which removes the dependence of the quality score on the size of the model. The QMEANnorm is now routinely reported and all QMEAN scores reported in this manuscript refer to the normalised values.

The QMEAN server (http://swissmodel.expasy.org/qmean) takes PDB-formatted files as input. These have to be supplied as individual files for each model of the ensemble collected into one .zip or .tgz archive. A FASTA sequence describing the protein can also be supplied, but did not change the outcome for the two examples discussed in this paper. QMEAN does not assess the experimental data, nor does it have provisions to determine and accommodate the unstructured regions of the molecule.

QMEAN reports an overview of its results via Email and allows the full set of results to be downloaded as a .tgz formatted archive. The archive provides both an overall QMEAN value and residue specific values for each model, as well as files that contain all the underlying data. The residue-specific QMEAN values are also reported in the Bfac column of a PDB-formatted structure file and graphically displayed as a colour-coded ribbon representation of the protein-backbone. No aggregation over the different models of the ensemble is provided; hence no assessment regarding the disparity in the ensemble is available without further user analysis.

Vivaldi

The Protein Data Bank in Europe PDBe, (Velankar et al. 2012) developed the Vivaldi service [VIsualisation and VALidation DIsplay; http://pdbe.org/vivaldi; (Hendrickx et al. 2013)] to validate NMR structures deposited in the public PDB archive. It combines a variety of validation scores from the external validation package CING (Doreleijers et al. 2012a) with internal routines to validate chemical shifts [VASCO, (Rieping and Vranken 2010)], distance restraints, dihedral restraints and residual dipolar couplings. Furthermore, it uses the OLDERADO (Kelley et al. 1996, 1997; Kelley and Sutcliffe 1997) program to cluster models of the NMR ensemble and to define the ordered core region of the protein. Chemical shifts are obtained from NMR-STAR files processed and archived at BMRB (Ulrich et al. 2008), and experimental restraints are obtained as CCPN projects available at the NMR Restraint Grid (NRG) (Doreleijers et al. 2009) database maintained by BMRB. Thus, Vivaldi does not (yet) provide for uploading and assessment of structural and experimental data by an external user.

Vivaldi utilises an interactive Java applet (OpenAstexViewer) to visualise the validation scores in 3D. In addition, per-residue graphs and textual output aids the user to assess the structural quality of an NMR ensemble.

NMR constraints analyser

The NMR Constraints Analyser (Heller and Giorgetti 2010) was explicitly designed for constraint analysis only. It is available as a web server at http://molsim.sci.univr.it/bioinfo/web/, complete with detailed documentation. The contents of the NRG FRED database maintained by BMRB (Doreleijers et al. 2009) are available to the program and can be selected easily by entering the appropriate PDB entry code. External user data can be uploaded as PDB formatted files (.pdb and .mr) with restraints in either CNS (Brunger 2007; Brunger et al. 1998) or CYANA/DYANA (Güntert et al. 1997) format. The accepted formats are well documented, but lack detailed feedback for incorrect input. In addition to the constraint analysis, the program calculates distance restraint completeness, according to procedure described by Doreleijers et al. (1999).

The output of the programs is reported as an interactive webpage that consists of three parts: a graph of the sequence with the number of constraints, the number of violations, an indicator for the presence of torsion angle restraints and the calculated completeness. In addition to these sequence dependent results, a set of tables are reported showing the restraints for one or more selected atoms and a Jmol viewer showing a ribbon diagram of the backbone, colour-coded according to the number of restraints.

Other publicly available servers

While we aimed to cover as exhaustive a list of validation software as possible, the scope of this paper necessarily limits us to testing and describing those servers that are ‘NMR-aware’ and/or aggregate scores of multiple sources. The software listed below can be adapted for NMR-structures, but that often requires specifying a model and chain identifier when running the validation task and sometimes even separating the ensemble into separate files with one model in each file. For these reasons, we did not attempt to test these servers extensively and limit ourselves to brief descriptions.

PDB Validation software (http://deposit.rcsb.org/validate/) performs basic geometry and nomenclature checks for NMR entries. Currently, it is applied to all depositions of NMR structures in the PDB archive. It is also included in PSVS and Quality Control Check.

PROSA (https://prosa.services.came.sbg.ac.at) (Wiederstein and Sippl 2007) is included in the PSVS server, which prepares the input files and averages the output from PROSA over the ensemble. As a standalone server, PROSA accepts ensembles of structures but analyses only one model at a time.

SAVES (Structure Analysis and Verification Server; http://services.mbi.ucla.edu/SAVES) combines 6 structural validation programs, with one of them X-ray specific. Only one model from the NMR ensemble is allowed at a time. Results are presented on simple web pages using colour-coding to indicate possible issues (yellow) and errors (red) with links to graphs an images.

Quality Control Check (http://smb.slac.stanford.edu/jcsg/QC) is a validation server developed by the Joint Center for Structural Genomics (JCSG) is also X-ray centric. It requires an upload of a separate file for each model of the NMR ensemble. It includes 9 validation programs, but only a subset of these (e.g. Molprobity and the PDB Validation software) are relevant to NMR structures.

WHAT IF [http://swift.cmbi.ru.nl/servers/html/index.html; (Vriend 1990)] server is not very NMR aware. Although it can handle an NMR ensemble, it does not produce any aggregated scores. Moreover, the extensive textual output produced for each member of the ensemble is difficult to analyse manually. For this reason, WHAT IF is used by both CING and Vivaldi to derive structural parameters, which are subsequently processed and analysed for the full ensemble and properly presented.

Harmony [http://caps.ncbs.res.in/harmony; (Pugalenthi et al. 2006)] server uses multiple sequence alignment to assess the local structural environment. The information from amino acid substitutions among homologous sequences (in the form of environment-dependent amino acid substitution tables) is then used as a tool for identifying errors that may be present in the protein structure. The server is directed toward X-ray structures, but accepts a PDB file containing an NMR ensemble. The results, however, do not indicate how the individual conformers are scored. Separate outputs are returned for each chain.

Results

We tested the performance of the different packages using two recently solved protein structures as examples. PDB entry 2kq3 (Wang et al. 2010b) was also used in the recent description of the CING package (Doreleijers et al. 2012a) and was now also subjected to the other analyses. PDB entry 2lw9 (Lu et al. 2012) represents the structural ensemble of a relatively small dimeric protein. It was solved using conventional protocols with distance restraints and backbone dihedral angle restraints only, as is still the practice for the majority of entries. Of particular interest is the assessment of a symmetric dimer, as this class of molecules pose specific issues with respect to the experimental procedures by which the intermolecular restraints were derived. For each entry, the data used in the analyses below were obtained as CCPN projects from the NMR Restraints Grid (Doreleijers et al. 2009) database maintained by BMRB (Ulrich et al. 2008), and if necessary exported into other formats with the help of FormatConverter. The chemical shifts files were taken directly from BMRB.

CING

The analyses of the entries 2kq3 and 2lw9 proceeded with all checks applied. The full reports can be examined via the NRG-CING website at http://nmr.cmbi.ru.nl/NRG-CING/data/kq/2kq3/2kq3.cing/2kq3/HTML/index.html, and http://nmr.cmbi.ru.nl/NRG-CING/data/lw/2lw9/2lw9.cing/2lw9/HTML/index.html, respectively. Automated analysis of the ordered regions using the circular variance criteria shows that the ordered sections of PDB entry 2kq3 include 122 out of 140 residues whereas for entry 2lw9 this amounts to 86 out of the 102 total residues in chains A and B (84 %) (Tables 2 and S2).

Table 2 Well-defined/ordered regions in entry 2lw9 as determined by different packages

The overall ROG scores, i.e. 0.17/0.65/0.17, for the ordered residues of entry 2lw9 are indicative of problems. Figure 1a shows the residue-specific ROG scores mapped upon the ribbon diagram of the 2lw9 protein. The orange- or red-labelled residues nearly encompass the complete protein, suggesting a general problem. The overall WHAT IF χ1χ2 rotamer normality score of −8.2 ± 0.3, as reported in the CING summary pages, suggests a problem with the side-chain conformations. Indeed, examination of the residue-specific pages of the CING report clearly indicates that the side-chain conformation of many residues is problematic. An example is shown in Fig. 1d for residue Leu9 of chain A of 2lw9, which displays the χ1χ2 plot (the so-called Janin plot). All 20 conformers in the ensemble cluster in a relatively narrow range and exhibit a consistently staggered χ1-rotamer. The problematic side-chain conformations are also flagged by the residue-specific Janin Z-scores (cf. Fig. 1b, bottom panel). The consistent low values of this parameter are also one of the main causes of the orange or red ROG scores of the corresponding residues.

Fig. 1
figure 1

CING results for PDB entry 2lw9. The full CING report can be accessed from http://nmr.cmbi.ru.nl/NRG-CING/data/lw/2lw9/2lw9.cing/2lw9/HTML/index.html. a Ribbon diagram of the first model of 2lw9 colour-coded according to the CING ROG scores. b Residue specific scores for chain A. Top to bottom: secondary structure as derived by DSSP, residue-specific WHAT IF accessibility, Ramachandran, backbone normality, rotamer and Janin scores as function of residue number of chain A. c Ramachandran plot for Leu9. Individual conformers are displayed as green plus signs. Experimental ϕ, ψ dihedral restraints are shown as a transparent light-orange box. Leucine-specific distribution of the accessible conformational ϕ, ψ space, as derived from the WHAT IF reference database of high-resolution crystal structures, is shown in green/yellow/grey. d χ1χ2 plot (also referred to as Janin plot) for Leu9. Labelling as in c). e Overview of the distance restraints page, selected for critiqued restraints only

The 2lw9 protein folds into a simple structure comprised of only two helices per monomer. Indeed, most of the backbone adopts this helical arrangement and the CING DSSP-based analysis (Fig. 1b) confirms their presence. For most of the backbone conformation CING does not signal problems (cf. Fig. 1b, c). One notable residue at the C-terminal end of helix 1 (Thr30), however, displays poor packing, Ramachandran and backbone normality scores (Fig. 1b), resulting in a red residue ROG score. The 2lw9 protein is a symmetric dimer and the analysis results for the corresponding residues in the two different chains are generally similar.

Crucial to a proper validation assessment is the analysis of the experimental data. CING assembles report pages for all experimental data made available to the program. The pages are interactive, as they allow for sorting and selection. Figure 1e shows the report page for the distance restraints of the 2lw9 entry, displaying only the critiqued restraints, i.e. those for which CING detected problems. The results display a series of disturbing lower bounds violations. In particular, it highlights the surprising distance restraints with lower bounds of 4.8–5.0 Å and upper bounds of 7.5–10.4 Å. CING also performs an analysis of the chemical shift assignments if such data are supplied. For 2lw9, the program flags six illogical missing stereo-specific assignments.

Supplementary Fig. S1 displays similar panels to Fig. 1 with the CING analysis results for 2kq3. Figure S1a displays backbone traces of the first member of the 2kq3 NMR ensemble, superposed with the trace of the S.Nase X-ray structure [PDB entry 1ey0 (Chen et al. 2000)]. Residue-specific backbone RMSD values for the ordered regions typically are in the 0.4–1.1 Å range. Regions significantly surpassing these values, e.g. Ile18-Gly20, are often flagged for suspect conformations. Inspection of the side-chain conformations often also yielded unusual results. For example, the Janin-plot of Lys9 reveals a bifurcated distribution of staggered conformers (Fig. S1d). Comparison with the crystal structure clearly reveals the differences in conformation (Fig. S1a). A detailed analysis on the basis of the full CING report was also presented before (Doreleijers et al. 2012a).

Molprobity

Analysis of PDB entries 2lw9 and 2kq3 was initiated from the main Molprobity website (http://molprobity.biochem.duke.edu/) (Davis et al. 2007; Chen et al. 2010) using the built-in feature to retrieve coordinate files from the public PDB archive. All validation scores relevant to NMR protein structures were calculated and analysed.

At the time of writing, Molprobity is undergoing a major version upgrade (V3.19 to V4.00a), which is mainly focussed on improving the calculation of clash-scores. Both versions are available from the website. In this paper, Molprobity V3.19 was used for analysing PDB entries 2lw9 and 2kq3 in order to maintain consistency with other validation packages, which at the time of writing were not yet updated to use the newest version of Molprobity.

The summary statistics Table (Fig. 2a) for entry 2lw9 shows perfect quality scores for bond-angles and bond-lengths (0 % outliers) and good geometry for the Cβ atom (no deviations above 0.25 Å). Molprobity’s assessment of both backbone and side-chain torsion angles is rather poor, displaying 2.8 ± 1.4 % Ramachandran outliers and 35.8 ± 3.9 % unfavoured side-chain rotamers, a result in line with the analysis by CING. Furthermore, clash-scores of 17.3 ± 2.9 are observed indicating an overall problem in protein packing. This results in an overall Molprobity score of 3.4 ± 0.1 Å. Analysis of per-residue tabular output (Fig. 2b) and KiNG images (Chen et al. 2009) (Fig. 2c) for the first model of the NMR ensemble shows that atom clashes are spread throughout the whole interface between the main α-helices of chains A and B, whereas side-chain rotamer outliers are found over the full length of the protein.

Fig. 2
figure 2

Molprobity output for PDB entry 2lw9. a Overview HTML page of the Molprobity analysis containing colour-coded overview scores. b Overview table showing outliers per residue for model 1. c KiNG (Chen et al. 2009) image for model 1 showing small and large overlaps as yellow and red dots, respectively

Geometric validation of PDB entry 2kq3 (cf. Supplementary Fig. S2) shows similar scores as obtained for entry 2lw9. No outliers in bond lengths and angles were observed in any of the 20 models of the NMR ensemble, yet many atom clashes and unfavourable dihedral angles are observed throughout the structures, with 4.8 ± 1.5 % Ramachandran outliers and 31.0 ± 3.0 % bad side-chain rotamers. Furthermore, Molprobity reports very high clash-scores (i.e. 35.7 ± 2.5 serious clashes per 1,000 atoms) spread over the entire protein core.

ResProx

Analysis of PDB entries 2lw9 and 2kq3 was initiated from the main ResProx website (http://www.resprox.ca/) (Berjanskii et al. 2012) using the built-in feature to retrieve coordinate files from the public PDB archive.

The average ‘resolution-by-proxy’ score over the ensemble for PDB entry 2lw9 is 2.9 ± 0.1 Å and is classified as ‘bad’ (>2.5 Å). A breakdown of this score is provided in the Z-score report, showing all 15 measured scores that contribute to the overall resolution. Eight scores are annotated ‘good’ and two ‘bad’ across the whole NMR ensemble, while the remaining five scores have both good and bad models. Interestingly, Molprobity Ramachandran score is considered good in 13 out of 20 models (Z = 1.4 ± 0.8), whereas Molprobity itself reported this score as worrisome (vide supra). Using different cut-offs and/or reference structures could be the underlying cause of this. Furthermore, the Ramachandran score calculated by GeNMR (Berjanskii et al. 2009) (Z = 2.1 ± 0.5) is considered borderline ‘bad’. Another discrepancy exists between χ1 angle scores obtained from VADAR (Willard et al. 2003) and PROSESS (Berjanskii et al. 2010), where the former score is considered ‘bad’ (Z = 3.2 ± 0.2), whereas the latter is considered ‘good’ (Z = 1.6 ± 0.4) and between the Molprobity clash-score (bad; Z = 2.4 ± 0.2) and GeNMR bump score (good; Z = 0.2 ± 0.2). Other ‘bad’ scores include RosettaHoles2 and the GeNMR radius of gyration score. PROSESS summarises a great number of scores for each category by a set of “overall” scores and a global quality score on a scale from 0 (worst) to 10 (best). Only these are reported in Table 3, although individual global and per/residue scores are also available for inspection on the detailed results pages. They convey the same information as discussed above for ResProx, but also include scores for the quality of backbone chemical shifts (poor for entry 2lw9) and distance restraints (good for chain A, only 1 restraint violation > 0.5 Å, but poor for chain B, 5 restraint violations). It is unclear how the summary PROSESS table reports the number of restraint violations, as for individual models this number varies from 1 to 5. All of PROSESS scores are reported separately for each chain.

Table 3 Summary of key structural and data assessment metrics reported by the different programs for PDB entry 2lw9

The reported resolution for PDB entry 2kq3 is 3.2 ± 0.1 Å and thus classified as ‘bad’. Both Ramachandran scores (Molprobity 2.5 ± 0.8 and GeNMR 3.8 ± 0.2), χ1 angle scores (VADAR 3.9 ± 0.2 and PROSESS 2.2 ± 0.3), clash-scores (Molprobity 3.5 ± 0.1 and GeNMR 2.3 ± 0.3) and Θ-hydrogen bond angle score (PROSESS 2.4 ± 0.4) are all beyond two standard deviations of the expected values and thus considered bad. Experimental data validation from PROSESS indicates that chemical shifts are within expected ranges, while the fit to distance restraints is bad with 16 violations (Supplementary Table S3).

PSVS

Figure 3 and Supplementary Figure S3 show the results of the PSVS (Bhattacharya et al. 2007) analysis for entries 2lw9 and 2kq3, respectively. These results are mostly consistent with the assessments from other validation servers described above. The smaller differences in global scores (e.g. Molprobity Ramachandran statistics) arise from the selection of residues submitted for analysis: e.g. ordered residues (cf. Tables 2 and S2) when running PSVS and all residues when running Molprobity itself.

Fig. 3
figure 3

Selection from the PSVS report on PDB entry 2lw9. Sequence numbering in chain B is automatically adjusted to 62–112 for clarity of presentation. a Residues with consistent Ramachandran outliers. b PROCHECK G-factors for backbone dihedral angles. c PROCHECK G-factors for all dihedral angles. d Verify3D scores. e ProsaII energy. f Number of steric clashes. Values in panels (bf) are plotted as a function of protein sequence. Scores in panels (de) for each position i are averaged over a window of [i − 3, i + 3], and are plotted for each model in the ensemble

The global PSVS scores for entry 2lw9 indicate that the packing of the structure is not likely (Z-score for Verify3D of −6.6), and that there are more than the usual number of clashes (Molprobity Z-score of −4.9 for steric clashes). The Ramachandran statistics also indicates that 2 % of the residues over all models are in disallowed regions, with two consistent outliers Thr30 and Asn2 on both chains (Fig. 3a) reported by both Molprobity and PROCHECK-NMR. The other global parameters indicate that the backbone is modelled mostly correctly (Z-score for ProsaII is 1.41 and for PROCHECK-ϕ/ψ angles 2.48). While the side-chain dihedral angles are poor, they are generally within the range commonly observed in NMR structures (PROCHECK all-dihedral-angle Z-score of −2.4) (Lemak et al. 2011). However, all of these global scores may mask individual outliers, and thus inspection on the residue level is necessary (Fig. 3b–f). This analysis confirms that there is a problematic spot around residue Thr30 involving both backbone and side-chain dihedral angles (Fig. 3b, c), while steric clashes are quite numerous, but spread throughout the protein (Fig. 3f). The highest numbers of van der Waals violations (up to 15) are observed for residues Ile15 and Gln35. The AVS analysis of 2lw9 reported an assignment completeness of 36 %; however this low number is due to the fact that the entry is a dimer and the real assignment completeness is therefore closer to 73 %. Only one outlier, Cδ of Arg41, is identified. The analysis of distance and dihedral angle restraints indicates that there were very few restricting long-range restraints (0.2 per residue), and that there were 2 violations per model, which were larger than 0.5 Å.

The results for PDB entry 2kq3 from PSVS indicate that while the protein fold is likely overall correct (Verify3D, ProsaII and PROCHECK-ϕ/ψ Z-scores only moderately negative), the Ramachandran analysis by both Molprobity and PROCHECK-NMR flags some local problems with respect to the backbone. However, the side-chains are most likely modelled incorrectly, resulting in the PROCHECK all-dihedral-angle Z-score of −5.6 and the Molprobity clash Z-score of −8.5. Such values are typically observed in structures that were not refined in explicit water, a procedure known to significantly improve the side-chain packing and side-chain conformations (Nabuurs et al. 2004; Linge et al. 2003; Spronk et al. 2002), a conclusion also supported by CING analysis (Doreleijers et al. 2012b). PSVS also identified more than 40 distance restraint violations per model, with 36 of them greater than 0.5 Å, which may indicate that the data from which the restraints were derived, may have been contradictory or the calibration procedure during the conversion of NOE peaks to restraints inappropriate. Seventeen chemical shift outliers are reported for this entry by the AVS module. The completeness of side-chain resonance assignments is 82 %, although for the aromatic rings, it drops to only 45 %.

QMEAN

The QMEAN analysis was run using its server (http://swissmodel.expasy.org/qmean) (Benkert et al. 2009). The use of an additional FASTA-formatted file with a description of the protein sequence did not alter the results. Figure 4 displays the results obtained for the first model of the 2lw9 ensemble. Manual averaging of the QMEAN scores for all 20 models yielded 0.64 ± 0.03 (Z-score −1.1 ± 0.3). As indicated by the red cross in Fig. 4a, the 2lw9 ensemble scores below average for proteins of comparable size. The QMEAN score is composed of six underlying metrics and their scores are displayed in Fig. 4c. In particular, the Cβ interaction parameter, which is a secondary structure-specific measure, and the torsion parameter, which encodes for a three residue extended torsion, display significant negative values indicative of problems with this structure. QMEAN neither discriminates in its scoring for the unstructured regions of the protein nor examines the underlying experimental NMR data.

Fig. 4
figure 4

Output generated by the QMEAN server for the first model of the ensemble of PDB entry 2lw9. a QMEAN scores as function of the protein sequence length. The score for the first model of the 2lw9 ensemble is indicated by the red cross. b Ribbon diagram colour-coded blue to red (good to bad) according to the residue-specific QMEAN predicted local error. c QMEAN and underlying metrics Z-scores (red indicates below average, blue above average scores)

Figure 4b shows the ribbon diagram of 2lw9 with each residue colour-coded according to the predicted local error. Notably and as was also found by CING and PSVS (vide supra), the C-terminal ends of the two helices are clearly flagged, as are the unstructured C-terminal ends of the protein.

Supplementary Fig. S4 shows the results of the QMEAN analysis for the first model of the 2kq3 ensemble. As was the case for the 2lw9 ensemble, the 2kq3 ensemble scores below average with a QMEAN score for the 20 models of 0.71 ± 0.03 (Z-score −0.4 ± 0.4). Interestingly and in line with the analyses of the other program suites (cf. Supplementary Figs. S1-3), the first β-strand is flagged as a region of predicted local error.

Vivaldi

The validation report of PDB entry 2lw9 is available through the Vivaldi web service at http://www.pdbe.org/vivaldi/2lw9 (Hendrickx et al. 2013). Representative output for PDBe entries 2lw9 and 2kq3 is shown in Fig. 5 and Supplementary Figure S5, respectively. Figure 5a shows a very tight bundle of 20 structures for entry 2lw9, mostly in a helical conformation. Multiple stable domains are obtained from the analysis by the program OLDERADO (Kelley et al. 1996, 1997; Kelley and Sutcliffe 1997) (Tables 2 and S2) and comprise the whole protein except for the N-terminal residues A:1 (and symmetry related B:53) and the C-terminal residues A:45–51 (B:96–103). Since Vivaldi obtains the ROG and geometric validation scores from the NRG-CING web service, the information it presents is already described in the section on CING results, although there are differences in terms of what cut-offs are used to draw the users’ attention to problematic spots (e.g. Ramachandran outliers).

Fig. 5
figure 5

Representative graphical output of the Vivaldi web service at PDBe for PDB entry 2lw9 a Superimposed NMR ensemble coloured by rigid-body domains (OLDERADO). b Per-residue CING scores plotted on the most representative model (model 14) of the NMR ensemble as calculated by OLDERADO. c Distance restraint violations plotted on the most representative model of the ensemble. Yellow lines represent violations below 0.3 Å. Amber lines for violations between 0.3 and 0.5 Å and red lines for violations above 0.5 Å. d Chemical shift outliers identified using VASCO presented as Van der Waals spheres and coloured by a continuous gradient based on the absolute Z-score (3.5 yellow, 5.0 and above red). e Per-residue graph showing the absolute sum of violations in Å. Coloured bars indicate the averaged value over the NMR ensemble. Black dots show violations for the most representative model. Continuous gradient colouring (0.0 Å green, 0.1 Å yellow, >0.2 Å red). f Chemical shift Z-scores on a per-atom basis. Continuous gradient colouring of the absolute Z-score (<2.0 green, 3.5 yellow, >5.0 red)

The Vivaldi analysis of the deposited restraint data shows thirty-three residues with a high number (>50) of distance restraints with relatively few restraints violations (Fig. 5c, e). Taken together with numerous atom clashes and poor packing, this suggests an over-fitting to experimental restraints during structure determination and refinement calculations.

Chemical shift analysis using VASCO (Rieping and Vranken 2010) (Fig. 5d, f) shows a good agreement between the experimental data and the structure. Seven carbon atoms were flagged as chemical shift outliers (Z-score > 3). These outliers have no direct electrostatic interactions with other residues or aromatic side-chains in close proximity to explain their unexpected chemical shifts. As the chemical shift validation routines used by CING and by Vivaldi use different underlying statistics, the referencing corrections are different, which may explain the differences in chemical shift analysis.

Analysis of PDB entry 2kq3 can be obtained from Vivaldi at http://www.pdbe.org/vivaldi/2kq3. Supplementary Figure S5a shows a tightly bundled core region (amber), a flexible N-terminal tail (Thr2-His8) and a flexible loop (Glu43-Ala58).

CING ROG scores (Figure S5b) are predominantly red indicating general problems with the modelled structure. WHAT IF scores indicate moderate Ramachandran, bond length or χ1 angle problems, and atom clashes are reported throughout the protein. Bond angle outliers are reported for His8 and His121 due to the non-planarity of Nε2. This is a commonly observed problem in NMR structures throughout the PDB archive. Atom clashes are reported throughout the core domain and amount to over 0.2 Å for 21 residues and over 0.4 Å for 4 additional residues.

Unusual chemical shift values are identified for 72 atoms from 40 different residues (supplementary Figs S5d, f) and are mostly concentrated on lysine residues. As VASCO does not take aromatic interactions into account, manual inspection of these outliers is advised. Six chemical shift outliers are identified for Lys9, all with negative Z-scores (i.e. the experimental chemical shift is smaller than the expected shift), thus suggesting a substantial ring-current effect induced by an aromatic side-chain. Inspection of the structure, however, does not yield a likely candidate. The other 39 residues with chemical shift outliers are scattered throughout the molecule.

Vivaldi analysed 2,091 distance restraints, which mainly cover residue ranges 7–41 and 61–140. The molecule has approximately 20 distance restraints per residue on average. Restraint violations are shown in Supplementary Fig. S5c, e.

NMR constraints analyser

Figure 6 and Supplementary Figure S6 show representative output of the NMR Constraints Analyser (Heller and Giorgetti 2010) web server for PDB entries 2lw9 and 2kq3, respectively. The program was tested in December 2012. Tabulation of the restraint content and completeness as function of residue is displayed in Fig. 6b for 2lw9. The program displays restraints for a single chain at a time, also in the case of dimers. Clicking the bar graph selects the corresponding residue for display in the viewer (not shown), and restraints selected in the table (cf. Supplementary Fig. 6c) can be displayed in the viewer as well. Regions of the molecule that are well or badly defined by restraints appear clearly, but there are no reference values to indicate local or global structure quality as such. Compared to dedicated analysis programs, such CcpNmr Analysis (Vranken et al. 2005), the NMR Constraints Analyser is neat and easy to use, but clearly lacking in detail. The restraint tables give the upper distance limit and the number of violated models, but lack information about lower limits, or actual distances or violation values. Also, restraints involving pseudo-atoms, such as methyl groups, cannot be visualised on the Jmol viewer.

Fig. 6
figure 6

Representative output of the NMR Constraints Analyser web server, showing the results of the analysis of PDB entry 2lw9. a Structure of 2lw9, colored by constraint density (red: many; blue: few), with Lys22 shown. b Constraint completeness (red line), number of constraints (dark bars) and number of violations (red triangles) along chain B of 2lw9. Although chains A and B are identical, the program only gives results for one chain of one structure model. c Constraints for Lys22 Hδ1 and HN

Discussion

Over the past decades, NMR has proved itself as a very versatile technique for structure determination of biomacromolecules and as a credible complement to X-ray crystallography. However, it is prone to serious errors particularly when misinterpreted, conflicting or over interpreted data are used (Bhattacharya et al. 2007; Doreleijers et al. 2012a; Lemak et al. 2011; Mao et al. 2011). Hence, the validation of input data, the resulting structures and the fit between the structural models and the experimental data is an absolute necessity for assessing and using NMR-derived structures in other biological applications. This need was also recognised by the wwPDB consortium (Berman et al. 2007) who appointed an NMR validation taskforce (NMR-VTF). The primary task for the NMR VTF was to define commonly accepted procedures and guidelines for validation of NMR structures. The NMR-VTF has now put forward its recommendations (Montelione et al. submitted), which will ultimately result in a set of tools that will be applied to all NMR entries deposited in the PDB archive. The authors of this review paper are directly involved in the implementation of these tools, most of which will be based upon the programs discussed in this review. At present, a regularly updated archive of CING validation reports of nearly all NMR entries of the PDB archive, called NRG-CING (Doreleijers et al. 2012b), is available for inspection at http://nmr.cmbi.ru.nl/NRG-CING.

NMR-derived structures typically encompass both structured and less-structured regions. The latter typically score worse on parameters used to characterise ordered structure. As a near complete set of NMR data, especially the chemical shifts, is required for the proper analysis of the structure, it is desirable to still include the full molecule in the validation analysis. Dedicated NMR validation programs, such as PSVS (Bhattacharya et al. 2007) or CING (Doreleijers et al. 2012a), routinely report on both the structured and full-length molecule. Table 2 lists the structured regions defined for entry 2lw9 as obtained by the different programs (Table S2 contains the corresponding information for entry 2kq3). The FindCore algorithm (Snyder and Montelione 2005) is clearly more restrictive when compared to the methods based on dihedral order parameter. For the latter, all algorithms yield almost the same results, differing only slightly for residues 94–96 at the C-terminal end of chain B.

Table 3 lists a summary of the structural and data assessment by the different programs for PDB entry 2lw9 (Table S3 reports on entry 2kq3). Overall, all programs indicate substantial problems with entry 2lw9. Whereas the overall fold is likely correct, conformational parameters related to the backbone and side-chain conformation and packing indicate significant problems. In particular, the C- and N-termini of the two helices are specifically flagged by multiple programs. At the level of restraints, the different programs all signal problems, i.e. violations, with respect to the agreement between the structural and experimental data. Disturbing lower-bound restraint violations and odd distance restraints are flagged by CING. The structural and data analysis together suggests errors in the modelling protocol used to derive the structural ensemble.

PDB entry 2kq3 (Wang et al. 2010b) has been used before as an example for the description of the CING program. Like CING, all other programs identify similar problems related to conformation and packing for this entry (cf. Supplementary Figures S2-6). We previously indicated a number of specific problematic areas, such as the first β-strand, which are also flagged by the other programs. Refinement in explicit solvent remedies these issues to some extent, and for this entry we previously showed that we could improve upon both the backbone and side-chain conformations (Doreleijers et al. 2012a).

The tested methods for the validation of chemical shift assignments do not produce a consistent picture for the two entries (cf. Tables 3 and S3). Currently VASCO produces the longest list of unusual chemical shifts, but it can only examine entries already present in the PDB and BMRB databases, making it less useful during the structure determination process. PSVS includes the AVS method, and can identify at least some outliers, but does not correct referencing. CING does correct the referencing but is more lenient towards declaring a shift value an outlier. In our opinion, it appears sensible to perform at least these two analyses and to confirm that any outliers are genuine, rather than due to clerical errors or wrong assignments.

This review was not aimed at discussing the two entries per se; rather they served as examples for the procedures implemented in the different programs. Many of these, i.e. CING, ResProx (Berjanskii et al. 2012), PSVS (Bhattacharya et al. 2007), QMEAN (Benkert et al. 2009) and Vivaldi (Hendrickx et al. 2013), are in effect based (in part) upon the results of a number of other underlying programs, that are sometimes partially overlapping. For example, PSVS and ResProx both use scores from the Molprobity program (Davis et al. 2007; Chen et al. 2010) and Vivaldi is heavily based upon the CING/WHAT IF (Vriend 1990; Hooft et al. 1996) assessments. Careful comparison of the different results yields some notable features. Whereas both PSVS and ResProx use the Molprobity Ramachandran score, the results for 2lw9 are qualified as ‘bad’ and ‘good’ by the two programs, respectively (cf. Table 3). The ResProx scores for side-chain assessment [VADAR χ1 (Willard et al. 2003) and PROSESS χ1 (Berjanskii et al. 2010)] also receive conflicting labels, as do its scores for packing [Molprobity clash-score and GeNMR bumps (Berjanskii et al. 2009)], suggesting that potentially the rescaling of the original scores to generalised Z–scores requires revisiting. Alternatively, these differences may be genuine and reflect the different sensitivities of the parameters to the problems present in entry 2lw9. Overall, our analysis of entries 2lw9 and 2kq3 by the different validation program suites suggests that, in addition to aggregated or transformed scores, it is beneficial for the user to also have access to original values of the parameters as obtained from the underlying program. This allows for a more straightforward comparison of the results obtained by the different validation suites. For the PSVS and CING programs, the original results are already directly accessible.

The comparison also raises a more fundamental question related to the significance of the different parameters; i.e. what validates the validators? Here, we would suggest the notion of ‘usual suspects’: patterns of poor indicators typically signal problems and only in exceptional cases are there genuine reasons to discard the overall conclusions. As many of the tools are based upon prior knowledge derived from the PDB database, features not yet present or under-represented may potentially be flagged unnecessarily. However, given the now extensive coverage of structural motifs in the PDB archive, such occurrences are very rare and should be treated with extreme caution. Examples of these are the chemical modification of residues, or the inclusion of D-enantiomers or other unusual amino acids.

The assessment of the structural quality on the basis of a combined set of different parameters has also proven to be a viable approach for the identification of the serious cases of outright incorrect structures (Bhattacharya et al. 2007; Doreleijers et al. 2012a). In fact, the developments of both the PSVS and CING suites were prompted by the erroneous structure 1tgq (discussed by Nabuurs et al., 2005), now replaced by PDB entry 2b95. A subsequent PSVS analysis of entry 1tgq clearly marked it as highly suspect (Bhattacharya et al. 2007), while its CING ROG scores, i.e. 0.54/0.30/0.16, also flagged it as highly problematic. In contrast, the revised entry 2b95 yields the much more acceptable ROG scores 0.37/0.35/0.28 and the highly homologous entry 1y4o yields 0.16/0.15/0.69 (Doreleijers et al. 2012b), characteristic of a well-modelled structure.

With the exception of Vivaldi, which operates directly upon the data deposited in the PDB and BMRB archives, the other programs that are currently maintained all feature an on-line server (cf. Table 1) for user submission of data. The file formats for the input to these different validation programs vary considerably: all programs accept PDB version 3 formatted files for the structural data, although some can handle only one model at a time (e.g. QMEAN). At present, none of the programs appear capable of using the much more modern mmCIF or PDBML/XML formats for the structural data. The CING program also accepts the CCPN format for structural data. Only a subset of the validation programs, i.e. CING, PSVS, PROSESS and Vivaldi, also validate the restraint and other experimental data. Formats for these are much more diverse, with either CYANA (PSVS, CING), Xplor/CNS (PSVS, PROSESS) or CCPN (CING) formatted data being accepted. Vivaldi and NRG-CING use the experimental restraints data remediated by BMRB and available from the NRG database.

Not only the input, but also the output generated by the different programs varies greatly. In certain cases (e.g. QMEAN, WHAT IF, Molprobity) the results for the different models in the ensemble are presented as separate entities and hence require manual averaging, a generally cumbersome procedure. PSVS features a summary page with key metrics summarised at a glance and collates detailed validation information in drill down pages (HTML or PDF). CING features hyperlinked and interactive webpages, which facilitate directed examination of the results. Particular emphasis is placed on the relation between experimental data and structural results. Vivaldi is a visualisation tool featuring interactive 3D viewer and graphs, exposing the validation information to non-expert users of the PDB archive. A notable feature of ResProx is the extensive list of suggestions that potentially could improve the different validation scores.

The development of validation software is a continuous process that has to keep pace with the development of NMR methodology. In our opinion, the community of NMR software developers could consider the following points to make the structure and experimental data validation more widespread and results more easily compared: (1) agree on a test set of macromolecular structures with known ‘good’ and ‘bad’ features to benchmark and compare their tools; (2) agree on standardised input formats for experimental NMR data, as conversion between the numerous existing formats is not trivial and most servers accept only a small subset of formats; moreover, there is some variation even within a given format, making the experience of a non-expert user quite frustrating; (3) if constituent validation scores are converted into Z-scores, the raw scores should still be made available; (4) the validation servers should state versions of constituent software used to obtain the scores; (5) they should provide APIs or machine-readable output. We have also found a number of features in the surveyed servers very useful that perhaps can be emulated by other developers: (1) suggestions on how to address a problematic structural feature (as done by ResProx); (2) ability to directly compare X-ray and NMR structures (e.g. PSVS, ResProx, Molprobity); (3) easy navigation in the results and functionality to present all relevant scores for individual residues or even atoms (CING) rather than grouping by scores only; (4) detailed analysis of peak lists (PSVS, CING).

Conclusions

Structural quality and the agreement between experimental data and structural results can be greatly improved by the application of validation routines. Nowadays, several packages taken together already supply ample tools to avoid trivial and hence unnecessary errors. By consistently using these tools as an integrated part of the structure determination process, the resulting outcome will not only be better in terms of quality, but also more confidently address the biological problem. Fortunately, the (anonymous) CING user statistics suggest an increasing number of regular returning users that submit jobs likely to represent different stages of their structure determination process. It is our recommendation that the assessment of the structural quality should be done in relation to the experimental data. We often find that regions of poor structural quality also display poorer agreement with the experimental data, such as NOE restraints (Nabuurs et al. 2005), or peaks (RPF scores; Huang et al. 2005; Bhattacharya et al. 2007) and back-calculated chemical shifts. Programs like PSVS and CING perform an integrated analysis that provides this information relatively easily, thus allowing for improvement of the structure calculations. Ultimately, this could result in the most optimal structures being deposited to the PDB archive.

The next stage for NMR structure validation also should include cross-validation with independent data (Brunger et al. 1993; Clore and Schwieters 2006; Nabuurs et al. 2005; Tjandra et al. 2007) as a standard procedure. Although significantly more complicated for NMR-derived biomolecular structures compared to X-ray structures, because of the diversity in the NMR data, the much increased data content of the average experimental NMR data set relative to the situation 10–15 years ago renders the cross validation procedures quite feasible.

The proper analysis of NMR-derived structures containing oligonucleotides or small-molecule ligands is currently still incomplete. Although most programs, e.g. CING and PSVS, will readily accept these and perform basic assessments, few dedicated tools are currently available for these non-protein macromolecules. Proper validation also requires provisions for NMR-specific phenomena, most prominently dynamics. All current analysis routines fail when confronted with ensembles of ensembles, generated to model the different dynamical states in concert with the structure. All these much-required developments are topics of on-going research and implementation.