Introduction

Multi-domain proteins, comprising more than one structurally well-folded domain connected by peptide linkers, widely exist in both prokaryotes and eukaryotes (Koonin et al. 2000; Chothia et al. 2003; Wriggers et al. 2005; Ekman et al. 2005). Characterizing the conformational states of these multi-domain proteins is critical to understanding their function in relevant biological events. Such structural characterization is nevertheless enormously challenging due to the flexibility of linkers as well as the complexity of inter-domain interactions (Reddy Chichili et al. 2013), even if the atomic resolution structures of individual domains are already available in the Protein Data Bank (PDB).

The orientation-sensitive NMR measurements, such as spin relaxation and residual dipolar couplings (RDCs) in alignment media, opened the way for the determination of domain arrangement within a molecule. Works by Fushman et al. (2004), Ryabov and Fushman (2007), Walsh et al. (2010), Göbl et al. (2014), and Castañeda et al. (2016) have described these approaches in remarkable details. Thanks to the significant breakthroughs made recently in the site-specific incorporation of paramagnetic ions into proteins, e.g., substitution of metals in metalloproteins or attachment of a small tag coordinating a metal (Ravera et al. 2017; Nitsche and Otting 2017; Pell et al. 2019; Su and Chen 2019; Joss and Häussinger 2019; Softley et al. 2020), paramagnetic NMR has become one of the most attractive branches of biomolecular NMR, especially in the structural determination of multi-domain proteins, due to the availability of long-range (~ 40 \(\overset\cdot A\)) distance and/or orientation constraints.

Pseudocontact shifts (PCSs) and RDCs caused by paramagnetic self-alignment are two of the main valuable effects induced through anisotropic metals, e.g., cobalt, nickel, and most lanthanides. They can affect chemical shifts and coupling constants in standard NMR spectra, respectively (Bertini et al. 2005, 2008). The values of PCSs and RDCs depend on the structural features and dynamics of proteins as well as the position of paramagnetic metals: PCSs are sensitive to metal-nucleus distance and orientation, and RDCs provide information on the orientation of internuclear vectors in the molecular frame. In addition to PCSs and RDCs, anisotropic metals also cause paramagnetic relaxation enhancements (PREs), which report on the distance between the metals and observed nuclei. At high magnetic fields and for molecules with a paramagnetic center, transverse relaxation is predominated by the Curie relaxation. This could be described by the isotropic magnetic susceptibility (Gueron 1975; Vega and Fiat 2006). Thus, PREs are preferably measured through isotropic metals, e.g., manganese and gadolinium. A number of applications of PREs in structural analysis are discussed thoroughly in Tang et al. (2007), Anthis et al. (2011), Liu et al. (2015, 2019), Chen et al. (2016), Wakamoto et al. (2019), and Lee et al. (2020). The use of PREs has also been extensively reviewed (Marius Clore and Iwahara 2009; Clore 2014). Here, we limit our discussion to two main anisotropic paramagnetic NMR restraints: PCSs and RDCs.

PCSs and RDCs are differently averaged depending on the interconversion rates over different conformational states (Fragai et al. 2013). The structural characterization then recovers a reasonable ensemble of such conformations with associated statistical weights, which should be consistent with averaged observables. Many approaches based on different algorithms for ensemble reconstruction have been proposed (Nodet et al. 2009; Bertini et al. 2010, 2012; Berlin et al. 2013; Ihms and Foster 2015; Bonomi et al. 2017; Köfinger et al. 2019; Bottaro et al. 2020). However, reconstruction is inherently an underdetermined problem, as there are infinite combinations of structures that can potentially recapitulate the experimental data within certain uncertainties (Ravera et al. 2016; Medeiros Selegato et al. 2021). In other words, the dramatically distinct ensembles that represent experimental observables might be obtained by different approaches, or even by independent runs from the same approach. Therefore, discerning the similarities and differences in the different interpretations provided by these approaches is essential to selecting appropriate approaches to ensemble reconstruction.

In this review, we outline the basics, general procedures, and some approaches for ensemble reconstruction in multi-domain proteins by using anisotropic paramagnetic restraints. Here, we focus on the paramagnetic tags that can be attached to proteins with substantial rigidity (Yang et al. 2015, 2016; Müntener et al. 2018; Lee et al. 2017; Pavlov et al. 2018; Joss and Häussinger 2019; Su and Chen 2019; Denis et al. 2020; Chen et al. 2020), for which the observables are mainly averaged by the internal dynamics of proteins other than the flexibility of paramagnetic tags.

The anisotropic paramagnetic restraints

PCSs and RDCs are two main anisotropic paramagnetic restraints, which can be observed through the same anisotropic metals. Their values depend on the positions of the observed nuclei in the frame of the magnetic susceptibility tensor (\(\chi\) tensor) of the paramagnetic center, which are related to the conformational features of the systems investigated. The details of the \(\chi\) tensor and its anisotropic component (\(\Delta \chi\) tensor) can be found in several excellent reviews (Bertini et al. 2002; Otting 2010; Fragai et al. 2013; Nitsche and Otting 2017). It should be noted that, if the conformation of the tag carrying the anisotropic metal is substantially mobile, PCSs and RDCs are severely affected. Hence, numerous studies have been dedicated to the tagging strategy. Some of the excellent works that offer insightful details about the effect of the tag mobility on the paramagnetic effects are those of Shishmarev and Otting (2013), Hass et al. (2015), and Suturina and Kuprov (2016).

PCSs arise from the spin-dipole interactions through space and depend on the polar coordinates of nuclear spin (\(r,\theta ,\varphi\)) with respect to the \(\Delta \chi\) tensor of metal and the axial (\(\Delta {\chi }_{ax}\)) and rhombic components (\(\Delta {\chi }_{rh}\)) of \(\Delta \chi\) tensor (Eq. (1) and Fig. 1a) (Bertini et al. 2002):

Fig. 1
figure 1

Schematic representation of structural information in a two-domain protein derived from PCSs and RDCs. a PCSs are described by Eq. (1); the observed nucleus (1H, orange circle) of the metal-free domain in three different arrangements (surface, gray) have different PCSs in the fame of the \(\Delta \chi\) tensor of the metal (Ln3+, circle, black); PCSs from the metal-bearing domain (surface, limon) can be used to determine the \(\Delta \chi\) tensor; b RDCs are described by Eq. (2); the averaged effective tensor (red frame in the center) from the metal-free domain depends on the exchange rate of all conformations (here, three are depicted); c, d illustration of paramagnetic effects PCSs (c) and RDCs (d) in 2D 1H-15 N correlation NMR spectra; for large biomolecules, there might be some overlaid peaks in diamagnetic NMR spectra (concentric circles, blue), while in paramagnetic states, they can be separated due to different metal-nucleus distances (solid circles, green and gray)

$$\Delta {\updelta }^{pcs}=\frac{1}{12\pi {r}^{3}}\left[\Delta {\chi }_{ax}\left(3{cos}^{2}\theta -1\right)+\frac{3}{2}\Delta {\chi }_{rh}{\mathrm{sin}}^{2}\theta \mathrm{cos}2\varphi \right]$$
(1)

PCSs can be measured by comparing differences in chemical shifts (in ppm) of nuclei in biomolecules between paramagnetic and diamagnetic states (Fig. 1c), in which a paramagnetic ion and a diamagnetic ion are rigidly bound to the molecule, respectively (John and Otting 2007). In the case that the perturbati Bermejo and Schwietersons in the structures and dynamics of molecules are insignificant with the incorporation of metal-chelating tags, i.e., the tag is peripherally bound to the surface of the molecules, the tag-free form of proteins may be regarded as a diamagnetic reference (Jensen and Led 2006; Otting 2010; Yang et al. 2016; Müntener et al. 2020).

In multi-domain systems, PCSs collected from the domain bearing the paramagnetic ion (the metal-bearing domain) can be used for the determination of the \(\Delta \chi\) tensor and metal positions by fitting data against the available structure with firmly established programs, i.e., NUMBAT (John et al. 2005; Schmitz et al. 2008), FANTEN (Rinaldelli et al. 2015), and PARAMAGPY (Orton et al. 2020). There can be significant discrepancies between experimental and predicted PCSs if the structural model is not accurate. In this case, PCSs may be incorporated into structure refinement tools, i.e., Xplor-NIH (Schwieters et al. 2003, 2006; Banci et al. 2004; Bermejo and Schwieters 2018) and CYANA (Banci et al. 1996; Bertini et al. 2001; Güntert 2004), as distance and angular constraints to obtain a refined model. If the structure of the metal-bearing domain had never been characterized in advance, the \(\Delta \chi\) tensor and metal positions are concurrently determined during the structure calculation with Rosetta (Bowers et al. 2000; Schmitz et al. 2012; Yagi et al. 2013; Kuenze et al. 2019), where additional NMR restraints, such as the backbone dihedral-angle restraints, are ideally included. In addition, REFMAC5, part of the CCP4 suite, is a powerful tool for joint structural refinement in the combination of X-ray data and paramagnetic NMR restraints (Murshudov et al. 2011; Rinaldelli et al. 2014; Kovalevskiy et al. 2018; Carlon et al. 2019b).

In many cases, PCSs observed from the metal-free domain (i.e., the domain bearing no paramagnetic ion) in multi-domain proteins are smaller than those from the metal-bearing domain due to the larger metal-nucleus distance. However, if the motion of the metal-free domain relative to the other is completely rigid, the determined \(\Delta \chi\) tensor and metal positions should be similar to those calculated by using PCSs collected from the metal-bearing domain. This similarity, however, decreases according to the existence and extent of domain rearrangement motions due to motional averaging (Chen et al. 2016). In systems with higher mobility, e.g., calmodulin (Bertini et al. 2004), PCSs from the domain without a paramagnetic metal failed to fit any single model, as nuclei in the domain have considerably fluctuating PCS values in the reference frame. Therefore, PCSs from the metal-free domain are, in general, only used for selecting the optimal ensemble by minimizing the discrimina0tion between experimental and back-calculated PCS data.

RDCs result from partial alignment of observed molecules caused by the anisotropic magnetic susceptibility of the metals, and they provide information regarding the orientation of the internuclear vector in the reference frame (Eq. (2) and Fig. 1b) (Banci et al. 1998).

$$\Delta {v}_{ij}^{rdc}=-\frac{1}{4\pi }\frac{{{B}_{0}}^{2}}{15kT}\frac{{\gamma }_{i}{\gamma }_{j}}{2\pi {{r}_{ij}}^{3}}h\left[\Delta {\chi }_{ax}\left(3{\mathrm{cos}}^{2}{\theta }^{^{\prime}}-1\right)+\frac{3}{2}\Delta {\chi }_{rh}{\mathrm{sin}}^{2}{\theta }^{^{\prime}}\mathrm{cos}2{\varphi }^{^{\prime}}\right]$$
(2)

where \({\gamma }_{i}\) and \({\gamma }_{j}\) denote the magnetogyric ratios of nuclear spin i and j, respectively, \({r}_{ij}\) is the interaction vector connecting the coupled nuclei i and j, \({B}_{0}\) is the strength of external magnetic field, \(\kappa\) is Boltzmann constant, \(T\) is the temperature, \(h\) is Planck’s constant, and \({\theta }^{^{\prime}}\) and \({\varphi }^{^{\prime}}\) define the orientation of \({r}_{ij}\) in the frame.

Paramagnetic-induced RDCs are typically small; thus, observables are often acquired for covalently bonded nuclei, e.g., backbone 1H-15 N, by comparing the coupling constants of partially aligned (paramagnetic states) and unaligned molecules (diamagnetic states) in the same solution through IPAP-HSQC experiments (Fig. 1d) (Yao et al. 2009; Ottiger et al. 1998).

In contrast to PCSs, RDCs collected from each domain can be represented by an effective tensor, as they are independent of metal-nucleus distance (Bertini et al. 2004). RDCs from the metal-bearing domain can be analyzed together with PCSs through fitting against the available structure model with the programs FANTEN (Rinaldelli et al. 2015) and PARAMAGPY (Orton et al. 2020). However, the PCSs-derived and RDCs-derived \(\Delta \chi\) tensors are somewhat different for two reasons. First, they are averaged differently (Shishmarev and Otting 2013). That is, RDCs can always be described by a single averaged effective \(\Delta \chi\) tensor, independent of the metal position (Fig. 1b). On the other hand, PCSs depend not only on the \(\Delta \chi\) tensor but also on the metal-nuclei distances (Fig. 1a). This difference can cause discrepancies in the experimentally derived tensors when conformational mobility of the paramagnetic center exists. Second, the structures registered in PDB may not be sufficiently accurate in the orientation of some internuclear bond vectors, even if they are determined in similar solution conditions using NMR spectroscopy. Since RDCs are more sensitive than PCSs to such bond vector orientations, the tensors obtained from both can also differ. This structural accuracy can be improved by the joint PCSs/RDCs refinement procedure included in some of the programs mentioned above. RDC and PCS data from mobile residues should not be incorporated into the structural analysis, which can be judged through NMR relaxation data, including longitudinal relaxation rate (R1), transverse relaxation rate (R2), and heteronuclear NOE (hnNOE) (Bertini et al. 2009). To be excluded, the selection criteria for the residues may differ between RDCs and PCSs because of the different effects of local motions.

Due to the domain mobility, RDCs from the metal-free domain might be smaller than those from the metal-bearing domain, since the aligning force produced by the metal and transmitted to the domain would be weaker in such a case. The extent of motions could be easily estimated by the ratio of magnitudes of effective tensors calculated from the two domains (Carlon et al. 2016). The ratio is approximately close to 1 in the complete absence of domain motion.

Obtaining conformational states from paramagnetic data

Once the domain mobility is assessed in the system, PCSs and RDCs can be simultaneously used for characterizing the conformational space sampled by domain rearrangements. This is tackled according to a procedure applied through the following steps (Fig. 2):

  • (a) Refine the structures of the individual domains and assemble the entire molecule.

  • (b) Generate a conformational pool with a sufficient number of structures.

  • (c) Calculate the magnetic susceptibility parameter of the paramagnetic metal for all the structures considered in (b) by using PCSs and RDCs from the metal-bearing domains.

  • (d) Back-calculate the PCSs and RDCs for the metal-free domains of all the structures considered in (b) by using the corresponding magnetic susceptibility parameters obtained in (c).

  • (e) Reconstruct the conformational ensembles satisfying the experimental PCSs and RDCs.

Fig. 2
figure 2

Overview of ensemble reconstruction in a two-domain protein, linear diubiquitin (Ub2), by using PCSs and RDCs: to enable unambiguous chemical shift assignments, the N- or C-terminal Ub was selectively enriched with 15 N-nuclei; the structure of each individual Ub was refined by using paramagnetic data with Xplor-NIH, and the structure model of linear Ub2 was obtained by assembling two Ubs with AIDA; the conformational pool was generated by MESMER; and two approaches were employed for the conformational ensemble reconstruction: MESMER selected a minimal ensemble comprising seven conformations with associated weights that are proportional to their putative population, while MaxOcc depicted the conformational distributions. Almost all conformations selected by MESMER had higher MaxOcc values. Adapt from (Hou et al. 2021)

In step (a), the accurate structure of the metal-bearing domain, especially with refined orientation of internuclear vectors related to RDCs, is obtained by using PCSs and RDCs from this domain. In addition, orientations of internuclear vectors in the metal-free domain are adjusted by using RDCs from this domain. Some programs have been implemented in structural refinement by using PCSs and/or RDCs, starting without or with available structures from X-ray crystallography or NMR (For the use of Xplor-NIH, see Ref. (Banci et al. 2004; Bertini et al. 2009); for the use of CYANA, see Ref. (Banci et al. 1996; Bertini et al. 2001); for the use of Rosetta, see Ref. (Schmitz et al. 2012; Kuenze et al. 2019), and for the use of REFMAC5, see Ref. (Rinaldelli et al. 2014; Carlon et al. 2019b)). Once the structure of each domain is obtained, the AIDA program provides fast docking for domain assembly (Xu et al. 2015, 2014). Some programs, e.g., pyDockTET (Cheng et al. 2008) and Rosetta (Wollacott et al. 2007), are also able to assemble structures of isolated domains into a multi-domain molecule. If the structure of the entire protein is available, replacing each domain with a refined structure is an alternative.

Generation of the conformational pool described in step (b) is typically achieved by treating all domains as rigid domains and randomizing the backbone torsion angles of several residues in the linkers. The unbiased and sufficient sampling from the entire conformational space is essential to reliable ensemble reconstruction. All generated conformations should be located in the topologically allowed space and maintain chain connectivity. The programs RanCh (Bernadó et al. 2007; Tria et al. 2015) and PDB Generator module of MESMER (Ihms and Foster 2015) are the two simplest tools for pool generation in the case that residues comprising the linkers are placed neither in alpha helices nor in beta sheets (Bernadó et al. 2005). Otherwise, the quasi-Ramachandran space of residues in those secondary structures should be carefully considered, such as in the native-like model of the program RanCh (Bernadó and Svergun 2012). Furthermore, the utilization of molecular dynamics (MD) is helpful in sampling a more physically realistic set of conformations. If available, the incorporation of complementary high-resolution constraints, e.g., data from NMR, small-angle X-ray scattering (SAXS), or dipolar electron–electron resonance (DEER), is preferable (Ihms and Foster 2015). In order to simplify the further calculation and visualization of the conformational ensemble, the metal-bearing domain is considered fixed in the reference frame in this step.

Next, in step (c), PCSs and RDCs observed from the metal-bearing domain are fitted by a single set of effective parameters, \(\Delta \chi\) tensors, and the metal position (only for PCSs). Several programs have been developed for such calculations, e.g., PALES (Zweckstetter 2008), PATI (Berlin et al. 2009), REDCAT (Valafar and Prestegard 2004), and MODULE (Dosset et al. 2001) for RDC analysis, and NUMBAT (John et al. 2005; Schmitz et al. 2008) for PCS analysis. The programs FANTEN (Rinaldelli et al. 2015) and PARAMAGPY (Orton et al. 2020) can fit \(\Delta \chi\) tensors and metal coordinates to the atomic coordinates of biomolecules by using PCSs and RDCs, respectively. These parameters can be subsequently used in step (d) to predict PCSs and RDCs for the metal-free domains of all conformations in the generated pool. For this purpose, PyParaTools python library (http://comp-bio.anu.edu.au/mscook/PPT/) is a powerful tool that has been involved in some reweighting programs (see next section for details), such as MESMER (Ihms and Foster 2015). The above outlines the basic processes of data preparation in ensemble reconstruction with any reweighting approaches.

Selected programs available for ensemble reconstruction from PCSs and RDCs are listed in Table 1

Table 1 Selected programs available for capturing structural information from PCSs and RDCs induced through anisotropic metals

Reweighting approaches for ensemble reconstruction

In this section, we review some approaches for conformational ensemble reconstruction based on reweighting (step (e) in the previous section). Reweighting means that the experimental data are used as a posterior to optimize the weights of conformations in a pre-calculated unbiased ensemble with the aim of minimizing the discrimination of experimental and back-calculated data. It is distinct from “restraining” approaches, in which the additional energy terms, as functions of experimental data, are directly incorporated into classical MD force fields during the simulations to generate and analyze possible conformational states (Roux and Weare 2013). Such approaches are preferable for structural characterization of molecules with extensive conformational heterogeneity, such as intrinsically disordered proteins (Bonomi et al. 2017). The broader understanding of this subject may be found in other existing works (Boomsma et al. 2014; Ravera et al. 2016; Bonomi et al. 2017; Rangan et al. 2018; Cárdenas et al. 2020). Here, we focus on reweighting approaches in which PCS and RDC restraints could be incorporated.

Reweighting approaches have the goal of either finding an optimal ensemble with the minimal subset of conformations or calculating the maximum allowed probability (MAP) or the maximum occurrence (MaxOcc) of all considered structures. The former can be achieved with several software packages. First, we selected minimal ensemble solutions for the multiple experimental restraints (MESMER) approach because it is user-friendly and available for nearly any type of observable (Ihms and Foster 2015).

MESMER has been developed for identifying and selecting ensembles that can simultaneously fulfil multiple experimental data, e.g., SAXS, paramagnetic NMR, and DEER. In simultaneous fitting, the relative scale for each experimental dataset is typically pre-set to the inverse of the average fitness obtained from individual fits. The optimal ensemble is iteratively selected via a genetic algorithm with the following steps:

  1. (1)

    K types of predicted data (PCSs and RDCs, etc.) for each structure in the conformational pool (Z structures) are calculated as described in the previous section, and they are compiled into a series of components (Z) as input.

  2. (2)

    A “parent” ensemble pool, comprised of N ensembles with M components, is generated randomly from Z components. This pool is then duplicated to form a “child” ensemble pool with some diversifications through different replacement mechanisms.

  3. (3)

    2 N ensembles generated in (2) are ranked according to their consistency with experimental data.

  4. (4)

    N best-fitting ensembles are selected and used as the “parent” ensemble pool for the next generation.

Steps (2), (3), and (4) are iterated until even the poorest-fitting ensemble in step (4) has a reasonable agreement with all K types of experimental data, or the residual standard deviation (RSD) is less than the pre-set value. Solutions with the total sum of weight in one ensemble slightly less or more than 1 are acceptable.

MESMER has been applied to characterize the conformational states of some biomolecules, e.g., calmodulin (Ihms and Foster 2015), PDZ domains (Delhommel et al. 2017), and linear Ub2 (Hou et al. 2021), by using paramagnetic data. It provides a graphical user interface to streamline all processes from the generation of the conformational pool to the visualization of selected best ensembles as well as representation of the correlation between experimental data and predicted data. However, users should be cautious when simultaneously fitting multiple datasets obtained from different experiments, as they might contain significantly distinct information. The inappropriately pre-set scales for those input datasets would lead to severe overestimation or underestimation of conformational diversity, as the weight for components involved in ensembles that fit one dataset well would easily increase during the iteration. Thus, the relative scales for each dataset need to be carefully tuned. This problem is distinct even when using multiple sets of PCSs collected from protein with incorporated metals at different positions (Hou et al. 2021). In addition, the solution is not deterministic. In essence, it is hard to obtain one single specific solution (ensemble) by executing MESMER repeatedly. Even so, major components (conformers) in N non-unique ensembles should be similar in any meaningful executions.

The Sparse Ensemble Selection (SES) method is an alternative global-fit approach, which can recover a representative conformational ensemble from multiple experimental datasets. Distinct from MESMER, SES requires no problem-specific tuning parameters and provides a deterministic solution, as only the structures comprised of the ensemble, which fits the experimental observables best, are cloned and subsequently complemented during each generation (Berlin et al. 2013; Ravera et al. 2016). In addition, it is important to stress that certain ensembles selected by these approaches only provide possible solutions to recapitulate the experimental data, which are sensitive to certain aspects of conformational sampling. No ensembles can be considered unique and represent the full and complete conformational states in practice (Ozenne et al. 2012).

Rather than finding a solution (the best ensemble) to fit the datasets, some approaches try to calculate reasonable existence probabilities of any considered structures in the ensemble. MAP (Longinetti et al. 2006; Bertini et al. 2007), MaxOcc (Bertini et al. 2010, 2012), and the maximum and minimum occurrence of defined regions, MaxOR and MinOR (Andrałojć et al. 2014, 2016), have been sequentially developed with the aim of estimating the maximum probabilities of single conformation or regions comprised of multiple conformations. The MAP approach utilizes free domain movements (Ravera et al. 2016), while the others calculate corresponding values of conformations in a pre-defined pool. Here, we focus on the MaxOcc, as MaxOR and MinOR are natural extensions of the MaxOcc approach for the estimation of specific combinations of conformations.

MaxOcc analysis can be performed by the MATLAB script (Andrałojć et al. 2016; Gigli et al. 2018). In this case, some further data preparation should be performed in advance. K types of experimental data should be concatenated into a length-J column vector representing all observables. Then it is essential to normalize this experimental vector through some methods, i.e., dividing the data by the square of their sum (Medeiros Selegato et al. 2021). The K types of predicted data from Z structures should also be concatenated into a J \(\times\) Z prediction matrix and normalized using the same procedure.

Then, the fitting calculation is repeated by increasing the weight of a single conformation until there is no acceptable ensemble comprising this conformation with one certain weight to explain the experimental data reasonably. This certain weight is defined as the MaxOcc of this conformation. Namely, MaxOcc is the maximum allowed weight for a conformation in any possible ensembles, which does not violate the experimental data. If a conformation has a higher MaxOcc value, this implies that it is more likely to be visited by molecules due to their intrinsic dynamics in solution. However, it should be noted that in principle, even if a conformation has higher-MaxOcc, it can be absent in reality, as MaxOcc does not guarantee minimum occurrence. In this sense, one may rather safely exclude the existence of lower-MaxOcc conformations in the ensemble.

MaxOcc has been used to evaluate the conformational states on the basis of paramagnetic data, SAXS data, or DEER data in many systems, e.g., calmodulin (Bertini et al. 2010), MMP1(Cerofolini et al. 2013), transactivation response element (TAR) RNA from the HIV-1 virus (Andrałojć et al. 2016), the capsid of human immunodeficiency virus type 1 (Carlon et al. 2019a), and linear Ub2 (Hou et al. 2021). One excellent work reported a direct relation between the results from SES and MaxOcc: SES is more likely to select conformations with highest-MaxOcc values (Medeiros Selegato et al. 2021).

As stated above, MESMER and SES provide an optimal ensemble of several discrete conformations with associated statistical weights that are proportional to their population for recapitulating the experimental data. It is unrealistic that those conformations truly exist with the calculated weights in solution. Rather, they can be viewed as representative snapshots taken from a large continuum of states; each of them represents a group of similar conformations, namely a certain region of the conformational space. The degree of conformational variability manifested in the obtained ensemble gives us functional insights into the molecule. As an example, we have captured a very compact conformer of free linear Ub2 by performing a MESMER analysis, which proposed that the dynamics of linear Ub2 is even more complicated than previously considered (Hou et al. 2021). In this study, MaxOcc further allowed us to discern which conformations or conformational regions (MaxOR) are more likely to be visited by the proteins, having expanded our insight into their intrinsic dynamics. In addition, the existence of any structure considered can be evaluated by MaxOcc, which is helpful for probing particular aspects of the conformational fluctuations. For example, the introduction of two distinct mutations, E16Rp or E18Rp (where “p” represents the mutation introduced at C-terminal (proximal) Ub), in linear Ub2 drastically changed the PCSs, strongly implying the perturbation of conformational space. Notably, the rank of MaxOcc of a target-bound conformation changed significantly among the whole structural pool. Such perturbed conformational sampling of free linear Ub2 was shown to correlate with binding affinities for Ub-binding proteins, HOIL-1L-NZF (Fig. 3), thus providing a profound insight into the binding mode of linear Ub2 (Hou et al. 2021).

Fig. 3
figure 3

a An example of dramatically changed PCSs by the introduction of mutations in free linear Ub2: PCSs collected from the metal-free Ub of wild-type (cyan bar), E16Rp (orange triangle), and E18Rp (red square) of linear Ub2, with paramagnetic tag (PSPy-6 M-DO3MA-Tm3+) (Yang et al. 2016) at D39C of N-terminal (distal) Ub. b, c MaxOcc of top 1,000 conformers calculated using PCSs from wild-type (purple or green), E16Rp (orange), and E18Rp (red) of linear Ub2 is sorted and plotted in descending order. The rank of MaxOcc of the bound conformer of linear Ub2 in complex with HOIL-1L-NZF (PDB code: 3b0a) (Sato et al. 2011), shown as triangles by corresponding color, dropped (elevated) in E16Rp (E18Rp), implying that the E16Rp (E18Rp) decreased (increased) the probability for free linear Ub2 to adapt the bound state. Correspondingly, dissociation constant (Kd) increased by over 14-fold for E16Rp and decreased by fivefold for E18Rp, implying contribution from the conformational selection mechanism. Adapted from (Hou et al. 2021)

In the linear Ub2 example described above, both the high flexibility of the linker and the weak interaction between domains result in a certain degree of domain movement continuity among several conformations with similar energies. Such a situation probably presents one of the most difficult cases for reconstructing conformational ensembles based on paramagnetic restraints. If a protein takes only a few stable conformations, then domain motion can be regarded as an exchange between these states. In such a case, the methods reviewed in this paper are likely to work more robustly. This is because the complexity of PCS and RDC data is greatly reduced when the number of stable conformational states is small.

Conclusions

To conclude, paramagnetic effects can be a rich source of structural restraints for characterizing the conformational states of multi-domain proteins. The anisotropic paramagnetic restraints are very sensitive to metal-nuclear distances (PCSs) and relative orientations of metal-nuclear (PCSs) or internuclear vectors (RDCs). Therefore, they can identify the conformations that are more likely to visit due to their intrinsic dynamics, thus providing structural insight into their physiological behaviors. A number of programs have been developed for the determination of magnetic anisotropy susceptibility tensors, refinement of solution structure, assembly of multiple individual domains, generation of conformational pools, prediction of paramagnetic data for other nuclei in biomolecules, and reconstruction of conformational ensembles. In combination with these programs, visualization of populated conformational space is possible from averaged paramagnetic experimental data. Certainly, in the near future, the use of paramagnetic NMR restraints induced through anisotropic paramagnetic ions would be a powerful approach for visualizing the dynamic behavior of proteins under physiological conditions, or even in living cells.