Introduction

G protein-coupled receptors (GPCR) comprise one of the largest families of integral membrane proteins in eukaryotes. GPCR serve central roles in amplifying and regulating a wide range of intracellular responses to extracellular stimuli. In response to ligand activation, GPCR undergo conformational changes that influence coupling with intracellular partners, including heterotrimeric G proteins (Gα, Gβ, Gγ subunits), β-arrestins, G protein-coupled receptor kinases (GRK), and other effectors [1,2,3]. It is estimated that between 27–50% of FDA-approved drugs interact with GPCR targets including classes of drugs such as beta-blockers, antihistamines, and antipsychotics [4, 5].

All GPCR share a common topology featuring a core bundle of seven transmembrane (TM) α-helices with the N-termini and C-termini located on the extracellular and intracellular sides of the cell membrane, respectively. The extracellular and intracellular loops (ECL and ICL) are the protein segments that connect adjacent TM domains (ECL1–3, ICL1–3). The ECL and ICL of GPCR have lower sequence and structural conservation than the TM domains. With respect to the available known GPCR structures (mostly Class A GPCR), ECL2 is generally the longest and most diverse in terms of amino acid identity and three-dimensional structure (Fig. 1). Despite low sequence conservation, an overwhelming majority of GPCR contain a disulfide bond between highly conserved cysteine residues in ECL2 and the extracellular end of TM3. Based on the analysis of 367 GPCR sequences representing members from Class A, B1, B2, C, and F that were downloaded using the GPCRdb alignment tools [6, 7], 89% (327 out of 367) of sequences contain the conserved cysteine residues in ECL2/TM3. Also, the disulfide bond between the cysteine sidechains is observable in 94% (47 out of 50) of representative crystal structures of unique GPCR (as of May 2018). Three lipid receptors (LPAR1, S1PR1, CB1) that have known crystal structures lack the conserved disulfide bond between ECL2/TM3. Instead, they contain an intra-loop disulfide bond that constrains the loop conformation.

Fig. 1
figure 1

Extracellular loop comparison of seventeen superposed GPCR crystal structures. The PDB IDs of the superposed GPCR crystal structures above are as follows: Bovine RHO (1GZM, green), B2AR (2RH1, orange), B1AR(2VT4, magenta), Squid RHO (2Z73, gold), AA2AR (3EML, cyan), DRD3 (3PBL, pink), H1R (3RZE, maroon), ACM2 (3UON, light blue), CXCR4 (3ODU, brown), S1PR1 (3V2Y, dark green), PAR1 (3VW7, yellow), NTR1 (4GRV, blue), ACM3 (4DAJ, light brown), OPRK (4DJH, purple), OPRM (4DKL, dark blue), NOP (4EA3, salmon), OPRD (4EJ4, dark brown)

For many GPCR, ECL2 plays important roles in GPCR activation, orthosteric ligand binding, and allosteric ligand interactions [8, 9]. Mutagenesis experiments on ECL2 of the complement C5a receptor and thrombin receptor resulted in constitutively active GPCR [10, 11]. These findings suggest that in some GPCR, ECL2 functions as a negative regulator that dampens signaling by restricting the transition to active receptor states in the absence of endogenous ligand. However, ECL2 mutagenesis doesn’t uniformly confer constitutive GPCR activity. For example, the A204E mutation in ECL2 of the ghrelin receptor resulted in diminished constitutive activity [12]. Thus, it is generally understood that ECL2 plays a role in GPCR function, but many of the details are receptor-dependent.

Given that the binding pocket features of closely-related GPCR are relatively similar, there must be other structural aspects that give rise to observed differences in receptor-ligand specificity. Indeed, the diversity among ECL2 amino acid sequences and structural features even between closely-related GPCR contributes to receptor-specific ligand interactions [13]. Table 1 shows examples of GPCR crystal structures that display a range of receptor-ligand interactions with a variable number of ECL2 residues.

Table 1 GPCR structures with ECL2 contacts to the crystallized ligand

Template-based modeling methods rely on an underlying structural similarity which is often lacking when comparing ECL2 between members of the GPCR family. GPCR loop segments display low sequence conservation relative to the TM domains and tend to exhibit variable lengths which inevitably introduces gaps in GPCR sequence alignments. Furthermore, there are instances of loops with low structural similarities despite having relatively high sequence identities by homology modeling standards. For reliable homology-based prediction of ligand-receptor interactions, the pairwise percentage identity of target-template sequence identity is suggested to be around 35–40% based on the GPCR Dock assessments from 2008 to 2010 [30, 31]. For example, the dopamine receptor D2 (DRD2) homology modeling target shares 66.9% and 36.8% sequence identity overall with the GPCR templates, DRD3 (PDB: 3PBL) and 5HT2C (PDB: 6BQH), respectively. While these templates certainly meet or exceed the suggested 35–40% sequence identity threshold for reliable homology modeling, the DRD3 and 5HT2C crystal structures would be poor templates for the ECL2 segment of the DRD2 target (Fig. 2).

Fig. 2
figure 2

ECL2 comparison of GPCR that share high percentage sequence identity. The DRD3 (3PBL, orange), DRD2 (6CM4, cyan), and 5-HT2C (6BQH, magenta) crystal structures are shown to highlight the structurally variable ECL2 segments

Although not a comprehensive list, Table 2 shows several GPCR with more than two crystal structures available at the beginning of this study. The dynamic nature of these loops is apparent in these sets of superposed crystal structures. The ECL2 Cα atoms in different structures of a single protein have root mean square deviation (RMSD) values ranging from 0.3 to 1.5 Å. Traditionally the gold standard of structure prediction is achieving top-ranking models with sub-angstrom accuracy (Cα RMSD under 1.0 Å) to the reference “native” structure. However, ECL2 experimental variability in different crystal structures of the same GPCR can exceed this value. For the ECL2 targets in this benchmark, it is more reasonable to consider methods that produce top-scoring models with near-atomic accuracy (Cα RMSD within 2.5 Å) as the threshold for success [32]. Often, models with near-atomic accuracy are sufficient for applications downstream of modeling [33].

Table 2 GPCR with more than 2 crystal structures

Ab initio loop modeling can be described as a mini-protein folding problem with success largely depending on two general components: sampling and scoring. An extensive search to sample loop conformational space is implemented with the target sequence. Also, a method to evaluate or score the loop model conformations in order to select near-native structures (close to reference structure) is necessary. The prediction involves generating model loop structures and ranking them based on some criteria (i.e. energy functions as in Rosetta and MOE) that desirably correlate with experimentally determined structures. Our primary research question at the start of this benchmark study was, which of the available modeling software methods can accurately sample and score near-native loop models?

Methods

The reference GPCR crystal structures used in this benchmark for ECL2 modeling are listed in Table 3. Crystal structures with poorly resolved or completely missing residues in the ECL2 segment were excluded from the benchmark. The highest resolution structures were chosen as references for individual GPCR that had multiple crystal structures available. The preferred chain of each benchmark target PDB file was extracted using the GPCRdb structure browser application for further preparation.

Table 3 GPCR reference structures used in loop modeling benchmark

The molecular operating environment (MOE version 2016.08) software package was used for GPCR structure preparation and visualization. The Rosetta software suite (Rosetta version 3.8) and MOE were used for modeling the ECL2 segment of the benchmark reference structures. For each of the GPCR in the benchmark, the Robetta fragment server was used to generate the 9mer and 3mer fragment library files that are necessary for Rosetta CCD and KICF loop modeling methods [124]. Fragment generation for each reference GPCR excluded any PDB data from the GPCR crystal structure itself, as verified by manual inspection of PDB ID codes in each fragment file.

Structure preparation for GPCR targets using MOE

The benchmark GPCR structures listed in Table 3 were downloaded from the PDB [125] and prepared for loop modeling with MOE. Ligands and water molecules were deleted from the structure files. For each GPCR structure, the “QuickPrep” process was used to streamline the structure preparation process. QuickPrep corrects any structural issues (i.e. residues with alternate locations, missing atoms, chain breaks, etc.) that often accompany structural data, adds explicit hydrogens and partial charges with “Protonate 3D,” and performs a tethered-receptor energy minimization using the Amber12EHT [126,127,128] forcefield (RMS gradient of 0.1 kcal/mol/Å). The final energy minimization step was performed to improve any inaccurate geometries derived from the crystallographic data. During the minimization process, the receptor atoms were tethered to ensure that changes to the initial positions are modest.

Description of loop modeling software

Rosetta loop modeling protocols can be used for loop refinement or loop reconstruction. This study only implements modeling in the context of loop reconstruction—ab initio/de novo prediction of the “native” loop conformation based on amino acid sequence, but the initial backbone and sidechain conformations were discarded prior to modeling. Loop refinement, on the other hand, is utilized in finding lower energy conformations starting from a given loop conformation that is potentially close to the “native” structure.

The sampling process was implemented in two stages with iterations of Monte Carlo simulated annealing: An initial low-resolution/coarse-grained stage where the sidechain atoms were represented as “centroids” and a high-resolution/full-atom stage where the sidechain atoms were explicitly represented. This process was coupled with one of several scoring functions. Figure 3 shows a schematic overview of the general Rosetta loop modeling process. The available loop modeling algorithms in Rosetta differ in conformational search (sampling) strategy and solutions to the loop closure problem.

Fig. 3
figure 3

Overview schematic of Rosetta loop modeling process

The cyclic coordinate descent (CCD) algorithm proceeds by optimizing the dihedral angles through consecutive loop residues from the N- to C-terminus where the goal is to minimize the distance between the free C-terminus end of the loop and the fixed anchor position [129, 130]. The CCD algorithm in Rosetta uses experimentally-derived fragment libraries to guide the conformational search during loop modeling. The fragment libraries contain the coupled phi/psi dihedrals of peptide segments with 9 and 3 residues (9mers, 3mers) from the PDB.

The kinematic closure (KIC) method selects three pivot atoms (remaining loop backbone atoms are designated non-pivot) and divides the loop into two segments for conformational sampling of the non-pivot phi, psi dihedral angles. Subsequently, the pivot dihedral angles (six phi, psi angles for three pivots) were analytically solved to position each rigid segment for loop closure [131]. The standard KIC protocol for loop modeling has subsequently been replaced by the next generation KIC (NGK) and KIC with fragments (KICF) methods.

The next generation KIC (NGK) algorithm employs intensification strategies during non-pivot conformational sampling in both low- and high-resolution stages of the loop modeling process [132]. In the high-resolution stage, NGK implements additional annealing strategies that modulate the energy function to overcome large energy barriers. The intensification strategies involve (1) using neighbor-dependent Ramachandran distributions (Rama2b term) to select phi/psi dihedral combinations during sampling and (2) independently sampling ω angles based on observations in high-resolution crystal structures. Traditionally, the planar character of the peptide bond restricts the ω dihedral angle to either 180° for the common trans-configuration or 0° for the less common cis-configuration. However, analyses of high-resolution protein structures concluded that trans peptide ω values can vary by more than 25° from planarity in some cases, and that the non-planar character of peptide bonds are more common than previously known [133]. During the NGK method, ω sampling is performed independently of the phi/psi dihedrals from a Gaussian around the observed mean of 179.1° ± 6.3° [132]. The annealing strategies implemented in the NGK method involve ramping the weights of (1) the repulsive component of the Lennard-Jones potential and (2) the Rama score (distinct from Rama2b term used in intensification strategy), which is the likelihood of a phi/psi combination occurring given an amino acid type. While the intensification strategies (Rama2b and ω sampling) are applied in both low- and high-resolution stages of loop modeling, the annealing strategies are only implemented in the high-resolution stage. Overall, these intensification and annealing strategies were found to greatly improve loop modeling accuracy compared to the standard KIC method in a previous benchmark [132].

The KIC with fragments (KICF) method combines the fragment library sampling strategy from the CCD method with the KIC loop closure method. The main difference between this method and the NGK method is the way in which loop backbone conformations are sampled. The fragment-based sampling of phi/psi/omega dihedral angles consists of four major steps: (1) one of the given fragment libraries is selected at random and searched for alignment frames where fragments overlap with subsegments of the loop; (2) one of the alignment frames and fragments within that frame is selected at random; (3) the phi/psi/omega dihedral angles of that fragment are applied to the loop subsegment; and finally (4) kinematic closure (KIC) calculations are performed to achieve loop closure.

The Rosetta all-atom energy function to evaluate/score biomolecular structures and models has evolved over many versions (Score12, Talaris2013, Talaris2014), with the Rosetta Energy Function 2015 (REF2015) becoming the default scoring function as of July 2017 [134,135,136,137]. However, the most recently available scoring function (REF2015) and loop modeling algorithm, (KICF), had not been tested on the 12-residue, or 14–17 residue loop modeling benchmark sets used by Rosetta developers (as of May 2018).

The MOE loop modeler application has a de novo search method and a PDB search method for sampling potential loop backbone conformations. For this study, only the de novo search method was used to model the ECL2 of the benchmark GPCR. MOE loop modeling protocols also consisted of distinct low-/high-resolution stages for loop modeling. The initial de novo search stage only deals with the loop backbone atoms, generating potential loop conformations that were ranked by an initial coarse scoring function before advancing to the full-atom stage. Figure 4 shows a schematic overview of the general MOE loop modeling process.

Fig. 4
figure 4

Overview schematic of MOE loop modeling process

MOE loop modeling uses an extension of the CCD algorithm, full CCD (FCCD) [138]. This method differs from CCD by solely operating on the Cα backbone atoms with pseudo bond angles and dihedral angles, optimizing both terms to achieve loop closure. Probability densities calculated from high-resolution PDB structures resulted in specific profiles for the Cα pseudo bond and dihedral angles. These profiles were used for random sampling of loop Cα conformational space during the de novo search stage, followed by FCCD loop closure.

After loop closure by FCCD, it is necessary to optimize the Cα backbone atoms. This is accomplished by using a component of the protein chain reconstruction algorithm (PULCHRA) method which employs a steepest descent gradient minimization and a simple harmonic potential to optimize the Cα positions before full backbone reconstruction [139]. To reconstruct backbone atom positions from the Cα loop traces generated, MOE uses the backbone building from quadrilaterals (BBQ) method which is based on proximal distance geometries for sets of four sequential Cα atoms in the loop. Additionally, MOE backbone packer performs a minimization to relieve any strained backbone geometries and atom clashes. This step is followed by a final geometry and duplicate check to ensure that non-redundant backbone conformations with reasonable bond and dihedral angles are being evaluated by the coarse scoring function. The top-ranking loop backbone conformations are advanced to the full-atom stage.

In the full-atom stage, sidechain atoms were added to the loop and optimized with respect to the sidechain orientations. The entire loop segment was energy minimized through multiple steps before the final scoring step. The full-atom loop conformations generated were scored using generalized-born volume integral (GBVI). The potential energy of the system using GBVI has been shown to recover loop conformations close to the native from the Jacobson Loop Decoy Dataset [140].

Rosetta and MOE loop modeling methods

A total of 1000 ECL2 models were generated for the GPCR benchmark targets after structure preparation using each loop modeling method. All 28 reference structures in the benchmark were reconstructed with the Rosetta methods discussed previously, but only 7 of the 28 reference structures were modeled with MOE. This was due to non-competitive performance on the shorter ECL2 targets which will be discussed in the "Results" section.

The following descriptions are the main Rosetta command line options used to perform NGK, KICF, and CCD loop modeling methods in low-resolution and full-atom stages. The remodel stage was the term for the initial coarse-grained modeling step. The remodel stage samples loop backbone conformations using a reduced representation of amino acid sidechains and a Rosetta low-resolution scoring function. This was initiated by the following options associated with the ‘-loops:remodel’ command: ‘perturb_kic’, ‘perturb_kic_with_fragments’, ‘perturb_ccd’. A loop definition file was separately generated for each reference GPCR to define the residues of the loop (ECL2) to be remodeled. By enabling the “extend loop” field in the loop files, the target loop segment’s bond lengths, angles, and omega torsions were idealized, and all phi/psi values were replaced randomly from Ramachandran space to give an initial closed conformation at the start of remodel stage. This was to ensure that loop reconstruction wasn’t influenced by the initial loop conformation. Subsequently, the loop phi/psi dihedrals were sampled using the fragment data described previously, followed by KIC or CCD calculations to achieve loop closure. Finally, the loop underwent minimization using the Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm and a Metropolis criterion acceptance test using the Rosetta low-resolution scoring function, score4L. The number of Monte Carlo steps in both stages of loop modeling was determined by the number of outer and inner cycles, (outer_cycles * inner_cycles). The default number of outer (5) and inner cycles (evaluated by: min(1000, number_of_loop_residues * 20)) were used for all Rosetta loop modeling jobs in this benchmark study. Loop poses were set to the lowest energy conformation evaluated at the end of each outer cycle. The temperature decreased exponentially from 1.5 to 0.5 KT from the first step to the last. The refine stage was the term for the all-atom loop modeling step and was activated by the following options associated with the ‘-loops:refine’ command: ‘refine_kic’, ‘refine_kic_with_fragments’, ‘refine_ccd’. This stage implements a similar scheme to the perturb stage, with major differences in the all-atom treatment of the loop during conformational sampling and the scoring function used to guide and evaluate loop conformations during model production. Out of the three Rosetta loop modeling algorithms used in this benchmark, NGK was the quickest among the more accurate methods at generating a given set of loop models. Therefore, if subsequent sets of ECL2 models of benchmark targets were produced, loop modeling was performed initially with the NGK algorithm.

The following MOE loop modeling options were used during the de novo search stage, full-atom model generation, and final model scoring. The loop sequence for each GPCR ECL2 was selected in the sequence editor window and the following options were provided in the main MOE Loop Modeler window: only the de novo search method was enabled, the default RMSD limit of 0.50 was decreased to 0.25 Å, the max iterations and energy window were set to 1000 and 10, respectively. The number of de novo search runs and final models built were set to 1000 total. Subsequently, SVL batch files were created to run the loop modeler jobs on a high-performance computing cluster. Due to the molecular database (.mdb files) storage limitations, 10 sets of 100 final models or 20 sets of 50 final models were generated.

Filtering and optimization

The majority of GPCR share a conserved disulfide bond between Cys45.50 in ECL2 and Cys3.25 at the top of the third transmembrane domain. In the 28-receptor benchmark set used in this study, 25 of the receptors share this structural constraint. We assessed the utility of using the S–S distance as a filter for improving ECL2 models. Distance-based filtering used a cut-off of 5.1 Å. This distance is approximately double the van der Waals contact distance for sulfer and was selected to avoid the possibility of selecting loop models in which atoms occur between the two sulfur atoms expected to form a disulfide bond. Disulfide bonds were constructed in the top 10 scored models meeting this distance cutoff followed by geometry optimization of the loop using the MOE software.

Results and discussion

Loop modeling performance throughout this benchmark study was assessed by comparing de novo models of ECL2 for proteins from Table 3 to the crystallographic reference structures indicated in the same table. Superpositions were performed for residues not modeled de novo before calculation of (Cα) RMSD values. The metrics reported throughout this section include: lowest RMSD model (LRM), top scored model (T1), lowest RMSD model in the top 10 scored (T10), and lowest RMSD model in the top 25 scored (T25).

Rosetta scoring function comparison

To decide which scoring function to use with Rosetta loop modeling, the NGK loop modeling algorithm was used to sample ECL2 conformations for the first four targets in Group 1 (Table 3) using both the Rosetta Energy Function 2015 (REF2015) and its predecessor, Talaris 2014. The LRM results show that models meeting the near-atomic accuracy metric of 2.5 Å were sampled in each run (Fig. 5). However, when modeling loops of unknown structure, the T1 is more relevant than the LRM. In 3 out of 4 cases, REF2015 improved reference structure prediction accuracy (lower RMSD) for the top scored models. These data suggest that the REF2015 scoring function is more suitable for identifying models closer to the reference ECL2 structures in the benchmark. Furthermore, the most recent energy function has been parametrized to estimate energies in units of kilocalories per mole, whereas all previous Rosetta energy functions used arbitrary units [137]. Therefore, REF2015 was used in the Rosetta loop modeling protocols for all benchmark targets and method comparisons.

Fig. 5
figure 5

Energy function comparison with Rosetta NGK loop modeling. Comparisons of the energy function influence on loop modeling performance is shown for the shortest GPCR ECL2 targets in the benchmark. The lowest RMSD models (LRM), top scored models (T1), lowest RMSD models in the top 10 scored (T10), and lowest RMSD models in the top 25 scored (T25) are shown from 1000 models generated for each ECL2 target using NGK loop modeling. The number in parentheses represents the ECL2 length

Group 1 loop modeling results

Loop modeling was performed using three algorithms in Rosetta and one algorithm in MOE for the Group 1 ECL2 targets from Table 3. The results from Group 1 in Fig. 6a show that the NGK and KICF methods were able to sample models with better accuracy than either CCD or MOE based on the RMSD of the LRM to the reference ECL2 structures. Notably, ECL2 of the muscarinic acetylcholine receptor M4 (ACM4, Fig. 7) and dopamine receptor D3 (DRD3) targets were modeled with sub-angstrom accuracy using both the NGK (LRM = 0.34, 0.71 Å, respectively) and KICF (LRM = 0.35, 0.76 Å, respectively) algorithms. For the ECL2 of CXCR4 (Fig. 8), loop modeling using KICF also displayed sub-angstrom accuracy for the LRM (0.50 Å). However, none of the targets were modeled with sub-angstrom accuracy using the CCD algorithm or MOE. Six out of the seven ECL2 targets in Group 1 were modeled with near-atomic accuracy (RMSD ≤ 2.5 Å) using the NGK and CCD algorithms. Additionally, all seven of the targets were modeled with near-atomic accuracy using the KICF algorithm. From the Group 1 targets, there were only two cases where the LRM from MOE had RMSD values below 2.5 Å to the reference structure. The average LRM values for the NGK, KICF, and CCD algorithms were 1.66, 1.09, and 2.07 Å, respectively. In comparison, the average LRM using the MOE loop modeling algorithm was much higher, 3.37 Å.

Fig. 6
figure 6

Group 1 loop modeling benchmark set. a The RMSD values of the LRM and T1 are shown out of 1000 models generated for Group 1 ECL2 targets using NGK, KICF, CCD, and MOE loop modeling algorithms. In total, the LRM had sub-atomic accuracy in two and three cases when using the NGK and KICF algorithms, respectively. Additionally, the LRM had near-atomic accuracy in seven cases when using the KICF algorithm and six cases when using the NGK or CCD algorithms. b The RMSD values of the T1 is compared to the T10 and T25 using NGK, KICF, CCD, and MOE

Fig. 7
figure 7

ECL2 models of ACM4 superposed with reference structure. a The LRM (orange) and T1 (magenta) out of 1000 total models generated using NGK loop modeling had RMSD values of 0.34 Å and 0.40 Å to the reference structure (green). b The LRM and T1 out of 1000 total models generated using KICF loop modeling had RMSD values of 0.35 Å and 0.62 Å to the reference structure

Fig. 8
figure 8

ECL2 models of CXCR4 superposed with reference structure. a The LRM (orange) and T1 (magenta) out of 1000 total models generated using NGK loop modeling had RMSD values of 3.08 Å and 7.02 Å to the reference structure (green). b The LRM and T1 out of 1000 total models generated using KICF loop modeling had RMSD values of 0.50 Å and 4.19 Å to the reference structure

While generating ECL2 models with sub-angstrom or near-atomic accuracy overall is desirable, sampling loop conformations is only one aspect of structure prediction when the target structure is unknown. Loop models with low RMSD values to the target must also be scored or ranked favorably so they can be distinguished from the rest of the generated models. To evaluate the scoring component of the loop modeling protocols, the lowest RMSD within the top 1, 10, and 25 scored models (T1, T10, T25) compared to the reference ECL2 structure is illustrated for Group 1 in Fig. 6b. Overall, the MOE loop modeling algorithm had a substantially higher average RMSD value for the T1 (7.96 Å) compared to the algorithms used within Rosetta. The NGK, KICF, and CCD algorithms had average T1 values of 3.61, 3.40, and 4.56 Å, respectively. The T1 produced by CCD or MOE loop modeling algorithms had RMSD values outside of the near-atomic accuracy threshold for all seven Group 1 ECL2 targets. The NGK and KICF algorithms were both able to produce T1 with RMSD values below the near-atomic accuracy threshold for two of the targets. For a majority of the Group 1 targets, selecting higher-quality loop models than the T1 was necessary to reliably study receptor-ligand interactions through docking experiments. Therefore, we sought to establish selection guidelines for the number of final loop models to retain from the top scored models. Specifically, we wanted to assess the accuracy of final models within subsets of the top 10 or 25 scored models produced. For every ECL2 target in Group 1, the RMSD values for T10 and T25 are lower than the RMSD value for the T1, regardless of sampling methodology. However, expanding the number of retained ECL2 models from the top 10 to the top 25 scored did not consistently result in substantially lower RMSD models. In Group 1, there were only 3 of 28 total cases in which the T10 had an RMSD value above the 2.5 Å threshold, but the corresponding T25 had an RMSD below that threshold. Models with near-atomic accuracy were generated and scored within the top 25 for the targets CRFR1 (using KICF), ACM4 (using CCD), and US28 (using CCD) that were not scored within the top 10. However, for most of the sampling algorithm and target combinations, there was not a significant advantage in retaining the top 25 scored models rather than the top 10 out of 1000 models total.

Group 2 loop modeling results

All Group 2 ECL2 targets from Table 3 were modeled using the three loop modeling algorithms in Rosetta. Loop modeling results for Group 2 targets (Fig. 9) show that the NGK and KICF methods were able to sample loop conformations of the cannabinoid receptor type 1 (CB1) ECL2 with sub-angstrom accuracy overall (NGK/KICF LRM = 0.85/0.82 Å, Fig. 10). Loop modeling with the KICF method also achieved sub-angstrom accuracy with the longest loop in Group 2, CCR5 (LRM = 0.75 Å). In terms of models with near-atomic accuracy (including models with sub-angstrom accuracy), there were three and four cases where the NGK and KICF algorithms sampled loop conformations with RMSD values ≤ 2.5 Å, respectively. There were three cases where the CCD algorithm sampled loop conformations with near-atomic accuracy, but no models with sub-angstrom accuracy were generated.

Fig. 9
figure 9

Group 2 loop modeling benchmark set. a The RMSD values of the LRM and T1 are shown out of 1000 models generated for Group 2 ECL2 targets using NGK, KICF, and CCD algorithms. In total, the LRM had sub-atomic accuracy in one and two cases when using the NGK and KICF algorithms, respectively. Additionally, the LRM had near-atomic accuracy in four cases when using the KICF algorithm and in three cases when using the NGK or CCD algorithms. b The RMSD values of the T1 is compared to the T10 and T25 using NGK, KICF, and CCD. The upper and lower dotted lines represent the near-atomic and sub-angstrom accuracy thresholds, respectively

Fig. 10
figure 10

ECL2 models of CB1 superposed with reference structure. a The LRM (orange) and T1 (magenta) out of 1000 models total using NGK loop modeling had RMSD values of 0.85 Å and 4.32 Å to the reference structure (green). b The LRM and T1 when KICF loop modeling was used had RMSD values of 0.82 Å and 1.93 Å to the reference structure

The T1 for a majority of Group 2 targets had a substantially larger RMSD value than the LRM which is consistent with the results from Group 1. However, for the CB1 ECL2 target the T1 using KICF displayed near-atomic accuracy to the reference structure with an RMSD of 1.93 Å and the LRM was scored within the top 10 models (Figs. 7, 8b). On the other hand, the T1 found using NGK had an RMSD of 4.32 Å to the reference structure and the LRM was not scored within the top 10 or 25 scored models. The lowest RMSD model found in the top 10 scored models using the NGK method displayed near-atomic accuracy to the reference structure with an RMSD of 1.39 Å. For the CCR5 ECL2 target, both methods that use fragment assembly (KICF and CCD) outperformed the NGK method in all four metrics. Notably, the LRM and T10 values from the KICF method with CCR5 were 0.75 and 0.81 Å.

In Group 2, the smoothened receptor (SMO) ECL2 was the most troublesome target for all three Rosetta loop modeling algorithms. The KICF algorithm yielded the most accurate LRM with an RMSD of 1.58 Å to the reference structure. The LRM found using the NGK and CCD algorithms had RMSD values of 6.22 and 5.84, respectively. However, the top scored model from this method had an RMSD of ~ 20 Å. While the ECL2 is situated just above the center of the TM bundle in the reference structure, SMO (Class F GPCR) differs from the other benchmark structures in many ways. Particularly, SMO has a longer ECL1 than other known GPCR structures (mostly Class A) and an extracellular domain (ECD) linker region that essentially form a lid over ECL2 and the TM bundle center (Fig. 11). The long ECL1 and ECD linker regions might sterically hinder a search for ECL2 loop conformations that are close to the reference structure. In other words, loop models that position ECL2 away from the TM bundle center, ECL1, and the ECD linker regions may be scored better. Since the SMO reference structure contained a co-crystallized ligand with contacts to the ECLs (ECL2 contacts shown in Table 1), it is also plausible that the conformation of the reference ECL2 is not as energetically favorable when the ligand is absent. A second set of ECL2 models (n = 1000) was generated for SMO using the NGK algorithm, but the ECD linker domain on the N-terminus was deleted prior to loop modeling. Out of the second set, the T1 and LRM were the same and had an RMSD value of 3.30 Å to the reference ECL2. This is a significant improvement over the T1 and LRM RMSD values from the initial set of models produced by NGK which were 17.0 Å and 6.22 Å, respectively. This demonstrates that the steric hindrance provided by the ECD linker domain is one impediment to sampling loop conformations similar to the reference SMO structure.

Fig. 11
figure 11

ECL2 models of SMO superposed with reference structure. a Side view of SMO crystal structure (PDB:4JKV) highlighting the native ECL2 (green) buried underneath ECL1 (gray) and the ECD linker (salmon). b Top view of the extracellular side of the SMO crystal structure. c The LRM (orange), T1 (magenta), and T10 (cyan) using the NGK algorithm had RMSD values of 6.22 Å, 17.0 Å, and 12.7 Å to the reference structure, respectively. d The LRM, T1, and T10 using the KICF algorithm had RMSD values of 1.58 Å, 19.9 Å, and 6.89 Å to the reference structure, respectively. The ECD linker and ECL1 regions were hidden in panels C and D to visualize ECL2 and models clearly

Group 3 loop modeling results

Group 3 targets from Table 3 were modeled using the Rosetta loop modeling algorithms. The results from modeling the ECL2 of Group 3 targets (Fig. 12) show that KICF was the only algorithm capable of sampling ECL2 models with sub-angstrom accuracy relative to the reference structures (2 of 7 targets). The LRM generated by KICF for the P2Y1R (Fig. 13b) and AT2R ECL2 targets had sub-angstrom accuracy with RMSD values of 0.63 and 0.54 Å, respectively. The LRM of P2Y1R had near-atomic accuracy (RMSD ≤ 2.5 Å) to the reference structure for all three loop modeling algorithms, suggesting it is a relatively ‘easy’ case.

Fig. 12
figure 12

Group 3 loop modeling benchmark set. a The RMSD values of the LRM and T1 are shown out of 1000 models generated for Group 3 ECL2 targets using the NGK, KICF, and CCD algorithms. In total, the LRM had sub-atomic accuracy in two cases when using the KICF algorithm. Additionally, the LRM had near-atomic accuracy in three cases when using the KICF algorithm and in single cases when using the NGK or CCD algorithms. b The RMSD values of the T1 is compared to the T10 and T25 using NGK, KICF, and CCD. The upper and lower dotted lines represent the near-atomic and sub-angstrom accuracy thresholds, respectively

Fig. 13
figure 13

ECL2 models of P2YR1 superposed with reference structure. a The LRM (orange), T1 (magenta), and T10 (cyan) out of 1000 models generated using NGK loop modeling had RMSD values of 2.44 Å, 6.53 Å, and 2.55 Å to the reference structure (green), respectively. b In this case, the LRM was also the T1 (magenta) when KICF loop modeling was used (LRM = T1 = 0.63 Å RMSD to reference structure)

In the case of the P2Y1R ECL2 target, the LRM produced by the KICF algorithm was also evaluated as the T1. This was not the case for the AT2R ECL2 target, but the top scored model produced by the KICF algorithm still had sub-angstrom accuracy (RMSD = 0.81 Å) when compared to the reference structure. On the other hand, the KICF and NGK algorithms produced top scored models with high RMSD values (~ 18 Å for both methods) compared to the ECL2 reference structure of β2AR (Fig. 14). For the top 25 scored β2AR ECL2 models, none of the methods were able to generate loop models with RMSD values below 4 Å. Since this target had one of the longer loops in the benchmark, it is possible that increased sampling was necessary to produce models closer to the reference structure. To determine if increased sampling would substantially improve models, a second set of 4000 ECL2 models was generated for β2AR using the NGK loop modeling algorithm. From the larger set of models, the LRM had an RMSD of 2.93 Å to the reference target. While this is only a slight improvement from the LRM from the initial set of 1000 models, the T1 from the set of 4000 models had a RMSD value of 5.46 Å which is substantially lower than the T1 from the initial set (18.1 Å). Out of the 4000-model set, the T10 and T25 both had an RMSD of 3.67 Å to the reference structure which was also lower relative to the T10 and T25 from the initial set (6.81 and 5.83 Å).

Fig. 14
figure 14

ECL2 models of B2AR superposed with reference structure. a The LRM (orange), T1 (magenta), and T10 (cyan) out of 1000 models generated using NGK loop modeling had RMSD values of 3.84 Å, 18.1 Å, and 6.81 Å to the reference structure (green), respectively. b The LRM, T1, and T10 generated using KICF loop modeling had RMSD values of 2.60 Å, 18.6 Å, and 5.04 Å to the reference structure, respectively

Group 4 loop modeling results

Group 4 targets from Table 3 were also modeled using the Rosetta loop modeling algorithms (Fig. 15). The average LRM values resulting from modeling with the NGK, KICF, and CCD algorithms were 4.55, 4.98, and 5.38 Å, respectively. None of the loop modeling methods used was able to generate models with sub-angstrom or near-atomic accuracy to the reference structures. Based on these results, an increase in conformational sampling is likely necessary. Potentially due to inadequate sampling of conformational space, deficiencies in the scoring function to distinguish accurate models were evident for the longer loops in this benchmark set. The average ECL2 RMSD values for the T1 using the NGK, KICF and CCD algorithms were 13.18, 11.01, and 11.70 Å, respectively. A decrease in the average ECL2 RMSD values was observed for the T10 using the same three algorithms (7.65, 7.37, and 7.72 Å), but were still substantially higher than the 2.5 Å threshold for useful models.

Fig. 15
figure 15

Group 4 loop modeling benchmark set. a The RMSD values of the LRM and T1 are shown out of 1000 models generated for Group 4 ECL2 targets using the NGK, KICF, and CCD algorithms. The T10 and T25 are shown out of 1000 models generated for Group 4 ECL2 targets using NGK, KICF, and CCD. b The RMSD values of the T1 is compared to the T10 and T25 using NGK, KICF, and CCD. The upper and lower dotted lines represent the near-atomic and sub-angstrom accuracy thresholds, respectively

Filtering and optimization results

Many GPCR structures characterized to date share a conserved disulfide bond between Cys3.25 at the top of TM3 and Cys45.50 in ECL2. The models generated using the KICF algorithm for the 25 of 28 receptors studied here were therefore filtered based on the distance between the sulfur atoms in these residues to see if such filtering selected an improved set of ten models. Additionally, formation of the disulfide bond followed by geometry optimization of ECL2 was examined as a potential loop optimization strategy. Table 4 compares T10 results in the absence of filtering/optimization (T10), after filtering (T10SSdist), and after filtering/disulfide formation/optimization (T10SSbond) across all four groups of loop modeling targets.

Table 4 Cys3.25–Cys45.50 distance and disulfide bond formation as filtering/optimization strategies within the KICF results for all four target groups

The results in Table 4 demonstrate that the use of the Cys3.25–Cys45.50 S–S distance as a filter produced modest improvements for most receptors in the lowest RMSD found within the top 10 scoring models. The average improvement in the group 1, 2, 3, and 4 targets was 0.33, 1.71, 0.55 and 1.27 Å, respectively. Formation of the disulfide bond followed by ECL2 geometry optimization provided slightly larger improvements of 0.37, 2.11, 0.86, and 1.41 Å across the same groups. A few targets showed more substantial changes, including PAR2 for which T10 dropped from 15.49 to 4.68 Å upon filtering and AA2AR for which T10 changed in the opposite direction from 7.05 to 9.88 Å upon filtering. Figure 16 compares the PAR2 structures selected solely based on score (panel a) and those selected based on score after filtering based on S–S distance (panel c) with the reference crystal structure (panel b). This figure illustrates that the ten top scored loop models consistently show interactions between ECL2 and the transmembrane domain, with ECL2 intruding into the lipid bilayer, which was absent during loop modeling. Use of the S–S distance filter eliminated many of the models in which ECL2 occupies space that should be reserved for the lipid bilayer, producing the > 10 Å improvement in the lowest RMSD found within the top 10 scored structures (Table 4). It is likely that the models in Fig. 16a scored well due to burial of hydrophobic sidechains, an important driver for folding of soluble proteins that have been used in parameterizing typical energy functions. Alternative filters (such as setting a minimum distance between atoms in ECL2 and those in the membrane-embedded sidechains of the TM segments) or hybrid scoring functions that appropriately treat the membrane-embedded region of transmembrane proteins could also effectively filter out unwanted structures like those in Fig. 16a. In contrast to the substantial benefit filtering provided in selecting better models of PAR2 relative to scores alone, filtering lowered the quality of the best model in the top 10 scored structures for AA2AR. As shown in Fig. 17, the majority of the top 10 AA2AR ECL2 models did not exhibit an ECL2 position overlapping with the position of the lipid bilayer. In this case and for others in the Group 3 target set, ECL2 models with the desired accuracy (< 2.5 Å RMSD were not present in the 1000 models initially sampled, and more sampling would be required in order to obtain improved models).

Fig. 16
figure 16

PAR2 ECL2 models (KICF) and reference structure. a Top 10 scored PAR2 ECL2 models produced within 1000 structures generated using KICF in Rosetta. b PAR2 crystallographic reference structure. c Top 10 scored PAR2 ECL2 models after filtering 1000 structures generated using KICF in Rosetta for Cys3.25–Cys45.50 S–S distances 5.1 ≤ Å

Fig. 17
figure 17

AA2AR ECL2 models (KICF) and reference structure. a Top 10 scored AA2AR ECL2 models produced within 1000 structures generated using KICF in Rosetta. b AA2AR crystallographic reference structure. c Top 10 scored AA2AR ECL2 models after filtering 1000 structures generated using KICF in Rosetta for Cys3.25–Cys45.50 S–S distances 5.1 ≤ Å

Comparison to other GPCR structure prediction benchmarks

Advances in GPCR structure prediction have been assessed in the community-wide GPCR DOCK experiments in 2008, 2010, 2013 [30, 31, 141]. In these experiments, researchers were tasked with modeling a target GPCR with a bound ligand prior to the publication of the target crystal structures. In GPCR DOCK 2010, two of the targets utilized were the crystal structures of the DRD3/eticlopride (PDB: 3PBL) and CXCR4/IT1t (PDB: 3ODU) receptor-ligand complexes [31]. Overall, there were no models submitted for either target where the ECL2 had a backbone RMSD within 2.5 Å in comparison to the crystal structure. While the best DRD3 models had ECL2 RMSD values of 2.69 Å, CXCR4 was a more difficult modelling target where the two best models had ECL2 RMSD values of 4.32 and 6.61 Å. Based on all submitted models of DRD3 and CXCR4, the median RMSD values for ECL2 were 4.11 and 9.19 Å, respectively. While the RMSD values presented in this benchmark study show significant improvement in modeling the ECL2 of DRD3 and CXCR4 in terms of sampling, it is remarkable that the LRM obtained from KICF loop modeling with CXCR4 and DRD3 were ranked within the top 10 scored models after Cys3.25–Cys45.50 S–S distance filtering and had RMSD values of 0.52 and 0.86 Å, respectively. However, it should be noted that there is a significant advantage in modeling loops starting with a GPCR crystal structure versus a homology model. In general, a template-based GPCR model (without additional refinement) has an equivalent backbone structure to the aligned segments of the template. However, GPCR structures tend to diverge at the extracellular ends, and thus homology models matched to a template structure may not reflect real structural differences between the target and template structure. Arora et al. showed that variations in loop anchor positions can have significant influence on modeling accuracy for GPCR loops [142]. Therefore, the results obtained from this benchmark study represent a best-case scenario for modeling GPCR loops where the starting structure contained no errors in the anchor positions or in the structures of the geometrically proximal ECL1 or ECL3.

Conclusion

Overall, the results from modeling Group 1 and 2 ECL2 targets showed that KICF sampled the most loop conformations with sub-angstrom and near-atomic accuracy to the reference structure. The NGK algorithm followed just behind KICF in terms of building loop models with sub-angstrom accuracy, but both algorithms had the same number of cases of modeling loops with near-atomic accuracy. Although the CCD algorithm was not able to build any loop models with sub-angstrom accuracy for the targets in Group 1 and 2, CCD produced models with near-atomic accuracy for 9 of the 14 targets. The results from modeling Group 3 ECL2 targets showed that only the KICF algorithm was able to sample loop models with sub-angstrom accuracy. Overall, these data suggest that the KIC with fragments and next generation KIC methods within Rosetta perform better than the cyclic coordinate descent method or the de novo search method within MOE loop modeler for loops with up to 21 residues (Groups 1 and 2). For the longer loops in Group 3 (20–24 residues), KIC with Fragments outperforms all the other methods. Out of all 28 GPCR loops modeled, KICF generated the most models under 2.5 Å out of 1000 produced total. For loop targets analogous to those in Group 4 (25–32 residues) where no models were within 2.5 Å RMSD regardless of loop modeling method, it is recommended that a greater number of models be produced (i.e. > 4000) to increase sampling of ECL2 conformational space. Application of loop modeling to generate unknown structures requires that models from the sampled set be selected. Regardless of loop length, the RMSD of the T1 was much higher than the LRM in most cases. This was also observed in another benchmark study targeting 13 GPCR ECL2 using the CABS modeling software [143]. Scores alone were less effective than the combination of scores and use of the conserved disulfide bond between Cys3.25 in TM3 and Cys45.50 in ECL2 as a filter. Notably, selection of ten structures provided substantial improvement in model quality within the set over selection of only one structure, but inclusion of additional structures up to 25 did not provide significant additional gains. Overall, 11 of the 21 targets in Groups 1–3 included at least one model with an RMSD ≤ 2.5 Å within the top 10 scored models meeting the S–S distance filter. Four targets in Groups 1–3 sampled a model with RMSD was ≤ 2.5 Å that did not score in the top ten models after S–S distance filtering. In these cases, adjustments to the scoring function, refinement methods, or loop structure environment would be needed improve loop structure prediction.