Introduction

The increasing application of fragment-based methods in drug discovery is motivated by the desire to develop smaller, ligand-efficient [1] therapeutics with more desirable pharmacological properties, such as bioavailability. To achieve commensurate levels of potency as compared to larger molecules, however, fragment-based strategies must seek to exploit hot spots, i.e., sub-sites of binding regions that contribute significantly to the free energy of ligand binding [2]. In the absence of either biophysical or SAR data, discernment of hot spots within a binding region is often not possible. Here we present the application of two fragment-based strategies, the multiple solvent crystal structures (MSCS) method and its computational alternative FTMap, that rely solely on three-dimensional structure for the identification of hot spots. In this study both techniques are applied to DJ-1 and glucocerebrosidase (GCase), therapeutic targets for Parkinson’s and Gaucher’s diseases, respectively.

Parkinson’s disease (PD, [3]) is a neurodegenerative disorder that results in the impairment of motor skills and cognitive abilities [4], eventually leading to death. Although most cases of PD are sporadic, multiple genes are implicated in the development of PD [5]. Among these is DJ-1, a recent target to emerge for the development of anti-PD agents [6]. DJ-1 consists of 189 amino acids and its function is currently unknown; however, several point mutations are associated with PD [7]. The best-characterized PD-associated mutation of DJ-1 is L166P, located on a C-terminal helix in the homodimer interface [8]. This mutant loses its ability to dimerize [9], resulting in structural instability and rapid degradation [10]. Hence, stabilization of DJ-1 may serve to prevent PD-associated degradation.

Gaucher disease (GD) is the most prevalent of the lysosomal storage disorders, i.e., those caused by deficient lysosomal glycolipid hydrolase activity [11]. GD affects one in 60,000 individuals in the general population and one in 800 in the Ashkenazi Jewish population [12]. Loss of activity of acid-β-glucosidase (GCase), the enzyme that hydrolyzes glucosylceramide in the lysosome, causes GD; current treatments include GCase enzyme replacement therapy (ERT, Cerezyme) or substrate reduction therapy (SRT, Zavesca). Missense mutations in GCase not localized to its active site exhibit defects in cellular trafficking [13], but do not abolish enzymatic activity of GCase [14]. Pharmacological chaperone (PC) therapy [15] aims to rescue partially active GCase variants, such as the N370S mutant, for lysosomal trafficking and activity. Although somewhat counterintuitive, enzyme inhibitors can increase steady-state cellular levels of active enzymes. To date, high affinity competitive inhibitors of GCase that bind the active site yet do not inhibit activity have been tested as PCs for GD [1621]. One such pharmacological chaperone is isofagomine (IFG), a nanomolar inhibitor that binds the active site of GCase [22] and increases cellular trafficking of the N370S mutant to the lysosome [16, 22]. However, once in the lysosome, a stabilizing molecule that binds to an allosteric site on the mutant enzyme would be preferable in order to prevent competition with the substrate.

Here we aim to identify allosteric binding regions to serve as starting points for the development of pharmacological chaperones for DJ-1 and GCase. Two fragment-based methodologies validated previously for the detection of hot spot regions on protein surfaces were utilized in this study. Multiple solvent crystal structures (MSCS, [2325]) were determined to identify consensus solvent binding regions on the surface of DJ-1 and GCase. In MSCS experiments a crystalline protein is exposed to a series of organic solvents. Previous studies utilizing MSCS have shown that consensus sites, i.e., regions of the proteins surface where multiple solvent molecules co-localize, are indicative of ligand binding regions [24]. In conjunction with the MSCS experiments, FTMap [26], a computational fragment-based method for the detection of hot spots, was employed. Similar in spirit to MSCS, FTMap, an improved version of the computational solvent mapping (CSMap, [2729]) algorithm, identifies regions of consensus binding of fragment-sized molecules [26]. Although similar programs such as GRID [30] and MCSS [31] are commonly utilized for the detection of binding regions, studies performed using both CSMap and FTMap have shown that in addition to identifying ligand binding regions with a lower false positive rate than other approaches, both methods accurately detect hot spots determined by biophysical methods such as calorimetry and NMR [26, 32]. Here we report multiple hot spots on the surfaces of DJ-1 and GCase suitable for ligand design. Furthermore, comparison of data resulting from the MSCS experiments to those derived using FTMap show that the computational method accurately reproduces experimental findings in addition to providing further elucidation of the binding regions detected via MSCS, providing a faster and cheaper alternative to crystallographic experiments.

Methods

Multiple solvent crystal structures of DJ-1 and GCase

Purification and crystallization. Recombinant human DJ-1 and GCase were expressed and purified to homogeneity according to published protocols [8, 22].

Crystallization of DJ-1 with solvents was accomplished using hanging drop vapor diffusion. The drops consisted of 2 μL concentrated protein (30 mg/mL) plus 2 μL mother liquor. The mother liquor was comprised of 30% polyethylene glycol (PEG) 400 for cryoprotection, 100 mM Tris–HCl pH 8.5, 200 mM sodium citrate and, individually, 20%(w/w) Acetonitrile, 20% Ethanol, 10% dimethyl formamide (DMF), 20% dimethyl sulfoxide (DMSO), 10% Phenol, or 5% trifluoroethane (TFE). Crystals with dimensions of approximately 200 μm × 200 μm appeared overnight at 20 °C.

Crystals of GCase were grown as described previously [22] using the hanging drop vapor diffusion technique. Crystals were crosslinked by placing a micro-bridge (Hampton Research) with 3 μl of a 25% glutaraldehyde solution (Sigma) in the well below a crystal-containing drop and incubating for 30–45 min. For methanol soaking, crosslinked crystals were then transferred briefly to a solution containing 50% methanol, 5 mM acetate pH 4.5 and 15% PEG 400, and flash cooled in liquid N2. For soaking with phenol, crystals were briefly soaked in a solution containing 23 mM phenol in 30% glycerol and 10 mM acetate buffer pH 4.5, and flash cooled in liquid N2.

Data collection and structure determination. Diffraction data for DJ-1 were collected at the Stanford Synchrotron Radiation Laboratory (SSRL) beamline 11-3. GCase diffraction data were collected at the GM/CA-CAT beamline at the Advanced Photon Source (Darien, IL). Data sets for both proteins were processed with the software program HKL2000 [33]. Solvent-bound DJ-1 structures were solved by molecular replacement with the program MolRep [34] using apo DJ-1 ([8], PDB entry 1SOA) as a search model, and subsequent refinement was performed using Refmac [35], both of the CCP4 package [36]. GCase structures were solved by rigid body refinement in Refmac5 using a model derived from PDB entry 2NT0 [22] after removing all non-protein molecules. Model building for all DJ-1 and Gcase structures were carried out in Coot [37]. Ligand parameters for DJ-1 were obtained from the Dundee PRODRG2 server (http://davapc1.bioch.dundee.ac.uk/prodrg). Water molecules were added to the model of GCase after several rounds of fitting atomic models into electron density maps using Coot and including phosphates and carbohydrates. For the GCase structures each non-protein molecule was subsequently inspected and noted for additional Fo–Fc different electron density. Methanol or phenol molecules were then modeled, as appropriate. Data processing and refinement statistics for DJ-1 and GCase are recorded in Tables 1 and 2, respectively.

Table 1 Data refinement and statistics for DJ-1 MSCS experiments
Table 2 Data refinement and statistics for GCase MSCS experiments

FTMap

The FTMAP algorithm samples billions of fragment positions on a dense translational and rotational grid. Positions are scored using an energy function that includes attractive and repulsive van der Waals terms, electrostatic interaction energy based on Poisson-Boltzmann calculations, a cavity term to represent the effect of nonpolar enclosures, and a structure-based pair-wise interaction potential. In spite of its relative complexity, the energy expression is written as sum of correlation functions with components defined on grids. This enables the use of the extremely efficient fast Fourier transform (FFT) correlation method for function evaluation [38]. The FTMAP algorithm consists of five steps as follows.

1. Rigid-body fragment docking. Sixteen fragments (Fig. 1) are used to probe the binding surface of a protein of interest. This fragment set includes solvent types commonly employed in MSCS experiments as well as common chemical groups; previous analyses have shown that this set is sufficient to delineate hot spots within binding pockets [32]. Only atomic coordinates of the protein of interest are utilized for mapping simulations, i.e., no information on the binding site is required. Prior to FTMap simulations, all bound ligands and water molecules are removed from crystal structures. For each fragment a special purpose rigid-body docking algorithm using the FFT correlation approach samples billions of conformations. The 2,000 best poses for each fragment are retained for further processing.

Fig. 1
figure 1

Fragment set used in FTMap simulations

2. Minimization and re-scoring. The free energy of each of the 2,000 complexes generated in Step 1 is minimized using the CHARMM potential with the Analytic Continuum Electrostatic (ACE, [39]) model representing the electrostatics and solvation terms as implemented in version 27 of CHARMM [40] using the parameter set from version 19 of the program. The ACE model includes a surface area dependent term to account for the solute–solvent van der Waals interactions. Minimizations are performed using an adopted basis Newton–Raphson method. During the minimization the protein atoms are held fixed while the atoms of fragments are free to move.

3. Clustering and ranking. Using a simple greedy algorithm, minimized fragment conformations generated from Step 2 are clustered to create consensus sites. First, for each fragment type, the lowest energy conformer is selected. Fragment conformers within 3 Å RMSD of this conformer are then joined to create the first cluster. An earlier analysis of binding distributions of fragments for several proteins revealed that fragments bind to the same subsite of a binding pocket with an average RMSD of 2 Å [41]. Here subsites are defined as regions that bind either an amino acid side chain (i.e., of a peptide substrate) or a comparably sized functional group of a ligand. Adjacent subsites are typically separated from each other by at least a 5 Å center-to-center distance; hence, in light of these findings, the value of 3 Å for the clustering radius is a conservative choice. After the first cluster is formed, the next lowest energy conformer is selected to start the second cluster. This step is repeated until all 2,000 poses are assigned to clusters. Clusters with less than six members are excluded from further consideration, thereby avoiding narrow energy minima with low entropy [42]. This cutoff has been determined empirically to result in a low false positive rate. Retained clusters are ranked on the basis of their Boltzman averaged energies.

4. Determination of consensus sites. Similar to MSCS experiments, FTMAP utilizes a consensus clustering approach to detect hot spots, i.e., regions of the protein surface where clusters of different fragment types overlap. The 96 clusters generated in the previous step (6 clusters for each of 16 fragment types) are used to form consensus sites. Using the distance between the centers of mass of the cluster centers as the distance measure, each fragment cluster is assigned to a consensus site. A clustering radius value of 4 Å is used since subsites are separated by greater than 5 Å (see Step 3). As in Step 3, FTMAP again employs a simple greedy algorithm to find the cluster with the maximum number of neighbors (defined as cluster centers within 4 Å from each other), which forms the first consensus site. Members of this site are then removed from consideration, and the procedure is repeated until all clusters are exhausted. Conformers are then redistributed among the consensus sites such that each conformer is closest to the center of its own consensus site, whereupon consensus sites are ranked based on cluster population size.

5. Characterization of the binding site. FTMap first selects the largest consensus site (CS1), which is generally identifies the most important subsite (or hot spot) for ligand binding. CS1 forms the kernel of the ligand-binding region. The binding site is then expanded by adding any consensus site (irrespective of its size) within 7 Å of any consensus site already in the binding site. This procedure continues until no further expansion is possible. The resulting set of consensus sites is used to describe the binding site.

Results and discussion

Detection of hot spots for DJ-1

Experimental solvent mapping of DJ-1 revealed a single hot spot bound by all solvents on each monomer (Fig. 2a, circled). This hot spot emerges in the region containing Cys 106 and is shown in close-up in Fig. 2b. Published studies have shown that oxidation of Cys 106 results in the localization of DJ-1 to the mitochondria, where it is has a demonstrated neuroprotective effect [43]. Independent of the MSCS experiments, FTMap simulations (see “Methods”) were conducted on a dimeric crystal structure of apo DJ-1 (PDB entry 1SOA, [8]). Several consensus regions for fragment binding are detected using FTMap (magenta, Fig. 3; Table 1). In excellent agreement with the MSCS data, the most highly populated consensus site resulting from FTMap coincides with the region surrounding Cys 106 (Fig. 3b). In addition to reproducing the experimental solvent mapping data, multiple consensus sites unique to the FTMap simulations are formed near the dimer interface (magenta sticks, Fig. 3a, b). These hot spots may provide a novel starting point for the development of stabilizing agents of DJ-1. In Table 3 the ranks of the consensus sites are given with their size, i.e., the number of fragment clusters comprising each site. Interacting residues are determined as those residues located within 5 Å of a fragment cluster and were calculated using Pymol version 0.99 ([44], http://pymol.sourceforge.net).

Fig. 2
figure 2

Results of the DJ-1 MSCS experiments. a Solvents (magenta sticks, circled) bind to both monomers (green and blue cartoon) of DJ-1 in the region of cysteine 106. b Close-up view of the region surrounding the bound solvents. Cys106 is shown in stick to illustrate its proximity to the bound solvents

Fig. 3
figure 3

Hot spots detected for DJ-1 via FTMap simulations. a Fragment clusters (magenta sticks) populating the top six consensus sites resulting from the simulations are superimposed on dimeric DJ-1. The largest consensus site (circled) is located in the region also detected via MSCS. b Close-up view of circled region in (a). C106 is labeled. c Close-up view of additional hot spots formed close to the dimer interface. Based on these data, we hypothesize that a pharmacological chaperone could be targeted to this region

Table 3 Summary of FTMap-derived hot spots for DJ-1

Detection of novel hot spots for GCase via MSCS and FTMap

Structures of cross-linked GCase soaked with methanol (Structure A) and phenol (Structure B) are shown here as Figs. 4 and 5, respectively. Four molecules are present in the asymmetric units of both structures, where the individual monomers are nearly identical to previously solved structures ([22], not shown). Fourteen sulfate, and 12 methanol molecules were modeled in structure A, whereas nine sulfates and 18 phenol molecules were modeled in structure B. Four glycerol molecules, used as a cryoprotectant, were bound in the active site of structure A, a result we obtained previously and will not be discussed further in this analysis [22]. The modeled sulfate, phenol and methanol molecules are located throughout the surface of GCase (Figs. 4, 5, 6), and represent sites previously modeled with water and sulfates only. Clusters described in detail below are limited to those regions where a heteroatom of interest was modeled in at least two of the four monomers in the asymmetric unit of GCase. Inspection of the regions bound by the solvents yields several points are noteworthy. First, two tight-binding anionic sites are found in the structure A (Fig. 6). These clusters have been described previously [12, 45, 46], occupied by sulfate or phosphate anions in numerous other GCase crystal structures (not shown). Cluster A1 (Fig. 6b) resides on the same face of the TIM barrel domain as the active site, and sits at the end of the α-helix harboring N370, the site of the most common mutation among Ashkenazi Jews [20]. The sulfate anion forms electrostatic/hydrogen bonding interactions with Arg 353 and Ser 12. Cluster A2 (Fig. 6c) is located at the opposite end of the TIM barrel from the active site, and is stabilized by interactions with Arg 277, Trp 228 and His 306. The analogous site in structure B (Fig. 5a) is modeled as a mixture of sulfate and phenol. The observation that cluster A1 is best modeled with sulfate anion in both structures, even after cross-linking and soaking with organic solvents, provides experimental evidence for cluster A1 as a phospholipid binding site. This hypothesis has been suggested previously [22], and is consistent with the notion that a destabilized helix harboring the N370S mutation would impair phospholipid binding [47].

Fig. 4
figure 4

Bound methanols in GCase (Structure A). a Overall view of methanol clusters in GCase relative to active site. b Cluster A3, which is equivalent to cluster B in Fig. 6. c Cluster A4. Stabilizing interactions (<3.7 Å) between sulfate and side chains are shown with dashed lines

Fig. 5
figure 5

Bound phenols in GCase (Structure B). a Overall view of modeled phenols after superposition of all monomers in asymmetric unit. b Cluster B at interface of three secondary structural elements. Top, ball-and-stick view. Bottom, surface representation. c Cluster C at antiparallel β-strand domain. Top, ball-and-stick view. Bottom, surface representation. Stabilizing interactions (<3.7 Å) between sulfate and side chains are shown with dashed lines

Fig. 6
figure 6

Bound sulfates in methanol-bound GCase (Structure A). a Overall view of the location of the clusters relative to the active site after superposition of all four monomers in the asymmetric unit. b Cluster A1 c Cluster A2. Stabilizing interactions (<3.7 Å) between sulfate and side chains are shown with dashed lines

One cluster of interest is located at the interface of all three secondary structure elements in GCase, a groove shaped by amino acid side chains derived from the TIM barrel, β-barrel and anti-parallel β-sheet. In this site we have found phenol (Fig. 5b, cluster B), methanol (Fig. 4b, cluster A3), and previously, glycerol, which was introduced as a cryoprotectant (see for example PDB entry 2NT0). Another phenol appears bound close by in two of the four crystallographic monomers (Fig. 5b). A second site (Fig. 5c, cluster C) found on the anti-parallel β-sheet, is separated from the previous site by the N-terminus (residues 1–10). The functional significance of these sites is not yet clear, but it is possible to envision a scenario in which a signal triggers a conformational change in the N-terminus to expose this site for binding to a protein partner. Two partners are currently known: LIMP-II [48], which GCase is proposed to traffic to the lysosome, and saposin C, an activator protein [4951]. However, the precise binding site for either of these proteins is not currently known. Finally, we have identified a cluster of methanol molecules in structure A (Fig. 4c, cluster A4) at the edge of the β-barrel that abuts the TIM barrel, particularly at the N370-containing helix (see above). N370 has been shown to be critical in stabilizing an active site loop in a conformation of GCase that substrate-ready [22]. It may be possible to develop a small molecule stabilizer for N370S GCase at this interface, which would both improve cellular trafficking as well as improve lysosomal catalysis. This region is also a candidate for a binding site for saposin C, as N370S mutant GCase exhibits poor binding to this activator protein [47]. Further functional studies will be required to determine the binding surface on GCase for both Sap C and LIMP-II, but these experimental methods provide clues to sites of interest.

Computational mapping simulations using FTMap were performed on three GCase monomers (Fig. 7; Table 4). These monomers were chosen to ensure that all published conformations of GCase were sampled for this analysis. For this study two monomers of PDB entry 2NSX [22] were utilized: chain B, to which IFG is bound, and chain A, to which only glycerol is bound. When IFG is bound, a conformational change in loop 1 occurs, resulting in a shallower surface topology as compared to the apo structure [22]. The third structure utilized was the A chain of PDB entry 2NTl, a neutral pH apo structure of GCase where no bound ligands are present. Two regions containing well-populated consensus sites are found in all three structures [Fig. 7, circled in (A)]. The largest concentration of fragment clusters contains the catalytic center and proposed hydrophobic subsites for binding of the substrate, glucosylceramide (large circle in Figs. 7a, 8). A comparison of the consensus sites for glycerol-bound GCase (2NSX chain A) and apo GCase (2NT1 chain A) shows high overlap at the site of catalysis (Fig. 8a). In addition to this region, a second cluster is observed only in glycerol-bound GCase (Fig. 8a). Some of these regions appear to clash with the protein side chains in GCase, but are clarified in a comparison with the IFG-bound structure (Fig. 8a). Overlay of all hot spots formed in the catalytic region point to a binding region that encompasses both the site of catalysis and the two emerging troughs; these data lend further credibility to the hypothesis that the induced fit conformation adopted by bound IFG is the substrate-ready form of the enzyme. Further, the hot spots that clash with GCase in the non-IFG-bound structures (Fig. 8a) appear to predict the conformational change observed experimentally with IFG-bound GCase. In terms of therapeutics, these data suggest that a larger molecule that mimics not only the substrate intermediate but can bind to surrounding hydrophobic subsites may be a suitable chaperone.

Fig. 7
figure 7

Hot spots detected on the surface of GCase using FTMap. Simulations were performed on (a) apo GCase (2NT1) (b) glycerol-bound GCase (2NSX chain A) and (c) IFG-bound GCase (2NSX chain B). Although differences in the location of consensus sites are evident (cluster representatives colored in cyan) two hot spot regions, circled in (a) are conserved in all three structures

Table 4 FTMap results for glucocerebrosidase
Fig. 8
figure 8

Close-up view of hot spots formed in the catalytic center (Fig. 7a, large circle) and N-terminal region (Fig. 7a, small circle) of GCase from FTMap simulations (a) Fragment clusters formed in catalytic center of the apo (magenta sticks) and glycerol-bound (cyan sticks) conformations of GCase. b Binding of IFG results in a conformational change that allows for the formation of additional hot spots in the active site region (c) The largest number of fragment clusters is found in the N-terminal region of GCase for both the apo (fragment clusters colored in magenta) and glycerol-bound structures of GCase (fragment clusters colored in cyan). d Although smaller, a hot spot is also observed in the N-terminal region of the IFM-bound structure (fragment clusters colored in magenta)

Although the catalytic region contains the highest number of consensus sites, the mostly highly populated consensus sites for two of the three structures is located near the N-terminus of GCase (smaller circle in Figs. 7a, 8c, d), in a groove at the interface between the β-sheet and β-barrel domains. This region coincides with cluster B identified in the MSCS experiments described above (Figs. 4b, 5b), in addition to glycerol observed previously [22]. Although not present in the apo structure (2NT1), fragment clusters are found located in a site proximal to region 2 for both glycerol- and IFG-bound GCase; based on this data, we hypothesize that a drug-sized molecule could be accommodated in this region. Overall, we observe considerable consistency of mapping data across the three structures, in terms of the size and rank of consensus sites, e.g., hot spots, determined for each region (Table 4). These new regions can now be targeted for the discovery of new remote-binding chaperones.

Conclusions

Fragment-based methods are becoming of increased interest and application in the pharmaceutical industry. In this study we successfully applied two fragment-based approaches—MSCS and FTMap—to the identification of hot spots for DJ-1 and glucocerebrosidase, two therapeutic targets for the development of treatments for neurological disorders. Novel hot spots were identified on the surface of DJ-1 in two regions. One hot spot was found in the region containing a residue whose oxidation may protect against PD and the other in the dimer interface, where a pharmacological chaperone could be bound to increase the stability of the dimeric structure. Three regions of interest were identified for glucocerebrosidase, where multiple hot spots emerge in the catalytic region. While the catalytic function of GCase is known, interactions with substrates and other proteins are poorly understood on a structural level. Results of the MSCS experiments and FTMap simulations corroborate the location of the catalytic center, and substantiate claims of hydrophobic binding sites for the substrate ceramide. Additional hot spots found on the surface of GCase provide new hypotheses for the binding sites of key players in trafficking and catalysis, both of which are disrupted in patients with Gaucher’s Disease. Taken all together, these hot spots provide novel starting points for drug discovery efforts.

In addition to these findings, excellent agreement is observed between hot spots derived from the two methods, supporting the use of computational simulations in conjunction with or in lieu of expensive and time-consuming experiments. While MSCS is a powerful approach, structural resolution is often too low to accurately determine solvent binding positions, in addition to the method itself being costly and difficult to perform. As illustrated by this study, higher resolution structures will be required to more confidently model organic heteroatoms in the GCase structure, a goal that will continue to be challenged by the somewhat lipophilic nature of GCase and subsequent tendency of cross-linked GCase crystals to disintegrate when exposed to organic solvents. In addition to successfully duplicating MSCS data using FTMap, we were able to uncover hot spots not observed in the MSCS experiments, in addition to being able to asses the binding of a much larger library of solvent-like fragments than is possible by MSCS. As is the case with rigid-body docking algorithms in general, one caveat of using FTMap is the conformational dependence of the findings; however, as illustrated by the analysis of GCase, consensus information gained from the use of multiple structures as input for FTMap simulations increases the robustness of the results. FTMap is available free to the academic community at http://FTMap.bu.edu/~FTMap/.