Main

Standard libraries for high-throughput screening (HTS)10 and virtual ligand screening (VLS)11,12,13 have been historically limited to fewer than 10 million available compounds, which is a small fraction of the enormous chemical space, estimated to be 1020 to 1060 drug-like compounds14,15. This limitation of standard HTS and VLS slows the pace of drug discovery, usually yielding initial hits with modest affinities, poor selectivity and ADMET profiles that require elaborate multistep optimization to gain lead- and drug-like candidate properties. Recently, ultra-large libraries of more than 100 million readily accessible (REAL) compounds have been developed and used in docking-based VLS, yielding high-quality hits for lead discovery5,6. The Enamine REAL library, which now comprises 1.4 billion compounds, and its REAL Space extension with more than 11 billion drug-like compounds, take advantage of modular parallel synthesis with a large set of optimized reactions and building blocks (synthons)6. This makes the synthesis of potential hit compounds fast (less than 4–6 weeks), reliable (>80% success rate) and affordable.

The modular nature of REAL libraries supports their further rapid growth way beyond 10 billion drug-like compounds16. However, with increasing library sizes, the computational time and cost of docking-based VLS itself become the next bottleneck in screening, even with massively parallel cloud computing capacities. For example, the docking of 10 billion compounds at a standard rate of 10 s per compound would take more than 3,000 years on a single CPU core, or cost over US$800,000 on a computing cloud. The ability to substantially reduce the computational burden of VLS without compromising the accuracy of docking or losing the best-hit compounds would remove this bottleneck and assure broad accessibility of gigascale screening. Recently, an iteration of docking and machine learning steps9, or stepwise filtering of the whole enumerated library using docking algorithms of increasing accuracy8, were suggested to tackle ultra-large libraries of 138 million and 1.4 billion compounds, respectively. However, these methods still require vast computational resources that scale linearly with the growing number of compounds.

Here we present the virtual synthon hierarchical enumeration screening (V-SYNTHES) approach, which takes full advantage of the modular building block organization of the Enamine REAL Space, does not need full enumeration of the library and requires thousands of times less computational resources than standard VLS without compromising docking accuracy at any step. Moreover, the algorithm cost scales linearly with the number of synthons, or as the square or cubic root of the whole library size (O(N1/2) and O(N1/3) for two-component and three-component reactions, respectively). Such performance of V-SYNTHES relies on the initial docking of a prebuilt set of the fragment-like compounds representing all of the library reaction scaffolds and corresponding synthons. The best selected scaffold–synthon combinations are then enumerated, and the resulting focused library is docked again to select fully elaborated hits. These iterations help to focus on a small fraction (<0.1%) of the best synthons, therefore substantially reducing the combinatorial chemical space for docking.

The approach is applied here to cannabinoid receptors, which are class A G-protein-coupled receptors (GPCRs), and are key targets in drug discovery for inflammatory disorders, neurodegenerative diseases and cancer17,18,19. V-SYNTHES enabled us to speed up prospective screening of the 11-billion-compound REAL Space library more than 5,000 fold by iteratively docking only around 2 million full compounds. Moreover, experimental validation showed that V-SYNTHES doubled the success rate in the discovery of CB hits as compared to a standard VLS screen of the REAL diversity subset of 115 million compounds (33% versus 15%). Similarly, application of V-SYNTHES to the kinase target ROCK1 yielded a 28.5% hit rate, including ligands with nanomolar affinity and potency. The new approach provides a practical alternative for fast screening of growing gigascale modular virtual libraries, helping to identify leads that are suitable for fast optimization in the sa me REAL Space.

The REAL Space virtual library

The V-SYNTHES approach has been implemented for the REAL Space virtual library, which comprises more than 11 billion readily accessible compounds based on optimized one-pot parallel synthesis developed by Enamine, involving 121 reaction protocols and 75,000 unique reagents. The reaction protocols include single and multistep procedures that involve two (102 reaction protocols) or three (17 reaction protocols) starting reagents. In this study, we used only two-component and three-component reactions, yielding around 500 million and around 10.5 billion compounds, respectively. The V-SYNTHES approach can easily be expanded to four-component and more reactions when they become a substantial part of REAL Space. Each reaction/scaffold in the library is presented in the form of a Markush scheme with two or more R groups representing synthons7,20.

The high diversity of the REAL Space is achieved by using diverse sets of starting reagents. The average numbers of starting reagents per protocol are as follows: for two-reagent reactions, 3,344 (reagent 1) and 2,068 (reagent 2); for three-reagent reactions, 939 (reagent 1), 1,308 (reagent 2) and 1,389 (reagent 3). The modular design of the library is based on well-established and optimized reactions and an automated one-pot parallel synthesis approach, enabling fast synthesis (less than 4–6 weeks) with a high success rate (>80%) and guaranteed high purity (>90%).

The V-SYNTHES screening approach

The V-SYNTHES approach involves iterative steps of library preparation, enumeration, docking and hit selection as outlined in Fig. 1. In preparatory step 1, we generate a library of fragment-like compounds representing all possible scaffold–synthon combinations for all reactions in the whole Enamine REAL Space, which we refer to as a minimal enumeration library (MEL). The MEL compounds are built from the reaction scaffolds, enumerated with the corresponding synthons at one of its R positions, while the other R position(s) are capped with a special minimal synthon according to the reaction specified for this R position (Fig. 1). This capping, which usually contains methyl or phenyl moieties, is needed to convert the reactive groups of the scaffold into a chemical form that corresponds to the full compounds (such as primary amine into methyl-amide or secondary amine), to better match the binding properties of the full compounds. As only one of the R groups is fully enumerated, and the others are just systematically capped, the MEL library size is of the same order as the number of synthons in the REAL Space, that is, only about 600,000 compounds. This MEL preparation step is performed once for the REAL Space library and does not depend on the target receptor.

Fig. 1: V-SYNTHES approach to modular screening of Enamine REAL Space.
figure 1

A general overview of the four-step algorithm (left) and examples for each step (right). Asterisks in step one show the attachment points of synthons; arrows show possible pairing of minimal synthons with real synthons.

In step 2, the MEL compounds are docked onto the target receptor using energy-based docking of the flexible ligand. The results of docking, including the predicted binding scores and ligand–receptor interaction information, typically for a few thousand top-scoring compounds, are then used to select the most promising fragments for the next enumeration. The selection is also filtered for diversity, including a rule that a single reaction cannot contribute more than 20% of the selection.

Step 3 involves the iterative enumeration and docking of the best MEL compounds selected in step 2. On each iteration, the compounds are enumerated such that one of the capped R groups is replaced by a full range of corresponding synthons from the library. For example, for two-component reactions with only two R groups, a single step-3 iteration completes the molecule, representing a full compound from the REAL Space. For three-component and more reactions, two and more iterations are performed, replacing one by one the minimal caps with real R group synthons. Thus, each ‘hit’ MEL compound selected in the previous iteration step is combinatorially ‘grown’, resulting in fully enumerated compounds from the REAL Space.

Finally, step 4 performs the docking screen on the final enumerated subset of the library. The several thousands of top-ranked VLS hits undergo postprocessing filtering for PAINS21, physico-chemical properties, drug likeness, novelty and chemical diversity to select a final limited set (typically 50–100) of compounds for synthesis and experimental testing.

The premise of this approach is to enrich the MEL library on step 2—and then each subsequent iteration library—with scaffold–synthon combinations that have high binding scores in the pocket and are suitable for further enumeration. Owing to the modular combinatorial nature of the REAL Space library, narrowing down the most promising scaffold–synthon combinations considerably reduces the enumerated chemical space for docking, for example, from 11 billion to 2 million compounds in our case.

Structure-guided selection of fragments

Selection of synthons in step 2, if based solely on binding scores, can already offer substantial library enrichment for example, there are an estimated 40 times more high-scoring compounds in the final iteration library than in the random subset of the full REAL Space library (Extended Data Fig. 1). At the same time, we found that the performance of the iterative approach can be further improved by taking into account docking poses of the compounds and, specifically, positions of the minimal capping R group. Thus, docking the fragments into a binding pocket can result in two conceptually different outcomes. The first, ‘productive’ outcome, is when the minimal capping group of the docked MEL ligand is positioned in the pocket in such a way that it can be replaced by real, bulkier synthons from the library  in the next step of enumeration. This requires the cap to be pointing towards the unoccupied part of the pocket and not being blocked by the pocket residues. A second, ‘non-productive’ outcome is when the minimal cap at one of the R positions is directly pointing towards the receptor residues at the dead-end subpocket, where it does not have space to grow. Another non-productive situation is when the capping R group is pointing outside of the pocket, where useful contacts are much less likely. To select productive hits, we used an automated procedure that checks the distance from the cap atoms to selected (dummy) atoms at the dead-end subpockets. The corresponding rules in implementation for the CB2 receptor are described in Extended Data Fig. 2. The docked MEL compounds for which their cap atoms approached the dead-end atoms closer than 4 Å were excluded from further consideration even if they had high-ranked binding scores.

Screening CB receptors with V-SYNTHES

The V-SYNTHES approach was then applied to screen 11 billion REAL Space compounds using the recently solved representative CB2R structure in complex with an antagonist (Protein Data Bank (PDB): 5ZTY) as a template22. We performed separate screening for two-component and three-component reactions of the library, representing around 500 million and around 10.5 billion virtual compounds. Note that V-SYNTHES required docking of only 1 million and 0.5 million compounds, respectively, for these libraries in the last enumeration step, reducing the computational cost of screening more than 5,000-fold.

To computationally benchmark the performance of V-SYNTHES versus a standard VLS procedure, we also generated randomized 1 million and 0.5 million compound subsets from the same two-component and three-component REAL Space and assessed them in standard VLS using the same receptor model and same docking parameters. Note that the full 11-billion-compound REAL Space library is not amenable to standard VLS with any reasonable computational resources. Figure 2 compares the screening performance of V-SYNTHES with the standard VLS benchmark over the range of docking score thresholds. The results show that V-SYNTHES detected many more high-scoring compounds with much better scores than standard VLS that involved docking of the same number of compounds. Thus, the best two-component compound identified by V-SYNTHES scored 7 kJ mol−1 better than the very best hit from standard VLS; the difference was 6.5 kJ mol−1 for three-component compounds. Moreover, two-component REAL Space V-SYNTHES identified 84 compounds with binding scores that were better than the very best compound from standard VLS; this number was 136 for the three-component space.

Fig. 2: Assessment of VLS computational performance for V-SYNTHES and standard VLS.
figure 2

a, b, The number of hits at each score threshold from V-SYNTHES and standard VLS in two-component (1,000,000 compounds) (a) and three-component (500,000 compounds) (b) cases. c, d, Enrichment in V-SYNTHES versus standard VLS at different score thresholds. The red X symbols represent thresholds that yield 100 hits in two-component (1,000,000 compounds) (c) and three-component (500,000 compounds) (d) cases.

To systematically characterize the enrichment for high-scoring compounds in the final step of V-SYNTHES versus a random subset of the whole library, we introduced the enrichment factor. At a given docking score threshold, the enrichment factor is calculated as a ratio of the number of candidate hits detected in the V-SYNTHES final-step enumerated library versus a random subset of REAL Space with the same number of compounds, as shown in Fig. 2c, d.

Note that, at the −30 kJ mol−1 binding score threshold, V-SYNTHES already yields around a 40–50-fold higher number of potential hits from two-component (>10,000 hits) and three-component space (>5,000 hits) compared with standard VLS. This enrichment further increases for more restrictive thresholds, reflecting the focus of V-SYNTHES on the iterative selection of the very-best-scoring compounds. One relevant way of measuring the enrichment factor is to set the docking score threshold such that it selects the 100 top-scoring compounds (EF100), where 100 is a typical number of compounds selected in VLS campaigns for synthesis and experimental testing. For the two-component reaction, this enrichment factor was estimated as EF100 = 250. This is approaching the theoretical limit of ideal enrichment of around 500, which would be achievable if all possible hits from the full chemical space of 500 million compounds were present in the 1-million-compound final enumerated library. For the three-component reactions, the EF100 = 460 is even higher and sufficient for practical use, although further from the theoretical limit of 20,000.

The enrichment factor evaluation did not take into account computational efforts for the initial docking of MEL compounds (and intermediate library for three-component). However, these initial steps add only limited computational costs to V-SYNTHES screens (~20% for two-component and 35% for three-component), as smaller fragment-like compounds in the MEL library dock much faster on average compared with the larger and more flexible compounds. Considering the full computational cost at all of the iterative steps, the acceleration of V-SYNTHES as compared with standard screening for the identification of the 100 top candidate hits at the same score threshold can therefore be evaluated as around 200-fold for two-component and 300-fold for three-component compounds in the current benchmark.

Selection and synthesis of candidate hits

To select the best V-SYNTHES hits for chemical synthesis and in vitro testing at CB receptors, we applied a standard post-processing procedure to the top-ranking 5,000 candidate hits, which included (1) filtering out compounds with potential PAINS properties and low drug-likeness; (2) filtering out compounds with high similarity to known CB1/CB2 ligands in ChEMBL; (3) redocking initial hits at a higher docking effort; and (4) clustering and selecting a limited number of the best compounds from each cluster to maintain a higher diversity of the final set. The final selected set included 80 compounds, of which 60 were synthesized with >90% purity and delivered by Enamine in less than 5 weeks. The details of this selection procedure are provided in Extended Data Fig. 3. A list of all of the synthesized compounds from the V-SYNTHES screening is provided in Supplementary Table 1, and details of compound synthesis and quality control are provided in the Supplementary Methods and source data).

Characterization of new CB ligands

Initial functional characterization of 60 new candidate ligands predicted by V-SYNTHES identified 21 compounds with antagonist activity (>40% inhibition at 10 μM concentration) at human CB1, CB2 or both in the β-arrestin recruitment Tango assay (Supplementary Figs. 1 and 2). Three compounds—673, 505 and 599—showed weak partial CB2 agonism at 10 μM and or 3 µM, they also behaved like antagonists in the antagonism assays. The primary hits were tested for their antagonist potency in full 16-point dose–response assays at CB1 and CB2 in the presence of a fixed 100 nM concentration (EC80) of the dual agonist of CB1 and CB2 CP55,940, which submaximally activates the receptors (Extended Data Fig. 4). Among the 60 compounds predicted by V-SYNTHES, the Tango assays identified 21 hits with functionalKi values better than 10 μM, including 21 antagonists of CB1 and 20 antagonists of CB2 (Fig. 3 and Extended Data Table 1). This constitutes a high 33% hit rate for both receptors, on the high end of the range observed in prospective screening for GPCRs4. Among the identified hit compounds, 14 showed submicromolar functional Ki values as antagonists at the CB1 receptor and three compounds at the CB2 receptor. The same 60 compounds were also tested in radioligand binding assays with human CB2 and rat CB1 receptors and [3H]CP55,940 as the radioligand. Of these, nine compounds had affinities (Ki) better than 10 μM to the CB1 receptor and 16 compounds had affinities better than 10 μM to CB2 receptor (Extended Data Table 1 and Extended Data Fig. 5).

Fig. 3: The top five CB2 hits identified by V-SYNTHES.
figure 3

a, Chemical structures and measured antagonist potencies for CB1 and CB2 receptors. bg, Crystal structure of CB2 receptor with AM10257 (b) and the predicted binding poses for hit compounds 505 (c), 523 (d), 610 (e), 665 (f) and 673 (g) in the CB2 receptor. Key subpockets of the binding pocket are marked as SP1, SP2 and SP3. h, i, Concentration–response curves for the top antagonists in β-arrestin recruitment Tango assays at the CB1 (h) and CB2 (i) receptors. The assays were performed in the presence of 100 nM (EC80) of the dual CB1/CB2 agonist CP55,940. The compounds rimonabant (h) and SR144528 (i) were used as positive controls. Data are mean ± s.e.m. from n = 3 independent experiments; each run was performed in triplicate.

To assess the broad off-target selectivity, the best compounds—523, 610, and 673—were also tested at 10 µM concentration in GPCRome–Tango assays with a panel of more than 300 human receptors23 (Extended Data Fig. 6). The initial panel shows only a few (3–5) potential off-target effects, with only negligible off-target activities in the follow-up dose–response assays.

Molecular determinants of the hits

Experimentally identified hit compounds showed a broad diversity in their chemical structures (Fig. 3b–g), representing new scaffolds with a Tanimoto distance of >0.3 from known CB1 and CB2 ligands found in ChEMBL24 (negative logarithm of the activity pAct > 5.0). The best hit compounds were predicted to largely fill the receptor orthosteric pocket, similar to antagonist AM10257 that was cocrystallized with CB2 receptor22 (Fig. 3c–g). These compounds occupy all three subpockets of the CB2 binding pocket, at which the benzene ring (subpocket 1), 5-hydroxypentyl chain (subpocket 2) and adamantyl group (subpocket 3) of AM10257 are bound in the crystal structure of the receptor. Similar to AM10257, these interactions suggest antagonistic profiles for our hit compounds, as compared to the recently solved cryo-electron microscopy structure of CB2 receptor with agonist WIN 55,212-2, which avoids interaction with subpocket 1 Trp194, Phe117 and Trp258 side chains25. Subpocket 1 preferably binds to the aromatic ring; however, two hit compounds (505 and 523) fill it with a non-aromatic ring and one compound with an aliphatic substituent (681). Interestingly, although most previously known CB1/CB2 ligands, including AM10257 and THC analogues, have an aliphatic moiety in subpocket 2, some of our hits have more bulky cyclic groups, whereas compound 505 avoids this pocket altogether. Notably, although the lipophilicity of the CB receptor pockets represents a challenge for developing high-affinity drug-like ligands, all of the V-SYNTHES-derived hits have low lipophilicity (cLogP < 5) and are smaller than 500 Da.

Comparison to standard VLS

In parallel to the V-SYNTHES screen, we performed a standard ultra-large VLS for a representative 115-million-compound diversity subset of the Enamine REAL library, using the same receptor model and the same parameters of the docking algorithm. As a result of this standard full-scale screening, 97 predicted hits were selected, synthesized and tested in the same functional and binding assays as the candidate hits from V-SYNTHES (Supplementary Table 2, Supplementary Figs. 3, 4). Out of 97 compounds from standard VLS, 16 compounds showed activity in functional assays (Extended Data Fig. 7), of which nine compounds were identified as antagonists at CB1 with functional Ki of better than or equal to 10 μM, and five at CB2. Of these, three compounds had a submicromolar antagonist Ki at CB1, and none at CB2. A binding affinity of better than 10 μM was detected for 8 compounds at CB1 and 15 at CB2 (8% and 15% hit rates, respectively) (Extended Data Fig. 8). Thus, hit rates for the standard VLS did not exceed 15% in any of the assays, as opposed to 33% hit rate obtained for candidate compounds selected by the V-SYNTHES approach.

Optimization of initial V-SYNTHES hits

Hits identified using V-SYNTHES have a great potential for further optimization because the combinatorial nature of the vast REAL Space of 11 billion compounds ensures thousands of close analogues for structure–activity relationship analysis (SAR). To assess this potential, we performed the first ‘SAR-by-catalogue’ search for three of the most prominent hits (523, 610 and 673) in REAL Space. A chemical similarity search using ChemSpace fast algorithms selected 920 compounds within a Tanimoto distance of 0.3 from the hits. The hits from the initial V-SYNTHES screening containing the same synthons as the selected hit compounds were also added to the list of similar compounds. On the basis of docking in the same CB2 structural model, 121 of these analogues were selected for synthesis, with 104 of the selected compounds synthesized within 5 weeks (Supplementary Table 3). Testing in functional assays detected 60 analogues with a potency that was better than 10 μM (Extended Data Fig. 9 and Supplementary Table 4) and 23 analogues with sub-µM antagonist potency at CB2 (13 for 523 analogues, 7 for 610 and 3 for 673) (Extended Data Figs. 10 and 11). A series of 523 analogues yielded the most potent antagonists, with at least five compounds (733, 736, 742, 747 and 749) in the low-nM range and more than 50-fold CB2 versus CB1 selectivity in their binding affinity and functional potency (Fig. 4). The highest affinity was shown for compound 747 (Ki = 0.9 nM). Similar to their parent V-SYNTHES hit 523, the best analogues 733 and 747 also demonstrated high selectivity against the GPCRome–Tango panel of more than 300 receptors23 (Extended Data Fig. 12). Thus, the V-SYNTHES screen and subsequent SAR-by-catalogue enabled the identification of a CB2-selective lead series with nanomolar activity, good chemical tractability and physico-chemical properties, without requiring custom synthesis.

Fig. 4: Selection and characterization of the best analogue series for CB2 hits from V-SYNTHES screening.
figure 4

a, Chemical scaffold for the antagonist analogues of compound 523. b, Predicted binding poses of the best two analogues 733 and 747 in the CB2 pocket. c, Measured antagonist potencies and binding affinities for the best six analogues of compound 523. d, Dose–response curves for the best six analogues tested in functional β-arrestin recruitment Tango assays at CB2; SR144528 was used as a positive control. e, Dose–response curves for the best six analogues at CB2 tested in a radioligand-binding assay; compound AM10257 was used as a positive control. For d and e, data are mean ± s.e.m. n = 3 independent experiments; each repeat was carried out in triplicate.

V-SYNTHES applied to ROCK1 inhibitor discovery

To assess the more broad applicability of the V-SYNTHES approach, we tested its performance on the Rho-associated coiled-coil containing protein kinase 1 (ROCK1), which is an important and challenging target in cancer drug discovery26,27. We performed a V-SYNTHES screen on 11 billion compounds with minor modifications in the selection procedure (Methods). The benchmark comparing the docking of a random compound subset of two-component REAL Space with the docking of selected MEL fragments (Extended Data Fig. 13) suggests enrichment EF100 ≈ 180 for ROCK1, which is comparable to EF100 ≈ 250 obtained for CB screening.

We next selected and ordered 24 fully enumerated compounds, of which 21 were synthesized and tested for functional potency and binding affinity in human ROCK1 inhibition assays (Extended Data Fig. 14). Potencies of better than 10 µM were found for six compounds (28.5% hit rate), with five of these also showing binding affinities Kd < 10 µM in the competitive-binding assay. The best compound, RS-15, achieved potency IC50 = 6.3 nM and affinity Kd = 7.9 nM.

Discussion

We introduce V-SYNTHES, a new iterative approach for fast structure-based virtual screening of combinatorial compound libraries, and apply it here to discover new antagonist chemotypes of cannabinoid CB1 and CB2 receptors among >11 billion compounds of Enamine REAL Space. In the computational benchmark, the first iteration of V-SYNTHES enriched the enumerated library with high-scoring candidate hits as much as 250-fold for two-component and 460-fold for three-component reactions, as compared with a random subset of the REAL Space. Moreover, the experimental hit rate for V-SYNTHES (~33%) was twice as high compared with a standard VLS of a 115-million-compound diversity subset of Enamine REAL, which used ~100 times more computational resources to complete. Similarly, high hit rates and potent nanomolar antagonists were obtained by V-SYNTHES for the kinase target ROCK1, suggesting that the approach can be used for different classes of protein targets.

The benefits of the V-SYNTHES modular approach in screening gigasize libraries, although already substantial with current REAL Space, are expected to further increase in the future when the size of such libraries becomes even more prohibitive for conventional full screening. In the past year, the drug-like portion Enamine REAL Space grew from about 11 billion to more than 21 billion compounds, increasing from 121 to 185 reactions and from 75,000 to 115,000 unique reactants, and will continue to grow polynomially. Thus, the library can grow as fast as a square of synthon numbers for the two-component reactions, and even faster for three- and higher-component reactions. By contrast, the V-SYNTHES computational cost increases only linearly with the number of synthons, and can therefore easily accommodate the further growth of REAL Space towards terascale and petascale libraries.

Conceptually, V-SYNTHES takes advantage of the same paradigm as fragment-based ligand discovery28,29,30, in which the binding of an anchor fragment serves as a core for growing the full drug-like compounds. However, classical fragment-based ligand discovery requires experimental testing of fragment binding by highly sensitive approaches such as nuclear magnetic resonance, X-ray or SPR, and is therefore limited to smaller libraries (~1,000 compounds) of smaller fragments (<200 Da). The validated fragments are then elaborated by expanding them to fill the binding pocket or connecting several fragments into one molecule, which requires elaborate custom chemistry. By contrast, V-SYNTHES avoids both the experimental testing of weakly binding fragments and custom synthesis of compounds by performing fragment enumeration in a very large but well-defined REAL chemical space, and yields drug-like compounds with affinities and potencies that are reliably measurable using standard biochemical assays. The apparent caveat of skipping experimental validation of initial fragments is a higher reliance on computational docking accuracy. However, this can be compensated for in several ways. First, the initial MEL compounds are relatively small (250–350 Da) and rigid, which is optimal for the performance of most docking algorithms, enabling better sampling and higher success rates31,32,33,34. Second, the detection of strong anchor fragments and their validation in the context of full drug-like molecules makes V-SYNTHES hits highly suitable for subsequent optimization. Thus, SAR-by-catalogue for several CB2 hit analogues here yielded low-nM compounds with strong CB2 selectivity, all achieved without requiring elaborate custom synthesis.

By design, V-SYNTHES is not limited to cannabinoid receptors (GPCRs) and ROCK1 (a kinase), but can potentially be applied to any target with a well-defined crystal or cryo-EM structure, including orphan receptors and allosteric pockets. Moreover, although this implementation uses ICM-Pro docking and applies to the Enamine REAL Space library, the iterative synthon-based screening algorithm can be implemented with any reliable docking-based screening platform and use any ultra-large modular library that can be represented as a combination of scaffolds and synthons. Such implementations may require custom adjustment of some parameters of the algorithm for optimal performance, opening many paths of further exploration of this approach.

Methods

Preparation of synthon and reaction libraries

The database of reactions and corresponding synthons was provided by Enamine (the version of May 2019). All of the reactions in the database can be separated into two categories: two-component and three-component reactions, based on the number of variable synthons. Synthons and reaction libraries were prepared for enumeration using ICM-Pro Molecular Modeling Software35 (Molsoft). For each reaction from the reaction database, a Markush structure representing a reaction scaffold with defined attachment points for substituent synthons was generated in a smile format. Structures of possible synthons for each R group in each reaction were generated in 2D format with attachment points defined for enumeration. An example of a two-reagent reaction is the one-pot reductive amination of aldehydes with heteroaromatic amines36, as shown in Extended Data Fig. 15a. An example of a three-reagent reaction is the one-pot formation of thiazoles through asymmetrical thioureas37, shown in Extended Data Fig. 15b.

Enumeration of the combinatorial library

Enumeration of combinatorial libraries was performed using combinatorial chemistry tools implemented in ICM-Pro35. Markush structures for enumeration were derived from reaction SMARTS provided by Enamine.

Generation of the MEL

The MEL was generated to represent all possible scaffold–synthon combinations in Enamine REAL Space. Each compound in the MEL library comprises a reaction scaffold enumerated with a single synthon, whereas other attachment points are replaced with the minimal synthons, or ‘caps’. Minimal chemically feasible synthons for every substituent in each reaction were selected as either methyl or phenyl, the latter one in case the reaction required an aromatic group. Minimal synthon atoms were labelled as 13C isotopes to facilitate computational analysis of docking poses (Extended Data Fig. 2).

In two-component MEL generation, filters on molecular weight and cLogP were applied to remove MEL compounds with a molecular mass (MM) of >425 Da or cLogP > 5, which would be likely to result in fully enumerated compounds that violate Lipinski’s rule of 5. For three-component reactions, the size filters were set to MM < 350 Da on the first iteration of V-SYNTHES and to MM < 425 Da on the second.

Generation of the random enumerated library

To generate random subsets of the REAL database for internal benchmarking was performed by enumeration of randomly selected synthons from each reaction. To create the 1-million-compound library for two-component reactions, 1% of synthons (a total of 6,418 synthons) were randomly selected, representing each R group in each reaction. For three-component reactions, 0.47% of synthons (a total of 512 synthons) were randomly selected for the 500,000-compound library, with no less than 1 synthon per Markush R group. The random libraries were filtered by Lipinski’s rules of five.

Selection of MEL candidates for CB1/CB2 for full enumeration

To select MEL candidates for further enumeration, the docking score and docking pose of each MEL candidate were analyzed. The fragments were ranked by score and the top 1% were retained for further investigation. To detect productive versus non-productive compound poses, the algorithm calculates the distances between the cap atoms of docked MEL candidates and the selected atoms (or dummy atoms) marking the dead-end subpocket in the protein-binding site. For the CB2 receptor pocket, three dead-end points were used to define potentially non-productive MEL ligands: the water molecule from the crystal structure and two dummy atoms, one placed between residues Phe106 and Lys109, another between residues His95 and Leu182. MEL compounds for which their cap atoms closer than 4 Å to the ‘dead-end’ points were excluded from further consideration. Furthermore, to ensure the diversity of the final library, the best MEL candidates were filtered in a way that the final selection did not contain more than 20% of the MEL candidates from the same reaction.

For two-component reactions, the 819 best MEL candidates were selected for further enumeration resulting in a library of 1 million full compounds. For three-component reactions, two rounds of enumerations were required to arrive at full molecules. In the first round, the 1,043 best MEL candidates were used to produce 500,000 molecules with two real synthons and one minimal cap. After docking and analysis of these ligands, the 4,739 best molecules were selected for the final enumeration step resulting in 500,000 fully enumerated molecules.

Receptor model preparation for CB2

Both V-SYNTHES and standard VLS used a structural model based on the CB2R crystal structure with an antagonist AM10257 at a resolution of 2.8 Å (PDB: 5ZTY)22. The structure was converted from PDB coordinates to the internal coordinates object using the ICM-Pro conversion tool by restoring missing heavy atoms and hydrogens, locally minimizing polar hydrogens, and optimizing His, Asn and Gln side-chain protonation state and rotamers. In the final step of selection, we also used ligand-optimized structural models for redocking of the top 1% hits. These refined models were generated in a ligand-guided receptor optimization procedure (LiBERO)38, which refined the sidechains and water molecules within the 8 Å radius from the orthosteric binding pocket. Two binding modes for the CB2 receptor binding pocket were prepared: one guided by 20 known antagonists and another by 20 agonists, selected from ChEMBL high-affinity ligands for CB2 (CHEMBL253, affinity pKd > 8). These compounds, along with 200 decoy molecules that were selected from the CB2 receptor decoy database (GDD)39 were docked into the refined conformers. The conformers yielding the best area under the receiver operating characteristic curves were selected as the best LiBERO models. The two LiBERO models, along with the crystal structure model, were combined into one 4D model as described previously40. The 4D model was used for screening in both V-SYNTHES iterative algorithm and standard VLS. In contrast to V-SYNTHES, standard VLS used a preassembled library of 115 million REAL compounds, including 100 million of a lead-like subset of REAL and a diversity REAL subset of 15 million drug-like compounds41.

Docking and VLS for CB2

Docking simulations in both V-SYNTHES and standard VLS were performed using ICM-Pro molecular modeling software (Molsoft)35. Docking involves an exhaustive sampling of the molecule conformational space in the rectangular box that comprised the CB2 orthosteric binding pocket and was performed using the thoroughness parameter set to 2. Docking uses biased probability Monte Carlo optimization of the compound’s internal coordinates in the precalculated grid energy potentials of the receptor. The 4D model of the receptor pocket described above was used to sample three slightly different receptor conformations in a single docking run as implemented in ICM-Pro (Molsoft). Before the final selection of hits for experimental testing, the top 30,000 compounds from the screen were redocked into the model with higher thoroughness (5) to assure their comprehensive sampling.

V-SYNTHES enrichment factor for CB2

To evaluate the efficiency of the V-SYNTHES approach and compare it with standard VLS, we introduced an enrichment factor that provides a quantitative measurement of how the final library on step 4 of the algorithm is enriched in hits as compared to a library of the same size generated as a random subset of the Enamine REAL Space. For two-component reactions (500 million compounds), we compared random and enriched libraries of 1 million compounds. For three-component reactions (total 10.5 billion compounds), we compared random and enriched libraries of 0.5 million compounds. The enrichment is calculated for hits with docking scores equal to or better than a certain threshold X, and is defined as the following ratio:

$${\rm{Enrichment\; factor}}(X)=\frac{{\rm{No.}}\,{\rm{of}}\,{\rm{hits}}\,{\rm{with}}\,{\rm{scores}} < X\,{\rm{in\; SYNTHES}}\,}{{\rm{No.}}\,{\rm{of}}\,{\rm{hits}}\,{\rm{with}}\,{\rm{scores}} < X\,{\rm{in\; standard\; VLS}}}$$

The enrichment factor at the docking score threshold that selects 100 candidate hits in V-SYNTHES, designated EF100, can be used as a single-value practical metric of the algorithm performance.

Generating initial SAR for selected CB2 hits

Chemical search for analogues of the best compounds 523, 610 and 673 in REAL Space was performed using REALSpaceNavigator16. Compounds with a Tanimoto distance less than 0.3 (<0.4 for 673) were selected for docking. The following criteria were used to select top-scoring compounds for each parent molecule: docking scores better then −30 (−25 for 673), cLogP < 5, cLogS > −5, MM < 500 Da and Tanimoto distance to known CB1/CB2 ligands >0.3. Furthermore, the 20,000 top hits from initial V-SYNTHES screening were reanalysed and the best molecules generated from the same fragments as 523, 610 and 673 were added to the final list. The number of analogues selected for synthesis were as follows: 49 compounds for 523 (49 compounds synthetized), 42 compounds for 610 (38 compounds synthetized) and 30 compounds for 673 (17 compounds synthesized).

Parallel synthesis

Parallel one-pot synthesis for all compounds in this study was performed by Enamine in 5 weeks with >90% purity guaranteed as described in the Supplementary Methods. This includes (1) candidate CB compounds from the initial V-SYNTHES round (60 synthesized out of 80 ordered); (2) SAR-by-catalogue compounds (104 out of 121); (3) compounds from the benchmark full screen of 115 REAL diversity library (97 out of 109); and (4) ROCK1 candidate compounds (21 synthesized out of 24 ordered).

Functional potency in CB1/CB2 Tango assays

The Tango arrestin recruitment assays were performed as previously described23. In brief, HTLA cells were transiently transfected with human CB1 or CB2 Tango DNA construct overnight in DMEM supplemented with 10 % FBS, 100 µg ml−1 streptomycin and 100 U ml−1 penicillin. The transfected cells were then plated into poly-l-lysine-coated 384-well white clear-bottom cell culture plates in DMEM containing 1% dialysed FBS at a density of 10,000–15,000 cells per well. After incubation for 6 h, the plates were added with drug solutions prepared in DMEM containing 1% dialysed FBS for overnight incubation. Specifically for the antagonist assay, 100 nM of CP55940 was added after 30 min of incubation of the drugs. On the day of assay, medium and drug solutions were removed and 20 µl per well of BrightGlo reagent (Promega) was added. The plates were further incubated for 20 min at room temperature and counted using the Wallac TriLux Microbeta counter (PerkinElmer). The results were analysed using GraphPad Prism 9. Each experiment was performed in triplicate and functional Ki values were determined from three independent experiments and are expressed as the mean of the three values.

Radioligand binding in CB1/CB2-binding assays

The affinities (Ki) of the new compounds for rat CB1 receptors and human CB2 receptors were obtained using membrane preparations from rat brain or HEK293 cells, respectively, and [3H]CP-55,940 as the radioligand, as previously described42,43. Results from the competition assays were analysed using nonlinear regression to determine the IC50 values for the ligand; Ki values were calculated from the IC50 using GraphPad Prism 9. Each experiment was performed in triplicate and Ki values were determined from three independent experiments and are expressed as the mean of the three values.

PRESTO-Tango GPCRome

Screening of the compounds in the PRESTO-Tango GPCRome was performed as previously described23 with modifications. First, HTLA cells were plated in poly-l-lysine-coated 384-well white plates in DMEM containing 1% dialysed FBS for 6 h. Next, the cells were transfected with 20 ng per well PRESTO-Tango receptor DNAs overnight. The cells were then added with 10 µM drugs without changing the medium and incubated for another 24 h. Each target was designed to have four wells for basal and four wells for sample. The remaining steps of the PRESTO-Tango protocol23 were followed. The results were plotted as fold change in the average basal signalling activity against individual receptors in GraphPad v.9.0. For the receptors that had greater than threefold basal signaling activity, assays were repeated as a full dose–response assay and the results were plotted as a percentage of reference compounds.

V-SYNTHES applied to ROCK1 screen

The MEL library was docked into the ROCK1 crystal structure (PDB: 2ETR)44 prepared in ICM-Pro. The 20,000 best-scoring fragments were then screened for their hydrogen bond interactions with the hinge region of ROCK1, residues Glu154 and Met156. To eliminate potentially non-productive fragments in the enumeration step, all fragments with caping atoms within 4.6 Å distance from these hinge-region residues were removed, leaving about 5,000 compounds for enumeration with full synthons. Docking of the 1 million fully enumerated compounds resulted in the top 30,000 compounds with docking scores ranging between −35 kJ mol−1 to −50 kJ mol−1. The vast majority of them (>99%) retained hydrogen bonding to hinge region residues, showing that the full molecules maintain the binding properties predicted for MEL fragment selection. The remaining compounds were filtered with PAINS score, drug-likeness properties, chemical diversity as well as ligand interaction diversity to sample different binding modes in the pocket. We selected 24 compounds for purchase from Enamine, of which 21 were successfully synthesized with a purity >90% and were delivered in under 6 weeks.

ROCK1 functional and binding assays

The HotSpot radiometric assay (Reaction Biology Corporation) measures inhibition of ROCK1 catalytic activity towards a specific peptide substrate (KEAKEKRQEQIAKRRRLSSLRASTSKSGGSQK), which is monitored by P81 filter-binding methods45. All compounds were tested in triplicate at a starting concentration of either 100 µM or 90 µM in the presence of 1 µM ATP and diluted threefold for a total of ten doses.

The KdElect assay (Eurofins/DiscovereX) measures quantitative binding (Kd) of compounds to ROCK1 in competition with an immobilized active-site-directed ligand. Binding is determined by measuring the amount of kinase captured by immobilized ligands versus the control samples through the use of qPCR. Soluble compounds specifically binding to ROCK1 prevent the immobilized ligand from binding. Our compounds were tested in triplicate in an eleven-dose response curve at a starting concentration of 30 µM. IC50 was calculated and graphed using a nonlinear regression curve in GraphPad Prism 8.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this paper.