Introduction

Activity cliffs can be defined as pairs of similar compounds with very high differences in activity [1]. Certainly, these kinds of compounds are of interest in medicinal chemistry because they are associated with high SAR information content [2, 3]. It is worth mentioning that activity cliffs fall out of the similarity-property principle and are usually incorrectly predicted by quantitative structure–activity relationships (QSAR) models [1, 4, 5]; therefore, their visualization and quantification in screening datasets are of interest on SAR and QSAR studies [2]. Graphical methods designed to detect activity cliffs are inspired by the emerging concept of the activity landscape [2]. An activity landscape is defined as any representation that integrates similarity and potency relationships between compounds sharing the same biological activity [3, 6]. These methods include structure–activity similarity (SAS) maps [7], network-like similarity graphs (NSGs) [8], structure–activity landscape index (SALI) graphs [9], etc. In addition to graphical methods, numerical SAR analysis functions have been introduced. These functions aid to characterize the SAR nature of molecular databases based on indices, i.e., Structure–Activity Landscape Index (SALI) and SAR Index (SARI) [911]. Furthermore, some methods to characterize local SAR information and activity cliffs have been reported [12, 13]. In a recent work, Kayastha et al. proposed local SAS maps as representations formed by plotting all compound pairs formed by a specific compounds [12]. This concept is related to our previous work where an activity cliff generator was defined as a molecule with high probability to form activity cliffs with structurally similar molecules tested in the same biological assay [13]. Additionally, Hu and Bajorath proposed the structural categorization of activity cliffs as R-group cliffs (compound pairs with same scaffold and different R-groups), chirality and topology cliffs (compound pairs with same scaffold and R-groups), and scaffold cliffs (compound pairs with different scaffold and same R-group) [14]. As part of our-continued effort to further advance the concept of local activity landscapes and their characterization, i.e., activity cliff detection, herein we use a chemotype-based classification to define local activity landscapes. Then, chemotype-based activity landscapes were analyzed employing the concepts of activity cliff enrichment factor (ACEF) and activity cliff generators. To illustrate our results, three screening datasets with different sizes were analyzed including compounds screened against three monoamine transporters (MATs): dopamine (DAT), norepinephrine (NET), and serotonin (SERT) transporters, two cyclooxygenases (COX-1 and -2), and three peroxisome proliferator-activated receptors (PPAR \(\upalpha \), \(\updelta \) and \(\upgamma )\). MATs have received considerable attention as targets for psychiatric and neurological disorders as depression, attention deficit, hyperactivity disorder, Parkinson’s disease, and schizophrenia [15, 16]. Cyclooxygenases are important since they are targeted by non-steroidal anti-inflammatory drugs and have been proposed as target for cancer therapy in combination with other chemotherapeutic agents [17, 18]. Finally, the activation of PPARs offers a promising strategy for the treatment of diabetes mellitus, obesity, and related cardiovascular complications [1921].

Materials and methods

Dataset

Three molecular databases previously used for activity landscape modeling studies were used in this work as benchmark datasets [2124]. The first database includes 299 compounds tested against three monoamine transporters recently used by Dimova et al. [22]. Each molecule in the dataset has reported \(K_{i}\) values against DAT, NET, and SERT. A second set has 658 cyclooxygenase inhibitors obtained from the Binding Database [2527]. These compounds have reported activity \((\hbox {IC}_{50})\) against COX-1 and COX-2. The third set of compounds comprises 168 PPAR ligands, with \(\hbox {IC}_{50}\) values against \(\upalpha \), \(\updelta \), and \(\upgamma \) receptors, obtained from the Binding Database. A general overview of activities, for each target, is summarized in the parameters presented in the Supporting Information (Table S1).

Molecular classification into chemotype classes

The chemotype of each molecule in the datasets was calculated using molecular equivalence indices (MEQI) at cyclic system resolution [2832]. Those compounds that presented the same chemotype were grouped together. The generation of chemotypes at cyclic system resolution consists of removing all side chains from the molecule, while bond orders and atom types are conserved as shown in Fig. 1. Exocyclic bonds of carbonyls, imines, sulfones, and sulfoxides were considered as part of the cyclic system if they are directly boned to a ring, as well as all functional groups forming linkers between two rings. Although cyclic system resolution was considered in this work, different resolutions or any other definition of molecular scaffold can be used as long as they are generated in a consistent manner.

Fig. 1
figure 1

Example of the generation of chemotypes using cyclic systems resolution level. The resulting chemotypes were used to group compounds with the same molecular scaffold

Molecular similarity

Molecular similarities were calculated using eleven 2D fingerprints implemented in MayaChemTools [33] and two 3D shape methods available in Rapid Overlay of Chemical Structures (ROCS) [34]. For 2D and 3D representations, the Tanimoto coefficient was used to calculate \(N(N-1)/2\) pairwise structural similarities \((SS_{ij})\) for each pair of molecules i and j [35, 36]. The 2D fingerprints namely atom neighborhoods [37], atom types, electrotopological state indices (EstateIndices) [38], extended connectivity (ECFP4) [39], MACCS (322 bits) [40], path length, topological atom pairs (TopAtomPairs) [41], topological atom torsions (TopAtomTorsions) [42], topological atom triplets (TopAtomTriplets), topological pharmacophore atom pairs (TopPh4AtomPairs) [43], and topological pharmacophore atom triplets (TopPh4AtomTriplets) [44] implemented in a set of Perl scripts available in MayaChemTools were used. To apply 3D similarity methods, the structures were optimized to obtain a single low-energy conformation using a modified Dreiding force field implemented in Vconf software [45, 46]. Although the use of multiple conformations is desirable for activity landscape studies [47], we use a single low-energy conformation to calculate 3D similarity representations as an approximation to simplify this work by introducing the less variables as possible. 3D representations included Tanimoto Shape Index (ShapeTanimoto) and ComboScore/2 (mean value of ShapeTanimoto and Color Score) [48, 49]. Despite the inherent conformational issues, the use of 3D structural representations is valuable in activity landscape studies [50].

Since structural similarities have a strong dependence with the molecular representation used [50], three uncorrelated structure similarity representations namely MACCS keys, TopPh4AtomPairs, and ComboScore/2 were combined in a mean similarity measure, taking into account the principles of data fusion [36, 51]. Similar criteria to select fingerprint representations to obtain consensus models of activity landscape have been used in several studies and extensively discussed [52].

Activity differences

For each dataset of N compounds tested against target T, \(N(N-1)/2\) absolute value of pairwise activity differences corresponding to each possible pair of compounds in the dataset was calculated from the following equation [21, 53]:

$$\begin{aligned} |\Delta \hbox {{ A}({ T})}_{i,j} |=|\hbox {{ A}({ T})}_i -\hbox {{ A}({ T})}_j | \end{aligned}$$
(1)

where \(\hbox {{ A}({ T})}_{i}\) and \(\hbox {{ A}({ T})}_{j}\) are the activities of the ith and jth molecules \((j > i)\), in \(\hbox {pIC}_{50}\) or p\(K_{i}\) values, tested against the target T. In this work, T = COX-1, COX-2, NET, SERT, DAT, PPAR\(\upalpha \), PPAR\(\updelta \), and PPAR\(\upgamma \).

Activity landscape with SAS maps

SAS maps were generated by plotting the structural similarity in the X-axis against the absolute value of the activity difference in the Y-axis for each pair of compounds; thus, for a set of N compounds tested against target T, each SAS map contains \(N(N-1\))/2 data points [13]. As is shown in Fig. 2, SAS maps can be roughly divided in four zones (I–IV) by imposing activity difference and molecular similarity threshold values to aid their interpretation [13]. The X-axis in SAS maps was divided using, as a heuristic threshold, the lowest similarity value observed in the top 10 % compounds with higher similarity. The activity difference axis (Y-axis) threshold was set to a value of 1 log unit to define boundaries between regions I/III and II/IV. Other criteria to define structure similarity and activity difference thresholds can be employed as reviewed by Medina-Franco [52]. Points that fall in each region can be associated with different SAR, briefly, data points that fall in region I are associated with scaffold hops having low structure similarity and low activity differences [52]. Points in region II are associated with compounds with smooth SAR having high structure similarity and low activity differences [52]. Region III corresponds to low structure similarity and high activity differences. Region IV contains molecular pairs that correspond to activity cliffs having high structure similarity and high activity difference [1, 3, 52].

Fig. 2
figure 2

General form of the structure–activity similarity (SAS) maps showing four major regions. Regions I and II are associated with scaffold hopping and smooth SAR, respectively. Region IV indicates discontinuous SAR and activity cliffs

SAS maps defined with chemotype information

All compound pairs in each database were distinguished depending on whether both molecules in the pair share the same chemotype or not. Data points where both compounds contain the same chemotype were further differentiated in the SAS maps using different colors. In this work, only chemotypes containing at least ten molecules were analyzed. Each chemotype \(\lambda \) comprises \(N_{\lambda }(N_{\lambda }-1)/2\) pairwise comparisons of structure similarity and activity differences, where \(N_{\lambda } (N_{\lambda } \ge 10)\) is the number of compounds classified in chemotype \(\lambda \).

Activity cliff enrichment factor

For each chemotype \(\lambda \), activity cliff enrichment factor (ACEF) was calculated using the following equation:

$$\begin{aligned} \mathrm{ACEF}(\lambda )=\frac{\mathrm{AC}(x_\lambda )}{\mathrm{AC}(x_\chi )} \end{aligned}$$
(2)

where \(\mathrm{ACEF}(\lambda )\) is the activity cliff enrichment factor for the \(\lambda \)th chemotype, which relates the fraction of activity cliffs formed with molecules that fall in the \(\lambda \)th chemotype \(\mathrm{AC}(x_\lambda )\) with respect to the fraction of activity cliffs in the entire database \(\mathrm{AC}(x_\chi )\).

\(\mathrm{AC}(x_\lambda )\) was calculated as follows:

$$\begin{aligned} \mathrm{AC}(x_\lambda )=\frac{|C_\lambda ^{IV} |}{|C_\lambda |} \end{aligned}$$
(3)

where \(|C_\lambda ^{IV} |\) is the number of activity cliffs (pairs of compounds that fall in region IV in the SAS maps) in the chemotype class \(\lambda \), and \(|C_\lambda |\) is the total number of pairs of compounds in the same chemotype class \((\lambda )\).

\(\mathrm{AC}(x_\chi )\) was calculated as follows:

$$\begin{aligned} \mathrm{AC}(x_\chi )=\frac{|C_\chi ^{IV} |}{|C_\chi |} \end{aligned}$$
(4)

where \(|C_\chi ^{IV} |\) is the total number of activity cliffs, and \(|C_\chi |\) is the total number of pairs in the database. Noteworthy, \(\mathrm{AC}(x_\chi )\) is fraction of the total number of activity cliffs (as defined in the SAS maps) that does not depend on the chemotype classification. Similar indices were previously reported to identify chemotypes enriched with active and selective compounds [23, 54].

Activity cliff generators based on chemotypes

We computed the number of activity cliffs with chemotype \(\lambda \) \((\hbox {NAC}_{x\lambda })\) as the frequency of each compound (x) in points that fall in region IV of SAS maps if the corresponding chemotype frequency \(\lambda (\hbox {CF}_{\lambda })\) is equal or greater than 10. \(\hbox {NAC}_{x\lambda }\) was scaled by the chemotype frequency using the expression \((\hbox {NAC}_{x\lambda }/\hbox {CF}_{\lambda })\). Compounds with \(\hbox {NAC}_{x\lambda }/\hbox {CF}_{\lambda } \ge 2\) standard deviations relative to the corresponding mean of the distribution of the total \(\hbox {NAC}_{x\lambda }/\hbox {CF}_{\lambda }\) values were considered as activity cliff generators.

Results and discussion

Chemotype classification

Molecules in each database were classified in cyclic systems. Figure 3 shows the most common cyclic systems in each database. In this work, only cyclic systems found in at least ten molecules (frequency \(\ge \)10) were considered as relevant, since they correspond to the most common molecular scaffolds. For compounds screened against monoamine transporters, 91 cyclic systems were identified of which nine of them have high frequency (\(\ge \)10). Also, 191 cyclic systems were found for cyclooxygenase inhibitors, and 16 of them have high frequency. Furthermore, PPAR ligands database contains 66 cyclic systems of which only three have high frequency.

Fig. 3
figure 3

Most frequent cyclic systems found in the analyzed databases (frequency \(\ge \)10). Chemotype identifier and frequency are displayed. a Compounds screened against monoamine transporters. b Cyclooxygenase inhibitors. c PPAR ligands

Molecular similarity

For each database, pairwise structural similarities were calculated using 13 structural representations. This resulted in 44,551 pairwise structural similarities for 299 compounds tested against monoamine transporters, 216,153 for 658 cyclooxygenase inhibitors, and 14,028 for 168 PPAR ligands. Cumulative distribution functions for each representation and for the mean similarity are shown in Fig. 4. Results show that most of the 2D and 3D representations have different ranges of similarity values. This is consistent with previous studies where different ranges of similarity values are identified for molecular fingerprints obtained from different design [13, 20, 45]. The statistics of structural similarities are summarized in Table S2 in the Supporting Information indicating the maximum, third and first quartile, median, mean, and standard deviation. Results indicate that Atom neighborhoods, ExtendedConnectivity, and TopAtomTriplets have the lowest similarity values for all databases analyzed. Also, median values are close to mean values in all the cases. Orthogonal representations were selected taking into account the correlation of pairwise similarities between different descriptors and their ability to capture molecular features by different methodologies. The correlation matrix for different 2D and 3D representations for each database can be found in the Supporting Information (Tables S3–S5). Three low correlated representations namely MACCS keys, TopPh4AtomPairs, and ComboScore/2 were selected to compute mean similarity. Noteworthy, all selected representations show low correlation between them in all databases studied (correlation \(<\) 0.6). MACCS and TopPh4AtomPairs have correlations of 0.25, 0.31, and 0.53 for MATs, COXs, and PPARs databases, respectively. Also, MACCS and ComboScore/2 have correlations of 0.28, 0.50, and 0.38, for the same databases, respectively. Finally, TopPh4AtomPairs and ComboScore/2 have correlations of 0.33, 0.32, and 0.30, respectively. It is worth mentioning that the selected structure representations are able to capture different molecular information since they have different design. The representation MACCS is based on substructures, TopPh4AtomPairs is based on atom pairs, and ComboScore/2 is based on 3D molecular shape and color features (related to 3D pharmacophoric elements) [40, 43, 48, 49].

Fig. 4
figure 4

Cumulative distribution functions of structural similarities using different 2D and 3D fingerprint representations. a 299 compounds screened against monoamine transporters (44,551 structural similarities), b 658 compounds with activity against COX-1 and COX-2 (216,153 structural similarities), c 168 compounds screened against PPAR \(\upalpha \), \(\updelta \), and \(\upgamma \) (14,028 structural similarities)

Activity landscape with SAS maps

Figure 5 shows a general overview of the activity landscape with SAS maps for all databases analyzed. It is worth noting the different point distributions in the databases under study. SAS maps for monoamine transporters and cyclooxygenases are characterized by compound pairs that cover a wide range of mean molecular similarity and potency differences. In other words, these databases are characterized by structurally diverse compounds, which cover a wide potency range (see Tables S6–S7 in the Supporting Information). In contrast, SAS maps for PPARs are characterized by the lower activity differences and the lowest structure similarity coverage. Therefore, this database is characterized by structurally similar compounds and low potency variations as compared with compounds tested against cyclooxygenases and monoamine transporters. This is an indication that a smaller part of the activity landscape is being captured by this database, and it has the most continuous SAR among the databases included in this study.

Fig. 5
figure 5

SAS maps for databases analyzed in this study. Each data point indicates a pairwise comparison of 299 compounds (44,551 datapoints) screened against monoamine transporters (NET, SERT, and DAT), 658 compounds (216,153 datapoints) with activity against cyclooxygenases (COX-1 and COX-2) and 168 compounds (14,028 datapoints) screened against PPARs (\(\upalpha \), \(\updelta \), and \(\upgamma )\). Mean similarity was calculated using three selected similarity representations in all cases (MACCS keys, TopPh4AtomPairs and ComboScore/2)

Mapping chemotypes in SAS maps

Although SAS maps give a general overview of the activity landscape, extract specific SAR information from these representations is not straightforward; therefore, additional information can be mapped in SAS maps to simplify their interpretation such as chemotype information. SAS maps including chemotype information visually depicts the local activity landscape of subsets of compounds sharing the same chemotype. Using this approach, it is possible to analyze the distribution of the pairs of compounds for each frequent cyclic system and then identify the cyclic systems with the highest faction of activity cliffs. Also, is of interest the detection of the most common cyclic systems present in compounds that frequently form activity cliffs; this is activity cliff generators. While detection of chemotypes enriched in activity cliffs gives a general overview of local activity landscape based on chemotypes, i.e., continuous or discontinuous chemotype landscape, the detection of activity cliff generators gives important advantage to highlight small structural changes on substituents that a specific scaffold should have to exhibit high activity differences.

Figure 6 shows some examples of SAS maps highlighting with a color code those points where both molecules in the pair share the same chemotype. As described in the Methods, SAS maps in Fig. 6 were divided in four major regions (I–IV) using a threshold value of 1 for activity difference and the top 10 % pairs with the highest structure similarity for each database. Only the SAS maps for NE transporter, COX-2, and PPAR\(\upgamma \) are shown in Fig. 6; however, other biological activities and databases can be analyzed similarly.

Fig. 6
figure 6

SAS maps including chemotype information for databases analyzed in this study. Each data point indicates a pairwise comparison of 299 compounds (44,551 datapoints) screened against norepinephrine transporter (ac), 658 compounds (216,153 datapoints) with activity against cyclooxygenase-2 (d–Medina-Franco), and 168 compounds (14,028 datapoints) screened against PPAR\(\upgamma \) (gi). Representative chemotypes were highlighted in color code represented by \(N_{\lambda }(N_{\lambda }--1)/2\) pairwise comparisons of structure similarity and potency differences where \(N_{\lambda }\) is the number of compounds classified in the chemotype \(\lambda \). Also, chemotype code and the number of data points where both compounds fall in a particular chemotype are indicated at the top of each map

Different distributions in regions I–IV were observed in SAS maps for each chemotype class showed in Fig. 6. Note that, in general, most of the data points are in region II. This is not surprising since molecules sharing the same scaffold will have generally high mean structure similarity (see Table S7 in the Supporting Information). However, some chemotypes have different distributions in regions I–IV as illustrated in Fig. 6. Figure 6a–c show local chemotype activity landscapes for compounds that fall in three chemotypes for the NET dataset. In this case, pairs that fall in chemotype KZZH1 are distributed between regions II and IV (Table 1). Noteworthy, some pairs in region IV for KZZH1 present high activity differences, in some cases more than two logarithmic units; hence, this chemotype is characterized by a heterogeneous SAR and the presence of activity cliffs for the current database (Table S6). Chemotype TKV67 shows pairs distributed in regions I–IV, although this observation is highly dependent of the structure similarity threshold used as reference, it is clear the overall lower structure similarity values for this chemotype (mean \(=\) 0.765 and median \(=\) 0.759) as compared with chemotype KZZH1 (mean \(=\) 0.870 and median \(=\) 0.865), see Table S7 in the Supporting Information. This last observation can be rationalized in function of the structural diversity of side chains of compounds with cyclic system TKV67 as compared with KZZH1. In contrast, chemotype HQRBX, where all molecular pairs fall in region II, is characterized by a flat SAR having molecules with high structure similarity (mean \(=\) 0.914 and median \(=\) 0.918) and low activity differences (mean \(=\) 0.204 and median \(=\) 0.173). Interesting results were also found for COX-2 inhibitors; Fig. 6d shows that chemotype PP97T covers a broad area of the ‘landscape space.’ Also, chemotype L2U5P (Fig. 6e) shows an interesting pattern where molecular pairs were mainly found in two zones including pairs having low and very high activity differences. For the same database, 4ZLWP (Fig. 6f) is an example of chemotype characterized by a flat SAR. On the other hand, only chemotypes X1R61, A04KM, and X6V2V have high frequency for PPAR\(\upgamma \) ligands. The SAS maps including these chemotypes are shown in Fig. 6g–i. Data points for chemotypes X1R61 and A04KM are mainly located at region II and IV, whereas pairs with chemotype X6V2V fall only in region II. It is worth mentioning that pairs in region IV for chemotypes X1R61 and A04KM have activity difference values lower than 2 units; therefore, changes in structure have only low or moderate impact in potency for this database.

Table 1 Quantitative characterization of molecular pairs in regions I–IV for monoamine transporters database

The number of molecule pairs that fall in each region of SAS maps was calculated for all databases. The results for monoamine transporters database are shown in Table 1; whereas the same data for COXs and PPARs databases can be found in Tables S8–S9 of the Supporting Information. As can be deduced from Table 1, quantitative comparison of each chemotype in SAS maps is meaningful only when comparing subsets with the same chemotype across regions I–IV, i.e., 50 % of the pairs with chemotype KZZH1 fall in region IV for NET database; therefore, activity cliffs having this chemotype are frequent. Also, the quantitative analysis in Table 1 enables the cross-comparison of molecules sharing the same scaffold and tested against multiple targets employing the same thresholds in activity and molecular similarity. For example, molecule pairs with chemotype HQRBX have a total flat SAR in NET landscape, whereas some activity cliffs (32 pairs with activity differences \(\ge \) 1) can be found for the same chemotype when the molecules were evaluated against DAT. However, since the number of pairs in SAS maps for each chemotype is related to chemotype frequency, the quantitative characterization by the number of molecular pairs that fall regions I–IV cannot be used for comparisons between different chemotypes.

Activity cliff enrichment factor for chemotype classes

The concept of activity cliff enrichment factor (ACEF) is introduced in this work to normalize the number of activity cliffs in each chemotype taking as a reference the entire database. This factor makes possible to compare the relative number of activity cliffs between chemotype classes for the most important chemotypes in the databases, this is, chemotypes with a considerable frequency useful to extract SAR information. The purpose of ACEF is to identify cyclic systems enriched in activity cliffs for the current databases. Calculated ACEF values for the most frequent chemotypes (frequency \(\ge \) 10) are shown in Table 2. Note that ACEF for the entire database has a value of one in all databases; therefore, chemotypes with values higher than one are rich in activity cliffs as compared with the entire database. Similarly, chemotypes with value lower than one are poor in activity cliffs, thus representing a smoother landscape as compared to those scaffolds with ACEF \(>\)1. Chemotype KZHH1 has high ACEF against the three monoamine transporters (\(\ge \)6.52). In contrast HQRBX has only high ACEF for DAT (4.28). These measures of ACEF for individual chemotypes along with the visual representations of the SAR in the SAS maps (Fig. 6a, c) can be interpreted in function of SAR discontinuity. Chemotype KZHH1 is characterized by a heterogeneous SAR for all monoamine transporters, whereas HQBRX has a flat SAR for NET and SERT. Interesting chemotypes for COX inhibitors with contrasting ACEF values are PP97T and 4ZLWP: while PP97T shows the highest ACEF against both cyclooxygenases (\(\ge \)10.66), 4ZLWP is characterized by a flat SAR for both targets. Similar results were found for the PPARs dataset. In this case, X1R61 is the chemotype with the highest ACEF (\(\ge \)4.65), whereas X6B2V has a totally flat SAR for the three studied targets.

Table 2 ACEF for all databases studied using a threshold of 1 in activity differences and 0.702, 0.681, and 0.698 in structure similarity for MATs, COXs, and PPARs databases, respectively

The ACEF measure can provide useful information when choosing a scaffold for lead optimization. For example, compounds containing cyclic systems with high ACEF are interesting, as it is more likely to have large changes in activity with few structural modifications. On the other hand, compounds having cyclic systems with a low ACEF can be useful to develop predictive models, i.e., QSAR.

Chemotype-based activity cliff generators

In addition to the characterization of chemotypes enriched with activity cliffs, it is of interest the detection of chemotypes associated with activity cliff generators. It is worth noting that characterization of activity cliff generators in this work was carried employing a slight modification of the original definition reported by Méndez-Lucio et al. [13]. In this case, we computed the fraction: [number of activity cliffs]/[chemotype frequency] in order to compare results between compounds classified by chemotype classes with different frequencies (see Methods). For NET database (Fig. 7), some compounds classified in chemotype KZHH1 tend to form high number of activity cliffs, e.g., compounds 82, 140, 156, and 275. Note that this same chemotype also has high ACEF for NET database (Table 2). Similar figures highlighting chemotypes with activity cliff generators for SERT, DAT, COXs, and PPARs can be found in the Supporting Information as Fig. S1–S7. Additionally, Table 3 shows chemotypes that have at least one activity cliff generator, the number of compounds associated with each generator, and the relation \(\hbox {NAC}_{x\lambda }/\hbox {CF}_{\lambda }\) (factor). Table 3 also shows that chemotypes KZHH1 and CK1V9 comprise most of the activity cliff generators for MATs. Similarly, chemotype PP97T is rich in activity cliff generators for COXs databases. Furthermore, for COX-2 database, chemotype L2U5P is associated with three activity cliff generators. This result is in agreement with the SAS map presented in Fig. 6e, where several molecular pairs form deep activity cliffs (activity difference \(\ge \) 2 log units). For PPARs database, a low number of activity cliff generators were identified. The most important chemotype identified is X1R61, which is present in two activity cliff generators for PPAR\(\updelta \) and in one generator for PPAR\(\upgamma \).

Fig. 7
figure 7

Number of activity cliffs formed by each compound divided by the chemotype frequency for NET database; only chemotypes with frequency \(\ge \)10 were considered in this analysis. Compounds with the fraction [number of activity cliffs]/[chemotype frequency] higher than two standard deviations of the average are considered activity cliff generators. Different chemotypes are labeled by the corresponding MEQNUM and are represented with different colors

Table 3 Activity cliff generators for MATs, COXs, and PPARs databases

Figure 8 shows the chemical structure of a representative activity cliff generator (82) with chemotype KZHH1 and molecules that form activity cliffs with this compound for the NET set. All compounds in this figure are arranged in decreasing order of activity difference. Note that in deeper activity cliffs (82_156, 82_140, 82_275, 82_49 and 82_266), the most active compounds have specific stereochemistry (1R,2S,3R,5R) in the 8-azabicyclo[3.2.1]octane nucleus. Changes in this stereochemistry lead to a decrement in activity (compare compounds 204 and 275). Also, N-methylation appears to be unfavorable as is shown by the ‘shallow cliff’ 82_118 \(({\vert }\hbox {AD}{\vert } = 1.016)\) in comparison with the ‘deep cliff’ 82_275 \(({\vert }\hbox {AD}{\vert } = 3.108)\) [55]. Interestingly, some subtypes of activity cliffs (based on the classification proposed by Hu and Bajorath) can be observed in Fig. 8, i.e., 82_156 and 82_140 are examples of topology and R-group cliffs, respectively [14]. Additional conclusions can be generated by the analysis of different activity cliff generators, chemotype classes, and biological activities.

Fig. 8
figure 8

Chemical structure of compounds that form activity cliffs with the activity cliff generator 82 for NET database. Each compound is labeled with their identification number, biological activity \(\hbox {{ K}}_{{ i }}\)(nM), activity difference \(({\vert }\hbox {AD}{\vert })\), and mean similarity

Conclusions and perspectives

In this work activity landscape representations employing SAS maps annotated with chemotype information were generated for three structural diverse databases with multiple biological activities. The addition of chemotype information in these representations gave a general overview of the SAR nature related to each cyclic system for the databases analyzed. In addition, it makes feasible to easily quantify activity cliffs related to each chemotype. Abundance of activity cliffs in each chemotype was compared between chemotypes and entire databases employing a new index introduced in this work named activity cliff enrichment factor. Using this approach, some chemotypes were identified as riches in activity cliffs for MATs (i.e., KZHH1), COXs (i.e., PP97T), and PPARs (i.e., X1R61) databases. The ACEF values give information concerning to chemotypes with high and low SAR discontinuity and can provide useful information when choosing a scaffold for lead optimization and predictive studies; whereas discontinuous landscapes are rich in SAR information (i.e., activity cliffs), continuous landscapes can be especially useful in predictive approaches (i.e., QSAR). Also, the activity cliff generator concept was applied to compounds grouped in each chemotype to highlight structural features that lead to high changes in activity. Some chemotypes were identified as cyclic systems that usually form activity cliff generators for MATs (i.e., KZHH1 and CK1V9), COXs (i.e., PP97T), and PPARs (i.e., X1R61) databases. The SAR analysis of activity cliffs leads to specific SAR conclusions for the datasets studied. In this context, the inclusion of chemotype information in activity landscape representations aids the SAR interpretation and characterization of diverse molecular databases.