Introduction

Most of the activity/toxicity prediction methods and screening tools rely on the molecular similarity principles which suggest that compounds with similar structural features will exhibit similar activity/property/toxicity [1, 2]. The conventionally used methods like quantitative structure–activity/property/toxicity relationships (QSAR/QSPR/QSTR), pharmacophore mapping, ligand-based virtual screening, etc., are based on this basic assumption. While we aim to develop a set of training or source compounds of maximum structural diversity, we compare the similarity of the test or query compounds to some of the source compounds. The structure-based methods also are indeed based on similar principles when the similarity consideration is extended to ligand–receptor interactions in general. Molecular similarity provides a popular method for virtual screening with application of clustering methods on chemical databases. On the other hand, molecular diversity analysis explores the way of molecules to cover a determined structural space. Molecular similarity, in general, has three components: descriptors (or fingerprints), coefficients or indices, and a suitable weighting scheme [3]. Quantitative molecular similarity is expressed in terms of different indices or metrics which may eventually lead to clustering or grouping of compounds. One of the similarity-based methods, very popular in computational toxicology, is read-across which can be applied for both quantitative and qualitative predictions [4, 5] even when only limited amount of experimental data is available. Recently, a tie between read-across and QSAR methods has been implemented leading to a new field read-across structure–activity relationship (RASAR) which appears to be of much promise in predictive toxicology [6].

Animal-based toxicity studies are resource intensive and involve ethical considerations. Most of the experimental toxicity studies are related to the evaluation of lethal potential of chemicals and do not address sufficiently the mechanistic aspects. To address this issue, the scientific community has now inclined to adopt a mechanistic approach of Adverse Outcome Pathway (AOP) consisting of key events, molecular initiating events and adverse outcome [7]. Starting from a conceptual framework, AOPs have rapidly evolved into a formalized framework for organizing biological and toxicological knowledge as per a set of principles and guidelines that are generally accepted by the scientific and regulatory communities like Organization for Economic Co-operation and Development (OECD) [6,7,8]. To avoid the problems associated with animal experimentation, scientists now rely on computational toxicity prediction methods to bridge the data gaps [9], as it is practically impossible to gather information of toxicity of thousands of chemicals against hundreds of different endpoints. In this sense, computational prediction methods that also involve mechanistic explanation like consideration of AOPs appear to be very much promising [10]. Recently, machine learning based analysis of big data has led to the generation of RASAR models which have been claimed to outperform reproducibility of animal experimentations [5]. RASAR can be applied simultaneously to a large number of endpoints and can be related to multiple toxicological targets addressing molecular initiating events (MIEs) in the adverse outcome pathway of several toxicity endpoints [11]. Considering these aspects, RASAR appears to be a promising tool for developing expert systems of predictions of toxicity and ecotoxicity of drugs and organic chemicals in general.

The RASAR method has so far been used only in case of graded predictions or classification modeling. In this work, we attempt, for the first time, to apply RASAR for quantitative predictions (q-RASAR) using a case study of androgen receptor binding affinity data. Although data fusion RASAR allows the application of multiple endpoints simultaneously in an approach to relate them to different biological targets involving appropriate molecular initiating events in different adverse outcome pathways [10] (Fig. 1), we use here only the simple RASAR approach to model a particular endpoint of androgen receptor binding of endocrine disruptor chemicals. However, the strategy mentioned here should be extendable to more complex problems involving multiple endpoints. Please note that we have limited our analysis in the present work only to chemical similarity aspects, and we have not considered the biological adverse outcome pathways and/or biological similarities of compounds in this study.

Fig. 1
figure 1

Generalized RASAR algorithm linked with AOP and MIEs using chemical similarity-based approaches

Materials and methods

For this report, we have used a data set androgen receptor binding affinity (RBA) recently used by us for QSAR model development and chemical read-across predictions [12]. The RBA data were originally collected from the Endocrine Disruptor Knowledge Base (EDKB) database (https://www.fda.gov/science-research/bioinformatics-tools/endocrine-disruptor-knowledge-base), and chemical curation of the compounds was performed by the application of a KNIME workflow (https://sites.google.com/site/dtclabdc/) taking the single.sdf file as input. Further details of the SMILES notation of the compounds and observed RBA values along with other raw data are available in Supplementary Material SI-1. The current work uses chemical read-across predictions using the tool Read-Across ver. 4.0 (https://sites.google.com/jadavpuruniversity.in/dtc-lab-software/home) as originally reported in a previous publication [13]. The workflow of chemical read-across predictions is presented in Fig. 2. We have finally used the descriptors selected in the previous QSAR model as the important physicochemical measures of the compounds in addition to different similarity measures as described below for the q-RASAR analysis.

Fig. 2
figure 2

Workflow of the chemical read-across predictions

Computation of similarity measures

We have used here the same division of training (source) and test (query) sets as used in our previous analysis [12]. However, one compound (no. 187) in the training set was an outlier and structurally significantly dissimilar from rest of the training compounds. Thus, while computing similarity metrics, its close congeners could not be found and hence this particular compound was not considered for q-RASAR model development. This may be considered as an advantage of the q-RASAR methodology to identify outlier compounds as influential observations enabling further refinement of the models to derive statistically more meaningful relationships. We have derived read-across predictions for the query compounds (test set) as reported in the previous work. Our read-across tool generates, in addition to read-across predictions, various similarity and error measures such as standard deviation and coefficient of variation of the activity of similar training compounds for each query compound, average and standard deviation of similarity levels and their coefficient of variation of similar training compounds (up to 10 in number) to each query compound, maximum similarity level to positive and negative compounds (based on the “training set” response mean), a concordance measure indicating similarity to positive, negative or both classes of close training compounds [11, 14], etc. as detailed in Table 1. We have used these measures along with the selected structural and/or physicochemical features (Table 2) as input for the q-RASAR analysis. Please note that the read-across tool reports the similarity measures for the query set compounds. In order to prepare the similarity descriptor matrix for the source compounds, the training test itself was used as a test set, and the derived descriptors were used for subsequent q-RASAR model development. Here, the development of the similarity measures represents the unsupervised step (without using the response values) followed by application of the supervised learning step (statistical modeling) like QSAR. Although Read-Across v 4.0 computes similarity based on Euclidean distance, Gaussian kernel and Laplacian kernel-based functions, we have used here different measures computed from Gaussian kernel (GK)-based function only, as this approach gave the best read-across predictions in the previous analysis on this data set [12].

Table 1 List of similarity and various error measures generated for each query compound during read-across predictions
Table 2 List of physicochemical features selected from the previously reported QSAR model [12]

q-RASAR model development

In this work, we have used simple linear model building for easy interpretation of the selected features and easy transferability. The pooled set of descriptors was subjected to best subset selection using the tool available from http://teqip.jdvu.ac.in/QSAR_Tools/, and a number of MLR models were selected based on their balanced performance in the internal validation and external validation and maintaining the diversity with respect to the selected descriptors in the final models. The descriptors selected in these models were then separately subjected to Partial Least Squares (PLS) regression. PLS being a generalized and more robust version of multiple linear regression, we have relied on this approach for the final model development [15]. PLS is able to handle a higher number of descriptors in the final model without compromising with the degree of freedom by using latent variables (LVs), and it can handle noisy data in a better way than MLR. The number of LVs was selected based on the leave-one-out (LOO) cross-validation [16]. PLS results can be presented in an MLR like equation with the advantage that quantitative contribution of each appearing descriptors can be explicitly depicted which helps the user to make a clear diagnosis of underlying mechanism for the response being modelled. We have also avoided here using any machine learning regression technique to maintain simplicity and transferability of the developed models for its wide usability.

The flowchart of the present work is shown in Fig. 3.

Fig. 3
figure 3

The generalized q-RASAR algorithm

The quality of the q-RASAR models has been evaluated based on the traditional quality and validation metrics of QSAR models like determination coefficient (R2), mean absolute error (MAE), leave-one-out cross-validated determination coefficient (Q2LOO), external predicted variance with different variants (R2pred or Q2ext_F1, Q2ext_F2, Q2ext_F3), etc. [17]. We have also applied MAE-based criteria for external predictions as described previously [18]. The PLS models have been explained and interpreted using various plots like score plot (allocation of the compounds in the LV space showing their distribution and similarity/diversity among the compounds), loading plot (the loadings of individual descriptors into the first two latent variables are plotted and the distance of the X-variables from the origin correlates with the importance of the descriptor), randomization plot (a plot of R2 and Q2 values of the random models (Y-axis) vs. the correlation coefficient between the original Y-values and the permuted Y values (X-axis)), scatter plot (a plot of predicted response values (Y-axis) versus observed response values (X-axis)), applicability domain (AD) plot using distance to model in X space (DModX) approach, etc. [15].

After development of individual q-RASAR models, we have then attempted to pool the descriptors appearing in the best models to develop new models in an attempt to further enhance the quality. Attempt has also been made to derive intelligent consensus predictions from the individual models as these have previously proved to enhance the quality of external validation [19].

Software used in the q-RASAR modeling study

The computation of different error and similarity-based measures was done using the tool Read-Across v4.0 available freely from https://sites.google.com/jadavpuruniversity.in/dtc-lab-software/home. The best subset selection, PLS regression, and intelligent consensus predictions were done using MLR Best Subset Selection, Partial Least Squares and Intelligent Consensus Predictor tools available freely from http://teqip.jdvu.ac.in/QSAR_Tools/. The PLS plots were done using SIMCA-P v10.0 software (https://www.sartorius.com/) and the bubble plots were done using SigmaPlot v11 (http://www.sigmaplot.co.uk/).

Results and discussion

Four individual PLS models were finally derived from the chemical feature-based descriptors along with similarity measures, and these are presented in Table 3 and Fig. 4. The statistical quality and validation measures of the models are shown in Table 4 and Fig. 5. The regression coefficient bubble plots (bubble size is proportional to individual VIPs [20]) are shown in Fig. 4. It is evident from Fig. 4 that similarity-based measures show higher importance in the models in most of the cases while chemical descriptors such as N%, F10[C-O] and minsOH are less important descriptors as per the VIP values. This is also evident from the loading plots (Fig. 6) where in most of the cases MaxPos resides near logRBA (and away from the origin) while MaxNeg appears in the opposite side of the X-axis. The chemical descriptors F10[C–O], N% and minsOH are located near the origin suggesting their less significant contributions to the models. The score plots show that most of the compounds are within the applicability domain (Figure S1 in Supplementary Materials SI-2) as also evident from the AD plot based on the distance to model in X space (DModX) approach (Figure S2 in Supplementary Materials SI-2). Model randomization plots with low R2intercept and Q2inetrcept also reveal that the models are not derived by chance (Figure S3 in Supplementary Materials SI-2). The scatter plots show that there is good concordance between the observed and predicted values (Fig. 7).

Table 3 List of q-RASAR models
Fig. 4
figure 4

Bubble plot for regression coefficients of Models a M1, b M2, c M3 and d M4 [bubble size is proportional to individual VIP]

Table 4 Statistical quality and validation metric values of various q-RASAR models (nTraining = 102, nTest = 44) and comparison with previous QSAR and read-across predictions (nTraining = 103, nTest = 44) and also with works done by other researchers
Fig. 5
figure 5

Comparison of prediction quality of q-RASAR models

Fig. 6
figure 6

Loading plots of models a M1, b M2, c M3, d M4

Fig. 7
figure 7

Scatter plots of models a M1, b M2, c M3, d M4

Comparison of the quality of q-RASAR models

We have developed four individual PLS q-RASAR models (M1, M2, M3 and M4) which are robust and predictive superseding the quality of our previous predictions using QSAR and Read-Across methodologies [12] in terms of the quality of external validation metrics (Table 4). However, in terms of the internal validation, the \(R^{2}\) and \(Q_{{\left( {{\rm LOO}} \right)}}^{2}\) values are comparable to the previous QSAR model. It is also important to note that the \({\text{MAE}}_{{\left( {{\rm TEST}} \right)}}\) values of these q-RASAR models are lower than our previous 2D-QSAR model; however, the \({\text{MAE}}_{{\left( {{\rm TEST}} \right)}}\) value of the previous Read-Across predictions was lower than the present individual q-RASAR models. We may note here that the Read-Across methodology does not involve the development of any model and thus interpretation of quantitative contributions of various contributing factors is not possible in Read-Across, but this can be done in case of q-RASAR models like any QSAR model. Therefore, one of the advantages of q-RASAR models over read-across predictions is the interpretability of physicochemical features along with their quantitative contributions. In search of a more improved model, we have pooled the descriptors and developed three different pooled descriptor PLS models (P1, P2, P3), and we found that the pooled descriptor PLS model P2 had better internal validation metric values as compared to the previous QSAR model, and there was an overall improvement observed in the internal validation of the pooled PLS models as compared to the individual PLS models (Table 4). Also, the predictivity of model P1 was even better than all of the individual PLS q-RASAR models as well as the previous 2D-QSAR and Read-Across approaches. The \({\text{MAE}}_{{\left( {{\rm TEST}} \right)}}\) values of the pooled descriptor PLS models were also significantly lower than the individual PLS models and the previous QSAR model [12]. We have then applied an intelligent consensus prediction method [19] in order to check the predictivity of the individual PLS models. This method selects the best model for a particular query compound using different consensus-based prediction methods with enhanced predictivity and reduced \({\text{MAE}}_{{\left( {{\rm TEST}} \right)}} \) values. The consensus-based predictions have the advantage of utilization of a greater number of features as compared to a single PLS model thus handling the complexity of a data set in a much more efficient way, which ultimately increases the applicability domain in terms of chemical space and leads to enhanced predictivity with reduced prediction errors. In addition, the shortcoming of an individual model may be overcome by the use of another model. Only those individual models qualify for a particular query compound in case at least three compounds from the training set have their Euclidean Distances with respect to the query compound within the threshold. This threshold value is derived from the training set compounds, and it corresponds to the mean Euclidean Distance + k × SD, where k = 3. Consensus Model 1 (CM1) is developed using the mean of predictions from all qualified individual models. Consensus Model 2 (CM2) is derived from the weighted average predictions (WAPs) from all qualified individual models. Consensus Model 3 (CM3) is involved in the compound-wise best selection of predictions from individual models [19]. Among the ICP models thus generated, the consensus model 3 (ICP3) shows a significant reduction in the \({\text{MAE}}_{{\left( {{\rm TEST}} \right)}}\) which is better than the individual PLS models, pooled PLS models, previous QSAR approach and even the Read-Across predictions.

We have made an attempt here to develop simple, transferable, interpretable and reproducible q-RASAR models using a combination of physicochemical and similarity-based descriptors. Using similar number of data points, Hong et al. [21] in 2003 utilized the Comparative Molecular Field Analysis (CoMFA) technique (a 3D-QSAR approach), and the resultant quality of the internal validation metric thus obtained was q2 = 0.571 (while r2 = 0.902). Due to this large difference in the r2 and q2 values, it can be inferred that their model is not robust. Moreover, it is evident that the technique is non-reproducible as CoMFA involves conformational analysis and alignment of the compounds. These authors [21] did not report external validation metric values for their model. The application of binary and multi-class classification techniques was done by Piir et al. [22] in 2021, but the limitation of their approach is that it yields qualitative or graded predictions only, whereas our novel q-RASAR models are able to generate quantitative predictions. The present q-RASAR models are robust (due to a very low difference between the q2 and r2 values), reproducible (one can easily recreate the models as there are no involvement of conformational analysis), predictive (their good predictive ability is reflected in their external validation metrics) generating quantitative endpoint prediction values, and hence, they supersede the previous models in all aspects.

Interpretation of q-RASAR models

The descriptor minsOH signifies minimum atom-type E-State for the hydroxyl group, and it contributes positively to the binding affinity in the developed models. A proper justification to this can be asserted by taking into the fact that the hydroxyl group at C-17 of the steroid nucleus has the ability to form a hydrogen bond with the Asn705 residue on the androgen receptor as supported by the work of Lill et al. [23], Bennett et al. [24] and also our recent molecular docking studies [12]. Moreover, a higher value of minsOH descriptor signifies that the minimum E-state value to the OH group of a particular compound is higher than most other compounds, which in turn signifies that the compound of interest has a large hydrophobic moiety attached to the OH group, which also contributes positively to the receptor binding affinity. The compound Milbolerone (193) has a higher minsOH descriptor value and thus has a higher receptor binding affinity, whereas diethyl phthalate (34) is devoid of OH groups and is much smaller in size and thus has a very low receptor binding affinity. The descriptor F10[C–O] stands for the frequency of Carbon and Oxygen atoms at the topological distance of 10, and this descriptor contributes negatively to the androgen receptor binding affinity of endocrine disruptors. The descriptor signifies the presence of functionalities like ether, hydroxyl and ester groups, which induce polarity and contribute to faster elimination of the molecules thus lowering their receptor binding affinity. It is important to note that the hydroxyl group in the form of minsOH contributes positively to the receptor binding affinity but their position should be such that they are able to form hydrogen bonding with the receptor residues. Presence of OH group at positions other than the 17th carbon atom of the steroid nucleus does not allow the formation of a hydrogen bond and thus such groups only induce polarity. This can be exemplified in compounds like Cortisol (63) where there is presence of OH groups at positions other than C-17 of steroid nucleus and thus shows a reduced receptor binding affinity while compounds like o, p| -DDD (84) are devoid of polar functionalities and thus have a higher receptor binding affinity value. nCconj is a functional group count descriptor, which stands for the number of non-aromatic conjugated carbon atoms which are sp2 hybridized, and it also contributes positively to the receptor binding affinity in the developed models. With respect to the steroid nucleus, the descriptor nCconj signifies the importance of the conjugated enone moiety in Ring A. The presence of a keto group at C-3 acts as a hydrogen bond acceptor to the receptor residue as shown by Banerjee et al. and others using molecular docking studies [12, 24]. The compound R1881 (207) possesses a conjugated enone moiety in Ring A and thus has a higher receptor binding affinity while compounds like Linuron (194), which are devoid of the enone moiety and the steroid nucleus, possess a much lower receptor binding affinity. The descriptor SsssCH stands for sum of sssCH E-state indices, and it encodes the presence of steroidal structures in the dataset. The bulkiness of the cyclopentanoperhydrophenanthrene nucleus induces hydrophobicity in the molecule, and thus SsssCH contributes positively to the receptor binding affinity. This can be shown in compounds like 5α-Androstan-17β-ol (23), which possesses a higher SsssCH value and thus has a higher receptor binding affinity while compounds like 1-methoxy-4-[1-propenyl]benzene (57) is devoid of a steroid nucleus and thus possesses a low receptor binding affinity. MaxaaCH descriptor denotes the maximum aaCH E-state indices, and it signifies the presence of CH groups in an aromatic ring. Since aromaticity in a molecule decreases its hydrophobicity due to the formation of induced dipoles, this descriptor contributes negatively to the receptor binding affinity. This can be exemplified in molecules like 4-(3,5-Diphenylcyclohexyl)phenol (61) which has a higher MaxaaCH value but a lower receptor binding affinity while compounds like Trenbolone (157), which does not have aromatic rings, possesses a much higher receptor binding affinity.

N% denotes the percentage of Nitrogen in a particular compound, and this is shown to have a negative effect on the receptor binding affinity. In the previous study [12] using the same dataset, N% contributed positively to the androgen receptor binding affinity. This can be explained from the Variable Importance Plot of Model 1 that the descriptor MaxPos(GK) (similarity value of the closest positive source compound to the target compound) has the highest importance while N% has a smaller importance, and compounds like Hydroxylinuron (192), apart from possessing the highest MaxPos(GK) value, also possesses Nitrogen in their structures, whose small but significant importance is already included in the MaxPos(GK) descriptor itself, and thus it results in positive contribution toward the receptor binding affinity. Hence, in order to balance the importance, N% shows a negative contribution in the developed RASAR models. The descriptor MaxNeg(GK) demonstrates the similarity value of the closest negative source compound to the target compound. A higher value of MaxNeg(GK) indicates that the target compound is very similar to such a source compound whose activity value is less than the threshold value. This indicates that there is a high probability that the target compound has low receptor binding affinity. Thus, the descriptor MaxNeg(GK) contributes negatively to the receptor binding affinity. The compound Bis(n-octyl) phthalate (114) has the highest MaxNeg(GK) value while its MaxPos(GK) value is very low, and thus this compound has a very low receptor binding affinity. On the other hand, the compound Trenbolone (157) has the highest MaxPos(GK) value but its MaxNeg(GK) value is much lower, and so this compound possesses a higher receptor binding affinity. In compounds like 5,6-Didehydroisoandrosterone (139), the MaxNeg(GK) value is only slightly greater than MaxPos(GK) value and thus the observed response value is only slightly lower than the threshold or it can be termed as a marginally inactive compound. The descriptor Avg.Sim(GK) indicates the average similarity value of the close “n” source compounds with respect to a particular target compound where n is a positive integer value which is given as a user input in Read-Across-v4.0, and this descriptor contributes negatively to the developed models. This can be explained by careful observation of the data structure where there is a greater number of negative compounds as compared to the positive ones (with respect to the biological activity threshold) among the close “n” source compounds. A higher overall average similarity value of the close “n” source compounds indicates that there is a greater chance that the target compound is also negative. The compound 3-Chlorophenol (52) has a high Avg.Sim(GK) value, and thus it possesses a very low receptor binding affinity. If we study the first ten close source compounds for 3-Chlorophenol, we find that nine of them have their activity values below the threshold with their similarity levels ranging from 1 to 0.69. Thus, we may conclude that 3-Chlorophenol, which has very high similarity values with negative compounds, is less active, which is also evident from its experimental receptor binding affinity data. The descriptor SD_Similarity(GK) denotes the standard deviation of the similarity values of the close “n” source compounds, and this descriptor contributes positively to the receptor binding affinity values. The larger deviation of the similarity values indicates that some of the similarity values are very low which justifies that the target compound has properties to be positive. The compound R1881 (207) has a high SD_Similarity value, and this is evident from its MaxPos(GK) value which is the highest in its class and the value for MaxNeg(GK) is very low, and this compound possesses a high receptor binding affinity. In contrast, compounds like 4/-Chloroacetanilide (87) has a high SD_Similarity value and a high MaxNeg(GK) value but low MaxPos(GK) value exhibiting a much lower receptor binding affinity. The descriptor SD_Activity(GK) stands for the standard deviation of the biological activity values of the close “n” source compounds [25], and this contributes positively to the response value. The compounds having lower SD_Activity(GK) values tend to have their activities close to the average response value (threshold) as in p-p/-DDE (203), and such compounds cannot be conclusively classified as active or inactive while compounds having a higher SD_Activity(GK) value along with higher MaxPos(GK) and lower MaxNeg(GK) values as in 11-keto testosterone (108) are likely to have a higher receptor binding affinity. In case of compounds like Bis(n-octyl) phthalate (114) where the SD_Activity(GK) is high but MaxPos(GK) is low and MaxNeg(GK) is high, we may suggest that the descriptor SD_Activity(GK) balances the contribution of MaxPos(GK) and MaxNeg(GK). The descriptor g(GK) is a concordance measure whose value ranges from 0 to 1. Compounds having low values of g(GK) implies that the close source compounds and the corresponding target compound are either active or inactive. In the case of Milbolerone (193), where the values of g(GK) is low, MaxNeg(GK) value is either low or insignificant but the value of MaxPos(GK) is high, and the compound shows a higher receptor binding affinity which explains the negative contribution of g(GK) in two of the developed models (M1 and P1). Similarly, in the case of Flavanone (182) where the values of g(GK) and MaxPos(GK) are low but the value of MaxNeg(GK) value is maximum, the compound exhibits a much lower receptor binding affinity. This probably explains how g(GK) has positive regression coefficients in the other two developed models (pooled descriptor models P2 and P3).

A new concordance measure g m

The one drawback that arises in the use of g is that when the value of g is zero, it is unable to differentiate whether the close source compounds are positive or negative (Fig. 8). This is because when all the close source compounds are positive, i.e. when the positive fraction corresponds to 1, the value of g becomes 0. Similarly, when all the close source compounds are negative, the positive fraction corresponds to 0 and the value of g also becomes 0. In addition, g attains the same value at a particular level of either PosFrac or NegFrac (Fig. 8). Thus, to obviate this ambiguous nature of g, we have introduced a novel modified expression of g (let us call it gm or Banerjee-Roy coefficient), which is able to differentiate the maximum similarity to either positive or negative compounds, and the corresponding equation for calculating gm is:

$$ g_{m} = \left( { - 1} \right)^{n} \times 2\left| {{\rm PosFrac} - 0.5} \right| $$

here n is a positive integer value which is either 1, when MaxPos < MaxNeg, or 2, when MaxPos >  = MaxNeg. Note that here the directionality is applied with respect to the MaxPos or MaxNeg values and not with respect to the PosFrac value [14]. This is because read-across predictions are made using the Read-Across ver 4.0 tool [13] as the weighted average predictions, and the compounds with higher level of similarity have higher weightage. Thus, a query compound with the highest level of similarity to a positive compound may have lower levels of similarity to several negative compounds. However, higher contributions to the read-across predictions are made by compounds with higher level of similarity in the weighted average prediction formula [13].

Fig. 8
figure 8

Ambiguous values of g

The measure gm (GK) is also a concordance measure with values ranging from − 1 to + 1, and the novelty here is that the gm (GK) value of a particular target compound is positive only when the most similar source compound is positive with respect to the activity threshold and the value becomes negative when the most similar source compound is negative. Thus, compound 176 with the PosFrac value of 0.7 has a gm value of − 0.4 (as MaxNeg > MaxPos) while compound 126 with the NegFrac value of 0.7 has a gm value of + 0.4 (as MaxPos > MaxNeg) while both of them have the g value of 0.6 (Fig. 8). Model P1a is developed by modification of the g value based on the directionality applied with respect to MaxPos and MaxNeg, and we have denoted this modified g (GK) value expressed as gm (GK) in the equation.

$$ \log RBA = - 1.21 - 1.31{\text{MaxNeg}}\left( {{\text{GK}}} \right) + 0.58g_{m} \left( {{\text{GK}}} \right) + 0.21{\text{MaxPos}}\left( {{\text{GK}}} \right) + 2.23{\text{SD}}\,\, {\text{Similarity}}\left( {{\text{GK}}} \right) - 0.67{\text{Avg.Sim}}\left( {{\text{GK}}} \right) + 0.06\min {\text{sOH}} - 0.10{\text{N\% }} - 0.13F10\left[ {C - O} \right] $$
(P1a)
$$ n_{{{\text{Training}}}} = 102\ n_{{{\rm Test}}} = 44\ {\text{LV}} = 4 $$
$$ R^{2} = 0.753\ Q_{{\left( {{\rm LOO}} \right)}}^{2} = 0.698\ Q_{F1}^{2} = 0.674\ Q_{F2}^{2} = 0.674\ {\rm MAE}_{{\left( {{\rm TEST}} \right)}} = 0.461 $$

We find that the values of the internal validation metrics are better than the previous QSAR and all of the developed q-RASAR models except P2, while the external validation metric values were the best among all the developed q-RASAR models as well as the previous QSAR and Read-Across predictions (Table 4). Another important aspect is that the corresponding \({\text{MAE}}_{{\left( {{\rm TEST}} \right)}}\) value of model P1a is the least among all of the individual and pooled descriptor RASAR models, the consensus-based predicted models, the previous QSAR and Read-Across predictions. Evidently, this particular descriptor (gm) contributes positively to the receptor binding affinity of a particular target compound. It is also essential to note that the importance of gm(GK) is much higher as compared to the g(GK) values of all the other models and this is reflected in the Variable Importance Plot of P1a (Figure S4 in Supplementary Materials SI-2). In contrast, the importance of the previous g(GK) descriptor is much lower as evident from the Variable Importance Plots of M1 (Fig. 4), P1, P2 and P3 (Figures not shown). This indicates the impact of this newly developed descriptor gm(GK) in influencing the receptor binding affinity of a particular compound. It is expected that gm will also make a major contribution while modeling other endpoints using the RASAR technique.

It is also evident that the contributions of the measures SD_similarity, SD_activity and g are dependent on whether the MaxPos value is higher than MaxNeg or vice versa. Thus, contributions of these measures should be explained in a relative sense along with MaxPos and MaxNeg values. A heat map of the scaled (0–1) values of MaxPos, MaxNeg, gm and logRBA (observed) values of the query compounds is shown in Fig. 9.

Fig. 9
figure 9

Heat map of the scaled (0–1) values of MaxPos, MaxNeg, gm and logRBA (observed) values of the query compounds

Conclusion

The importance of QSAR in designing bioactive chemicals and drugs has been well established to the biological chemistry community though classical examples Hansch analysis, Free-Wilson model, Fujita-Ban modification, topological descriptors, etc. followed by gradual evolution of high dimensional QSARs [26, 27]. The concept of read-across has mainly been applied in regulatory toxicology for data gap filling. RASAR modeling has evolved by combining the concept of read-across and QSAR leading to the generation of statistical relationships (QSAR-like) with similarity-based measures (like read-across) along with physicochemical descriptors. The previously reported RASAR models have mainly been applied for classification-based problems, and they have provision to consider multiple endpoints simultaneously with the application of machine learning techniques (data fusion RASAR) which may eventually be related to relevant AOPs with underlying MIEs. In the present communication, we have developed, for the first time, quantitative RASAR (q-RASAR) models taking a single data set of androgen receptor binding affinity. These models could supersede the previously developed QSAR models in the quality of external validation. q-RASAR models have the advantage over the read-across approach in their ability to make convincing conclusion about the quantitative contributions of different features toward the quantitative response values and thus to enhance the interpretability of the physicochemical basis of the response. The workflow can also identify potential outliers in the training set having less similar congeners in the data set thus helping in developing more robust models. It is also possible to apply the concept of consensus modeling as has been done in the present study demonstrating further enhancement of external validation quality even over read-across predictions. We have also developed in the current study a novel concordance measure (gm) which showed a significant importance in the q-RASAR models. The model generated using this measure outperformed all other models (other q-RASAR models, previous QSAR and read-across predictions) in the external validation metric values. It appears that this concordance measure should further be studied in a greater detail for its potential application in q-RASAR modeling involving other endpoints of pharmaceutical, property, toxicological and other diverse applications. The q-RASAR strategy described in this study should be extendable to more complex problems involving multiple endpoints possibly with the application of more sophisticated machine learning techniques.