Introduction

Toxoplasma gondii is an obligate intracellular coccidian parasite of the phylum Apicomplexa that can infect any nucleated cell of a very wide range of warm-blooded vertebrates, including, carnivores, omnivores, and herbivores, as well as humans throughout the world, causing toxoplasmosis [1, 2]. According to global investigation estimated, approximately 30% of the human population is chronically infected with toxoplasmosis specifically in developing countries. T. gondii in different hosts such as humans according to parasite strain and the immune status of the host cause infection with a range from asymptomatic to severe manifestations, therefore toxoplasmosis in immunocompetent persons is usually asymptomatic or associated with self-limited symptoms and rarely needs treatment [3, 4], but in immunocompromised individuals with primary or acquired deficiencies in T cell function such as AIDS patients and patients undergoing therapies for malignancies, transplants, or lymphoproliferative disorders because of reactivation of a latent infection, it may cause severe diseases ranging from encephalitis, myocarditis, pneumonia, hepatitis, and severe ocular disease to death [5]. Also in fetuses and children, congenital toxoplasmosis according to infected time of seronegative pregnant mothers may cause death of the fetus in the uterus or disorders such as ocular and neurologic complications in surviving infants [6]. Furthermore, T. gondii causes considerable economic loss and damage to the livestock industry, mainly among food-producing animals. Therefore, this protozoan has great importance in both public health and livestock husbandry. Nevertheless, so far, fully effective, safe, and efficacious vaccine is not produced for prevention of transmission and reducing serious complications of toxoplasmosis in humans and animals. The therapy for this disease has not changed in recent years, despite the tragic consequence of toxoplasmosis in humans. The current treatment for toxoplasmosis is a combination of antifolates (pyrimethamine and trimethoprim with a sulfonamide drug (sulfadiazine)) that display numerous and serious side effects such as hypersensitivity, hematological toxicity, teratogenicity, and allergic reactions [7] (Fig. 1).

Fig. 1
figure 1

DHFR inhibitors in the clinic for T. gondii infections

So, the discovery and development of less-toxic and more-efficacious parasite-specific compounds becomes crucial for blocking any stage of the parasite’s life cycle in humans or in a different type of hosts [8]. Among the heterocyclic systems, thiazolidine-4-one is a biologically important scaffold for potential drugs and drug candidates such as antiviral, antibacterial, antifungal, antihistamine, and anti-inflammatory drugs. The thiazolidin-4-one analogs present promising pharmacological properties not only for the treatment of T. gondii infections but also for its high selectivity level with a high therapeutic index [9, 10].

The apical complex structure of T. gondii has served essential functions in both invasions of its host cells by attachment or penetration and in replication of the parasite. This structure includes three important secretory organelles known as rhoptries, micronemes, and dense granules. Rhoptries are morphologically club shaped with anterior duct (neck) and the posterior bulb that contain a conserved serine/threonine protein kinase domain and approximately constitute 1–30% of the total Toxoplasma cell volume. The rhoptry proteins (ROPs) have an important role in the multiple stages of the T. gondii invasion and also critical for survival within host cells (e.g., family ROP2, ROP4, ROP7, and ROP8) [11]. ROP8 is considered a type I transmembrane and has a conserved serine/threonine (S/T) kinase domain and contain in its cytoplasmic tails both tyrosine-based and dileucine-sorting signals. This protein expressed in the bradyzoites and tachyzoites stages of T. gondii, therefore, is a main factor of T. gondii acute virulence and has a key role in the parasitophorous vacuole (PV) formation [12].

Quantitative structure-activity relationship (QSAR) is a technique that is used in computer-assisted rational drug design and predicts the protein-ligand interaction and to explore the correlation between biological activity and molecular structure [13,14,15]. Three-dimensional QSAR (3D-QSAR) is a broad term encompassing all those QSAR methods which are utilized to calculate the highly specific interactions and a molecule, how far and with how much power can be connected to the active site of an enzyme or protein [16, 17]. Recently, comparative molecular field analysis (CoMFA), CoMFA region focusing (CoMFA-RF), comparative molecular similarity index analysis (CoMSIA), and hologram QSAR (HQSAR) are especially effective methods of QSAR based on statistical techniques [18,19,20,21,22,23].

In the present study, we performed a molecular modeling study by combining 2D- and 3D-QSAR, molecular docking, and molecular dynamics (MD) simulation techniques. 2D-QSAR, using HQSAR method, and 3D-QSAR, using CoMFA, CoMFA-RF, and CoMSIA methods, were used to identify the key structural factors influencing inhibitory activity. Molecular docking was used to identify some key amino acid residues at the active site of ROP8 protein and investigate the binding modes between ROP8 and the selected inhibitors. MD simulations were employed to determine the detailed interactions in ROP8 protein and validate the rationality of docking results. The obtained results can apply to the further structural modification, design and develop new and more potent anti-toxoplasma drugs.

Materials and methods

Data set

QSAR studies were performed on a set of 59 thiazolidin-4-one derivatives as a new class of anti-T. gondii agents with their biological activities (IC50 values) that are recently reported by Tenorio, de Aquino, Carvalho, Liesen, and Carradori groups [9, 24,25,26,27].

These activity values (IC50 in μM) were converted to corresponding pIC50 (−log IC50) values and used as a dependent variable in CoMFA, CoMFA-RF, CoMSIA, and HQSAR models. The data set was randomly divided into a training set (44 compounds, 75%) for QSAR model generation and a test set (15 compounds, 25%) for external validation of the models (Fig. 2).

Fig. 2
figure 2

Distribution of experimental inhibitory activities (pIC50) for the training and test sets compounds in the QSAR models

Molecular modeling and alignment

The QSAR models including CoMFA, CoMFA-RF, CoMSIA, and HQSAR were performed using the SYBYL-X 1.2. molecular modeling software (Tripos, Inc., St. Louis, MO). Before modeling with these primary methods, the 3D structures of compounds were drawn using Chemoffice Bio 3D Ultra (version 12.0, Cambridge Soft Corporation, Cambridge, UK, 2010). All the compounds were energy minimized using the standard molecular mechanics force field with a distance dependent dielectric and the Powell conjugate gradient algorithm with a convergence criterion of 0.05 kcal/molÅ using the maximum iteration set to 5000 [28]. Partial atomic charges of the compounds for electrostatic interactions were calculated by the Gasteiger-Hückel method. Structure alignment was one of the most important input variables in 3D-QSAR analysis, and the accuracy of the prediction power of the models was reliability dependent on contour maps according to the structural alignment of the molecules. In this study, rigid body alignment of molecules in a Mol2 database was performed using maximum common substructures defined by Distill alignment. Compound 56 was selected as template because the most active compound of the data set and other compounds were aligned according to the common structure. The structure of compound 56 with bold red common substructure and final super imposition of compounds are shown in Fig. 3a, b.

Fig. 3
figure 3

Compound 56 used as the template molecule for database alignment and common substructure in distill alignment shown in the bold red (a) and aligned compounds in the training and test sets (b)

CoMFA and CoMSIA analysis

The CoMFA model (by Cramer et al.) describes the molecular properties by steric (Lennard-Jones) and electrostatic (Coulomb) energy fields of important regions of a set of aligned compounds that predict their biological activity over a lattice of point [29,30,31]. In CoMFA-RF model, steric and electrostatic fields are calculated for aligned fragments by creating specific grid space at the specific lattice points [32]. In CoMFA method, the aligned molecules in optimal orientation were located in a 3D cubic lattice with grid spacing of 2 Å in the x, y, and z direction which extended 4.0 Å around the align molecules in all Cartesian directions. The CoMFA steric and electrostatic fields were calculated for each molecule using a hybridized sp3 carbon probe atom with a Vander Waals radius of 1.52 Å and a charge of + 1.0. The Coulomb and Lennard-Jones potential functions were used to estimate the electrostatic and steric interactions, respectively. The energy cutoff values for both steric and electrostatic fields were set at 30 kcal/mol. In order to reduce noise and improve efficiency, column filtering was tested in the range of 0.0 to 2.0 kcal/mol and a threshold column filtering value of 2.0 kcal/mol. CoMFA-RF in the “Advanced CoMFA” module is a technique of application of weight to the lattice point in a CoMFA region to increase or decrease the contribution of these points to subsequent analysis. “StDev*Coefficients” values as different weighting factors were employed in addition to grid spacing for getting the better models. This increases the resolution and predictive capability (q2, cross-validated r2) of a followed partial least squares (PLS) analysis [33].

In the CoMSIA model, proposed by Klebe et al., a probe atom is used to calculate similarity indices, at regularly placed grid points for the aligned molecules. Compared with CoMFA, CoMSIA uses a Gaussian-type distance-dependent function to assess five fields of different physicochemical properties (i.e., steric, electrostatic, hydrophobic, hydrogen-binding donor, and acceptor [34, 35]. Also, CoMSIA is differentiated by distance-related Gaussian functions and no arbitrary definitions of cut off limits should be used.

The CoMSIA method calculates the similarity indices descriptors with the same lattice box used in CoMFA. Five physicochemical properties of steric, electrostatic, hydrophobic, hydrogen-binding donor, and acceptor fields were evaluated using a probe atom with to charge + 1.0, radius 1 Å, hydrophobicity + 1.0, hydrogen-binding donor + 1.0, hydrogen-binding acceptor + 1.0, attenuation factor α of 0.3 and grid spacing 2.0 Å. A distance-dependent Gaussian type was used between the probe atom and each molecule atom [28, 36]. In these models, all regression analyses performed in two steps using the PLS method [37,38,39,40,41,42].

HQSAR analysis

Hologram QSAR study is a 2D-QSAR technique which provides certainty to the relationship between the biological activity with the structural fragments which employs the fragment fingerprints of molecular holograms and other molecular descriptors to predict the biological activity of a series of molecules [43,44,45]. Hologram QSAR study is a 2D-QSAR technique which certain the relationship between the biological activity with the structural fragments. This method eliminates the need for 3D structure, the ability to achieve molecular alignment and conformational specification [46, 47] by transforming the chemical representation of a molecule into its corresponding molecular hologram. 2D chemical database storage and searching technologies rely on linear notations that define chemical structures (Wiswesser line-formula notation (WLN), simplified molecular input line entry system (SMILES); SLN-SYBYL line notation). The process involves the generation of fragments that are hashed into the array called molecular hologram, and bin occupancies are the descriptor variable [46, 48].

The HQSAR method employs different parameters for the molecular hologram generation, such as hologram length (HL) values (53, 59, 61, 72, 83, 97, 151, 199, 257, 307, 353, and 401), a fragment distinction (atom (A), bonds (B), connections (C), hydrogen atoms (H), chirality (Ch), and donor and acceptor (DA), and the fragment size (2–5, 3–6, 4–7, 5–8, 6–9, 7–10).

Partial Least-Square analysis

In 3D-QSAR studies, PLS method [19, 49] which is an extension of multiple regression analysis was used for the model building. Calculated CoMFA and CoMSIA descriptors as independent variables were used with the pIC50 values as dependent variables in the PLS regression analysis, respectively. Before the PLS analysis, the CoMFA and CoMSIA columns were filtered by using column-filtering value equal to 2.0 kcal/mol. The predictive ability of the models was evaluated by leave-one-out (LOO) and leave-ten-out (L-10-O) methods. LOO cross-validation method was used as an internal validation to determine the number of components that yields an optimal predictive model. A component is a linear combination of the explanatory, independent column data. Unlike the independent explanatory variables themselves, the components are mutually orthogonal to one another. This number of components can only be as large as the number of degrees of freedom in the dataset. The optimal setting for un-cross-validated analyses is generally determined by preliminary cross-validated runs. The default setting of 6 is a reasonable place to start for cross-validation analyses.

The optimal number of principal components (ONC) was obtained by applying leave-one-out cross-validation, which was then utilized to derive the final QSAR models. The optimal number of components corresponds to one that produces the lowest standard error of prediction (SEP) and the highest cross-validated coefficient q2 (\( {r}_{cv}^2 \)) that was calculated by Eq. (1):

$$ {q}^2=1-\frac{\sum_{n=1}^{\infty }{\left(\hat{y_i}-{y}_i\right)}^2}{\sum_{n=1}^{\infty }{\left({y}_i-\overline{y}\right)}^2} $$
(1)

whereas, \( \hat{y_i} \) and yi are predicted, observed activity values, and \( \overline{y}\ \mathrm{and}\ \hat{y} \) are observed and predicted mean activity values of the training set, respectively [50]. The \( {\sum}_{n=1}^{\infty }{\left(\hat{y_i}-{y}_i\right)}^2 \) is the predictive residual sum of squares (PRESS).

After cross-validation, the final PLS analysis was carried out using the optimal number of components with no validation to generate the final QSAR model. The non-cross-validated analysis performed by the conventional correlation coefficient r2(\( {r}_{ncv}^2 \)) (Eq. (2)), standard error of estimation (SEE) and F values calculated with the same column filtering set. High q2 and r2 (q2 > 0.5, r2 > 0.6) values are regarded as a proof of high predictive ability of the built model and also r2 − q2 for a good model should not be more than 0.3 [51].

$$ {r}^2=\frac{{\left[\sum \left({y}_i-{\overline{y}}_i\right)\left(\hat{y_i}-\hat{y}\right)\right]}^2}{\sum {\left({y}_i-{\overline{y}}_i\right)}^2\times \sum {\left(\hat{y_i}-\hat{y}\right)}^2} $$
(2)

Bootstrapping analysis was performed for 100 runs to assess the statistical confidence of the derived models [29, 49, 52, 53]. Contour maps were generated graphically after models were developed in CoMFA/CoMFA-RF and CoMSIA using the field type “StDev*Coeff”, and the contour levels were set to default values.

In HQSAR, LOO cross-validation was applied to determine the number of components that yields a good predictive model. PLS then yields a mathematical equation that related the molecular hologram bin values to the inhibitiory activity of the compounds in the database.

Validation of the QSAR model

A good internal validation showed only a high q2 in the training set of compounds, but it did not indicate the high predictive ability of the established models, therefore external validation was essential. The predictive ability of 3D-QSAR models was validated by calculating biological activities of the compounds which were not included in the training set and used as a test set. Test set was marked with an asterisk in Table 1.

Table 1 Chemical structure and the corresponding experimental and predicted pIC50 values by QSAR models

The predictive correlation coefficient \( {r}_{pred}^2 \) (\( {r}_{pred}^2>0.6 \)) [54], based on the test set, was calculated using Eq. (3):

$$ \kern3.5em {r}_{pred}^2=\left(\frac{\mathrm{SD}-\mathrm{PRESS}}{\mathrm{SD}}\right) $$
(3)

SD is the sum of squared deviation between the biological activities of the test set molecules and the mean activity of the training set molecules. PRESS is the sum of squared derivations between the predicted and actual activities of the test set molecules.

The performance of the regression models constructed here was evaluated using the root mean squared error (RMSE), mean absolute error (MAE) (RMSE and MAE close to zero), residual sum of squares (RSS), and concordance correlation coefficient (CCC; CCC ≥ 0.85) of the training and validation sets [55]. The RMSE and the MAE are calculated for the data set as Eqs. (4)–(7):

$$ \mathrm{RMSE}=\sqrt{\frac{\sum_{\mathrm{i}=1}^n{\left({y}_i-\hat{y_i}\right)}^2}{n}} $$
(4)
$$ \mathrm{MAE}=\frac{\sum_{i=1}^n\left|{y}_i-\hat{y_i}\right|}{n} $$
(5)
$$ \mathrm{RSS}={\sum}_{i=1}^n{\left({y}_i-\hat{y_i}\right)}^2 $$
(6)
$$ \mathrm{CCC}=\frac{2{\sum}_{i=1}^n\left({y}_i-\overline{y}\right)\left(\hat{y_i}-\hat{y}\right)}{\sum_{i=1}^n{\left({y}_i-\overline{y}\right)}^2+{\sum}_{i=1}^n{\left(\hat{y_i}-\hat{y}\right)}^2+n{\left(\overline{y}-\hat{y}\right)}^2} $$
(7)

To obtain the best predictive model for the test set, additional validation of model, the following parameters [54] were used (Eq. (8)):

$$ {\displaystyle \begin{array}{l}\frac{\left({r}^2-{r}_o^2\right)}{r^2}<0.1\ \mathrm{or}\ \frac{\left({r}^2-{r}_o^{\prime 2}\right)}{r^2}<0.1\\ {}0.85\le k\le 1.15\ \mathrm{or}\ 0.85\le k^{\prime}\le 1.15\end{array}} $$
(8)

\( {r}_0^2 \) and \( {r}_0^{\prime 2} \) are squared correlation coefficients of determination for regression lines through the origin between predicted (y) and observed (x) activities and vice versa. The values of k and k′ are the slopes of their models, respectively.

To further assess the models, another validation statistical parameter \( {r}_m^2 \) and \( {\varDelta r}_m^2 \) were determined by Eqs. (9) and (10):

$$ {r}_{m\kern0.5em }^2={r}^2\left(1-\sqrt{\left|{r}^2-{r}_o^2\right|}\right) $$
(9)
$$ {\varDelta r}_{m\kern0.5em }^2=\left|{r}_{m\kern0.5em }^2-{r}_{m\kern0.5em }^{\prime 2}\right| $$
(10)

\( {r}_m^2 \) value more than 0.5 (\( {r}_m^2>0.5 \)) and \( {\varDelta r}_m^2<0.2 \) show good external predictability of the models.

Molecular docking study

Molecular docking as one of the most frequent methods in drug design was used to investigate the mode of interaction of small molecules with the appropriate target binding sites. The docking study was performed using Operation Environment (MOE) software (www.chemcomp.com) between the most and least active compounds with TgROP8 protein. For the preparation of ligands prior to docking, the 2D structures of ligands were prepared by Chemoffice ultra (version 12.0, Cambridge Soft Corporation, Cambridge, UK, 2010) and converted to 3D format by Hyper Chem7 (Hyper cube Inc., USA) using AM1 semi-empirical method. The ligands in our data set were docked in the active site of TgROP8 (PDB ID: 3byv) by MOE software. The docking was performed by the triangle matcher placement algorithm in combination with London dG scoring function and force field as refinement method, and the conformation of compounds were further analyzed by LigX module in MOE software. The best docking pose of compound 56 was chosen for MD simulation.

Molecular dynamics simulation

The MD simulations, based on Newton’s second law or the equation of motion, were performed to investigate the interaction between the receptor and ligand in atomic details using the dynamics module of SYBYL [56]. The compound 56 was used as the template molecule to elucidate the MD simulations. Energy minimization of the docked ligand was performed in the Tripos force field and Gasteiger-Huckel charge without water using Boltzmann initial velocity.

The simulations were executed using normal temperature and volume (NTV) ensemble 300 K with coupling 100 fs. The MD simulations were performed with a time step of 2 for 10,000 fs and conformation snapshot at every 100 fs to calculate RMSD, Rg and potential energy values and recorded as a plot.

Results and discussion

CoMFA and CoMFA-RF statistical results

The statistical results of CoMFA and CoMFA-RF models are summarized in Table 2. The CoMFA analysis was carried out with steric and electrostatic fields at column filtering of 2.0 kcal/mol.

Table 2 Statistical parameters of QSAR models

PLS analysis of CoMFA for the training set including leave-one-out (LOO) and leave-ten-out (L-10-O) cross-validation with ONC 3 showed q2 value of 0.897, \( {r}_{cv}^2 \) (L-10-O) value of 0.892, and SEP of 0.242. These statistical results showed that the model had a better predictive capability.

The non-cross-validated PLS analysis gave a \( {r}_{ncv}^2 \) of 0.933 with SEE of 0.191, F value of 194.977, r2 − q2 of 0.036, and Rpearson of 0.932 which supported the statistical validity of the development model. The contributions from steric and electrostatic field descriptors explained 0.709 and 0.291 of the total variance, respectively, that indicated steric effect was more important than the electrostatic fraction.

After using region focusing, a new model of CoMFA-RF with improvement in the statistical parameters was created. The cross-validation and non-cross-validated PLS calculation results were found better in CoMFA-RF as, compared with CoMFA. This approach showed an increase in the q2 value from 0.897 to 0.900 with ONC of 2 and \( {r}_{cv}^2 \) (L-10-O) from 0.892 to 0.902 and SEP of 0.236. The non-cross-validated PLS analysis resulted in high \( {r}_{ncv}^2 \)value of 0.935 with a low SEE value of 0.174, F value of 242.439, r2 − q2 value of 0.035, and Rpearson value of 0.940. The contribution of steric and electrostatic field descriptors was 0.730 and 0.270, respectively in CoMFA-RF.

The bootstrapped results were shown in \( {r}_{bs}^2 \) and SEEbs values of 0.984 and 0.005 (CoMFA) and 0.980 and 0.006 (CoMFA-RF), respectively, suggesting a good internal consistency and the absence of systematic errors of the models within the training data set.

CoMSIA statistical results

The CoMSIA technique deals with direct correlation of ligand affinities to changes in molecular properties [30]. The CoMSIA model was generated using combinations of five steric (S), electrostatic (E), hydrophobic (H), hydrogen-binding acceptor (A), and hydrogen-bonding donor (D) fields. The statistical parameters of CoMSIA model are summarized in Table 2. In PLS analysis, the q2 value of 0.910 with ONC of 3, SEP of 0.226, and \( {r}_{cv}^2 \)(L-10-O) of 0.912 was obtained with column filtering of 2.0 kcal/mol. The non-cross-validated PLS analysis gave a \( {r}_{ncv}^2 \)value of 0.950 with SEE value of 0.169, F value of 252.617, r2 − q2 value of 0.040, and Rpearson value of 0.951.

A high bootstrapped r2 value of 0.981 and SEEbs of 0.006 suggests a high degree of confidence in the analysis. For CoMSIA, the contribution of the steric, electrostatic, hydrophobic, hydrogen bond donor and hydrogen bond acceptor field descriptors were 0.143, 0.170, 0.206, 0.356, and 0.126, respectively. These molecular fields were not completely independent of each other and could form 31 combinations (Fig.4).

Fig. 4
figure 4

Results of the distribution of q2, \( {r}_{ncv}^2 \), \( {r}_{cv}^2 \), and \( {r}_{bs}^2 \) values that were obtained from 31 combinations of CoMSIA fields. s, steric; e, electrostatic; h, hydrophobic; d, H-bond donor; a, H-bond acceptor

Among the first five models, acceptor field with a high q2 value (q2 = 0.910) was more important than the other four fields. In CoMSIA model, combination of steric, hydrophobic, and hydrogen bond acceptor (SHA) was found to be the best. CoMSIA (SHA) combination gave q2 value of 0.932, \( {r}_{ncv}^2 \) of 0.951, \( {r}_{cv}^2 \) of 0.923, and \( {r}_{bs}^2 \) of 0.982. In the model CoMSIA, this combination shared the large part and indicated that internal prediction of SHA combination was good.

According to the results of the CoMFA and CoMSIA models, the steric field and the steric, hydrophobic and hydrogen bond acceptor contributions, respectively shared the large part. Also, docking study show that the steric, hydrophobic and H-bond effects around the key residues of the active site performed a significant role in the binding of ligand to TgROP8. It was indicated that the hydrophobic and steric properties were effective in the design of new T. gondii agents.

HQSAR statistical results

The HQSAR is a technique for QSAR analysis that is useful in exploring the combination of each molecule under study to the biological activity and eliminates the need of alignment, generation of 3D structures and putative binding conformation. The performance of the HQSAR model was affected by three parameters, including the fragment size, the fragment type (fragment distinction), and hologram length. The HQSAR models with statistical parameters are shown in Table 2.

The best statistical results of HQSAR model were obtained with q2 value of 0.924, ONC of 4, SEP of 0.210, and \( {r}_{cv}^2 \) (L-10-O) of 0.896, \( {r}_{ncv}^2 \) of 0.953 with SEE of 0.162, F value of 226.360, r2 − q2 of 0.029, \( {r}_{bs}^2 \) of 0.982 with SEEbs of 0.006, and Rpearson of 0.954 using a relevant hologram length (HL) of 151, fragment distinction (atom (A) and bond (B)), and the fragment size of 4–7 (Tables 3 and 4). All the results demonstrated that the HQSAR model was also highly predictive.

Table 3 HQSAR analysis for various fragment distinctions on the key statistical parameters using fragment size (4–7)
Table 4 Statistical results of QSAR model using the model 1–2 (including fragments A/B) with different fragment sizes

Validation of QSAR models

The predictive abilities of the QSAR models were externally validated using the independent test set that was not used for the model generation [50]. q2 and r2 parameters, obtained from internal validation, were used for confirming the stability and the predictive ability of the models. The QSAR models for the whole test set including 15 compounds gave the \( {r}_{pred}^2 \) and \( {r}_m^2 \) values of 0.938 and 0.771 (CoMFA), 0.988 and 0.725 (CoMFA-RF), 0.998 and 0.870 (CoMSIA), and 0.995 and 0.763 (HQSAR) and high slope regression lines with k and k values of 0.995 and 1.002 (CoMFA), 1.000 and 0.997 (CoMFA-RF), 1.009 and 0.989, and 1.018 and 0.991 (HQSAR), respectively. \( {r}_o^2 \) and\( {r}_o^{\prime 2} \) values of 0.909 and 0.931 (CoMFA), 0.895 and 0.925 (CoMFA-RF), 0.952 and 0.961 (CoMSIA), and 0.897 and 0.839 (HQSAR), respectively, were used to calculate the relationship between r2, \( {r}_o^2 \), and \( {r}_o^{\prime 2} \) that (r2 − \( {r}_o^2\Big)/\mathrm{r}2 \) and (r2 − \( {r}_o^{\prime 2} \))/r2 values of 0.035 and 0.012 (CoMFA), 0.060 and 0.025 (CoMFA-RF), 0.016 and 0.006 (CoMSIA), and 0.034 and 0.001 (HQSAR), respectively were obtained.

The QSAR models yielded RMSE, MAE and CCC values of 0.157, 0.119, 0.967; 0.089, 0.040, and 0.961 (CoMFA); 0.177, 0.132, 0.888; 0.091, 0.038, and 0.955 (CoMFA-RF); 0.140, 0.111, and 0.951; 0.157, 0.032, and 0.968 (CoMSIA); and 0.133, 0.098, 0.976; 0.136, 0.056, and 0.904 (HQSAR) for training and test set, respectively.

From the values of the performance criteria parameters yielded by the QSAR models in training and test data (Table 5), it is evident that all of the models yielded considerably low RMSE and MAE values and high CCC values which show that models built by training set could be used for the prediction of these chemo types.

Table 5 Statistical parameters of validation method for QSAR models

The results of external validation parameters are listed in Table 5. These results confirm that the QSAR models could be used to predict the biological activities of new compounds and their derivatives.

The correlation plots between the predicted and experimental activities. Most of the compounds were located on or near to the trend line in the QSAR models, and these results confirm that these models had good predictive ability for new compounds.

The residual values of the QSAR models. The CoMSIA and HQSAR models showed smaller residuals than the CoMFA and CoMFA-RF models and were the better models are shown in Fig. 5 and Fig. 6.

Fig. 5
figure 5

The plot of predicted pIC50 vs. experimental pIC50 values for training and test sets compounds by QSAR models

Fig. 6
figure 6

Residual plots between experimental and predicted values for QSAR models

Evaluation of the Y-randomization test, variance inflation factor, and application domain of model

The QSAR models were further validated by applying the Y-randomization test to assess the robustness of the models and to avoid chance correlation [57, 58]. Thus, for every original model, several random shuffles of the dependent variable (biological activity) were performed and a new QSAR model was developed using the original independent variable matrix and the results are shown in Table 6. The low q2 and \( {r}_{ncv}^2 \) values (q2 < 0.5 and \( {r}_{ncv}^2<0.6 \)) show that the good results obtained in the formulation of the final models were not by chance.

Table 6 q2 and \( {r}_{ncv}^2 \) values after several Y-randomization tests

In addition to Y-randomization tests, multi-collinearity of the descriptors and the models were detected by calculating the variance inflation factor (VIF), which can be calculated as follows:

$$ \mathrm{VIF}=\frac{1}{1-{r}^2} $$

where r2 is the correlation coefficient of the multiple regression between the variables within the model. If VIF equals to 1, then no inter-correlation exists for each variable; if VIF falls into the range of 1–5, the related model is acceptable; and if VIF is larger than 10, the related model is unstable and a recheck is necessary [59, 60]. The corresponding VIF values of the seven descriptors were showed in Table 2. As can be seen from this table, all the variables have VIF values of less than five, indicating that the obtained model has statistical significance, and the descriptors were found to be reasonably orthogonal and model is said to be stably acceptable.

For a new compound with no experimental data, a predicted value of QSAR models without an idea of reliability of the value is not useful. Therefore, for evaluating new compounds, a very important step in QSAR model development is the definition of the applicability domain of regression or classification models [33].

The Williams plot, the plot of the standardized residuals (δ) vs. leverage values (hi), was used to illustrate the predictive and express the applicability domain of the models for each chemical compound [61, 62].

The standardized residuals (δ) value is calculated by Eq. (11) [63]:

$$ \delta =\frac{y_i-\hat{y_i}}{\sqrt{\sum \limits_{i=1}^n\frac{{\left({y}_i-\hat{y_i}\right)}^2}{\left(n-A-1\right)}}} $$
(11)

where \( {y}_i\kern0.75em \mathrm{and}\ \hat{y_i} \) are the observed and predicted values for ith compound, respectively, n is the number of compounds, and A is the number of descriptors. Also, the leverage value (hi) is defined by Eq. (12):

$$ {h}_i={X}_i^T{\left({X}^TX\right)}^{-1}{x}_i\left(i=1,...,n\right) $$
(12)

where xi is the descriptor-row vector of the ith compound, \( {X}_i^T \) is the transpose of xi, X is the descriptor matrix of the training set compounds, and XTis the transpose of X.

The warning leverage value (h*), as a prediction tool, is expressed as:

$$ {h}^{\ast }=\frac{3\left(k+1\right)}{n} $$

where k is the number of model descriptors and n is the number of training compounds.

The Williams plot illustrates the distribution of data and its restricting rang termed cutoff lines which all data should be between ± 3 units (horizontal dotted line) for standardized residuals and the leverage value (hi) should be less than warning leverage (hi < h*). The Williams plot for the training set is used to identify molecules with the greatest structural influence (hi < h*) in developing the QSAR models. Molecules with hi > h* are evaluated to be unreliably predicted by the models due to substantial extrapolation.

Cook’s distance is used to estimate the influence of a single observation of the model [64] and is defined by Eq. (13):

$$ {D}_i=\frac{e_i^2}{p+1}\ast \frac{h_i}{1-{h}_i} $$
(13)

where \( {e}_i^2 \) is the standard residual of the ith compound, p is the number of descriptors, and hi is the leverage value of the ith compound. The cutoff of the Cook’s distance is defined as\( \frac{4}{\left(n-p-1\right)} \), and the compounds with Cook’s distance higher than the cutoff value are marked as highly influential points of the model.

In this work, for CoMFA, CoMFA-RF, and CoMSIA models, most of the compounds fall into their corresponding application domain. These results indicated that our QSAR models had achieved a reliable activity prediction for the compounds.

As shown in the Williams plot of CoMFA model for the data set (Fig. 7a), two compounds (15 and 31) of training set had greater value than the warning leverage (h*) value of 0.206. These compounds had low standard residual value and could be considered as influential in fitting the model performance but not necessarily outlier to be deleted from the training set. The test compounds were within the applicability domain (AD), indicating that their predicted activity values were reliable. Also, at the Cook’s plot of CoMFA model (Fig. 7b); only, there were highly influential two compounds for training and test set that may slightly distort the regression. In addition, the histogram of the residuals distribution was confirmed with histogram plot as shown in Fig. 7c.

Fig. 7
figure 7

Williams plot describing the applicability domain of the CoMFA model for the training and test sets (h* = 0.206) (a); Cooks distance plot (b); and histogram of model CoMFA residuals (c)

In the Williams plot of CoMFA-RF model for data set (Fig. 8a), two compounds (58 and 31) of training set had greater value than the warning leverage (h*) value of 0.206. These compounds similar CoMFA model had low standard residual value and could be considered influential in fitting the model performance, but not necessarily outlier to be deleted from the training set. The test compounds were within the AD, indicating that their predicted activity values were reliable. Also, at the Cook’s plot of CoMFA model (Fig. 8b), there were highly influential four compounds for training and test set. In addition, the histogram of the residuals distribution was confirmed with histogram plot as shown in Fig. 8c.

Fig. 8
figure 8

Williams plot describing the applicability domain of the CoMFA model for the training and test sets (h* = 0.206) (a); Cooks distance plot (b); and Histogram of model CoMFA residuals (c)

Also, at the Williams plot of CoMSIA model for data set (Fig. 9a), there were two outlier compounds for training set that could be regarded as structural outliers. Otherwise, according to the Cook’s distances (cutoff = 0.0755) of the compounds in the data set, three highly influential compounds may distort the regression (Fig. 9b), also, the histogram of the residual distribution was confirmed with histogram plot as shown in Fig. 9c and prediction of CoMSIA model is reliable.

Fig. 9
figure 9

Williams plot describing the applicability domain of the CoMSIA model for the training and test sets (h* = 0.410) (a), Cook’s distance plot (b); and Histogram of model CoMSIA residuals (c)

Interpretation of CoMFA and CoMSIA contour maps

The QSAR contour maps were used as an informative tool to visualize the effects of the different fields on the target compound 3D grid orientation of the models. The CoMFA and CoMSIA results were graphically interpreted by field contribution maps using the standard deviation (StDev) at each grid point and the coefficient from the PLS analysis (StDev*Coefficients).

The CoMFA contour maps of the steric and electrostatic fields for the best anti-T. gondii agent (compound 56) are shown in Fig. 10a, b. The field steric is shown by favorable groups (80% contribution) in green color and unfavorable ones (20% contribution) in yellow where the introduction of bulky groups may enhance or diminish the activity.

Fig. 10
figure 10

CoMFAStDev*coeff. Contour plots with the combination of compound 36. a Steric contour maps: Green contours indicate regions where bulky groups increase activity and yellow contours indicate regions where bulky groups decrease activity. b Electrostatic contour maps: Blue contours indicate regions where positive charges increase activity and red contours indicate regions where negative charges increase activity

In the CoMFA steric maps, there was a green contour covering the naphthyl group at N-3 position of thiazolidin-4-one scaffold. The bulky groups at this position of compound improved anti-T. gondii activity and had the highest activity. Thiazolidin-4-ones substituted on the nitrogen atom of the3-position with phenyl, naphtyl groups such as the compounds 13, 15, and 27–59 exhibited more potency, while compounds 2–5 due to the absence of these groups had relatively low activity. In addition, ferrocene group was substituted on the moiety aryl hydrazone that is attached to the carbon of the two positions of thiazolidin-4-one core with green contour in compound 56 had the highest activity. The compounds 1, 13, 15, 17, 22, and 30–59 with bulky substituent’s (e.g., aryl, thiophenyl, and butyl) at this region exhibited more potency.

Substituting the bulky groups at C-5 position of thiazolidin-4-one core, substituents of aryl moiety at N-3 position of thiazolidin-4-one core and the ethylidene group of hydrazone moiety decreased activity because these substituents were located at disfavored yellow contours. Therefore, these positions of thiazolidin-4-one core should be occupied by the steric moderate and low crowed substituents such as acetic acid and halogen aryl groups (e.g., 13, 15, 17, 22, 26, 41, 44, 47, and 53).

In CoMFA electrostatic contour maps (Fig. 9b), the blue region (80% contribution) are favorable for electropositive groups and red regions (20% contribution) is favorable for electronegative groups. The blue contour on the ferrocene group of compound 56 indicated the introduction of electropositive groups in this position could improve the biological activity. Besides, the red contours in the S-1 and C-5 positions of thiazolidin-4-one core and N-1of the moiety hydrazone showed that the electronegative substituent was beneficial to activity (compounds 25 < 24 < 23 < 26, 22 < 13, 15, 17).

In CoMSIA model, the steric and electrostatic, hydrophobic, hydrogen-bonding (H-bond) donor, and acceptor contour maps of compound 56 are shown in Fig. 11. The CoMSIA steric contours were nearly similar to that of CoMFA contours, so the electrostatic, hydrophobic interaction and hydrogen bond fields were described here. The blue contours located at the N-3 position of the thiazolidin-4-one core that increased anti-T. gonidii activity (e.g. 5, 6, 23, 54, 55, and 56). The red contour was observed close to ethylidene hydrazineylidene moiety that the introduction of the electronegative groups increased biological activity.

Fig. 11
figure 11

CoMSIAStDev*Coeff contour plots with the combination of compound 56. a Steric contour maps: green contours indicate regions where bulky groups increase activity; yellow contours indicate regions where bulky groups decrease activity; b electrostatic contour maps: blue contours indicate regions where positive charges increase activity; red contours indicate regions where negative charges increase activity. c Hydrophobic contour maps: yellow contours indicate regions where hydrophobic substituents enhance activity; white contours indicate regions where hydrophobic groups decrease activity. d Hydrogen bond donor contour maps: cyan contours indicate regions where H-bond donor groups increase activity and purple contours indicate the unfavorable regions for hydrogen bond donor substituents. e H-bond acceptor contour maps: magenta contours indicate regions where H-bond acceptor substituents increase activity; red contours indicate the disfavor regions for H-bond acceptor groups

In the hydrophobic contour map, the yellow region is favorable (80% contribution) for the hydrophobic group while white region (20% contribution) is favorable for the hydrophilic group.

The white regions near the N-3 position of thiazolidin-4-one core and an aryl moiety of the ferrocene group showed that the introduction of hydrophilic groups into these positions might be beneficial for inhibitory activity (Fig. 11c). The yellow contours in a naphtyl ring of the N-3 position of thiazolidin-4-one core and ethylidene hydrazineylidene moiety indicated that hydrophobic groups such as aryl and heterocyclic in this region could be increasing the activity of the compounds. The compounds 54–59 with hydrophobic substituent at this region exhibited more potency; while compounds 2–4 due to the absence of this hydrophobic group, had relatively lower activity. These results confirm that the yellow contour of hydrophobic map was in agreement with green contour of steric map.

The CoMSIA H-bond donor and acceptor contour maps correlated with hydrogen bond interactions of ligand with the target. The cyan and purple contour maps of H-bond donor indicated favorable (80% contribution) and unfavorable (20% contribution) interactions and the magenta and red contour maps indicated favorable (80% contribution) and unfavorable (20% contribution) H-bond acceptor groups (Fig. 11d, e). However, no unfavorable purple contour was observed. There were two cyan contours near to C-5 and C-6 positions of naphtyl ring and N-3 position of thiazolidin-4-one core that the H-bond donor groups might improve anti-T. gonidii activity.

Also, no unfavorable red contour for H-bond acceptor interaction was observed. There was a magenta contour in substituents of the N-3 position of thiazolidin-4-one core which was favorable for H-bond acceptor (compounds 57–59) (Fig. 10e).

Interpretation of HQSAR contribution map

HQSAR calculations are based on the contributions of molecular fragments to the biological activity for each molecule. The results of the HQSAR contribution maps can be graphically shown as a color-coded structure diagram which the color of each atom reflects its contribution to the molecule’s overall activity. The red end of the spectrum (red, red orange and orange) reflects negative contribution to the activity, while the green end (yellow, blue, green-blue, and green) represents positive effect and intermediate contributions are colored in white. The individual atomic contributions of the most active anti-T. gonidii analogs (compound 56) were displayed in Fig. 12.

Fig. 12
figure 12

The HQSAR contribution map of the most active compound (56). The colors in yellow, blue, green-blue, or green indicate positive contributions, while colors with red, red-orange, or orange represent negative contributions and intermediate contributions are colored in white

The thiazolidin-4-one scaffold as maximal common structural fragment represented by green color code because it was a common fragment to all molecules and contributed in the same way to all inhibitors. The aryl derivatives in the N-3 position of thiazolidin-4-one core was highlighted in green and yellow colors, indicating the importance of these fragments to biological activity. Ferrocene group was substituted on the aryl hydrazone moiety was colored in green that positive contribution to inhibitory activity. Finally, the structure-activity relationship and binding features obtained by present QSAR models and molecular docking analysis are summarized in Fig. 13.

Fig. 13
figure 13

Structure-activity relationship revealed by 3D- and 2D-QSARand docking studies

Molecular docking studies

TgROP8 is kinase from the rhoptry organelles of the parasite that are unique to apicomplexan organisms. This protein contains a serine/threonine kinase domain is injected into the host cell in the precise moment of parasite internalization and manipulate the immune response of the host.

In this study, the MOE program was run to explore the possible binding modes of the anti-T. gondii agents. To confirm the validity of used docking parameters, the co-crystallized ligand ethylene glycol was re-docked into the active site of TgROP8 enzyme. The re-docking result and the cognate ligand (red) were almost completely superimposed and the RMSD value (0.8566 Å) guaranteed the reliability of the docking procedure (Fig. 14).

Fig. 14
figure 14

Comparison of binding poses of co-crystallized ligand ethylene glycol (orange red) and its re-docked (green) in the active site of TgROP8 (PDB: 3byv)

In order to gain functional and structural insight into the binding mode of the most-potent (compound 56) and lest-potent (compound 10) inhibitors and TgROP8 enzyme and also, to validate the results of QSAR contour maps, docking studies were carried out using MOE software (Fig. 15a, b).

Fig. 15
figure 15

a The 2D representation of the interaction between compound 56 (the most active compound). b The 2D representation of the interaction between compound 10 (the least active compound) in the crystal structure of TgROP8 (PDB ID: 3byv) using LigX in MOE

Analysis of docking results revealed a high docking score (− 12.44 kcal/mol) for the most active compound 56 in comparison with that of the least active compound 10 (− 8.36 kcal/mol). The compound 10 bound with the less numbering of active site residues of enzyme and had less interaction with TgROP8 compared with compound 56. The compound 56 was well stabilized in the active site of TgROP8 and had significant interactions with the key amino acid residues of TgROP8 (Fig. S1 in the Supplementary file).

Regarding the docking studies, four types of interactions: hydrophobic π-π, π-cation, hydrogen-bonding, and hydrophobic interactions were involved in the attachment of compound 56 to the active site of the receptor. With few exceptions, the binding mode of the best-scored ligands with TgROP8 by LigX of MOE suggested that the compounds were oriented towards the gorge of the protein. In active site of the TgROP8, Arg 228 made two π-cation interactions with the naphthalene substituent. Also, Tyr 370 made π-π stacking interactions with naphthalene substituent.

The carbonyl oxygen in the thiazolidin-4-one core formed hydrogen bond with Pro371 while the nitrogen atom of hydrazine moiety could also form a hydrogen-binding interaction with Thr 439. Nevertheless, the ferrocene, naphthalene and thiazolidinone moieties made Van der Waals interactions with Glu 275, Ile 337, Gly 440, Glu 442, Tyr 278, Tyr 280, Asn 376, Val 426, Asp 427, His 295, Val 429, and Met 373. Therefore, with investigation of docking of all compounds at TgROP8 active site, obtained results showed that the most potent compounds with Ferrocene and naphtalene moieties on thiazolidin-4-one core have more favorable interactions with TgROP8 and bound with more number of active site residues in comparison with the less-potent compounds (Fig. S1 in the Supplementary file). These docking results validated the contour maps of QSAR models.

Molecular dynamics simulation

The molecular dynamics simulation has been done to elucidate behavior of TgROP8 protein upon binding to the ligand and stability and interaction of ligand-protein throughout the simulation. The dynamics stability of secondary structure elements and conformational changes in protein-ligand complex were compared with protein of TgROP8 by the root-mean-square derivations (RMSD) and radius of gyration (Rg) plots for 10 ns simulations with respect to temperature, potential energy, kinetic energy, and total energy, and results showed that this system is in a stable state.

The overall simulation convergence and ligand-protein equilibration were determined with RMSD of backbone atoms (Cα, C, and N) that is a measure of the stability of the structures. The RMSD vs. Time is shown in Fig. 16.

Fig. 16
figure 16

RMSD between TgROP8 with ligand and without ligand

This plot indicated that The RMSD of backbone (Ca, C, and N) of protein and protein-ligand complex (56) reached stability after about 4 and 2 ns of simulation, respectively, and RMSD value was under 2.0 Å in both the cases. The RMSD of protein-ligand complex (56) with an average of 1.51 ± 0.070 Å (mean ± SD) was converged very close to protein with average RMSD value of 1.21 ± 0.112 Å was slightly larger than protein. It seemed that binding of ligand with protein increased the conformation flexibility of TgROP8 protein.

Rg is a parameter that describes the equilibrium conformation of the native and the bound systems and is an indicator of the protein structure compactness.

Rg value as a function of time for protein and protein-ligand complex (56) were 18.59 ± 0.056 and 18.54 ± 0.100 Å, respectively. The graph showed that Rg of protein was increased for the first 2 ns of the simulation and then remained constant and Rg of protein-ligand complex (56) was constant after 1.50 ns of simulation and larger than protein As shown in Fig. 17.

Fig. 17
figure 17

Rg between TgROP8 with ligand and without ligand

Another simple way to measure the stability of ligand-protein complex is potential energy. The ligand bond protein potential energy was found to be 202,304.8 ± 167.73 kcal/mol indicating the stability of the system (Fig. 18).

Fig. 18
figure 18

Potential energy of TgROP8 with and without ligand

The simulation results showed that the final structure and initial docked structure were in the same binding pocket and ligand-protein conformation was stable, and docking results were reliable. The 2D representation of the interaction between compound 56 after 10 ns simulation has been depicted in Fig. 19. This figure indicated that the interactions between most residues (Glu 275, Ile 337, Asn 376, Val 426, Tyr 278, Tyr 280, Arg 228, His 295, Glu 442, Met 373, Pro371, Thr 439, and Tyr 370) and compound 56 in the initial docked and final protein-ligand complex (56) were unchanged. However, the number of amino acid residues at the active site had changed in this 3D representation. These binding interactions of compound 56 with Pro 371, Thr 439, Arg 228, and Tyr 370 similar to initial docking might be helpful to the stability of compound 56 in the active site of TgROP8 protein.

Fig. 19
figure 19

3D representation of interactions between compound 56 and TgROP8 at the active site after 10 ns molecular dynamics simulations

Conclusion

The 2D-(HQSAR) and 3D-QSAR (CoMFA, CoMFA-RF and CoMSIA) methods were employed to study a series of thiazolidin-4-onederivatives as anti-T. gondii agents. The CoMFA, CoMFA-RF, CoMSIA, and HQSAR models provided statistically significant results for internal and external validations including q2 values of 0.897, 0.900, 0.910, and 0.924, \( {r}_{ncv}^2 \) values of 0.933, 0.935, 0.950, and 0.953, \( {r}_{pred}^2 \) values of 0.938, 0.998, 0.998, and 0.995, and \( {r}_m^2 \) values of 0.771, 0.725, 0.870, and 0.763, respectively. The CoMFA and CoMSIA contour maps and the HQSAR fragment contribution map were explained structure-activity relationship of this series of anti-T. gondii agents. Also, molecular docking and molecular dynamics simulation studies were carried out to confirm the rationality of the derived models. The thiazolidin-4-one core as scaffold and the bulky groups in the cyclic moieties as hydrophobic parts were key factors to improve inhibitory activity of TgROP8. These results showed good predictive models for the rational design of novel anti-T. gondii agents for the treatment of Toxoplasmosis disease.