Introduction

In the modern society, cancer has become an increasingly prevalent disease, characterized by a gradual accumulation of genetic alterations leading to the malfunctioning of tumor suppressor genes and/or the hyperactivity of oncogenes [1,2,3]. The modulation of genes through epigenetic mechanisms is pivotal in reshaping structure of nucleosomes by modifying the interaction between DNA and histones. Among various epigenetic modifications, histone deacetylation, affecting the fundamental packaging units of DNA, has been identified as a contributor to gene expression control. This process results in the formation of more condensed chromatin and subsequent transcriptional suppression [2, 4, 5]. The family of histone deacetylases (HDACs) is primarily responsible for catalyzing deacetylation on lysine residues [6, 7]. Aberrant expression of various HDACs has been implicated in numerous human disorders, including cancer, making them promising targets for therapeutic intervention across a spectrum of human malignancies [8].

HDACs are categorized into four groups based on phylogenetic relationships, with Class III acting as a nicotinamide-adenine dinucleotide-dependent lysine deacetylase, while Classes I, II, and IV are zinc-dependent [7, 9, 10]. Among HDAC enzymes, histone deacetylase 8 (HDAC8) inhibition has emerged as a prominent therapeutic strategy for various diseases. HDAC8, a class I histone deacetylase, is targeted for the treatment of conditions such as parasite infections, cancer, and X-linked intellectual disability [5, 11]. HDAC8 is a class I HDAC found in both the cytoplasm and nucleus of vital organs such as the heart, lung, kidney, and brain. It consists of 377 amino acids, weighing approximately 42 kDa, and lacks a C-terminal protein-binding domain. One of its distinctive features involves an unexpected susceptibility to negative regulation by cAMP-dependent protein kinase (PKA), suggesting the possibility of functional specialization. Notably, HDAC8 targets non-histone proteins, including structural maintenance of chromosome 3 (SMC3) cohesin protein, retinoic acid-induced 1 (RAI1), and the tumor suppressor gene p53 [8,9,10,11,12,13,14]. Moreover, HDAC8 is involved in promoting the proliferation of gastric adenocarcinomas, lung cancers, and cervical malignancies in humans [14, 15]. Additionally, it has been identified as a catalyst responsible for the in vitro deacetylation of several acetylated histone variants [16]. Apart from its implications in cancer, HDAC8 has been recognized for its substantial involvement in conditions like schistosomiasis and influenza-A infections [17].

Ongoing studies suggest that HDAC8 holds promise as a therapeutic target for conditions such as neuroblastoma, T-cell leukemia, and acute myeloid leukemia [18]. At present, the majority of inhibitors targeting HDAC8 exhibit broad-spectrum activity, affecting multiple isoforms from class I, class II, and class IV [19]. However, the primary drawback of these clinically approved pan-HDAC inhibitors lies in their non-selective nature, resulting in various adverse effects. Despite their clinical approval, these medications fall short of fulfilling the criteria for a selective and potent inhibitor, crucial for the effective anticancer treatment of HDAC8-associated diseases [20].

In the recent past, a multitude of QSAR studies [21,22,23,24,25,26,27,28,29,30,31,32] have been employed to decipher the essential structural features that impact HDAC8 inhibition (Table S1). However, despite the widespread use of QSAR techniques, a persistent gap exists due to challenges in statistically significant model generation with a properly curated dataset of HDAC8 inhibitors and also in achieving robust predictions [33]. Furthermore, the diverse chemical structures of HDAC8 inhibitors pose a challenge in identifying a universally applicable set of descriptors that can comprehensively elucidate their inhibitory activity. To address these challenges, we have utilized Quantitative Read-Across Structure–Activity Relationship (q-RASAR) which involves the study of 121 HDAC8 inhibitors with distinct IC50 values sourced from published research [34,35,36,37,38,39,40,41,42,43,44,45].

q-RASAR is a statistical modeling technique that improves the external predictivity of QSAR/QSPR models by including similarity and error-based metrics as descriptors in addition to the standard structural and physicochemical ones [46, 47]. While using similarity-based considerations, this method can produce models that are straightforward, comprehensible, and transferable. The q-RASAR technique holds promise in data gap filling in materials science, food sciences, predictive toxicology, medicinal chemistry, agricultural sciences, nano-sciences, and so on [46]. The present study employs an integrated in silico approach to investigate the critical structural features required for effective HDAC8 inhibition. The research encompasses four primary components (depicted in Fig. 1) aimed at crafting potent HDAC8 inhibitors: (a) Utilizing 2D-QSAR modeling to pinpoint essential structural characteristics of HDAC8 inhibitors, (b) Employing q-RASAR modeling to enhance the external predictability of HDAC8 inhibition, (c) Conducting a pharmacophore mapping study to elucidate potential pharmacophoric features governing HDAC8 inhibitory activity, and (d) Validating these identified structural features through molecular docking and dynamics simulation-based methods. Combining these methods provides a powerful toolkit for understanding, predicting, and validating the structural features crucial for HDAC8 inhibition, offering valuable insights for discovering and developing selective HDAC8 inhibitors.

Fig. 1
figure 1

The workflow of the current study involves different approaches such as 2D-QSAR, q-RASAR, pharmacophore mapping study, molecular docking, and dynamics simulation

Materials and methods

2D-QSAR modeling

Data set

This study focuses on the q-RASAR modeling of 121 HDAC8 inhibitors with specific IC50 values extracted from published research [34,35,36,37,38,39,40,41,42,43,44,45]. The dataset is considered as biologically curated since all the reported inhibitors underwent evaluation for HDAC8 inhibition using the same method. Table S2 displays the reported IC50 values for the 121 compounds. The logarithm of the reciprocal of the half maximal inhibitory concentration (pIC50) of HDAC8 inhibitors was taken as the response variable for the generation of the QSAR model.

Molecular descriptor calculation

For the development of QSAR models, we utilized a diverse array of descriptor types [48]. Specifically, we employed 0D−2D descriptors for model generation. This choice was made to reduce the computational load associated with energy minimization and conformational analysis [49]. Several classes of 2D descriptors, including ring descriptors, molecular features, constitutional index, functional group count, and electro-topochemical atom descriptors, were computed using the PaDEL-Descriptor program [50]. The DTC Lab's pre-treatment tool (using Data Pre-Treatment 1.2 from http://teqip.jdvu.ac.in/QSAR_Tools/) eliminated the intercorrelated descriptors (intercorrelation cut off > 0.99) and those with minimal variability in values (variance cut off < 0.0001) [51, 52].

Data set division

Splitting the data set is an important step in QSAR model development. We segregated the dataset into training and test sets. The test set was dedicated to validating the established model externally, whereas the training set primarily facilitated model development. To accomplish this, we utilized the "datasetDivisionGUI1.2_19Feb2019" program [52], which implemented the sorted activity-based division technique in our current work. The data points within a cluster were found to be comparable to each other but distinct from those found outside of it. After the organization of the complete data set by cluster number and related activity levels, we selected approximately 20% of the data points from each cluster to be used as test set compounds (\({N}_{\text{test}}\) = 24, pIC50 range = 4.915 to 7.066), with the remaining 80% treated as the training set compounds (\({N}_{\text{train}}\)= 97, pIC50 range = 4.988 to 7.678) for the QSAR study.

Feature selection and development of the QSAR model

For feature selection, we utilized the genetic algorithm (GA) technique [53], which relies on a fitness function based solely on mean absolute error (MAE) criteria. Using the Genetic Algorithm v4.1 [52], we identified descriptors with the strongest correlation to the response variable. Subsequently, we employed "Partial Least Squares (PLS) regression" to develop the initial QSAR models after pinpointing the significant descriptors.

q-RASAR model development

To increase the QSAR model’s external predictability, a q-RASAR model was developed using RASAR descriptors along with the structural descriptors [54]. The most similar compounds were identified by using the "Readacross v4.2" tool (available from https://sites.google.com/jadavpuruniversity.in/dtc-lab-software/home), which is based on Euclidean distance (ED)-based similarity, Gaussian kernel (GK) function similarity, and Laplacian kernel (LK) function similarity [55]. After splitting the training set in an 80:20 ratio, we got the sub-training and subtest sets, and used the following parameters for the optimization of the method: γ = 1, σ = 1, distance threshold = 1, number of close training compounds = 7, and similarity threshold = 0. Laplacian kernel function similarity read-across is the least error-prone method according to the optimization result (Table S3) and it is used for the RASAR descriptor calculation by using the "RASAR-Desc-Calc-v3.0.2." tool (available from https://sites.google.com/jadavpuruniversity.in/dtc-lab-software/home) [54,55,56] Finally, using the Best Subset Selection 2.1 tool [52], the pooled set of RASAR descriptors and previously selected 2D descriptors has been processed to the best subset selection. Partial Least Squares version 1.0 [52] was used to build the final PLS q-RASAR model.

ML-based qRASAR model development

For the development of a ML regression model, we employed supervised machine learning algorithms such as AdaBoost (ADB), Extreme Gradient Boost (XGB), Linear Support Vector Machine (LSVM) and Support Vector Machine (SVM). These models were developed using the previously mentioned training (\({N}_{train}\)= 97) and test set data (\({N}_{test}\) = 24). There might be opportunities for the model to be improved even more, and in order to do so, hyperparameter tuning was done. In the present study, we used GridSearchCV algorithm for tuning the hyperparameters which were used for the development of ML models. For performing this optimization, we used DTC Lab’s Python-based tool “Optimization and Cross-validation v1.0.” (https://sites.google.com/jadavpuruniversity.in/dtc-lab-software/home/machine-learning-model-development-guis). Using the optimized hyperparameter settings alongside the training and test sets, we developed four machine learning-based q-RASAR models. This was achieved using a Python-based tool called Machine Learning Regressor v2.0, which can be accessed from https://sites.google.com/jadavpuruniversity.in/dtc-lab-software/home/machine-learning-model-development-guis [57].

Pharmacophore mapping

In this investigation, Discovery Studio software [58] was used to create ligand-based 3D QSAR pharmacophore models. Here, the “3D QSAR Pharmacophore Model Generation” module of Discovery Studio 3.0 [59] was applied for model development. The dataset comprising 121 compounds was divided into training set (Ntrain = 24) and a test set (Ntest = 97). The training set consists of molecules with IC50 activity values spanning from 21 nM to 12,170 nM, including highly active (< 350 nM), moderately active (> 350–2500 nM), and inactive (> 2500 nM) compounds. During the hypothesis generation procedure, the module determines the cost function. Equation 1 depicts the formula used for estimating the cost (total cost) of a hypothesis:

$$ {\text{cost}} = eE + wW + cC, $$
(1)

where the coefficients for the error (E), weight (W), and configuration (C) components are denoted by the letters e, w, and c, respectively.

Two other important cost values are the fixed cost and the null cost that can be calculated using Eqs. 2 and 3, respectively.

$$ {\text{Fixed cost}} = eE\left( {x = 0} \right) + wW\left( {x = 0} \right) + cC, $$
(2)

where x basically represents the deviation from the expected values of weight and error.

$$ {\text{Null cost}} = eE\left( {\chi_{est} = \overline{\chi }} \right) + wW + cC, $$
(3)

where \({\chi }_{est}\) is the averaged scaled activity of the training set molecules.

The configuration cost is determined by the complexity of the pharmacophore hypothesis space. An increase in the root mean square (rms) value leads to a corresponding increase in the error cost value. The rms deviations serve as a metric for assessing the correlation quality, primarily between the estimated and actual activity data [60, 61].

Molecular docking study

The molecular docking study was conducted using the zinc metalloenzyme-optimized AutoDock4Zn [62] utilizing the AutoDock Vina program. The crystal structure of HDAC8 (PDB ID: 1T64) sourced from RCSB (https://www.rcsb.org/) was employed to explore the interactions between the protein and ligands. The protein preparation included the removal of water molecules, the application of Gasteiger partial charges, and the addition of polar hydrogens. Five active compounds (44, 54, 82, 102, and 118) and the least active compound (34) were selected from the pharmacophore mapping study for docking analysis. The ligands were imported into the AutoDock Tools suite [63], where polar hydrogens were added. PDBQT files were generated for both the ligands and the HDAC8 enzyme. The protein was placed at the center of a grid box to cover its entire binding region (grid center: 61.3, 73.865, 11.247; grid box volume: 40 × 40 × 40). Molecular docking was performed using the Lamarckian genetic algorithm (LGA). The ligand-binding interactions within the HDAC8 cavity were visualized using Discovery Studio 3.0 [59].

Molecular dynamics study

In this study, we have performed MD simulation study by employing the GROMACS 2021.1 software [64, 65] on the docked complexes of the most active (54) and the least active (34) compounds with HDAC8, implementing CHARM-GUI web server [66] to prepare different inputs. The PDB reader tool was first employed to facilitate protein pre-processing [67]. Additionally, using the OpenBabel program, the best-docked poses of ligands were converted to the “. mol2" format [68]. After that, the ligands were imported into the Modeler and Ligand reader tools to parameterize and generate topology files [69]. Subsequently, each protein–ligand complex was combined into a ‘.pdb’ file, which was then utilized in the "Solution Builder" function to create the GROMACS input system [70]. The entire protein structure was covered by the TIP3P water box system, which is rectangular. To remove steric overlapping, each system was neutralized using a sufficient amount of NaCl ions added using the Monte-Carlo method [71] and then run through 5000 steps of steepest descent energy minimization [72]. The next step involved subjecting the entire system to V-rescale temperature-coupling (constant coupling of 1 ps at 310.15 K temperature) for 125000 steps to achieve NVT equilibration, where the number of particles, volume, and temperature are all constant [73]. The MD simulation was run with a CHARMM36 m forcefield for 500 ns [74]. Using the gmx__rms, gmx_rmsf, and gmx_gyrate programs, respectively, the simulation results were further evaluated for a variety of geometrical properties, including radius of gyration (Rg), root mean square deviation (RMSD), and root mean square fluctuations (RMSF) [75]. With the aid of the gmx_hbond program, the hydrogen bond analysis between ligands of interest and active site amino acid residues was completed [76, 77]. Subsequently, production trajectories were created using PyMOL [78] to analyze the binding poses at various time intervals.

Results and discussions

QSAR and q-RASAR models

A dataset of 121 HDAC8 inhibitors was used for 2D-QSAR model development. Initially, a pool of 1444 2D descriptors were generated using the PaDEL-Descriptor tool. This was followed by the pre-treatment of the data, which generated 813 2D descriptors. These descriptors were chosen and put to use in the feature selection and model building. Finally, a PLS regression-based QSAR model (Eq. 4) with four latent variables was generated for the present study.

$$ pIC_{50} = \, 28.8362 \, + \, 0.0003 \, ATS6m \, - 0.1550 \, AATS0i \, + 0.3978 \, AATS7s \, - \, 0.4206 \, C3SP2 \, + 1.4312 \, maxHaaCH \, + \, 0.0040 \, TIC2 \, + \, 0.2586 \, VE1\_D $$
(4)

As per the regulations provided by the OECD, the model's performance was evaluated through stringent internal and external validation. The determination coefficient (R2 = 0.713), leave-one-out cross-validated correlation coefficient (Q2(LOO) = 0.654), and rm2 metrics of the training set [rm2 (train) = 0.540 and Δrm2 (train) = 0.177] are the internal validation metrics that demonstrate the model's robustness and goodness of fit. The mean absolute error of the training set (MAEtrain) for the model is 0.255. Additionally, the external validation metrics was computed, which include external predicted variance (Q2F1 = 0.732 and Q2F2 = 0.727), mean absolute error of test set predictions (MAEtest = 0.249), rm2 (test) = 0.519, Δrm2 (test) = 0.248, and concordance correlation coefficient (CCC = 0.825).

The q-RASAR model was developed to improve the external predictability of the QSAR model. Utilizing PLS regression with four latent variables (4 LVs), we merged the pool of seven structural descriptors of Eq. 4 with the computed read-across-based RASAR descriptors for the optimal subset selection and final model development (Eq. 5). The identified structural descriptors of Eq. 5 and their definition are summarized in Table 1.

$$ pIC_{50} = \, 27.5777 \, {-} \, 0.1473 \, AATS0i \, + \, 0.5759 \, AATS7s \, - \, 0.3768 \, C3SP2 \, + \, 0.7308 \, maxHaaCH \, + \, 0.0043 \, TIC2 \, + \, 0.3828 \, sm1\left( {LK} \right) \, {-} \, 0.2546 \, sm2(LK) $$
(5)
Table 1 The structural descriptors identified in the study

Following the OECD guidelines [79], this final model has undergone an exhaustive validation process using several internal and external validation criteria. Table 2 presents a comparison of validation parameters before (Eq. 4) and after (Eq. 5) the read-across strategy. A comparison of internal and external validation metrics before and after the implementation of the read-across strategy is illustrated using bar plots in Fig. 2.

Table 2 Comparison of internal and external validation parameters before and after the read-across strategy
Fig. 2
figure 2

Bar plot illustrating the comparison of internal and external validation metrics before and after the implementation of the read-across strategy

The final q-RASAR model (Eq. 5) exhibits superior statistical significance in terms of both internal and external validation parameters when compared to the QSAR model (Eq. 4). The internal validation parameters like R2 (Train), Q2(LOO) and Scaled Average rm2 are increased, whereas Mean Absolute Errors (MAE-Fitted; Train and MAE-LOO; Train) as well as Scaled Delta rm2 are decreased in the qRASAR model (Eq. 5). The external validation parameters like Q2F1 and Q2F2 as well as concordance correlation coefficient (CCC) are also significantly increased compared to the previous model (Eq. 4). Moreover, Mean Absolute Error (MAE; Test) is decreased suggesting a better external predictivity of the q-RASAR model (Eq. 5).

Thus, external validation metric values (Q2F1 and Q2F2), as well as the model's internal validation metrics, such as R2, Q2(LOO), and rm2 metrics justified better model's performance in the case of q-RASAR model. The coefficient plot, variable importance plot (VIP), score plot, and loading plot of the final q-RASAR model (Eq. 5) were generated by using the SIMCA-P program [80]. The coefficient plot is presented in Fig. 3A, and it demonstrates that AATS7s, maxHaaCH, TIC2, and sm1(LK) are descriptors that contribute positively to the developed model, whereas AATS0i, C3SP2, and sm2(LK) contribute negatively. The variable importance plot (VIP) is shown in Fig. 3B to identify the significant descriptors in the QSAR model for HDAC8 inhibition. As per the VIP, HDAC8 inhibition has the following order of relative relevance for the contributing descriptors: sm1(LK) > sm2(LK) > TIC2 > AATS7s > AATS0i > C3SP2 > maxHaaCH. Figure 3C displays the score plot demonstrating that no molecule is recognized as an outlier. The loading plot in Fig. 3D further indicates that the descriptors sm1(LK), sm2(LK), TIC2, and AATS7s had the largest impact (as mentioned in Fig. 3B) on predicting the HDAC8 inhibition because of their distant placement from the origin.

Fig. 3
figure 3

A Coefficient plot, B variable importance plot (VIP), C score plot, and D loading plot of the final q-RASAR model (Eq. 5)

In addition, we attempted to develop other machine learning-based q-RASAR models, namely AdaBoost, Extreme Gradient Boost, Support Vector Machine, and Linear Support Vector Machine for the prediction of HDAC8 inhibitory activity using seven descriptors, which were identified as significant contributing features in our PLS q-RASAR model. The statistical parameters of the ML models and selected hyperparameters for developing these models are shown in Table 3. It is evident from Table 2 that the statistics of our q-RASAR PLS model outperforms the developed ML-based q-RASAR models. Therefore, we have continued further studies based on the PLS q-RASAR model.

Table 3 Statistical parameters and selected hyperparameters of the ML-based q-RASAR models

Mechanistic interpretation of HDAC8 inhibition

Mechanistic interpretation is very important for any QSAR model as per OECD Guideline 5. The five structural descriptors (AATS0i, AATS7s, C3SP2, maxHaaCH, and TIC2) and two RASAR descriptors, sm1 (LK) and sm2 (LK) [81], are employed for the development of the final model. The contribution of structural descriptors is important to gain insight into the HDAC8 inhibition. Among the structural descriptors, TIC2 has the highest impact on HDAC8 inhibition (Fig. 3B). The descriptor TIC2, which is a function of molecular structure, basically encapsulates the total information content of the 2-order symmetry. Higher values of this descriptor typically indicate a higher degree of symmetry or balanced relationships, while lower values suggest a more asymmetric or imbalanced structure. The greater values of TIC2 have a positive impact on HDAC8 inhibition. Thus, the compounds 82, 102, and 118 exhibit higher HDAC8 inhibition due to the increased TIC2 values (Fig. 4).

Fig. 4
figure 4

Identified seven 0D-2D descriptors [5 structural descriptors: AATS0i, AATS7s, C3SP2, maxHaaCH, TIC2 and 2 RASAR descriptors: sm1 (LK) and sm2 (LK)] used in the model. Among 7 descriptors, AATS7s, maxHaaCH, TIC2, sm1(LK) are the descriptors responsible for positive contribution. AATS0i, C3SP2 and sm2(LK) are responsible for negative contribution

The descriptor C3SP2 exhibits the greatest negative contribution to the HDAC8 inhibition. The C3SP2 descriptor [82] basically interprets the number of carbons that are doubly bound, and it is further attached to three other carbon atoms. The presence of the above-mentioned type of carbon in the structure was found to be responsible for the decrease in HDAC8 inhibition (compounds 22, 27 and 126). Compound 54 showed significant HDAC8 inhibition due to the low value for the aforementioned descriptor (Fig. 4).

The variable AATS0i [83] also exhibits a significant negative contribution to the model, as observed by its coefficient value in the coefficient plot (Fig. 3A). The descriptor AATS0i is an averaged moreau-broto autocorrelation of lag 0 weighted by ionization potential. This descriptor combines the concepts of ionization potential and the Moreau-Broto autocorrelation to quantify compounds' structural and electronic properties. In simpler terms, it represents how the ionization potential of a compound is related to its internal structural features, specifically focusing on the autocorrelation at lag 0. In the case of compounds 52 and 69, increasing the value of the AATS0i descriptor produces a decrease in HDAC8 inhibition. It is also noticed that compounds 44 exhibit significant HDAC8 activity due to the lower value of the AATS0i descriptor (Fig. 4).

The descriptor maxHaaCH has the least standardized coefficient value and it made the smallest positive contribution to HDAC8 inhibition. The descriptor maxHaaCH indicates the maximum atom-type H and it focuses on the presence and arrangement of hydrogen atoms within the molecular structure [84]. It is used to quantify the maximum occurrence of a specific type of hydrogen atom or a specific configuration of hydrogen atoms in the molecule. It has been found that increasing the value of the maxHaaCH descriptor promotes HDAC8 inhibition as indicated in compounds 44, 112, and 89 (Fig. 4).

The final structural descriptor of the model, AATS7s descriptor indicates the average Broto-Moreau autocorrelation—lag 7/weighted by I-state [85]. Broto-Moreau autocorrelation is a mathematical concept used to analyze the arrangement of atoms in a molecule at different distances, and the lag 7 part indicates that it is specifically looking at the correlation between properties of atoms or substructures separated by a distance of 7 bonds in the molecular structure. The term "weighted by I-state" suggests that the autocorrelation values are adjusted or weighted based on the electronic state of the atoms involved. In the case of compounds 89 and 44, we found that increased values of the AATS7s descriptor promote HDAC8 inhibition, whereas decreased values of the descriptor reduce HDAC8 inhibition (compounds 34 and 52) (Fig. 4).

Notably, two RASAR-based descriptors, sm1(LK) and sm2(LK), emerged as significant in the final model. These descriptors represent similarity coefficients, offering a means to identify compounds or inhibitors with biological activity or HDAC8 inhibition [87]. The mathematical representations of these coefficients are shown below:

$$ {\text{sm}}1 = \frac{MaxPos - MaxNeg}{{{\text{argmax}}\left( {MaxPos, MaxNeg} \right)}}, $$
(6)

where MaxPos and MaxNeg denote the similarity scores of the nearest positive source and negative source compounds, respectively, concerning a specific query compound [81].

$$ {\text{sm2}} = \frac{PosAvgSim - NegAvgSim }{{Avg. Sim}}, $$
(7)

where PosAvgSim signifies the average similarity values obtained from the positive close source compounds, whereas NegAvgSim indicates the average similarity values derived from the negative close source compounds [81].

The RASAR descriptor sm1(LK) [86] is positively correlated with HDAC8 inhibition. The higher values of the sm1(LK) coefficient are observed in the case of compounds 54 (pIC50 = 7.678), 78 (pIC50 = 7.456), 104 (pIC50 = 6.987), and 112 (pIC50 = 6.757). For an active compound to be considered ideal, its MaxPos value should be higher than the MaxNeg value. This condition leads to a positive sm1 value. Conversely, a negative sm1 value suggests that the MaxNeg value is higher than the MaxPos value. This indicates that the compound structurally resembles an inactive compound from a close source rather than an active one (e.g., compound 34, pIC50 = 4.915). The RASAR descriptor sm2(LK) shows a negative correlation with HDAC8 inhibition and is determined by the difference between positive and negative average similarity. The higher values of the sm2(LK) coefficient are identified in the case of compounds 60 (pIC50 = 6.076), 69 (pIC50 = 5.883), 124 (pIC50 = 5.699), and 129 (pIC50 = 6.452). This suggests that compounds similar to these in the training set are less effective at inhibiting HDAC8.

Pharmacophore mapping study

A total of 24 training set compounds were employed for the development of pharmacophore models by utilizing the “3D QSAR Pharmacophore Model Generation” module of Discovery Studio 3.0 [59]. Chemical features such as hydrogen bond acceptor (HBA), hydrophobic (HYP), ring aromatic (RA), hydrophobic ring aromatic (HY_RA), and zinc-binding group (ZBG) features were incorporated to generate ten pharmacophore hypotheses, with parameters adjusted for weight variation and uncertainty as detailed in Table 4.

Table 4 List of hypotheses for the generated pharmacophore using the modeling set of HDAC8 inhibitors

A thorough analysis of Table 4 reveals that HYP1 stands out as the best pharmacophore model, featuring the highest correlation coefficient (r = 0.969), the lowest root mean square (rms) of 0.944, a total cost value of 86.078, and the most significant cost difference value of 146.856. The inhibitory activity (IC50) of the modeling set molecules against HDAC8 using the HYP1 model is illustrated in Table 5.

Table 5 Inhibitory activities (IC50) and activity scale of the modeling set molecules against HDAC8 utilizing the HYP1 model

The statistical significance of the HYP1 model was evaluated using Fischer's randomization test, which involved generating 19 random pharmacophore hypotheses at a 95% confidence level for cross-validation. Comparatively, the cost value of HYP1 was lower (Total cost = 86.078) than that of the 19 randomly generated hypotheses (see Fig. S1, Supplementary file). In essence, this suggests that the superiority of the HYP1 model is not coincidental but rather deliberate and statistically supported.

A cost difference (ΔCost) exceeding 60 implies a correlation probability of over 90%. The correlation coefficient (rExternal) of the external set compounds demonstrates promising results, indicating the statistical significance and robustness of the best pharmacophore hypothesis, HYP1. Consequently, HYP1 is chosen as the final pharmacophore model for HDAC8 inhibitors. The identified pharmacophoric features from this model are distinct compared to prior studies [23, 32, 87,88,89,90,91,92,93,94,95,96] (refer to Table S4 of the Supporting Info). Figure 5A illustrates the 3D spatial relationship and geometric parameters of the HYP1 model, along with inter-feature distances. The presence of the ZBG is imperative for HDAC8 inhibitory activity, along with hydrogen bond acceptor features (HBA1, HBA2), and ring aromatic (RA) features. Compound 54, the most potent HDAC8 inhibitor (IC50 = 21 nM), adeptly matches all pharmacophore properties (Fig. 5B), while compound 34, the least potent HDAC8 inhibitor (IC50 = 12,170 nM), fails to align with all features (Fig. 5C).

Fig. 5
figure 5

A Representation of the best pharmacophore (HYP1) model. Inter-feature distances are shown in angstroms. The green and orange contours represent hydrogen bond acceptor (HBA) and ring aromatic (RA) features, respectively. The nevi blue contours represent the zinc-binding group (ZBG) feature. B Mapping of the most active compound (54) onto the selected pharmacophore (HYP1) C Mapping of the least active compound (34) onto the selected pharmacophore (HYP1). The green and orange contours represent hydrogen bond acceptor (HBA) and ring aromatic (RA) features, respectively. The nevi blue contours represent zinc-binding group (ZBG) feature

Molecular docking study

In order to find the significance of different pharmacophoric features in its interaction with the HDAC8 enzyme (PDB: 1T64), a molecular docking study was conducted for the five potent HDAC8 inhibitors (54, 44, 84, 102, and 118). Figure 6 depicts the docking poses and ligand–enzyme interactions of these HDAC8 inhibitors. Notably, five of the most active compounds fit precisely into binding pocket when docked into the active site of HDAC8 using AutoDock Vina program [62]. The 2D interactions and docking score of all these compounds are depicted in Figs. S2–S4 and Table S6, respectively.

Fig. 6
figure 6

Interactions of A compound 44, B compound 54, C compound 82, D compound 102, E compound 118, F co-crystallized ligand (CCL) with important binding residues in the pocket of HDAC8 (PDB: 1T64)

From the pharmacophoric mapping study, it has been noted that the zinc-binding group is essential for HDAC8 inhibition. The molecular docking study also explains that all the potent HDAC8 inhibitors show metal (Zn2+) coordination in the HDAC8 binding site. Both the amide oxygen as well as the hydroxyl oxygen of hydroxamate moiety are involved in metal coordination with Zn2+ in compound 44, whereas in the case of compounds 54, 84, 102, and 118, only the amide oxygen is involved. Y306 is involved in various interactions in all the compounds, suggesting that Y306 is a prerequisite for binding with HDAC8 ligands. Y306 participates in H-bonding interactions with the zinc-binding hydroxamate moiety, engaging the amide oxygen and the hydroxyl oxygen in compound 118. In compounds 44 and 54, Y306 forms ππ interactions with F152. In compound 82, Y306 is involved in alkyl interactions with the trimethoxy group. Moreover, in compound 102, hydrogen bonding interactions of Y306 with the NH group of the pyrrolidine ring and ππ stacking with the tetrahydroisoquinoline ring were also seen. Another, important residue F152 showed a significant contribution to ligand-HDAC8 binding. F152 forms π-interactions with the sulfur atom of the dithiolane ring and ππ-T shaped interactions with the phenyl ring in compound 54. As per the pharmacophore mapping study, the sulfur atom of dithiolane ring and the phenyl ring act as hydrogen bond acceptor (HBA1) and ring aromatic (RA) features, respectively, necessary for the HDAC8 inhibition. The importance of ring aromatic feature in the HDAC8 inhibitor structure is also highlighted by the presence of ππ stacking interaction with the residue F152 and the indole ring of compound 82. Residue I34 also forms π–alkyl interactions with the phenyl ring (ring aromatic feature) of compound 54. In compound 44, H143 and K33, are involved in H-bonding interactions, R37 forms π–cation interactions, and P35 is involved in π–alkyl interactions. In compound 54, G304 forms H-bonding with OH of the hydroxamate group. In compound 82, Y100 forms π–alkyl interactions with pyrrolidine ring and L308, P35, and Y306 are involved in alkyl interactions with the trimethoxy group. In compound 102, I34, K33 and F152 are involved in alkyl interactions, M274 forms π–sulfur interactions with the aromatic ring of the tetrahydroisoquinoline ring, and H143 is involved in H-bonding interactions. Residues I34, P35, and F152, along with residue Y306, participate in alkyl interactions with the trimethoxy group within compound 118. The least active compound (34) exhibited poor binding with the HDAC8 enzyme and showed fewer interactions with some amino acid residues. Compound 34 displayed ππ stacking interactions with Y100 and M274. H-bonding interactions with Y100 and van der Waals interaction with H143 were also observed.

Molecular dynamics simulation

A molecular dynamics (MD) simulation study is an approach to assess the fluctuation, and atomic motion of individual atoms or groups, as well as changes in conformations of a molecule with receptor/protein for a specific time [97]. By foreseeing a specific pose in terms of the interactions of the ligand with the macromolecule, the MD simulation aims to verify the stability of the receptor–ligand complex.

In the current study, MD simulation (500 ns) was performed for the most active HDAC8 inhibitor (54) and least active HDAC8 inhibitor (34). Compound 54 has all the pharmacophoric features and it binds nicely in the active site of HDAC8. The least active compound (34) was also docked (Fig. S3) in the active site of HDAC8 and was inspected through MD simulation (500 ns) for comparison. After the MD simulation, we compared the RMSD, Rg, and RMSF values of the protein–ligand complexes stated above to HDAC8 apo form (control system). The average RMSD of the backbone atoms of HDAC8 apo form and HDAC8 complexed with compounds 54 and 34 were 0.20 ± 0.02 nm, 0.15 ± 0.01 nm and 0.17 ± 0.02 nm, respectively during 500 ns. The lower RMSD of HDAC8 complexed with compound 54 implies that this compound induces a more significant conformational change or stabilization in the protein structure than to the apo form. Similarly, the RMSD of HDAC8 complexed with compound 34 indicates that this compound also induces a significant structural change or stabilization in the protein compared to the apo form, though perhaps not as pronounced as compound 54 (Fig. 7A).

Fig. 7
figure 7

Molecular dynamic simulation analysis: A RMSD of HDAC8 apo form and with compounds 54 and 34, respectively, B Rg of the backbone atoms of HDAC8 without ligand and with the mentioned ligands, C RMSF of amino acid residues of the HDAC8 without ligand and with the mentioned ligands

To examine the compactness of HDAC8 in the presence and absence of the aforementioned ligands, the radius of gyration (Rg) is plotted in Fig. 7B. The average Rg values for the HDAC8 apo form and complexes with compounds 54 and 34 were uniformly found to be 2.01 ± 0.01 nm, respectively, demonstrating that the compactness of HDAC8 does not significantly change in the presence of compounds 54 and 34 even after 500 ns period (Fig. 7B). Furthermore, the RMSF of individual HDAC8 residues is determined to examine the flexibility or rigidity of various locations within HDAC8 apo form and its complexes with compounds 54 and 34 (Fig. 7C). The average RMSF of backbone atoms in HDAC8 apo form and in the presence of compounds 54 and 34 were found to be 0.09 ± 0.07 nm, 0.08 ± 0.05 nm, and 0.08 ± 0.05 nm, respectively. It is inferred that in the presence of the aforementioned inhibitors, the average residual fluctuation of HDAC8 has reduced with respect to the fluctuation of the apo form.

During MD simulation, snapshots of all complexes were visualized at varied time intervals to investigate HDAC8-inhibitor complex stability within the binding pocket. Interestingly, compound 34 moves out from the catalytic pocket before 100 ns during the simulation period of 500 ns, but compound 54 (pIC50 = 7.678) remains in the binding site until 330 ns of the simulation (Fig. 8). To verify this observation, we have plotted individual distances between the oxygen atom of hydroxamate functionality of each ligand and zinc atom within the binding site as a function of time (Fig. 9). Additionally, to assess the impact of ligand size on HDAC8 inhibition, we conducted a 100 ns molecular dynamics (MD) simulation of a moderately active ligand with intermediate size (compound 37, pIC50 = 6.180) (see Fig. S5, Supplementary file). Compound 37 displays a similar pattern to compound 34 (pIC50 = 4.915), exiting the binding pocket before the 100 ns (Fig. S6, Supplementary file).

Fig. 8
figure 8

Binding stability of A the most active compound (54) and B the least active compound (34) with HDAC8 enzyme up to 330 ns simulation

Fig. 9
figure 9

Distance between the oxygen atom of each ligand and zinc atom during the 500 ns simulation

Conclusions

This study sequentially employed validated QSAR and q-RASAR models by using the PLS-regression-based method, pharmacophore mapping, molecular docking, and molecular dynamics approaches to identify essential structural features for potential HDAC8 inhibitors. Statistically significant PLS regression-based QSAR model (\({Q}_{F1}^{2}\):0.732, \({Q}_{F2}^{2}\):0.727, MAEtest: 0.249) was developed using four latent variables. q-RASAR strategy was applied to increase the external predictivity of HDAC8 inhibition. The developed q-RASAR model has a high statistical significance and predictive ability (\({Q}_{F1}^{2}\):0.778, \({Q}_{F2}^{2}\):0.775, MAEtest: 0.221). The different descriptors in the final q-RASAR model were discussed to get meaningful insight into the mechanistic aspects of HDAC8 inhibition. Different pharmacophoric features have also been identified through pharmacophore mapping studies, and it revealed that HYP1 was the best pharmacophore model, with the highest correlation coefficient (r = 0.969), and lowest rms of 0.944. The pharmacophore predictions showed that the ring aromatic (RA) feature near the hydrogen bond acceptor feature (HBA2) in HDAC8 inhibitor plays a crucial role in HDAC8 inhibition, while other features, such as the presence of ZBG, are also essential for HDAC8 inhibition. Based on the q-RASAR model and pharmacophore mapping studies, five HDAC8 inhibitors (compounds 44, 54, 82, 102, and 118) were chosen as potent inhibitors to assess the binding interactions by using molecular docking. A molecular docking study validated the results of pharmacophore mapping which demonstrated that the hydroxamate moiety (as ZBG) is involved in metal (Zn2+) coordination in the HDAC8 binding site in all the five potent HDAC8 inhibitors. Moreover, the ring aromatic feature (RA) and hydrogen bond acceptor (HBA1) feature necessary for HDAC8 inhibition of the most active inhibitor (54) were highlighted by the phenyl ring and the sulfur atom of dithiolane ring, respectively, in the molecular docking study (Fig. 10). Lastly, the complex stability of the most (54) and the least active (34) inhibitors were analyzed using MD simulation which indicated that inhibitor 54 exhibited more structural stability of the complex than inhibitor 34. The findings of this study could be useful for future HDAC8 inhibitor design, and the computational strategy used can be broadly applied to different targeted drug designs.

Fig. 10
figure 10

Structure of some of the active compounds (54, 82, and 102) along with the essential structural features obtained from different computational studies

Overall, this comprehensive exploration sheds light on the intricate molecular aspects influencing HDAC8 inhibition, offering a foundation for future research endeavors in the pursuit of novel and effective HDAC8 inhibitors.