1 Introduction

Multi-drug resistance strain Mycobacterium tuberculosis (TB) has pose a challenge toward the treatment of tuberculosis in the global community. World Health Organization in (2017), has reported 9.0 million people infected with tuberculosis, 360,000 HIV patient whom were leaving with tuberculosis, death of 230,000 children and death of 1.6 million people worldwide [1]. Some of the notable commercial sold drugs administered to people infected with tuberculosis are isoniazide (INH), pyrazinamide (PZA), rifampicin (RMP) and para-amino salicylic acid (PAS) [2]. The emergence of multi-drug resistance strain of M. tuberculosis toward the aforementioned drugs has led to advances in searching for new and better approach that is precise and fast in developing a novel compound with improved biological activity against M. tuberculosis [2, 3].

For the time being, QSAR is a theoretical approach with widely used computational method in predicting and designing new hypothetical drug candidate [2]. Multi-variant QSAR model is expressed mathematically to relates the biological activity of each compound with its respective molecular structures. Meanwhile, some prominent researchers [4,5,6,7] have successful established QSAR models to show the relationship between some anti-M. tuberculosis inhibitor’s such as; chalcone, quinolone, 7-methyijuglone, pyrrole and their respective biological activities using QSAR approach. However, QSAR alongside with molecular docking simulation study have not been fully established to relate the structures and activities of the inhibitory compounds as well as the interaction mode with the receptor (DNA gyrase). Hence, this research was aimed to build a robust QSAR model with high predictability, carry out a molecular docking simulation and to design new potent hypothetical compounds with better anti-tubercular activities against M-tuberculosis.

2 Materials and Methods

2.1 Data Collection

Fifty (50) molecules comprising the derivatives of 1,2,4-Triazole reported as anti-mycobacterium tuberculosis that were used in this study were obtained from the literature [3]. The biological activities of these compounds and the list of the compounds were presented in Table 1.

Table 1 Molecular structures of inhibitory compounds and their derivatives as anti-tubercular agents

2.2 Molecular Optimization

Spartan 14 software version 1.1.4 [https://down.cd/10055/buy-WaveFunction-Spartan-14-1.1.4-download] was used to optimize all the inhibitory compounds in order for the compounds to attain stable conformation at a minimal energy. The strain energy from the molecules were removed by employing Molecular Mechanics Force Field (MMFF) and complete optimization was achieved with the aid of Density Functional Theory (DFT) by utilizing the (B3LYP/6-31G*) basic set [5].

2.3 Generation of Molecular Descriptor

A descriptor is a mathematical logic that defines the properties of a molecule in a numeral term based on the connection between the biological activity of each molecule and its molecular structure. Descriptors for all the inhibitory molecules was calculated with the aid of PaDEL descriptor software version 2.20 [http://www.yapcwsoft.com/dd/padeldescriptor/] and a total of 1879 molecular descriptors were generated.

2.4 Normalization and Pretreatment of Data

For each of the variable (descriptor) to have the same chance at the inception so as to influence the QSAR model, the descriptors values generated from PaDEL descriptor software version 2.20 were subjected to normalization using Eq. 1 [2, 8].

$$ D = \frac{{d_{1} - d_{{\min }} }}{{d_{{\max }} - d_{{\min }} }} $$
(1)

where dmax and dmin are the maximum and minimum value for each descriptors column of D. d1 is the descriptor value for each of the molecule. Immediately after the data have been normalized, the normalized data were then subjected to pretreatment [http://teqip.jdvu.ac.in/QSAR_Tools/]. so as to remove redundant descriptors.

2.5 Generation Training and Test Set

The whole compounds that made up the data set was divided into training and test set in proportion of 70 to 30% using Kennard and Stone’s algorithm which was incorporated in DTC lab software [http://teqip.jdvu.ac.in/QSAR_Tools/]. The development of the QSAR model and internal validation test were performed on the training set while the confirmation of the developed model was performed on test set.

2.6 Building of QSAR Models and Internal Validation Test

The QSAR models were built by adopting the Genetic Function Approximation (GFA) technique incorporated in the Material Studio software version 8.0 [https://www.3dsbiovia.com/products/collaborative-science/biovia-materials-studio/] to select the optimum descriptors for the training set. Meanwhile, Multi-linear regression Approach (MLR) was used as a modelling tool to develop the multi-variant equations by placing the activity data in the last column of Microsoft Excel 2013 spread sheet which was later imported into the Material Studio software version 8.0 to generate the QSAR model. The internal validation test to affirm the built model is robust and also have a high predictability was also performed in Material Studio software version 8.0 and reported.

2.7 Evaluation of Leverage Values (Applicability Domain)

Influential and outlier molecule present in the both the training and test set were determined by employing the applicability domain approach. The leverage hi approach as defined in Eq. 2 was used define applicability domain space \( \pm \) 3 for outlier molecule [9, 10].

$$ h_{i} = M_{i} \left( {M^{T} M} \right)^{ - 1} M_{i}^{T} $$
(2)

where Mi represent the matrix of i for the training set. M represent the \( n \times d \) descriptor matrix for the training set and \( M^{T} \) is the transpose of the training set (M).\( M_{i}^{T} \) represent the transpose matrix Mi. Meanwhile, the warning leverage h* defined in Eq. 3 is the limit boundary to check for an influential molecule.

$$ h^{*} = \, 3 \frac{{\left( {d + 1} \right)}}{N} $$
(3)

where d is the total number of descriptors present in the built model and N is the total number of compounds that made up the training set.

2.8 Y-Randomization Validation Test

Y-Randomization test [http://teqip.jdvu.ac.in/QSAR_Tools/] is one of the external validation criteria which has to be considered in order to ascertain that the developed model is not built by chance [10, 11]. Random shuffling of the data was performed on the training set following the principle laid by [10, 12]. The activity data (dependent variable) were shuffled while the descriptors (independent variables) were kept unchanged in order to generate the Multi-linear regression (MLR) model. For the developed QSAR to pass the Y-Randomization test, the R2 and Q2 values for the model must be significantly low for numbers of trials while Y-randomization Coefficient (c\( R_{p}^{2} ) \) shown in Eq. 4 must be \( \ge \) 0.5 in order to establish the robustness of the model.

$$ {\text{c}}R_{p}^{2} = R \times \left[ {R^{2} - \left( {R_{r} } \right)^{2} } \right]^{2} $$
(4)

where c\( R_{p}^{2} \) is Y-randomization Coefficient, R is correlation coefficient and Rr is average ‘R’ of random models.

2.9 Affirmation of the Build Model

The internal and external validation criteria for both test and training set reported were compared with the generally accepted threshold value shown in Table 6 for any QSAR model [2, 10,11,12,13] in order to affirm the reliability, fitting, stability, robustness and predictability of the developed models.

2.10 Docking Studies

2.10.1 Preparation of receptor

The crystal structure of DNA gyrase shown in Fig. 1 was obtained from protein data bank with PDB code 31FZ [15]. Crystal structure of DNA gyrase was prepared by removing all bound substances (ligands and cofactors) and solvent molecules associated with the receptor. DNA gyrase preparation was done by launching the Discovery Studio Visualizer software; The prepared receptor was then saved in PDB file format which is the recommended input format in Pyrx and Discovery Studio Visualizer software. The prepared receptor was transported into the Pyrx software in order to make it a macro molecule. [13].

Fig. 1
figure 1

Crystal structure of DNA gyrase

2.10.2 Receptor (DNA Gyrase) Preparation

The crystal form of the target protein (DNA gyrase) was downloaded from protein data bank with PDB code 31FZ [14, 15]. All imported foreign substances such as solvent molecules, cofactors and ligands allied with the enzyme were disinterested using Discovery Studio Visualizer software [https://www.3dsbiovia.com/products/collaborative-science/biovia-discovery-studio/]. Later on, the target protein was saved format (PDB) which is the recommend format for Pyrx software and Discovery Studio Visualizer. Thereafter, the target protein saved in PDB format was imported in the Pyrx software and converted as macro molecules [5, 16].

2.10.3 Ligand Preparation

The stable conformation of triazole derivatives at a minima energy were achieved with the aid of Spartan 14 software at Density Functional Theory (DFT) level which serve as an optimized tool. The optimized ligands were then saved as a PDB format which is the recommend format for the Pyrx software. Later on, the ligands saved in PDB format were imported in the Pyrx software and converted as micro molecules [5, 16].

2.10.4 Docking of Receptor and Ligand

Ligand-receptor interactions between triazole derivatives and the receptor (DNA gyrase) was carried out using molecular docking technique by employing the PyRx virtual screening software. The PyRx software [https://pyrx.sourceforge.io/], is an open source software for performing virtual screening. PyRx uses AutoDock Vina [http://vina.scripps.edu/] and AutoDock 4.2 [http://autodock.scripps.edu/] as docking softwares. Discovery Studio Visualizer software version 2016 [https://www.3dsbiovia.com/products/collaborative-science/biovia-discovery-studio/] was used to visualized and analyzed the docked results. [5, 16].

3 Results and Discussion

3.1 QSAR Studies

Optimum QSAR model for predicting the derivatives of 1, 2, 4 Triazole against M. tuberculosis was successfully achieved by adopting the combination of computational and theoretical method. Data set comprises of 50 compounds was partitioned into 35 training set and 15 test set using Kennard and Stone algorithm method. The 35 training set compounds were used to derive QSAR model using Multi-linear regression technique which also served as data set for internal validation test while the external validation test for the derived model was conducted on the test set.

Model 1

$$ \begin{aligned} {\text{pBA}} =& - 3.927401745 \, *{\text{ MATS2s}} + \, 4.730973152 \, \\ &*{\text{ nHB}}\text{int} 3 \, + \, 1.1035920582 \, * \, \hbox{max} {\text{tsC }} \\ &+ 0.310934301*{\text{TDB9u}} \, - \, 0.791306892 \, \\ &*{\text{ RDF90i }} - \, 4.281096493 \, *{\text{ RDF110s }} \\ &+ 8.840916286 \\ \end{aligned} $$

Model 2

$$ \begin{aligned} {\text{pBA}} =& - 2.418520845*{\text{ MATS2s }} + \, 1.783195320 \, \\ &*{\text{ nHB}}\text{int} 3 \, \, + \, 1.310849563 \, *{\text{ maxtsC }} + 0.0280218642 \, *{\text{ TDB9u}} \, - \, 4.992450732 \,\\ & *{\text{ RDF150p }} - \, 4.59209513*{\text{ Ds}} + 9.702350851 \\ \end{aligned} $$

Model 3

$$ \begin{aligned} {\text{pBA}} = & - 6.934102832*{\text{MATS2s }} + \, 1.760023432\\ &*{\text{nHBint3 }} + \, 4.803387356*{\text{maxtsC}} \\ &+ 2.7934152560*{\text{TDB9u }} + 0.950439041\\ &*{\text{RDF90i }} - \, 3.521095439*{\text{De }} \\ &+ \, 7.4873028922 \\ \end{aligned} $$

The experimental activities reported in literature, the predicted activities calculated for all the anti-tubercular compounds, the leverage values and the residual values were presented in Table 1. The difference between the experimental activities and predicted activities is the residual values which were observed to be significant low. The low residual value indicates that the model built has a good predictive ability.

The optimum (2D and 3D) descriptors that efficiently describe the anti-tubercular compounds in relation to their biological activities were selected by GFA approach. The characterization and relative information on the molecular structure of the anti-tubercular agent illustrated by the descriptors were reported in numerical value as shown in Table 2. Meanwhile, for the purpose of reproducibility all the calculated descriptors for the both the training and test set in model 1 were presented in Table 3.

Table 2 Name of selected descriptors used in the QSAR model 1
Table 3 Predicted descriptors for training set in generating model 1

Various statistical analysis were conducted on the calculated descriptors in order to check the validity of the built model as reported in Table 4. Variance inflation factor (VIF) was evaluated for all the descriptors in order to determine the degree of correlation between each the descriptor. Generally, VIF value equal to 1 or falls with 1 and 5 signify non-existence of inter-correlation among the descriptors. However, if the VIF value is greater than 10, it signify that the model developed is unstable hence, the model should be re-checked if necessary. Regarding the VIF values for each the descriptors which were found to be less than 5 as reported in Table 4 affirm that the descriptors were significantly orthogonal to each order since there is no inter-correlation between them.

Table 4 Statistical parameters that influence the model 1

The degree of contribution that each descriptor plays in the built model was evaluated by determining the standard regression coefficient (\( b_{j }^{s} ) \) and mean effect (ME). The magnitude and signs for \( b_{j }^{s} \) and ME values reported in Table 4 indicate strength and direction with which each descriptor influence the activity model. The relationship between the descriptors and biological activity of each compound was determined by one way Analysis of variance (ANOVA). The probability value of each of the descriptor at 95% confidence level were found to be (p \( < \) 0.05) as presented in Table 4. Therefore this signify that the alternative hypothesis that says there is a direct relationship between the biological activity of each compound and the descriptor swaying the built model is accepted thus; null hypothesis proposing no direct relationship between biological activity of each compound and the descriptor swaying the built model is rejected. To further justify the validation of the descriptors in the activity model, Pearson correlation statistic was conducted to also check whether there is inter-correlation between each descriptors. The correlation coefficient between each descriptors reported in Table 5 were all \( < \pm 0.8 \). Hence this implies that all the descriptors were void of multicollinearity.

Table 5 Pearson’s correlation coefficient for the descriptor used in the QSAR model

Validation results for both the external and internal assessment to assure that the built models are reliable and robust were presented in Table 6. These results were all in full agreement with general validation criteria resented in Table 6 to truly indorse that the stability and robustness of the model is valid. Reference to these validation results obtained, model one was selected and established to be the prime model which was used to predict the biological activities of 1, 2, 4 Triazole against M. tuberculosis.

Table 6 Validation parameters for each model using multi-linear regression (MLR)

The QSAR model generated in this research was compared with the models obtained in the literature [10] as shown below.

$$ \begin{aligned} {\text{pBA}} & = - 6.515153698*{\text{AATS5e}} + 0.056593117*{\text{VR1 Dzs}} \\ & \quad - 6.230058484*{\text{SpMin7 Bhe}} + 0.016884210*{\text{TDB7e}} \\ & \quad + 0.09232054{\text{RDF9}}0{\text{i}} + 43.764308643 \\ \end{aligned} $$

R2 = 0.9265, Radj = 0.9045, Qcv2 = 0.8324 and the external validation for the test set was found to be R2pred = 0.8034 [10].

The validation factors reported in this work and those reported in the literature were all in agreement with the validation parameters presented in Table 6 which really inveterate that the model generated is predictive and robust.

The coefficient of Y- Randomization (c \( R_{p}^{2} ) \) with significant value of 0.7849 greater than threshold value of 0.5 reported in Table 7 provide a reasonable supports that the model built is robust and not just by chance.

Table 7 Y-randomization parameters test for model 1

The graphical representation to show the degree of correlation between the predicted activities and experimental activities of the training and test set were shown in Fig. 2 and 3. The correlation coefficient (R2) value of 0.9579 and 0.8657 for both the training set and test set shows that there is a high correlation existing between the predicted activities and experimental activities of the training and test set which were also in agreement with the accepted QSAR threshold values reported in Table 6.

Fig. 2
figure 2

Plot of predicted activity against observed activity of training set

Fig. 3
figure 3

Plot of Predicted activity against Observed activity of test set

The residual plot shown in Fig. 4 signify that there is no indication of computational incompetency and inaccuracy in the QSAR model derived as all the standard residual values for both training and test set were found within the defined boundary of \( \pm \) 2 on the standard residual activity axis [2, 10, 17, 18].

Fig. 4
figure 4

Plot of standardized residual activity versus observed activity

The Williams plot to show the Applicability Domain space (AD) is shown in Fig. 5. It is observed that only compound 42 was found to exceed the warning leverage of (h* = 0.60). Therefore it can be infer that this compound is an influential molecule. Moreover, it is also observed that all the compounds fall within the defined space of \( \pm \) 3 which indicates that no compound is said to be outlier.

Fig. 5
figure 5

The Williams plot of the standardized residuals versus the leverage value

3.2 Molecular Docking Studies

3.2.1 Assessment of Binding Affinity

Binding affinity between the ligand and target enzyme is elucidated via docking studies. The outcomes of the docking studies evidently showed that the activity value of each docked ligand correlated with binding affinity which ranged from − 4.0 to − 21.9 kcal/mol presented in Table 8. Meanwhile, ligands 41 was observed with higher binding affinity of − 21.9 kcal/mol compared to the binding affinity of commended drugs; isoniazid (− 14.6 kcal/mol) and other derivatives. Hence, this gives an indicating that ligands 41 could serve as a better compound against tuberculosis.

Table 8 Molecular docking interactions between M. tuberculosis target (DNA gyrase) ligands (1,2,4-Triazole derivatives)

3.2.2 Bond Type and Bond Length in the Ligand-Receptor Complex of Compound 41

The prominent ligands (compound 41) with highest binding affinity was viewed examined using Discovery Studio Visualizer software. The interaction of ligand 41 with target enzyme ‘‘DNA gyrase’’ is presented in Fig. 6. The interaction was observed with five hydrogen bonds (2.6234, 2.1123, 2.1922, 2.6012 and 2.6302Å) with GLN385, THR77, GLN385, ALA167 and ALA167 of the enzyme. The ligand the S=O of acts as H-bond acceptor with formation of two hydrogen bonds with GLN385 and THR77 of the target. More also, the ligand N–H group acts as H-bond donor with formation of three hydrogen bonds with GLN385, ALA167, and ALA167 of the enzyme. Meanwhile, the hydrophobic interactions were detected with VAL78 and PHE168 of the enzyme. The region of the H-bond and hydrophobic interaction of the ligand-receptor complex formed are presented shown in Figs. 7, 8. Therefore, the hydrophobic interactions and the H-bonds formation offer a significant evidence to proof that ligand 41 among its co-ligand has the highest efficiency against DNA gyrase receptor.

Fig. 6
figure 6

a is the interactions between the ligand 41 and DNA gyrase. b is the interactions between Isoniazid and DNA gyrase

Fig. 7
figure 7

Hydrophobic interactions between DNA gyrase and ligand 41

Fig. 8
figure 8

H-bond interactions existing between DNA gyrase and ligand 41

3.2.3 Bond Type and Bond Length in the Ligand-Receptor Complex of Isoniazid

The binding interaction in 2-Dimension of the target enzyme with the commended drug ‘‘isoniazid’’ is represented in Fig. 6. The amino acid; SER279 and ALA337 and ALA337 are the main binding site through which the target enzyme bonded with Isoniazid via the hydrogen bond length; 2.52954, 2.29943 and 2.24657Å. Meanwhile, the amino acid; CYS345 and PHE338 are the main binding site through which the target enzyme bonded with Isoniazid via the hydrophobic interactions. Based on the observations, increase in number of hydrogen bonds in ligand 41 of triazole derivatives provide a concrete evidence to support the claim that ligand 41 binds efficiently with the binding pocket of the receptor when compared to the commended drug ‘‘isoniazid’’.

3.3 Discussion on Designed Compounds

3.3.1 Computational Design of New Hypothetical Compound

Ligand based design approach was used to designed new hypothetical compounds with improved activities against tuberculosis. The best compounds among the derivative was used as template structure to design the new compounds. The modification was done by deletion, substitution and insertion of active substituent(s) into the structure template i.e. compound 41 as shown in Fig. 9. Compound 41 was selected as the template structure due to fact that the compound falls within the defined Applicability Domain (AD) as presented in Fig. 5. The deletion, substitution and insertion of the substituent(s) was successfully made around the triazole and acetylene of the template structure at position 12 and 8 as presented in Fig. 9. The molecular descriptors; maxtsC, nHBint3, TDB9u and RDF90i in the built QSAR models designated that the activity of the compound is positively influenced. Modifications of the template structure at position 12 and 8 with alkyl group, H atom, and methoxy group result to derivation of sixteen new novel compounds with improved activities against tuberculosis as described in Table 9. In order to screen and ascertain the Applicability Domain (AD) space of the designed, leverage value for each of the compound designed was calculated. Meanwhile, the leverage value reported for all the designed compounds as presented in Table 9 asserted that all the compound designed falls within the warning leverage h* = 0.60. Therefore, this implied that each of the compound designed was within defined Model AD space. Based on the calculated activity for the compound in Table 9, it’s obviously seen that compounds 41p was observed with against tuberculosis. The prominent anti-tubercular observed in designed compound 41p was due to modification of the template at position 12 with alkyl group (CH3) which releases electron to ring system through positive inductive effect (+I) and modification at position 8 with 1H-triazole. The substituents with +I effect connected to the structure template rise the electron density which make the triazole pharmacophore of the compound 41p more basic. Hence, this gives reasonable explanation for its high activities toward Mycobacterium tuberculosis.

Fig. 9
figure 9

a is the prime compound (41). b is the design template structure

Table 9 Compound designed, calculated descriptors and predicted activities

3.3.2 Molecular Docking of Prominent Designed Ligands (Compound 41p)

The results of the docking studies affirmed the correlation between the activity values of ligand 41p and its binding affinity. The binding affinity value for ligand 41p was established to be − 24.3 kcal/mol as presented in Table 10 which were higher than the binding affinity value of the template 41 compound (− 21.9 kcal/mol) stated Table 8.

Table 10 Molecular docking interactions between M. tuberculosis target (DNA gyrase) ligands 41p

The interaction of the ligand 41p with DNA gyrase formed eight H-bonds with the enzyme. Six H-bonds formation was observed with of triazole (N–H group) of the ligand which acted as hydrogen donor with PRO B119, GLY B120, TRP B103 and VAL B278 of the target. Meanwhile, the ligand S=O group acted as hydrogen acceptor with formation of two hydrogen bonds with TRP B103 and SER B104 of the enzyme as presented in Fig. 10.

Fig. 10
figure 10

a and b are the 2D and 3D interactions existing between ligand 41p and DNA gyrase

Number of H-bonding and distance has been reported to be the key reason influencing the binding affinity of receptor-ligand interaction [5, 16, 19]. Therefore, this reason provide structural insight to support the claim why designed compound 41p was able to binding efficiently in the binding pocket of the target enzyme Fig. 11.

Fig. 11
figure 11

a and b are the H-bond and hydrophobic interactions existing between ligand 41p and DNA gyrase

4 Conclusion

Triazole derivatives was study using a theoretical method to select molecular descriptors to relate the structure of the derivatives against M. tuberculosis. The internal and external assessment confirmed that the built QSAR model is substantial, reliable and robust. Molecular descriptors; nHBint3, MATS2s, TDB9u, maxtsC, RDF110s and RDF90i from the results have shown to be prominent descriptor needed to predict the biological activities of the studied compound. Furthermore, docking study indicates that compounds 41 of the derivatives with promising biological activity have the utmost binding energy of − 18.8 kcal/mol compared to the commended drugs; Isoniazid − 14.6 kcal/mol. Thereafter, compound 41 was used as a structure template to designed compounds with more efficient activities. Among hypothetical compounds designed; compounds 41p was experiential with highest activity against tuberculosis with more noticeable binding affinity of − 24.3 kcal/mol. The presumption of this research aid the medicinal chemists and pharmacist to design and synthesis a novel drug candidate against the tuberculosis. Moreover, in vitro and in vivo test could be carried out to validate the computational results.