Introduction

Tuberculosis (TB), an infectious disease caused by Mycobacterium, remains a global health burden since many decades [1]. With the emergence of multidrug-resistant TB (MDR-TB) and extensively drug-resistant TB (XDR TB) strains, these infections have been amplified further and became difficult to cure with the conventional anti-TB therapy [2]. Moreover, TB has an active association with acquired immunodeficiency syndrome (AIDS) [3]. Such scenario of co-infection was accounted for 26% of all AIDS-associated deaths [2]. The WHO (World Health Organization) statistics on TB for the year 2020 says that there were 1.5 million people died from tuberculosis (including 214,000 AIDS-related deaths). Current pharmacotherapy for TB includes several drugs which has severe adverse side effects. Moreover, these drugs are now become ineffective due to resistant strains [4]. Thus, to deal with these issues, there is an urgent need to develop new chemical entities (NCEs) with unique mechanism of actions.

Imidazopyridine motifs are very crucial in the rational drug design and developments of newer drugs [5]. Recently, this scaffold has tremendously explored for central nervous system (CNS), antidiabetic, antitubercular, antiviral, anticancer agents, etc. [5]. An optimization of imidazopyridine heterocyclic system would serve as important medicinal chemistry tool to enhance further potency of drug moiety. Q203 and ND09759 were two clinical drug candidates, which showed strong activity against resistant strains of TB (Fig. 1) [1]. The IMB-1402 candidate also showed acceptable safety parameters (Fig. 1).

Fig. 1
figure 1

Structures of ND-09759, Q203, and IMB-1402

From several decades, computer-aided drug design (CADD) has been immensely explored and benefited for new drug discovery, structural optimization, and the target identification. Considering large number of supportive literatures [6,7,8,9,10,11,12,13,14,15,16,17], it is very clear that CADD approach facilitates an economical, less time-consuming, and successful outputs with the help of varieties of algorithms and ideas. Thus, this work focuses on incorporation of CADD approach in anti-TB drug discovery processes using ligand-based as well as structure-based drug design techniques. The QSAR approach would facilitate medicinal chemists to retrieve exact molecular characteristics required for biological activity and thus, would serve as significant tool in drug designing.

In our present study, we have carried out PHASE (Schrödinger, 2021) generated pharmacophore and multiple QSAR analysis (Figs. 2, 3, 4, 5, 67; Tables 1, 2, 3, 4, 5, 6, 7) [17,18,19,20]. The generated pharmacophore model signifies key specific characteristics required for imidazopyridines to act as potent antimycobacterial inhibitors. Our current Atom as well as Field based 3D-QSAR models correlates key features required with inhibitory potencies of molecules. Thus, from these information, one can design more potent imidazopyridines. Moreover, with the help of designed 2D- and 3D-QSAR models, we come up with newer designed 10 (S1-S10) imidazopyridines analogues (Table 8) with better in silico pharmacokinetics and good predictive potencies. From the originally reported best hit (molecule 24), we screened ZINC drug like database and retrieved with top 5 imidazopyridines (Table 9). These hit molecules were then analyzed theoretically using DFT approach (Figs. 8, 910). From theoretical properties, the best probable hit molecule VS-4 is also reported herein (Table 10).

Fig. 2
figure 2

(a) Pharmacophore model (HHPRR_1) generated by PHASE. and (b) The HHPRR_1 model illustrates hydrophobic feature (H4H5; green color), positive feature (P8; blue), and aromatic rings (R10R11, brown color) features. All active ligands overlapped on the generated model HHPRR

Materials and methods

Softwares

In our current study, the developments of common pharmacophore hypothesis (CPH) and 3D-QSAR (quantitative structure–activity relationships (QSAR)) models were carried out using PHASE module (Schrödinger, LLC, New York USA, 2021). For GA-MLR (genetic algorithm-based multilinear models)-based QSAR models, we used QSARINS (2.2.2 software). For in silico ADMET analysis (absorption, distribution, metabolism, excretion, and toxicity), we utilized “admetSAR” webserver [21].

Dataset, structure drawing, optimization, and molecular descriptor calculations

For this work, a dataset of thirty-eight substituted imidazopyridine (38) compounds having a wide chemical space with moderate to high anti-mycobacterium activity was selected (Table 1) [4]. All 38 imidazopyridine analogues were drawn using ChemBioDraw V.12.1. These 2D structures were then converted to 3D forms and optimized using the MMFF94 force field with the help of software, “TINKER.” For QSARINS model development, “Open3DAlign” program was used for alignments of all dataset molecules. For calculation of molecular descriptors, we used PaDEL and ChemDes. As per literature-known process, initially we divided our dataset into random splitting into 70%:30% (the training set and test set molecules, respectively). For the CPH development, we placed 27 compounds in training set and 11 molecules in the test set. Furthermore, all CPH models were assessed for their statistical significances. Throughout QSAR model developments (Tables 17), pIC50 (pIC50 =  − log10 IC50) values are considered as dependent variables.

Pruning of molecular descriptors for GA-MLR-based QSAR models

It is very crucial to mention that descriptor pruning is the key step for the development of QSAR models. As “PaDEL” would provide more than > 30,000 molecular descriptors, we used objective feature selection module of QSARINS ver. 2.2.2 [22,23,24]. Thus, many descriptors were excluded due to high co-linearity (|r|> 0.90) and nearly constant (> 95%) values. In continuation with the same, we also removed various esoteric descriptors manually. Finally, via descriptor pruning step, we were retained with 600 molecular descriptors (1D, 2D, and 3D).

GA-MLR-based QSAR model building

Statistically robust GA-MLR-based QSAR models were developed and validated using popular software QSARINS ver. 2.2.2. All developed QSARINS-based models were assessed for their internal and external validations, analysis of their applicability domains and Y-randomizations. These validations were carried as per the OECD guidelines. Initially, we divided all 38 molecules into training and test sets. Further, we allowed splitting of dataset as per known criteria [1, 7, 9] or method, i.e., 70%, training:30%, test. respectively. In this way, we used training set molecules for model building and test set molecules for external validations. Moreover, we also allowed multiple splitting so as to gain models with good statistical significances. During QSARINS utilization, we kept all default functions as it is. The Q2LOO was selected as a fitness function. We observed that the value of Q2LOO was increased till 6 variables and then, after there were only minor increments in Q2LOO. Thus, models with 3–6 variables were developed and assessed to check the overfitting issues. The best GA-MLR-based QSAR model was analyzed and studied further (Table 7). Further information on GA-MLR-based QSAR model validation is available in the supporting information.

The CPH studies

For the developments of common pharmacophore hypotheses, we utilized two parts on our dataset molecules (38), viz., actives and inactive (Fig. 2). CPH analysis was performed with PHASE (Schrödinger, 2021 release, Inc.). The macromodel (OPLS-2005 force field) utility was used for minimization of all 38 IMPs (imidazopyridines). LigPrep minimized configurations were allowed to import into PHASE workflow at pH 7.4 ± 0.0. Thereafter, we set a criterion for splitting of molecules into active and inactive sets (Active = MIC (-log) > 5.40 mol/l and inactive = MIC (-log) < 4.30 mol/l). Default pharmacophoric features includes 6 features (positive (P), aromatic ring (R), negative (N), hydrogen bond acceptor (A), hydrogen bond donor (D), and hydrophobic (H)). Throughout pharmacophore developments, we used default definitions of PHASE module keeping box size for pharmacophore to 2 Å. After CPH generations, top ranked CPH was selected and used further for 3D-QSAR analysis (grid space = 1 Å and PLS factor = 3).

Flexible ligand alignment and 3D-QSAR studies

For ligand alignments, we had superimposed all LigPrep minimized 38 ligands (LigPrep, Schrödinger, 2021). As in atom-based 3D-QSAR, a set of overlapping van der Waals spheres are considered; thus, we further carried the QSAR model development with atom-based 3D-QSAR option (Phase, Schrödinger, 2021) (Tables 13). Twenty-seven molecules were placed in the training set, while 11 molecules were in test set for both atom as well as field based-3D-QSAR analysis (Phase, Schrödinger, 2021) ((Tables 46); Figs. 36). The random splitting pattern (70%:30%) was used while developing atom as well as field based-3D-QSAR models. Partial least square (PLS) factor was kept as 3 along with grid spacing of 1 Å. In Field based 3D-QSAR, Gaussian based fields were employed (Fig. 6) [25]. These consists of five Gaussian-based field features like Gaussian H bond donor, Gaussian H bond acceptor, Gaussian electrostatic, Gaussian steric, and Gaussian hydrophobic. The truncate steric force field as well as electrostatic force fields were kept at 30.0 kcal/mol as per default settings. Moreover, variables with std. dev. < 0.01 were subjected to elimination. In field-based QSAR, Gaussian intensities (as descriptors) were considered as independent variables. Finally, for both atom-based and field-based 3D-QSAR models, visualization of contour maps was carried out (Figs. 36).

Theoretical method

The density functional theory (DFT) was utilized in order to optimize the gas-phase structures of compounds under investigations (VS-1-VS-5) [19]. The hybrid B3LYP method was used for current DFT calculations. Overall, basis set 6–311 +  + G** was used while doing B3LYP calculations. The harmonic vibrational frequencies (HVF) were also calculated at same theoretical levels, after retaining of the converged geometries. The Gaussian 09 program was used for geometry optimizations and HVF analyses. Moreover, the quantum chemical descriptors derived from the conceptual DFT were calculated for the respective molecule.

In silico ADMET predictions

The “admetSAR” tool was employed to calculate ADMET properties of newer hydrazones (S1-S10). For current study, we also investigated toxicity analysis (carcinogenicities, Ames test toxicities, and CYPs enzyme–substrate/inhibitory assessments).

Results and discussion

CPH analysis, virtual screening, and 3D-QSAR

PHASE-generated pharmacophores usually consists of 6 features, which are hydrogen bond donor (D), and aromatic ring (R), hydrogen bond acceptor (A), hydrophobic group (H), negatively ionizable (N), and positively ionizable (P) (Fig. 2). Firstly, we aligned all ligands by utilizing the flexible ligand alignment with the shape-based alignment method. Ligands were classified into actives, inactive and intermediates. Finally, 5-point pharmacophore HHPRR_1 (2 hydrophobic (green), 1 positive (blue), and 2 aromatic (orange circles)) hypothesis was generated and selected based on ranking among 15 generated CPH models. CPH with HHPRR_1 feature was then visualized with excluded volumes to see features which should not clash in that region. We finally, superimposed dataset molecules (actives) with HHPRR_1 for visualization.

With the help of 3D structure of experimentally most active molecule, 24, we further used “Swiss similarity”-based ligand-based virtual screening (combined approach). From virtual screening of ZINC drug like hits, out of > 120 hits, we selected top 5 hits (VS 1–5) for further theoretical analysis and calculations of their QSAR based activity predictions. Moreover, we have also designed new series of IMPs (S1-S10) and also predicted their probable QSAR based potencies.

Atom based- and field-based 3D-QSAR studies (statistical validations)

Table 16 depict statistical analysis of developed atom-based as well as field-based 3D-QSAR models. Our detailed common pharmacophoric hypothesis revealed that hydrophobic features as well as ring features are important for biological activity. Our generated plots for the training set and test set correlations with biological activity depicted acceptable 3D-QSAR equations (Fig. 3). In our current analysis of atom-based 3D-QSAR, 27 molecules were placed in training set and 11 molecules in the test set (a correlation coefficient: R2 = 0.9291, standard deviations, SD = 0.3239, Fischer coefficient, F = 100.5, cross-validation correlation coefficient: Q2 = 0.5906, RMSE = 0.63, Pearson R = 0.7972, P = 2.31E − 13). For the field-based 3D-QSAR, we were also retained with acceptable set of parameters as like atom-based QSAR (a correlation coefficient: R2 = 0.9161, standard deviations, SD = 0.3293, Fischer coefficient, F = 87.3, cross-validation correlation coefficient: Q2 = 0.5039, RMSE = 0.83, Pearson R = 0.7149, P = 4.73E − 13). Higher values for R2, Q2, and F values signifies statistical robustness of developed 3D-QSAR models. Further, these models were subjected to contour map analysis for more information of structural characteristics.

Fig. 3
figure 3

Graphical presentation of actual versus predicted pIC50 of (a) training and (b) test set molecules for obtained atom-based 3D-QSAR model

Visualizations of 3D-QSAR models

Analysis of atom-based 3D-QSAR models

From our developed PHASE-generated atom-based 3D-QSAR models, we only selected the best one with higher statistical robustness. Thus, contributions of various features were analysed by QSAR visualization. The blue occlusion maps or contours signifies increment in biological activity (BA). However, red occlusion maps or contours suggests decrease in BA. Figure 4, represents various contour maps for different features of atom-based QSAR visualizations observed for molecule 24.

Fig. 4
figure 4

AD Visual representation of atom-based PHASE 3D-QSAR model. A Electron withdrawing, B hydrogen bond donor, C hydrophobic, D Negative, and E positive ionic, blue color cubes indicate positive coefficient or increase in activity and red color cubes indicate negative coefficient or decrease in activity

Fig. 5
figure 5

Graphical presentation of actual versus predicted pIC50 of (a) training and (b) test set molecules for obtained field 3D-QSAR model

Fig. 6
figure 6

Field contour maps based on test set compounds. A Gaussian electrostatic fields: favored electropositive (blue) and disfavored electronegative (red). B Gaussian hydrogen bond acceptor field: favored (red) and disfavored (magenta). C Gaussian hydrogen bond donor field: favored (purple) and disfavored (cyan). D Gaussian hydrophobic field: favoured (yellow) and disfavored (white). E Gaussian steric field: favored (green) unfavorable (yellow)

Fig. 7
figure 7

a Graph of experimental vs predicted pIC50 values for model 1. b William’s plot for model 1. c Insurbia plot and d Y-scramble plot for model 1

Fig. 8
figure 8

The B3LYP-optimized geometries of 1 ~ 5 (bond lengths in Å) (VS-1–5)

Fig. 9
figure 9

The HOMO and LUMO of the studied compounds VS-1 to VS-5 (1–5) (the isovalue = 0.02 a.u.)

Fig. 10
figure 10

The MEP of the studied compounds VS-1 to VS-5 (1–5) (the isovalue = 0.0004 a.u)

From our best generated atom-based 3D-QSAR, it was clear that electron withdrawing substitutions or groups besides a 2,6-dimethylimidazo[1,2-a]pyridine feature would increase in biological activity as represented majority by blue occlusion maps (BA). However, replacement or substitutions over amidic oxygen has been represented by red occlusion maps indicating slight decrease in BA. Hydrogen bond donating features over 2,6-dimethylimidazo[1,2-a]pyridine are well tolerated and usually would tend to increase in BA as represented with blue contours. It has also been seen that hydrophobic groups or substitutions over 2,6-dimethylimidazo[1,2-a]pyridine and 1-bromo-4-methoxybenzene feature are significantly increasing biological activities. This has been represented with large number of blue occlusion maps. Red contours for negative feature over 1-bromo-4-methoxybenzene feature indicates decrement in BA; however, positive ionic features show blue maps over the region of imidazo[1,2-a]pyridine feature. The statistical parameters and atom fractions for the developed atom-based QSAR models are tabulated in Tables 2 and 3.

Analysis of field-based 3D-QSAR models

In order to study Gaussian field-based 3D-QSAR models, we superimposed all dataset molecules over the best developed field-based 3D-QSAR model (Fig. 6). It has been observed that Gaussian electrostatic fields over amidic feature attached to 2,6-dimethylimidazo[1,2-a]pyridine prominently showed red occlusion maps indicating disfavored substitutions, while some of substitutions associated with 2,6-dimethylimidazo[1,2-a]pyridine feature indicated blue occlusions (favored pattern). Gaussian hydrogen bond acceptor field feature depicted prominent favored (red) and disfavored (magenta) occlusions immediate to portion attached with 2,6-dimethylimidazo[1,2-a]pyridine, which indicated intermediate effects on BA. Purple colored occlusions for the Gaussian hydrogen bond donor field indicated acceptable substitutions over a chemical bridge between 2,6-dimethylimidazo[1,2-a]pyridine moiety and 1-bromo-4-methoxybenzene ring feature. Gaussian hydrophobic field feature prominently depicted white occlusions over a chemical bridge between 2,6-dimethylimidazo[1,2-a]pyridine moiety and 1-bromo-4-methoxybenzene ring feature, thus indicating disfavored nature of substitutions with respect to BA. Green colored occlusions for the Gaussian Steric field feature over 1-bromo-4-methoxybenzene ring feature simply indicates tolerable steric features. The statistical parameters and field fractions for the developed field-based models are tabulated in Tables 5 and 6.

The analysis and interpretation of QSARINS model

For our present study, we used a small number of dataset molecules; from our previous study, it was cleared that QSAR modelling can be performed if there is a sufficient chemical space covered by analogues. It is also important to note that our currently developed model satisfies OECD (The Organisation for Economic Co-operation and Development) criteria (Table 7). Moreover, robust statistical validation parameters are also satisfying standard criteria (high values of Golbraikh and Tropsha criteria and CCCex). Thus, both internal and external validation parameters have been recorded for currently developed model and found to be statistically robust (see supplementary information) (Fig. 7). Further information on applicability domain analysis and model selection basis has been attached in supplementary information.

Table 1 Experimental dataset employed for atom-based 3D-QSAR (PLS factor 3) study along with predicted and actual pIC50 values (against MTB H37Rv ATCC 27,294)
Table 2 The partial least square (PLS) statistics for atom-based 3D-QSAR models
Table 3 The atom-type fraction statistics for the developed atom-based 3D-QSAR models
Table 4 Experimental dataset employed for field-based 3D-QSAR (PLS factor 3) study along with predicted and actual pIC50 (µM) values (against MTB H37Rv ATCC 27,294)
Table 5 The partial least square (PLS) statistics for field-based 3D-QSAR models
Table 6 The field type fraction statistics for the developed field-based 3D-QSAR models
Table 7 Statistical parameters for developed QSAR model 1
Table 8 A dataset of newer imidazopyridine hydrazone derivatives S1S10 (10) analogues used in the current study
Table 9 A dataset of virtually screened and theoretically studied imidazopyridine derivatives 15 (5) analogues used in the current study
Table 10 The quantum chemical descriptors calculated for a dataset of virtually screened imidazopyridine derivatives 15 (5) analogues used in the current study

From our analysis, we finalized our best developed models as shown below:

Multivariate model

Appendix A. Model 1 (70% training: 30% test set, 5 parametric)

Appendix B. pIC50 = 10.0372 (± 3.8874) + 0.0517 (± 0.0126)*AATS8v + 23.0654 (± 1.8898) * MATS2s-8.8033 (± 2.7514) * SaaaC + 1.34 (± 0.1022) * minHBint3-1.8587 (± 0.4181) * IC2

QSAR model interpretation

From our detailed analysis for QSARINS based model, it was observed that positive increments in the autocorrelation factor matters with biological activity. Similar trends have been observed for increase in descriptor values of MATS2s (Moran autocorrelation—lag 2/weighted by I-state), SaaaC (atom-type electrotopological state descriptor, i.e., sum of atom-type E-state:::C:), and minHBint3 (atom-type electrotopological state/minimum E-state descriptors of strength for potential hydrogen bonds of path length 3). However, there is decrement in biological activity as there would be decrease in descriptor values of information content index (neighborhood symmetry of 2-order) (IC2).

Considering limitations of our developed model, we believe that with the use of large pool of descriptors and large number of dataset molecules, this model can be further developed and used for designing and prediction of newer analogues of imidazopyridines as anti-TB agents. The MLR model was then applied to molecules VS 1–5 and S1-S10 (Tables 8 and 9).

The DFT studies of the virtually screened ZINC drug-like hits/compounds

Calculations of theoretical properties (the (FMO approach)

One must take into considerations several aspects of the frontier molecular orbital theory (FMO), especially HOMO and LUMO (the highest occupied and lowest unoccupied molecular orbitals) [19]. For electrophilic and nucleophilic sites determinations, one must consider LUMO and HOMO orbitals, respectively. The FMOs of the title compounds (VS-1 to VS-5) were studied in this study (Table 10). As depicted in Fig. 9, the transition from HOMO to LUMO for VS-3 and VS-5 belong to the π* transition while that for VS-1, 2, and 4 is a charge transfer. The B3LYP-converged geometries of the studied compounds were summarized in Fig. 8.

The energy of the highest-occupied molecular orbital (EHOMO), the energy of the lowest-unoccupied molecular orbital (ELUMO), dipole moment (D), and the qunatum chemical descriptors including the chemical potential (μ), chemical hardness (η), softness (S), and electrophilicity index (ω) calculated by the following equations:

$$\mu =\frac{I+A}{2}$$
(1)
$$\eta =\frac{I-A}{2}$$
(2)
$$S=\frac{1}{2\eta }$$
(3)
$$\omega =\frac{{\mu }^{2}}{2\eta }$$
(4)

where I and A are the ionization energy and electron affinity of a species, respectively. 26.

Furthermore, the ionization energy (I) and the electron affinity (A) of a species could be calculated by applying the Koopmans’ theory [19] (I = -EHOMO and A = -ELUMO) and the quantum chemical descriptors were calculated and summarized in the Table 10. It is pertinent to note that both chemical hardness and global softness values are comparable with the stability and reactivity of molecules. A Smaller (greater) the value of hardness (softness) a molecule has, the more reactive it should be. From Table 10, it was observed that VS-4 exhibited the lowest value of chemical hardness and highest value of global softness among the studied compounds, therefore, it is chemically more reactive and less stable than all other compounds. Moreover, from previous literature analyses, one can compare electrophilicity index (ω) with the toxicity of molecules. From Table 10, it was observed that VS-1 showed the lowest value of electrophilicity index (ω) among the tested compounds, which indicates that it should have the lowest toxicity among all the studied compounds.

MEPs

It has been seen that molecular electrostatic potentials (MEPs) can be successfully used for measuring the interaction strengths of the nearby charges, nuclei and electrons, etc. These plots give us visual representations and provide information on the charge distributions [26]. In general, red color contours denote the lowest electrostatic potential value, while blue indicates the highest electrostatic potential value. From Fig. 10, it has been seen that oxygen atoms in studied compound are responsible for the nucleophilic attacks (as they have larger electronegativities).

Theoretical prediction of ADMET properties

Calculations of ADMET properties are very crucial to optimize lead molecules. Although, in silico methodologies have known limitations, still these methods can successfully use to predict pharmacokinetic properties before actual experiments. For our current study, we have accessed these properties via in-silico methods for virtually screened ZINC Druglike hits (VS1-5) and newly designed S1-S10 molecules (the “admetSAR” tool). Virtually screened hits were obeyed drug like characteristics. It was observed that designed molecules S1-S10 followed a class III acute oral toxicity. Moreover, eye erosion, HERG and corrosion properties were found to be negative. Our predicted hit molecule, S10 was found to be good in-silico pharmacokinetic properties, thus we would recommend molecule S10 for further in vitro analysis for future scope on this work.

Conclusion

In summary, a dataset of thirty-eight substituted imidazo[1,2-a] pyridine-3-carboxamide (38) compounds was used to develop a common pharmacophore hypothesis, atom-based as well as field-based 3D-QSAR analysis. Moreover, same dataset has been explored for GA-MLR QSAR model developments. From pharmacophoric hypothesis, HHPRR we understood that hydrophobicity and ring functionality are key important features. For both atom-based (a correlation coefficient: R2 = 0.9291, standard deviations, SD = 0.3239, Fischer coefficient, F = 100.5, cross-validation correlation coefficient: Q2 = 0.5906, RMSE = 0.63, Pearson R = 0.7972, P = 2.31E − 13) and field-based (a correlation coefficient: R2 = 0.9161, standard deviations, SD = 0.3293, Fischer coefficient, F = 87.3, cross-validation correlation coefficient: Q2 = 0.5039, RMSE = 0.83, Pearson R = 0.7149, P = 4.73E − 13) 3D-QSAR analysis, we were retained with higher values for both internal as well as external validation parameters. Thus, by utilizing CPH and 3D-QSAR visualization along with GA-MLR-based QSAR model, one can design newer analogues with better predictivity for biological activity and thus, would act as a reliable tool. Moreover, from our ZINC drug-like screening for actual experimental hit molecule, we retained with VS 1–5 hits which were further screened for their DFT properties. We also designed a new series of IMP-hydrazone molecules (S1-S10) and studied for in-silico ADMET analysis.

From the analysis of in-silico ADMET analysis for both VS1-5 and S1-S10, molecule S10 were found to be more potent (predicted pIC50 (μM) value: 13.64 (against MTB H37Rv ATCC 27,294)) as well as good ADMET properties. Henceforth, the designed analogue S10 and virtually screened VS-4 molecules (predicted pIC50 (μM) value: 7.96 (against MTB H37Rv ATCC 27,294)) would be proposed forward as potent anti-mycobacterial agents from our combined theoretical analysis.