Introduction

The major causative agent of acquired immune deficiency syndrome (AIDS) is the human immunodeficiency virus type 1 (HIV-1), which has cost millions of lives in recent two decades [1, 2]. Presently, the discovery of anti-viral therapeutics for the inhibition of HIV-1 replication cycle still remains a challenging task in drug discovery. Since, some of the potent inhibitors have been identified against HIV’s three major drug targets: reverse transcriptase, protease, and integrase, which play a vital role in viral infection [3]. Among these, HIV-1 integrase plays a pivotal role in the amalgamation of the DNA of the virus into the genome of host during the early stage of retroviral replication. Integrase is a highly conserved and well-characterized protein encoded at the 3′ end of HIV Pol gene. IN consists of three structural domains such as a NTD (a zinc-binding N-terminal domain), CCD (a central RNAse H-like catalytic core domain), and CTD (a DNA-binding C-terminal domain), which are crystallized either individually or in complex with other proteins/factors. Generally, the integration process occurs in two catalytic steps following by 3′-processing (a hydrolytic reaction) and strand transfer (a trans-esterification reaction), in which the nucleotides in the number of 2 will be removed by integrase (GT) from each end of 3′ of the DNA of virus in the cytoplasm during 3′-processing, and newly developed viral nucleotides bound with IN are transferred into the host cell chromosomal DNA in the nucleus. Most of PPI are specific in the progress of viral replication [4,5,6,7,8,9], since the IN takes help from cellular as well as viral proteins for completing the integration process in a gain of productive infection. Of these, one of most host cellular co-factors, EDGF/p75, is targeted as a promising tight binding partner with integrase in 2003. LEDGF/p75 comprises a domain that can bind with chromatin (PWWD) presented at N-terminal and integrase-binding domain (IDB) at C-terminal region. LEDGF/p75 acts to the interface of HIV-IN and promotes pre-integration complex to the chromatin of the genome to be presented in the host. The PPI of IN-LEDGF/p75 is facilitated between the CCD of IN and IBD present in the C-terminal region of LEDGF/p75. Identifying a new molecule to inhibit PPI is a great assignment in drug discovery. Thus, research evidence suggests that targeting the PPI of IN-LEDGF/p75 could be a great achievement towards the development of the next-generation anti-viral inhibitors [10,11,12,13]. To date, plenty of molecules with various structural features have been designed and reported as IN-LEDGF/p75 PPI inhibitors to improve the lifespan of people suffering from HIV infection. This success has been reported to be an extensive challenge that can be implemented to wrestle against the viral infection in the recent years. Hence, new classes of inhibitors targeting IN-LEDGF/p75 PPI are necessary to knock down the HIV infection and to offer an improved quality of life to an individual suffering from AIDS.

The current scenario of the reported inhibitors of a specific targeted protein provides an open opportunity to design a novel therapeutic agent for particular diseases. Today, 3D-QSAR modeling is found as a major computational approach in the academy and the industry in order to develop a strong relationship between the biological activities and the physicochemical properties of the chemical substances to obtain a consistent statistical model for the forecast of activities of designed chemical entities [14, 15]. Based on this, we have developed an atom-based 3D-QSAR on acylhydrazone, hydrazine, and diazene derivatives as IN-LEDGF/p75 inhibitors [16] for the better understanding of the structural basis and inhibition mechanism that could be further utilized for the recognition of potent lead molecules. Therefore, the development of atom-based 3D-QSAR model was performed by a phase application available in the Schrodinger suite. Besides, the quantum calculation–based docking and calculations of binding free energy were executed to understand the binding mode of interactions and energy involvement of the most and least active compounds within LEDGF/p75 binding pocket. Furthermore, DFT and MDS studies were performed to explain the electronic features of every atom in the compounds as well as examine the dynamics’ stability and behavior, respectively. We conclude that these findings could be helpful and supportive in the future for the development of potential anti-HIV inhibitor towards the treatment of viral infection.

Material and methods

Configuration of system

The current study was carried out into the Centos Linux system using the Drug Discovery Software - Schrödinger 2017 (https://www.schrodinger.com/). The academic Desmond Software 2015 was used to accomplish simulation studies.

Protein preparation

Protein structure of IN-LEDGF/p75 with 2.0 resolutions (PDB code: 2B4J)) was downloaded from PDB (https://www.rcsb.org/). Retrieved crystal coordinates were prepared in Protein Preparation Wizard, Schrodinger Software [17,18,19,20], with default parameters of assigning bond orders, optimizing and minimization using OPLS_2005 with a root mean square deviation (RMSD) value of 0.30 A°. Furthermore, Schrodinger’s receptor grid generation module was applied to the prepared structure with reported active site with default parameter of the radii of Vander Waal’s scaling factor of 1 Å with a partial charge cutoff of 0.25 Å.

Data sets

All the 45 compounds of acylhydrazone, hydrazine, and diazene derivatives with their experimental activities in micro-molar, presented in Table 1 [16], have been converted to the pIC50 value based on the formula (pIC50 = −log IC50) as mentioned by Selvaraj et al. [21]. Finally, all the retrieved compounds were taken into the platform of LigPrep module, Schrodinger [22] for proper conversion of 2D to 3D, neutralization of charge, stereoisomer generation, and ionization state at pH 7.2 ± 0.2 by applying the force field OPLS-2005.

Table 1 Compounds structure with experimental and predicted activity

3D QSAR generation and visualization

Using the Schrodinger phase module, the atom-based 3D-QSAR model has been developed by using a prepared low to higher inhibition value dataset 45, which was separated into the training set (36) and test set (9) to maintain 4:1 ratio by combining biological and chemical diversity with PLS factors. 3D-QSAR model’s visualization was analyzed with contour cubes based on the favorable and unfavorable regions with different properties like H-bond acceptor and donor, hydrophobic group as well as electron-withdrawing, and positive or negative ionic features [23,24,25,26,27].

Receptor grid generation and molecular docking

The receptor grid has been generated using the grid generation panel in Glide, Schrodinger, 2017 [28], on the basis of selected active site residues Glu170, Gln168, His171, Thr174 from A chain and Thr125, Gln95, Leu102, Ala129, Ala128, and Trp132 from B chain with X, Y, and Z coordinates (− 14.1122382553, 4.64841956383, − 9.18968857979) as well as default values of partial charge cutoff (0.25 Å) and Vander Waal’s radius scaling factor (1 Å). Finally, the most and least active compounds (7 and 1) were selected to dock into the binding site of LEDGF/p75 of IN by using quantum polarized ligand docking (QPLD) protocol implemented in Schrodinger, 2017 [29], to examine the binding inhibition and affinity. The QPLD step has been performed to find out proper conformations with strong interactions between the protein and ligand [30,31,32,33].

MM/GBSA calculation

The Prime MM/GBSA [34, 35] was applied with generalized born (GB) and surface area (SA) continuum solvent model to draw the binding conformation and free energy of both the compounds. The ΔGbind was calculated in kilocalories per mol with the following equation.

$$ {\Delta G}_{\mathrm{bind}}=\Delta E+{\Delta G}_{\mathrm{solv}}+{\Delta G}_{\mathrm{SA}} $$
(1)
$$ \Delta E={E}_{\mathrm{complex}}\hbox{--} {E}_{\mathrm{protein}}\hbox{--} {E}_{\mathrm{ligand}}\kern0.5em $$
(2)

where the terms Ecomplex, Eprotein, and Eligand denote the reduced energies of protein-ligand complex, protein, and ligand, respectively.

DFT calculation

DFT, a quantum chemistry method, was applied to investigate the detailed aspects in terms of structure, electronics, and energy states of every atom of ligand. Both the most and least active compounds were taken into the Jaguar platform in Schrodinger to compute the HOMO, LUMO, and MESP by using B3LYP (Lee-Yang-Parr correlation functional theory) incorporation of basis set 6-31G* level and hybrid DFT with Becke’s 3-parameter exchange potential. The outcome of all optimized structures was examined in the form of molecular frontier orbital’s MESP, HOMO, and LUMO using Maestro panel, Schrodinger [19, 36,37,38]. The electrostatic potential V(r) of a molecular system at point of r with nuclear charges (ZA) placed at (RA) and electron density ρ (r) applied in examining the molecules with functional sites is represented in the following equation [35].

$$ V(r)={\sum}_{A=1}^N\frac{Z_{\mathrm{A}}}{\mid r-{R}_{\mathrm{A}}\mid }-\int \frac{\rho \left(r'\right){d}^3r'}{\mid r\hbox{--} r'\mid } $$
(3)

Herein, N indicates the overall number of nuclei in the molecule, \( \frac{Z_{\mathrm{A}}}{\mid r-{R}_A\mid } \) represents as a bare nuclear potential by the contribution of nuclei, and \( \int \frac{\rho \left(r'\right){d}^3r'}{\mid r\hbox{--} r'\mid } \) appears owing to the constant electronic charge density.

Molecular dynamics simulation

The simulation studies on the complex with compound 7 and compound 1 were taken into the platform of the Desmond Software, Schrodinger, to explain the dynamic behavior and stability [38,39,40,41,42,43]. Using the system builder, both complexes were prepared by solvation with the water model SPC (simple point charge) in the box that shapes orthorhombic, neutralization of the suitable number of Na+/Cl counter ions with a fixed salt concentration of 0.15 M [44, 45] and the removal of overlapped water molecules. Furthermore, both systems were taken into the MDS for 100 ns with default relaxation protocol following by periodic boundary condition with number of atoms, pressure, and temperature (NPT) ensemble, where temperature Nose-Hoover and isotropic scaling were utilized to adjust the temperature at 300 K and atmospheric pressure at 1 atm. Later, the complete results were analyzed by monitoring the RMSD and gyration using the simulation event analysis [46,47,48,49,50].

Results and discussion

Atom-based 3D-QSAR model, validation, and visualization

The 3D-QSAR model based on atom was developed successfully using PLS factor 4 in regression, and the visualization was carried out with contour cubes as per favorable and the unfavorable region. These regions possess different properties like H-bond acceptor, donor, hydrophobic group, and electron-withdrawing group with positive or negative ionic features. The validation of the model was performed with the statistical parameters RMSE, Q2, R2, SD, and Pearson R, which has achieved an excellent correlation coefficient (R2, training set) of 0.98 with a standard deviation (SD) of 0.07, and a predicted coefficient (Q2, test set) of 0.89, with an RMSE of 0.14 and a Pearson R of 0.90. The predicted model summary of statistical data is listed in Table 2. The scatter plot with the XY axis of the actual correlation with the predicted pIC50 is represented in Fig. 1a, b for the test and training set compounds.

Table 2 Results of atom-based 3D-QSAR (quantitative structure-activity relationship)
Fig. 1
figure 1

a Scatter plot with the XY axis of the actual vs. predicted activity for the training set. b Scatter plot with XY axis of the actual vs. predicted activity for the test set

Phase three-dimensional QSAR technique has been utilized for the visualization of contour cubes based on the favorable and unfavorable areas of the compound in 3-dimensional space, which represent the significant features of the ligand interaction with the targeted protein. These contour cubes permit the consideration of the positions which are important in physiochemical property to improve biological activity of a molecule. The cube represented in blue signifies the favorable regions whereas red cubes represent the unfavorable region which can be utilized to increase the activity. Also, these cubes can be represented to various properties like hydrogen bond acceptor, hydrogen bond donor (electron-withdrawing), hydrophobic, and positive and negative ionic features that explain non-covalent interactions with the targeted receptor. The generated QSAR model is applied to the most active and least active compounds such as 7 and 1 respectively for the analysis of significant favorable and unfavorable features as shown in Fig. 2. All the favorable blue cube regions on most active compound 7 in terms of features like hydrogen bond donor on hydroxyl group of benzene ring and amine group of methylidenepyrazine, the hydrogen bond acceptors (electron withdrawing) and hydrophobic spread throughout the molecule have been identified and represented to enhance the biological activity. But, all the unfavorable red cube regions on least active compound 1 in terms of features reduce the biological activity of the compound.

Fig. 2
figure 2

Graphical representation of contours generated using the three-dimensional QSAR model on the most active (compound 7) and least active (compound 1) compounds. Blue cubes represent favorable regions for the activity; red cubes represent unfavorable region for the activity

Molecular docking

The docking studies of both the most and least active compounds with the target protein were performed in the integrase-binding site to get better insights of the protein-ligand interactions using quantum-polarized ligand docking application available in Schrodinger, presented in Table 3. The interactions based on hydrogen bond were observed mostly in active site residues of integrase like Gln168 and Glu170 present in A chain and Gln95 present in B chain in compound 7 but, in the other side of compound 1, the hydrogen bond interactions were observed in Gln168, Glu170, and His171 (from A Chain). These amino acid residues were the major contributors to the interactions between protein-ligand.

Table 3 Results of QPLD docking on the most active and least active compounds

Interactive mode of compound 7 in the dimeric CCD of IN

Compound 7 was recognized with a maximum docking score of − 5.394 kcal/mol, glide energy of − 31.302 kcal/mol, and glide Emodel of − 35.454 kJ/mol. A total of 3 H-bond interactions were identified with the backbone amino acids Gln168 and Glu 170 (A chain) and side-chain amino acids Gln95 (B chain), with a distance of 1.90, 2.51, and 1.92 Å respectively. Some of the residues presented in the hydrophobic form are Met178 and Ala169 in A chain followed by Leu102, Ala98, Trp131, Ala128, Ala129, and Trp132 in B chain. The closed views of 3D profile interaction are shown in Fig. 3a.

Fig. 3
figure 3

The binding site interaction of HIV-1 integrase with most active compound 7 (a) and least active compound 1 (b). H-bond is represented in green color dash lines and Pi-Pi in cyan color dash line

Interactive mode of compound 1 in the dimeric CCD of IN

Compound 1 was acknowledged with the lowest docking score of − 4.980 kcal/mol, glide energy of − 30.2 kcal/mol, and glide Emodel of − 36.785 kJ/mol. A total of 3 H-bond interactions were noticed with the backbone amino acid residues Gln168 and Glu 170 as well as His171 from A chain with a distance of 2.34, 2.38, and 2.71 Å respectively. Some of the residues presented in the hydrophobic form are Met178 and Ala169 from A chain followed by Tyr99, Ala98, Leu102, Ala129, Ala128, and Trp132 from B chain. The closed views of 3D profile interaction are shown in Fig. 3b.

Binding free energy

Post-scoring approaches of docking namely MM/GBSA (binding free energy) were calculated for the most and least active compounds to inspect the precision of docking protocol. The binding free energy (ΔGbind) ranges from − 40.06 to − 43.38 kcal/mol. Following ΔGbind, various interactions such as coulomb, covalent, H-bond, Solv GB, Lipo, and vdW were also analyzed on both compounds, in which compound 7 was seen with more stable and more interaction compared with compound 1. All the energies were represented in Table 4 and Fig. 4.

Table 4 Binding free energy results on both compounds
Fig. 4
figure 4

The binding free energies of most active compound 7 (blue) and least active compound 1 (green)

DFT analysis

Both the most active (7) and least active (1) compounds were examined to understand the electronic and energetic states. These properties were represented in the form of MESP, LUMO, and HOMO in Fig. 5, respectively, for compounds 7 and 1, in which HOMO’s outcome ranges from − 0.228 to − 0.238 and LUMO − 0.048 to − 0.074, and HOMO and LUMO’s energy gap (HOMO-LUMO gap, HLG) of 0.164 and 0.180 is tabularized in Table 5. It openly shows the perceptive of the bond electrons with the fragile nature and the energy gap between HOMO and LUMO denote the value of stability in terms of kinetic and chemical reactivity of both compounds, in which the compound with the lower gap is highly active in comparison with the higher gap. Based on distribution charges, furthermore, HOMO and LUMO orbitals on compounds 7 and 1 were analyzed to identify the nucleophilic and electrophilic sites, where the HOMO for nucleophilic and LUMO for electrophilic attack are the chief regions in the compounds. Thus, orbitals (HOMO and LUMO) of compound 7 were distributed nearby the space on 2-hydroxyphenyl, methylidene, and pyrazine respectively, and similarly, in compound 1, HOMO and LUMO orbitals were distributed near the hydroxyphenyl’s benzene ring and ethylidene, respectively. Following this, MESP also examined to calculate the electrostatic potential regions, physiochemical property, and hydrogen bonding interactions along with shape and size of ligands in terms of color, in which blue color represents electropositive and red color represents electronegative. Hence, MESP map of the most electropositive region in compound 7 was found near the NH atom of methylidene portion, and in the case of compound 1, it was seen with less potential positive effective near the nitrogen atom of ethylidene and oxygen atom of hydrazide, respectively. Thus, the 3D MESP contours clearly support that the nitrogen group is majorly responsible for the vital activity.

Fig. 5
figure 5

Representation of HOMO, LUMO, and MESP on compound 7 and compound 1

Table 5 Single point energy (Jaguar) output value of frontier orbital energies and electrostatic potential

MDS analysis

MDS is a powerful approach to analyze the internal interaction and stability between protein-ligand under the specific physiological environment condition. The simulation studies were carried out on the most active and least active compounds using Desmond, Schrodinger, where structural stability and dynamics behavior of the complexes were analyzed via RMSD and gyration. The overall RMSD on both the complexes were found to be stable throughput the simulation without much fluctuations between the range of 1.4 and 3.0 Å, where the most active compound has shown good stability compared with the least active, shown in Fig. 6a. Similarly, the compactness of both complexes was noticed without any much changes, shown in Fig. 6b. With this, the final outcome represents that compound 7 (most active) has been shown to have a noteworthy binding interaction with proper stability.

Fig. 6
figure 6

Dynamics behavior of compounds 7 (most active) and 1 (least active) RMSD (a) and gyration (b)

Conclusion

The IN-LEDGF/p75 has been identified as the pleasing target for drug discovery of anti-HIV. Herein, we applied a combined approach of 3D-QSAR, docking, binding free energy, DFT, and simulation studies to understand the structural basis and mechanism inhibition of the acylhydrazone, hydrazine, and diazene derivatives as IN-LEDGF/p75 inhibitors. The developed 3D-QSAR model resulted in elevated predictive ability with a training set of R2 = 0.98 and SD = 0.07 and test set of Q2 = 0.89, RMSE = 0.14, and Pearson R = 0.90, and this provides a strong structural basis for the understanding of the structure-activity relationship of these derivatives. Furthermore, molecular docking and MM/GBSA studies on most active and least compounds showed the interaction with crucial amino acids presented in the binding site of LEDGF/p75 in IN and explained the involvement of energy. Finally, DFT and MDS studies were introduced to get energy states and dynamics behavior of stability in a natural condition. So, compound 7 was found to be more stable in the binding site of the targeted protein. Thus, we suggest that these findings can be useful and supportive for the progress of new agents that may be a potential anti-HIV lead for AIDS treatment.