Introduction

Anterior Gradient 2 (AGR2), also known as secreted cement gland protein XAG-2, is a member of protein disulfide isomerase (PDI) superfamily [1], which is overexpressed in multiple human cancers, including breast, prostate, lung, gastric, ovarian and pancreatic cancers [2,3,4,5,6]. It has both intracellular and extracellular roles; it harbors a signal peptide (1 M–20 K) at the N terminus to import into ER. After maturation, it is released via the secretory pathway and acts as a signaling molecule in numerous pathways. By virtue of its C-terminal endoplasmic reticulum (ER) retention sequence (K172-L175), it resides in the lumen of ER and acts as a chaperone to assist other proteins to fold properly. Normally, it is expressed in secretory goblet cells of the intestine, where it is responsible for secreting MUC2, a cysteine-rich glycoprotein that protects the inner lining of the intestine [1]. AGR2 contains a cysteine-containing conserved thioredoxin-like domain, also called PDI domain or CPHS domain (C81-S84) which makes mixed disulfide bonds with its client proteins, and hence takes part in the maturation of mucins, e.g., MUC2, MUC1, MUC5AC [7] and other members of the secretory pathway and ultimately plays a pivotal role in the maintenance of ER homeostasis [8]. Predominantly, AGR2 exists as a homodimer via a dimerization domain (E60-K64); after it is self-dimerized, it helps FGF2 and VEGF to homo-dimerize and promotes the angiogenesis (Fig. 1a and b) and the invasion of vascular endothelial cells and fibroblasts by augmenting the activities of vascular endothelial growth factor (VEGF) and fibroblast growth factor 2 (FGF2) [9].

Fig. 1
figure 1

a Pathway of AGR2 Dimerization [5] b Dimerzation domain (E60-K64) of AGR2 [3]

Being an extracellular signaling molecule and intracellular ER chaperone, targeting AGR2 has been a bottleneck in discovering both anti-AGR2 small and large molecules for cancer therapy. Few approaches have been used to inhibit AGR2. Our collaborator group previously reported that a monoclonal antibody 18A4, targeting the dimerization domain (E60-K64) of AGR2, can inhibit different xenograft tumors in mice. The AGR2 inhibition activity of 18A4 is by virtue of its binding to amino acid sites spanning the regions of E60-H76 and A86-E153. However, the 18A4 was combined with bevacizumab (as an adjunct therapy) to produce a maximum effect in inhibiting the angiogenesis and tumor growth in SKOV3 cell lines [9, 10]. Another approach to inhibit AGR2 is by using a peptide; H10 peptide was discovered after screening millions of peptides by mRNA display library. The H10 peptide can inhibit AGR2 by binding with the interface of AGR2–AGR2 homodimer surrounding the amino acid residues P41, E60 and E96 [11].

Drug repurposing/repositioning offers a comparatively direct way to study the already approved or investigational drugs for a completely different disorder. It can provide a better hit without safety issues which can cut the time, technical and financial resources as compared to starting the drug discovery process from scratch [12].

Using structure-based drug design protocol, Rani et al. discovered several FDA-approved drugs against important enzymes of Mycobacterium tuberculosis; virtual screening of 1932 approved drugs from DrugBank and 1852 drugs from eLEA3D; using AutoDock Vina as the docking program and MurB and MurE enzymes as the protein targets, the study discovered that sulfadoxine and pyrimethamine showed stable interaction with MurB, while lifitegrast and sildenafil (– 9.1 kcal/mol) showed the most reliable interaction with MurE [13].

Dakshanamurthy et al. reported that mebendazole (anti-parasitic) could structurally inhibit the vascular endothelial growth factor receptor 2 (VEGFR2), a mediator of angiogenesis; this finding was also supported by experimental data. They screened 3,671 FDA-approved drugs across 2,335 human protein crystal structures by using a high-throughput computational docking [14].

Recently, computer-aided drug design is widely used for rational drug discovery. Virtual screening of compounds against a valid target can lead to cost- and time-effective novel drug discovery [15]. For those proteins which crystal structure was not available, through homology modeling, the crystal structure was generated, and then, virtual screening was performed on the target protein. In order to discover novel antagonists of the endothelin-A receptor (ETAR), first, the 3D structure was developed by homology modeling because its X-ray crystal structure was not available. Then the target was virtually screened against Traditional Chinese Medicine (TCM) database to identify novel natural ETAR antagonists resulting in the discovery of two potential antagonists; their binding with ETAR was validated by molecular dynamic simulation and molecular mechanics generalized born surface area [16].

To discover novel protein–protein interaction inhibitors of Nrf2-Keap1, structure-based virtual screening was performed on Specs database, which reported the compound 15 with an in vitro EC50 of 9.80 µM in the fluorescence polarization (FP) assay [17].

Up till now, to the best of our knowledge, no small molecule has been discovered to inhibit the AGR2–AGR2 homodimer. Therefore, the present study was performed to prepare a validated 3D structure of AGR2 by homology modeling and for the first time discover a set of small molecules by screening the FDA-approved drugs library (https://www.selleckchem.com/screening/fda-approved-drug-library.html) on AGR2–AGR2 homodimer as a target protein. Since the X-ray crystal structure of AGR2 was not available, we prepared its structure by homology modeling and used the generated homology model for further virtual screening. Then we compared the list of screened drugs with that obtained from screening the NMR structure of AGR2 (PDB ID = 2lns). The obtained list of drugs was again screened by a molecular docking tool which narrowed down the list to 35 common drugs. The interaction of these drugs was determined and then ultimately validated by molecular dynamic simulation. To sum up, the present study reported the discovery of 5 FDA-approved drugs to inhibit AGR2–AGR2 homodimer for the first time by using structure-based virtual screening. Moreover, the binding of the top 5 FDA-approved drugs with AGR2 was also validated by molecular dynamic simulation.

Materials and methods

Homology modeling of AGR2

In the absence of suitable X-ray structure in RCSB protein data bank, “YASARA Structure's homology modeling module” was used to generate the homology model of AGR2 [7, 18]. YASARA (Yet Another Scientific Artificial Reality Application) is a complete package of molecular modeling, molecular graphics and molecular dynamics. It is comparatively easy to use and reliable package for molecular modeling [19]. The most advanced version of YASARA (YASARA Structure) also includes a full homology modeling module that automate all steps from an amino acid sequence as an input (including alignment of amino acid sequence, building loops, rotamer selection, optimization of stereochemistry and subsequent validation of the homology model) to a refined high-resolution model as an output. It often generates a comprehensive scientific report for each modeling steps. The overall Z‐scores obtained for AGR2 were within the range, “GOOD.” The Z-score is a matrix that can indicate how far a system deviates from the average of standard reference structures. The YASARA and WHAT IF Twinset was used for the subsequent visualization and analysis [20].

Evaluation of the generated homology model

Following the generation of the AGR2 homology model, the RAMPAGE and ModRefiner servers were used to evaluate and optimize the AGR2 homology models [21].

Structure-based virtual screening

The workflow of docking-based virtual screening is depicted in Fig. 2. In brief, all FDA-approved drugs were subjected to molecular docking calculations using built-in utility of OpenEye Scientific Software's FRED v3.2.0. Prior to docking, the AGR2 structure was optimized at pH 7.0 using the pdb2receptor tool in OEDocking. OMEGA 2.5.1 [22] was used for generating the multi-conformers of all compounds. During generation of the conformers, default OMEGA settings were used (maximum 200 conformers per molecule). For molecular docking calculations, binding site was portrayed around the entire AGR2 structure. FRED default parameters were also used for predicting binding affinity of the compounds with AGR2 [23].

Fig. 2
figure 2

Workflow of docking-based virtual screening

After protocol optimization, all FDA-approved compounds were docked using the previously described protocol [15]. A maximum of ten poses were created for each compound, and the best hits were chosen based on the lowest Chemgauss4 score. The Discovery Studio Visualizer [15] was used to illustrate the bonding orientation of docked poses within AGR2.

MD simulations

Amber14 package was used for MD simulation to understand the dynamic behavior of each ligand-bound system. Solvation with TIP3P water model with the addition of ions to neutralize the system was carried out. For drugs, topologies were generated with antechamber. AMBER14 FFSB force field was used for protein, whereas GAFF2 was used for drugs [24]. Two-step energy minimization followed by heating of the system was performed. We used default parameters such as temperature 300 K and 2 ps. PME (particle mesh Ewald) algorithm with cutoff distance 10 Å was used for long-range interactions. We used SHAKE algorithm for covalent interactions. A total of 100 ns simulation for each system was performed. CPPTRAJ and PYTRAJ were used for post-simulation analyses [22, 24, 25]. We prepared protein–ligand complexes of top 5 hits with AGR2, i.e., C1 (AGR2–AZD2281), C2 (AGR2–Emtricitabine), C3 (AGR2–flumazenil), C4 ( AGR2–AGR2–ganetespib (STA-9090)) and C5 (AGR2–mercaptopurine) before running MD simulations.

Results and discussion

Homology models generation using YASARA

The homology modeling steps for generating AGR2 are depicted in Fig. S1. We used amino acid sequence of Anterior Gradient 2 (AGR2) protein, which was identified as “O95994” in UniProtKB/Swiss-Prot for homology modeling [26, 27].

The following procedure was used to generate the homology models; PSI-BLAST [28], which was built in YASARA, was used to locate the 13 closest templates in the PDB. Table 1 shows the PDB structures that have the greatest degree of similarity to our target sequence.

Table 1 List of top 13 closest templates identified for generating the homology model of AGR2

We built an AGR2 homology model utilizing both monomer and dimer. The Uniport ID “O95994” was used to retrieve AGR2's full-length amino acid sequence. A secondary structure prediction for the target structure was needed to assist in the alignment correction and loop modeling. This was accomplished by using PSI-BLAST to generate a target sequence profile and then feeding it into the PSI-Pred secondary structure prediction algorithm [29].

The resulting prediction as shown in Fig. S2 indicated the estimated probability for each of the secondary structure classes, helix, strand and coil. To help in aligning the amino acid sequence of target and templates, a target sequence profile was constructed from multiple sequence alignment, which was built from similar UniRef90 sequences. Table 2 lists the 5 generated homology models sorted based on their overall quality Z-scores.

Table 2 Five representative homology models of AGR2

Evaluation of the generated homology model

The SAVES structure assessment software [http://nihserver.mbi.ucla.edu/SAVES/] was used to validate the reliability of the modeled AGR2 structure. SAVES is the integration of various tools for evaluating the protein structure such as Procheck, WhatCheck, ERRAT, Verify 3D and PROVE. The SAVES structure verification software was used to assess the quality of the modeled AGR2 structure, as shown in Fig. S3, S4 and S5. The Ramachandran plot in Fig. S3 reported that 90.3% of the amino acid residues were in the core region, 9.7% in the allowed region and 0% within the generously allowed regions. However, none of the residues was found in the disallowed region of the plot, confirming the stereochemical reliability of the AGR2 homology model.

Furthermore, a Verify-3D score of 88.88% (Fig. S4) and an ERRAT score of 97.04 percent (Fig. S5) verified the modeled structure's “structure sequence compatibility” and “non-bonded interactions,” respectively. Similarly, the atomic volumes of the modeled AGR2 residues were compared to the corresponding residues in the PDB database, yielding an ideal Z-score mean value of 0.366 for the best AGR2 model. RAMPAGE generated Ramachandran plots of models for evaluation. The ModRefiner server was then used to refine the generated models. Figure S3 shows the Ramachandran plot after the refinement. The absence of any residues in the outlier area shows that each model is improved in stereochemistry after refinement.

Structure-based virtual screening

Molecular docking calculation of all FDA compounds was performed using blind docking in which whole structure was covered as an active site using FRED docking software. Detailed methodology of docking is illustrated in Fig. 2.

The FRED software has a strong track record of implementation in structure-based drug discovery. Huabin Hu et al. recently described FRED (Chemgauss4 score) as one of the best docking score functions among three separate docking score functions [30]. A maximum of 200 conformers were produced for each ligand and used as input. The protocol mentioned in methods was used to execute molecular docking calculation. For each compound, a maximum of ten poses were obtained, and the pose with the lowest Chemgauss4 score was chosen as a best hit compound. The chemguass4 score was used as a criterion for detecting actives in a large pool of ligands. If a ligand's score was less than − 7.8 kcal/mol, it was considered as active as shown in Table 3.

Table 3 List of top 10 drugs with FRED Chemguass4 score lower than − 7.8 kcal/mol

Comparative analysis of binding interactions

Molecular docking was used to obtain insights into the binding interactions of identified best AGR2 ligands inside the target's active pocket. Figures 3, 4 and 5 and S6-S10 displays the detailed binding interactions of the top ten AGR2 ligands. Each of the top 10 hits was found interacting with the residues of dimerization domain of AGR2 (E60-K64).

Fig. 3
figure 3

a-f Binding interaction of top 5 hits (b AGR2–AZD228 complex, c AGR2–emtricitabine complex, d AGR2–ganetespib (STA-9090) complex, e AGR2–mercaptopurine complex, f AGR2–flumazenil complex) docked in complex with AGR2.Hydrogen bonding are shown in black dotted lines

Fig. 4
figure 4

RMSDs and RMSF of all five top hits in complex with AGR2

Fig. 5
figure 5

RoGs of all five top hits in complex with AGR2

Molecular dynamic simulation

The stability and flexible dynamic behavior of the AGR2, C1–C5 complexes

The current study aimed to identify the dynamic behavior and to evaluate the internal movement of the AGR2 with different ligand complexes (C1, C2, C3, C4 and C5). The structure of all the complexes was simulated in an explicit water environment. We further determined the effect of these ligands on the stability of the complex by predicting the thermodynamic state function (RMSD). The root mean square deviation (RMSD) method is commonly used to calculate the variation in a protein backbone from its initial structural conformation to its final position. The deviations observed during simulation period can be used to estimate the dynamics stability of a biological molecule relative to its conformation. The deviation that a protein faces during simulation is linked to its stability; a smaller variation indicates that the structure is more stable and is less likely to exceed the stability limit. Herein, the stability (RMSD) of Cα backbone was estimated for 100 ns trajectory for each of the AGR2–ligand complex (C1–C5) which revealed that hits of the complex C1 and C2 strongly bound to AGR2 as compared to the rest of the hit compounds as shown in Fig. 4. The smaller deviation curve indicates higher stability and the opposite implies that of a lesser extent. The RMSD of all complexes in comparison with the existing structures demonstrated that a total of 100 ns of MD simulation time was sufficient to achieve equilibration at 310 K.

The RMSD dynamic analysis revealed that the AGR2 protein adopted diverse conformations in each of the 5 AGR2–ligand complexes (Fig. 4). The plot illustrates that for C1, the RMSD initially remained lower until 20 ns but then suddenly converged up to 1.0; then, it remained uniform until 60 ns. Later on, it converged again. Although in C2, the results of RMSD delineated, the AGR2 protein achieved equilibrium at ~ 80 ns of the simulation time, with lower variation at a specific site, while it continuously oscillated and showed stability in their behavior throughout the entire MD simulation time. Furthermore, the AGR2, C3 complex indicated that the RMSD pattern for backbone variation was steadily decreased initially, but later dynamically increased at 20 ns and remained constant throughout the simulation time (Fig. 4). Whereas in C4 and C5, the RMSD plot showed a similar pattern of backbone deviation, at the beginning, it was gradually decreased and then suddenly fluctuated higher up to 1.2 nm, with continued fluctuations throughout their MD simulation time owing to the instability of the systems as compared to the rest of complexes. Thus, hits of the C1, C4 and C5 complex had a higher affinity for binding to the targeted area, but did not completely diminish the activity of AGR2 protein, while hits of C2 and up to some extent C3 complex strongly bound to the dimerization cavity of AGR2, rendered the protein dynamic behavior steady and, to a larger level, inhibited the activity of the AGR2 protein. Overall, the RMSD dynamic findings showed that all the (top 5) AGR2 inhibitor drugs had a higher affinity for binding to the intended location, but the hit compound of C2 complex bound to the protein more strongly than the others and stabilized the protein effectively.

To gain a better understanding of the impact of specific residues, that might provide information on residues flexibility in case of all ligand complexes of AGR2, i.e., C1–C5, RMSFs (root mean square fluctuations) of backbone Cα were calculated and compared to gain insights into the dynamics association caused by drug binding of protein motions. The greater the RMSF value, the more flexible the region is, while the smaller the RMSF value, the less it moves from its average position during the simulation. We analyzed the Cα RMSF for all of the proteins' side-chain atoms that frequently creates the structure of protein with high or less flexibility and provides insights into per residue flexibility (Fig. 4). To increase the accuracy of the analysis, we have retrieved the lowest/minimum energy structure coordinates from the equilibrium phase, and the structure was then aligned and was used as a reference to calculate the RMSF. The fluctuation is negatively correlated with the stability of the residues, i.e., the larger residue fluctuation shows the instability and vice versa. The results of RMSF indicated that fluctuation of the AGR2 protein significantly decreased in the presence of hit compounds of the complex C2 and C3, whereas the fluctuation patterns of C1, C4 and C5 implied different behaviors and led to excessive flexibility and, hence, instability (Fig. 4). This may correspond to the different binding modes of C1, C4 and C5, which resulted in greater movement of residues and high flexibility to better match in the binding site to achieve the optimal binding mode. Thus, the overall results indicated that hit compounds of the complex C2 and C3 acted as better inhibitors than the rest of the compounds and increased the binding affinity for the desired protein.

Moreover, the compactness of the AGR2 protein in all complexes was assessed by using the radius of gyration (Rg). As a consequence of residue fluctuation and backbone variation, a more thorough examination of overall compactness in all complexes was needed. The Rg analysis demonstrated that all complexes of AGR2, i.e., C1–C5, showed a distinct pattern of compactness, as shown in Fig. 5. Significantly, the Rg of the C1–C2 complexes remained stable during the MD simulation, indicating a strong compact conformation, whereas C3–C5 complexes were found to be less compact over time. The dynamic behavior of the protein–ligand complexes showed that the binding altered the stability and residual flexibility, thus induced the therapeutic cloud.

Conclusion

In this report, we illustrated that the in silico virtual screening framework is an important tool for identifying hit compounds against AGR2–AGR2 homodimer. We reported a structure-based virtual screening protocol for identifying new AGR2–AGR2 homodimer modulators. Starting from the homology modeling of AGR2 protein, followed by the virtual screening of FDA-approved compounds, only 35 compounds were identified as a potential AGR2 ligands. Further validation by binding interactions and stability assessment resulted in only five FDA-approved drugs as new AGR2 modulators. Outcome of this study needs to be further ascertained by in vitro experimental validation.