Introduction

In the next few decades, cancer will become the main cause of incidence rate and mortality in various regions of the world (Ferlay et al. 2010; Jemal et al. 2011). In 2008, women's breast cancer, lung cancer, colorectal cancer and prostate cancer accounted for half of the total cancer burden in the region with the highest human development index (HDI) (Farhood et al. 2019). In the middle region of HDI, esophageal cancer, gastric cancer and liver cancer were also common, and in the middle to very high HDI region, the seven types of cancer combined accounted for 62% of the total cancer burden. In areas with low HDI (Gersten and Wilmoth 2002), cervical cancer is more common than breast cancer and liver cancer. Among men in 184 countries, nine different cancers are most frequently diagnosed, the most common of which are prostate cancer, lung cancer and liver cancer (Bray and Mller 2006; Kallab et al. 2020). Breast cancer and cervical cancer are most common in women. In medium HDI and high HDI settings, decreases in cervical and stomach cancer incidence seem to be offset by increases in the incidence of cancers of the female breast, prostate and colorectum. If the estimated cancer and gender-specific trends in this study continue, we expect that the incidence of all cancer cases will increase from 12.7 million new cases in 2008 to 22.2 million by 2030 (Bray et al. 2012).It is easy to forget that cancer is not a single disease, but a lot of diseases. In the past 70 years, the complexity of cancer has become more obvious. A lot of work has been done to determine the common principles of pathogenesis. Recently, several models have been proposed to explain the transformation of normal cells to cancer cells through discrete genetic changes, including the activation of oncogenes, the loss of telomerase and induction of aneuploidy, which are important initial events (Lee et al. 2016). However, in addition to the genetic and epigenetic changes in the transformation process, another discrete step is needed to allow tumor proliferation and progression-inducing tumor vascular system, termed the “angiogenesis switch” (Zhong et al. 2020). Like normal tissues, tumors require adequate oxygen, metabolites and effective waste removal methods (Papetti and Herman 2002). These requirements are different in different tumor types and change with tumor progression (Lee et al. 2020). However, the generation of host vascular system and tumor blood supply is the rate limiting step of tumor progression. It was found that vascular endothelial growth factors (VEGFs) and receptors (VEGFRs) regulate both vasculogenesis, the development of blood vessels from precursor cells during early embryogenesis, and angiogenesis, the formation of blood vessels from preexisting vessels at a later stage (Ferrara and Kerbel 2005). There are three major vascular endothelial growth factor receptors (VEGFR-1, VEGFR-2 and VEGFR-3), which are key intermediates of tumor angiogenesis and new vascular network formation, providing nutrition and oxygen for tumor growth (Shibuya 2011). VEGFR-2 is the main functional receptor of VEGF, which is involved in the regulation of angiogenesis (Roskoski 2007). Therefore, the research of effective and low toxic anticancer drugs of VEGFR-2 inhibitors is still an important direction in the research and development of anticancer drugs. Some of the 6-amide-2-arylbenzoxazole/benzimidazole derivatives have higher inhibitory activity than general VEGFR-2 kinase, and their inhibitory activity on HUVEC and HepG2 is also higher than that of A549 and MDA-MB-231 cancer cells Strain.

In our work, 44 compounds were collected for quantitative structure activity relationship (QSAR), which has been widely used as a valuable assistant tool in drug design. The main advantage of QSAR model is that it can predict the biological activity of new untested compounds and obtain the physical and chemical views on the research end point (Ancuceanu et al. 2020). HQSAR is a modern 2D QSAR method based on special molecular segments (Li et al. 2020). In hologram quantitative structure activity relationship (HQSAR), each molecule in the training set is decomposed into several unique structural segments, which are arranged to form a molecular hologram, i.e., the extended form of fingerprint, which can encode all possible molecular segments. The only requirements for HQSAR model generation are the 2D structure and corresponding attribute values of the compounds in the data set. Partial least squares (PLS) analysis can be used to correlate the fragment pattern counts from the training set compounds with their corresponding experimental biological parameters in order to generate the HQSAR model. In general, biological or pharmacological data (e.g., Ki, IC50, EC50) are converted to negative pair values (e.g., pKI, pIC50, pEC50, respectively) and used as dependent variables in QSAR studies (Waller 2004). HQSAR explains the observed differences by quantifying changes in the molecular hologram to determine the activity of a series of molecules (Wasko et al. 2015). Comparative molecular field analysis (CoMFA) is a useful 3D-QSAR method. It can take the steric/electrostatic characteristics into account and display the model through contour map (Tong et al. 2019). Topomer CoMFA is the second generation of CoMFA (Cramer 2012). This is a fast 3D-QSAR method based on fragments. Unlike traditional CoMFA, topomer CoMFA does not need subjective comparison of 3D ligand conformation and uses automatic comparison rules, so the analysis speed is faster (Li et al. 2017).

Based on the ease of operation of Topomer CoMFA and HQSAR and the mutual verification of the two in different dimensions, in this research 44 kinds of 6-amide-2-aryl benzoxazole/benzomidazole derivatives were analyzed by HQSAR and topomer CoMFA to reveal structural activity factors. Molecular docking is also used to study the mechanism of drug action. In addition, in order to evaluate its drug-like capabilities, standard calculated pharmacokinetic parameters (ADMET) and drug-like tests have been performed for each designed compound. This work will help to guide the synthesis of new 6-amide-2-aryl benzoxazole/benzimidazole derivatives.

Computational methods

Preparation of data set

A total of 44 kinds of 6-amide-2-aryl benzoxazole/benzomidazole derivatives were collected from the literature (Yuan et al. 2019), and their IC50 values were converted into corresponding pIC50 (−logIC50). The structures and pIC50 of 44 compounds are shown in Table 1. In the development of QSAR model, training and testing compounds must be selected so that the distribution of test set in the chemical and structural space of the whole data set is uniform enough. Regarding the division of the training set in the data set, we use the method of picking one out of three for the overall data set. Therefore, the training and test set are composed of 33 and 11 molecules, respectively. The distribution of training sets and test sets is shown in Table 1. For HQSAR and topomer CoMFA research, 44 kinds of 6-amide-2-aryl benzoxazole/benzomidazole structures were constructed by SYBYL-X 2.0. The tripos force field and the gradient descent method of Gasteiger-Hückel charge (Purcell and Singer 1967) are used to minimize the energy of each molecule in the data set.

Table 1 Structures and biological activities (pIC50) of 6-amide-2-aryl benzoxazole/benzomidazole derivatives
figure a

Hologram quantitative structure–activity relationship (HQSAR)

As a two-dimensional QSAR method, HQSAR does not need to determine 3D structure to inferred binding conformation and molecular arrangement (Weida et al. 1998). In HQSAR, the molecules in the training set are decomposed into all possible linear and branch segments connecting atoms, and then using a hashing algorithm, encodes these fragments into bins in the hologram (Doddareddy et al. 2004). The hologram with its bins thereafter is correlated with the experimental property or biological activity to generate HQSAR prediction models (Ugarkar et al. 2014). The HQSAR method uses different parameters to generate molecular holograms, such as hologram length (HL) values (53, 59, 61, 72, 83, 97, 151, 199, 257, 307, 353 and 401), fragment differences [atom (A), bond (B), connection (C), hydrogen atom (H), chirality (Ch), donor and receptor (DA)] and fragment size (2–5, 3–6, 4–7, 5–8, 6–9, 7–10). Various combinations of these parameters are optimized to obtain a better HQSAR model.

Topomer CoMFA

Topomer CoMFA is a segment-based fast three-dimensional quantitative structure–activity relationship (3D-QSAR) method. Its results are faster than traditional CoMFA analysis. Unlike traditional CoMFA, topomer CoMFA does not need subjective alignment of 3D ligand conformational isomers and uses automatic alignment rules, so the analysis speed is faster (Li et al. 2017). The steps of topomer CoMFA are as follows:

  1. 1.

    The three-dimensional molecular structure is divided into segments with common features, open bonds or bonds.

  2. 2.

    Align each section according to the overlap to provide the absolute direction of any section.

  3. 3.

    Calculates the space and electrostatic field of the top aligned segments.

  4. 4.

    PLS regression was used to build the model, and the model was evaluated by the folding knife test.

R2 and q2 are used to evaluate the topomer CoMFA model (Roy et al. 2016). The values of r2 and q2 should be greater than 0.6 and 0.5, respectively. The optimal model is determined by the highest q2, and the validity of the model depends on the r2 value (Wang et al. 2015).

Virtual screening

As a tool, topomer search can be used to virtually screen similar fragments in large compound libraries through specific structures. In our study, topomer search is used to screen R groups in the ZINC database, topomer similarity is used to filter them, and topomer distance is used to estimate query fragments in the specified database. Topomer distance is a parameter to estimate the similarity between query fragments and molecular fragments (Zhang et al. 2014). The smaller the value is, the higher the similarity is. Set topomer distance (TOPDIST) to 185 to evaluate the degree of combination, and other parameters are defaulted by SYBYL-X 2.0. The topomer search rules include: (1) the molecules in the database are cut into fragments and compared with the topomer similarity of the R-group of the template molecule; (2) the topomer CoMFA model is used to predict its contribution to the activity. (3) a series of R-groups will be obtained (Tong et al. 2016).

Molecule docking

Molecular docking provides visualization of the possible orientation of binding to the important residues of VEGFR-2. We use the Surflex-dock connected to SYBYL-X 2.0 for docking. Surflex-dock is an empirical scoring function based on the binding affinity of protein ligand complexes (Jain and Surflex 2003). Proteins were prepared using structural preparation tools. The binding site residue is used to generate protomol. Protomol represents the unique and important factor of docking algorithm, and represents the interaction between ligand and protein binding sites. It achieves hammerhead's experience scoring function by molecular similarity method to create postures of ligand fragments (Jain 2007). The docking results were evaluated by total score. The larger the value, the better the binding between small molecules and large proteins. Generally, when the total score is greater than 4, it indicates that the interaction between small molecules and large proteins is strong. When the total score is greater than 6, the experimental activity can reach the level of micromol.

ADEMT and drug-like prediction

In order to further determine whether the newly designed molecule can be used as a drug candidate, ADMET and drug-like properties have been developed to initially estimate the pharmacokinetic, physical and chemical and drug-like parameters. Computer simulation of ADMET (drug absorption, distribution, metabolism, excretion and toxicity) and prediction of drug-like properties are very important methods for contemporary drug design and drug screening (Yadav et al. 2012). Early ADMET property evaluation methods can effectively solve the problem of species differences, significantly improve the success rate of drug development, reduce drug development costs, reduce drug toxicity and side effects and guide clinical rational drug use (Aarjane et al. 2020). ADMET tools are obtained from online web admetSAR servers (Yang et al. 2019), and their drug-like properties and artificial synthesis difficulty are evaluated using SwissADME online tools (Agahi et al. 2020).

Results and discussion

Results of QSAR models

Result analysis of HQSAR model

In HQSAR research, parameters such as HL (hologram length), FD (fragment discrimination) and FS (fragment size) may affect the quality of the model, so they should be specified and optimized. In our study, we first default FS (4–7) and HL and adjust the different combinations of FD (A, B, C, H, Ch, DA) to generate the model initially. Table 2 shows the statistical results of training sets using different FD combinations. The results showed that among the six components, atom, bond, connection and chirality (A/B/C/CH) produced the highest q2 (0.576) and r2 (0.848). The impact of FS is then investigated, and the statistical results are shown in Table 3. Obviously, FS is optimized for 7–10. According to Tables 2and 3, the best HQSAR model generation (bold in Table 3) uses the following parameters: A/B/C/CH for fragment differentiation, 7–10 for fragment size and 97 for hologram length. The highest q2 and r2 are 0.646 and 0.871, respectively, with standard error of 0.046. The pIC50 of observation and prediction of training set and test set is shown in Table 5. Their correlation graph (shown in Fig. 1) shows a good linear relationship.

Table 2 Summary of hologram quantitative relationship statistical parameters for various fragment distinction parameters using the default fragment size (4–7)
Table 3 Summary of hologram quantitative structure–activity relationship statistical parameters for various fragment size parameters using the fragment distinction (A/B/C/Ch)
Fig. 1
figure 1

Plot of predicted pIC50 values versus the actual values for training and test set compounds using topomer CoMFA model and HQSAR model

Result analysis of topomer CoMFA model

In order to further verify the relationship between the structure and activity of 6-amide-2-aryl benzoxazole/benzomidazole derivatives, the topomer CoMFA model was selected for quantitative analysis of 3D-QSAR model. This model has been widely used in the adjuvant design of targeted drugs for avian influenza, HIV, central nervous system diseases and other tumors (Kumar and Tiwari 2015). In the topomer CoMFA model, the activity of derivatives is related to the cutting method. In the modeling process, once the cutting is completed, the input structure will be standardized and topomer with the same substructure will be generated (Zhang et al. 2014). As more identical substructures are identified in the test set, the prediction ability of the model will be better. In this study, NO.31 molecule (with the highest activity) was divided into three parts, namely Ra (blue), Rb (red) and skeleton (green). Two topomer CoMFA models are obtained. q2 and r2 of the two models are shown in Table 4. For reliable prediction model, q2 should be > 0.5 (Golbraikh and Tropsha 2002), model 2 has statistical significance (q2 = 0.659, r2 = 0.867). This means that our model not only has a good prediction effect, but also has a wide range of application prospects.

Table 4 Results of two topomer CoMFA models

Table 5 shows the biological activity of each molecule in the topomer CoMFA model, and a linear correlation regression diagram is obtained (Show in Fig. 1). Abscissa is the actual activity and ordinate is the prediction activity. The training set is displayed as a square point and the test set as a circular point. As shown in Fig. 1, all the molecules in the test set are near the regression line, indicating that the model is reasonable, reliable and has good prediction ability.

Table 5 Predicted activities from QSAR models compared with the experimental activities

HQSAR contribution maps and topomer CoMFA contour maps

HQSAR contribution maps analysis

In the color coding diagram of HQSAR, the color code of each atom reflects the contribution of the atom to the total activity of the molecule. The contribution diagram of compounds 31 and 41 (the largest and smallest pIC50 compounds, respectively) is shown in Fig. 2.

Fig. 2
figure 2

Contribution diagrams of compounds 31 and 41 obtained from the optimal hologram quantitative structure activity relationship model

The carbon 3, 4 of the benzene ring at the Ra position of compound 41 shows a negative contribution. When the bromine group of compound 31 replaces the methoxy group of compound 41, the position of carbon 4 on the benzene ring has a positive contribution to pIC50. At the position of carbon 3 on the benzene ring, the contribution of H to compound 41 is negative. When the F of compound 31 replaces the H, the carbon 3 position on the benzene ring has a positive contribution to the pIC50. These findings indicate that the orientation of the group at the Ra position is very important for the pIC50 value of 6-amide-2-aryl benzoxazole/benzomidazole derivatives. Most of the atoms in compounds 31 and 41 are shown in blue-green, indicating a positive contribution to pIC50.

Topomer CoMFA contour maps analysis

By plotting the coefficients in the model can generate topomer CoMFA 3D contours around Ra and Rb (shown in Fig. 3). It is better to choose the molecule with the highest activity as the reference molecule, so it is easier to interpret the profile. Of all the compounds, compound 31 showed the best biological activity. Therefore, these figures are shown using compound 31 as a reference structure.

Fig. 3.
figure 3

3D contour maps of topomer CoMFA model of compound 31. a steric field map of Ra; b electrostatic field map of Ra; c steric field map of Rb; d electrostatic field map of Rb

In the three-dimensional field, the green outline of carbon 3 and carbon 4 in the Ra group indicates that a larger substituent is advantageous, while the yellow outline indicates that a tolerant substituent is not allowed (Fig. 3a). In the electrostatic field, the red outline of the carbon 3 position of the Ra group indicates that the negative group is advantageous, while the blue outline indicates that the positive group will be advantageous (Fig. 3b). The green contour occupies the Ra group in the steric field, the blue contour occupies the middle of the electrostatic field, and the red contour locates at the end of the substituent. This shows that the large group with negative potential at the end of the side chain at the C-3 position will be beneficial to the activity. With respect to the profile of the Rb group, the green profile (Fig. 3c) is located near the C-3 site, while the yellow profile is located at the C-4 site. The red profile is located near the C-3 site, and the blue profile is located near the C-4 site (Fig. 3d); this shows that the large volume group with negative potential at the C-3 site of Rb is beneficial to the activity, and the smaller volume group with positive potential at the C-4 site of Rb will improve the anti-tumor activity.

Finally, based on 2D-QSAR's contribution maps and 3D-QSAR's contour maps, we summarized the types of R-based structures that the template molecule No. 31 needs to change. The results are shown in Fig. 4.

Fig. 4
figure 4

Structure–activity relationship revealed by 2D/3D-QSAR

Molecular screening and molecular design

Based on the analysis of HQSAR’s contribution maps and topomer CoMFA’s contour maps, we use topomer search technology to screen the R group in the ZINC database. The result is evaluated by the contribution value of R-group (TOPCOMFA_R) and TOPDIST. In general, we choose the R group whose TOPCOMFA_R value is larger than the template molecular value in the original training set and whose TOPDIST is close to 185. In this study, seven new Ra and six Rb groups were selected, and 42 new molecules could be formed by arrangement and combination. Then, these molecules were optimized and their activities were further predicted by topomer CoMFA model. The results show that the new designed molecules have higher activity than the original template molecules, and we choose to retain eight molecules with higher activity. The conclusion shows that all the results are consistent with those of HQSAR's contribution maps and topomer CoMFA's contour maps. The molecular structure and predicted activity are shown in Table 6.

Table 6 Structures and predicted pIC50 of new designed molecules

Binding mode of VEGFR-2 inhibitors

Compounds need to bind to proteins to play an active role. In this study, we retrieved the crystal structure (PDB ID: 6ET4) from the protein database of RCSB. 6ET4 is a target of VEGFR-2 based on structural design (https://www.rcsb.org/structure/6ET4) (Seal et al. 2011). The protein was treated by adding charges, hydrogen atoms, removing remaining water and extracting ligands. In order to verify the reliability of docking, the crystal structure of protein (6ET4) and homologous ligand was reconnected. As reference ligands, homologous ligands were removed from their protein ligand complexes (6ET4) and rearranged back to their binding sites. As shown in Fig. 5a, the modified ligand is almost coincident with the reference ligand. Their rotation trend is basically similar. The results show that the method is reasonable and reliable. Figure 6a shows the docking results of the reconnected ligands. As can be seen from Fig. 6a, the ligands are surrounded by residues Arg136, Thr360, Pro52, Phe62, Ala59 and Leu58.

Fig. 5
figure 5

Superimposition of the reference ligand and the protomol. a Superimposition of the reference ligand (the green stick represents the redocked ligand, and the red stick represents the reference ligand). b The protomol (the green region represents the prototype molecule)

Fig. 6
figure 6

a Docking result of the redocked ligand. B–d Docking results of the redocked ligand and newly designed inhibitors (The ligand was represented by sticks; the amino acid residues were represented by green sticks; the hydrogen bonds were represented by purple lines). b Hydrogen bond interaction between the newly designed molecule 01 and 6ET4; c Hydrogen bond interaction between the newly designed molecule 02 and 6ET4; d Hydrogen bond interaction between the template molecules (No.31) in the original training set and 6ET4

In SYBYL docking software, the scoring functions total-score and C-score are the criteria for evaluating the binding ability of molecules and proteins. The total-score function considers the molecular polarity, hydrophobicity, enthalpy and solvation. The larger the value is, the better the binding ability of small molecule to receptor protein is. Taking the total-score as the scoring standard, it is generally considered that the activity with the value greater than or equal to 6 is better. C-score is another scoring function, which combines the values of D-score, Chem-score, G-score and F-score. A value close to 5 is considered to have better activity. Similarity is a parameter to evaluate the similarity between molecules and homologous ligands. The higher the value is, the more similar the molecular conformation is. In this study, the Total-score and C-score are used to evaluate the docking results.

We, respectively, docked the homologous ligand, the newly designed 01 molecule, the newly designed 02 molecule and the template molecule in the original training set with the large protein (6ET4). The docking diagram is shown in Fig. 6, and the docking results are shown in Table 7. From the chart, we can see that the binding between the newly designed 01 and 02 molecules and the large protein is good, and the docking results of the original template molecules are not as good as the docking results of the newly designed molecules. The docking results are in good agreement with the observed biological activity data, indicating that these docking conformations are ideal drug model analysis.

Table 7 Molecular docking results

In silico ADEMT and drug-like prediction

The ADMET properties of a compound can determine whether the compound can be used as a medicine, because small molecule drugs must express reasonable and effective inhibitory effects on specific parts of the human body. Table 8 shows the ADMET properties of the 8 newly designed compounds. We can see that the intestinal absorption rate of all compound molecules in the body is mostly between 70 and 100%, indicating that these drugs can be well absorbed by the body. Although all the designed compounds showed no effect on the CYP2D6 protein, their clearance ability was low. Most of the newly designed compounds can be used as substrates for CYP3A4 inhibitors. In addition, the blood–brain barrier (BBB) has low permeability, which prevents drugs from entering the central nervous system (CNS). Moreover, all compounds are non-toxic to AMES. Compound 08 can bind tightly to plasma proteins (PPB > 0.9), and its function can be further enhanced compared with other ideal ADMET parameters. Based on the prediction results of ADMET, we can theoretically believe that these newly designed molecules have the best potential to inhibit VEGFR-2 protein.

Table 8 ADMET prediction results of novel designed compounds

Table 9 shows the drug similarity of all newly designed compounds. Compounds that comply with Lipinski's rule have better pharmacokinetic properties, higher utilization rates during metabolism in the body and are more likely to become oral drugs. According to Lipinski rules, small molecules that can be used as drugs must comply with the following conditions: (a) MW < 500 Daltons, (b) < 10 HBA, (c) < 5 HBD, and (d) an octanol/water partition coefficient (logP) < 5 (Lipinski et al. 2001). In addition, some other parameter requirements are proposed, such as the number of rotatable keys < 10 (Leeson and Oprea 2011). Drug molecules can only violate at most one parameter. Fortunately, the compounds we designed all comply with Lipinski regulations and meet the requirements for oral drugs. It is worth noting that we evaluated the synthetic possibility of the designed compound, and the result showed that the synthetic possibility was about 3.9. The highest value for the synthesis possibility of a compound is 10. The smaller the value, the easier it is to synthesize the compound, so it can prove that these compounds are easy to synthesize.

Table 9 Results of the Drug likeness prediction of new novel designed compound

Conclusion

In a word, HQSAR and topomer CoMFA are used as 2D/3D-QSAR for a series of 6-amide-2-aryl benzoxazole/benzomidazole derivatives. Through the same training set, two models with good statistical parameters and reliable prediction ability are obtained. The results of different models can be mutually confirmed. According to our model, we designed some new compounds as potential VEGFR-2 inhibitors and predicted their pIC50. On this basis, we dock these new molecules and large proteins to verify their binding with receptor proteins. The results showed that there was a good binding ability between the new designed molecule and the receptor protein. Finally, the prediction of ADMET and drug-like properties also showed ideal results. Therefore, our results provide structural and theoretical basis for the rational design of VEGFR-2 inhibitors.