Introduction

Cancer is a growing public health problem, with an estimated incidence of about six million new cases per year globally. It is the second most important cause of death around the world, and half of all cancer-related deaths occur in developed countries [1]. Medicinal plants are often investigated as a source of new drugs for treating cancer; indeed, 60% of the anticancer drugs that have been approved by the FDA originate from plants [2].

The brain and spinal column comprise the central nervous system (CNS), where all of the vital functions of the body are controlled. When tumors arise in the central nervous system, they are difficult to treat, because the tissues surrounding the tumor may play a vital role in normal body function, so it is highly undesirable to risk affecting them through surgery or radiotherapy. Another type of cancer, lung cancer, is a leading cause of death in both men and women, and occurs most commonly between the ages of 45 and 70. Hence, the discovery of new drugs for treating CNS and lung cancers is an important task.

Triterpenes exist abundantly in the plant kingdom. Over the past few years, triterpenoids from higher plants have been shown to possess a wide range of biological activities [3], such as cytotoxic [4], antitumor [5], antiviral [6], anti-inflammatory [7], and anti-HIV [8] activities. Ursolic acid is a ubiquitous triterpenoid in the plant kingdom, in medicinal herbs, and is an integral part of the human diet [3]. It has shown significant cytotoxicity against various tumor cell lines [7, 8], and in recent years a large number of ursolic acid analogs with anticancer activity have been synthesized [9, 10]. Some ursane triterpenoids with modified A and C rings have been reported to possess high inhibitory activities against nitric oxide production. This suggests that these compounds could potentially be used as cancer chemopreventive drugs, as excessive production of NO, which is closely related mechanistically to carcinogenesis, can destroy functional normal tissues [1115]. However, ursolic acid analogs tend to have high molecular weights and solubility issues [16], which is why they have not been thoroughly explored in terms of their cytotoxic activities. In this regard, a good understanding of their chemical properties at the molecular level—such as their lipophilic, steric, and electronic characteristics—may provide important information on the anticancer properties of these analogs.

The quantitative structure–activity relationship (QSAR) approach has emerged as a promising tool for the effective screening of potential drugs. The ultimate goal of QSAR studies is to correlate the biological activities of a series of compounds with some appropriate descriptors. Among the different descriptors that can be used to describe the electronic properties of molecules, the dipole vector, the ring count, and the solvent-accessible surface area have been found to be useful in several QSAR studies [7, 8]. However, the highest occupied molecular orbital (HOMO) and the lowest unoccupied molecular orbital (LUMO) energies have been shown to correlate particularly well with various biological activities [17]. As part of our drug discovery program, in the study described in this paper, we developed QSAR models for predicting the activities of ursolic acid analogs against human lung (A-549) and CNS (SF-295) cancer cell lines. A total of 41 virtual analogs of ursolic acid were screened using these developed QSAR models, and the models predicted that a few of these analogs of ursolic acid should possess high anticancer activities. To validate the predictions made by the models, we then carried out the semisynthesis of the analogs of interest in the wet lab, and experimentally evaluated their in vitro anticancer activities against the human lung (A-549) and CNS (SF-295) cancer cell lines. In this way, we designed ursolic acid analogs with enhanced anticancer activities using QSAR models and in silico pharmacokinetic and PK compliance (ADME).

Materials and methods

Molecular modeling parameters and energy minimization

Molecular construction, geometry optimization, and energy minimization of ursolic acid analogs was carried out using SYBYL-X 1.3 (Tripos, St. Louis, MO, USA) on an HP xw4600 workstation with an Intel Core 2 Duo E8400 (3.2 GHz) processor and 4 GB of RAM, running the Red Hat® Enterprise Linux 4.0 (32-bit compatible) operating system (Silicon Graphics Inc., Mountain View, CA, USA). The Tripos force field [16] and Gasteiger–Hückel charges were used for energy minimization. 2D structures were converted to 3D structures using the program Concord 4.0. The maximum number of iterations performed in the minimization was set to 2000. Minimization was terminated when the energy gradient convergence criterion of 0.05 kcal mol−1 Å−1 was reached. Further geometry optimization was carried out with the MOPAC 6 package using the semiempirical PM3 Hamiltonian method [18, 19].

A total of 36 compounds/drugs (Table 1) were added to the training set that was used to develop the QSAR model for activity against the human lung cancer cell line (A-549), while 26 compounds/drugs (Table 2) were employed in the training set to develop the QSAR model for activity against the human CNS cancer cell line (SF-295), based on 50 chemical descriptors. Selection was made on the basis of structural/pharmacophore or chemical class similarity, in order to include a diverse set of data rather than only compounds from the same family. Similarly, in order to select the best subset of descriptors, highly correlated descriptors were excluded through covariance analysis using a correlation matrix (see the “Electronic supplementary information,” ESM, files 1 and 2). These descriptors were used for model development utilizing a forward stepwise multiple linear regression method. The derived QSAR models had high regression coefficients. The QSAR models were successfully validated through the use of random test set compounds, and the robustness of their predictions were validated through the crossvalidation coefficient.

Table 1 Structures and experimental and predicted activities of the compounds included in the training set used to develop a QSAR model for activity against a human lung cancer cell line (A-549)
Table 2 Structures and experimental and predicted activities of the compounds included in the training set used to develop a QSAR model for activity against a CNS human cancer cell line (SF-295)

Selection of structural chemical descriptors for QSAR modeling

The biological activity of an ursolic acid analog can be expressed quantitatively as in the concentration of that substance which is required to achieve a certain biological response. Additionally, when physicochemical properties or structures are expressed numerically, it is possible to form a mathematical relationship between the two. This mathematical expression can then be used to predict the biological responses to other chemical structures [2023]. Before novel compounds are tested experimentally as potential drugs, predicting their toxicities/activities allows us to calculate the risk factors associated with administering them. A QSAR model ultimately helps to predict these important parameters (i.e., IC50 and LD50 values). Some of the important chemical descriptors used in multiple linear regression analysis were: atom count (all atoms), atom count (carbons), atom count (hydrogens), atom count (oxygens), bond count (all bonds), minimum energy of conformation (kcal mol−1), connectivity index (order 0, standard), connectivity index (order 1, standard), connectivity index (order 2, standard), dipole moment (debyes), dipole vector X (debyes), dipole vector Y (debyes), dipole vector Z (debyes), electron affinity (eV), dielectric energy (kcal mol−1), steric energy (kcal mol−1), total energy (hartrees), group count (amines), group count (carboxyls), group count (ethers), group count (hydroxyls), group count (methyls), heat of formation (kcal mol−1), HOMO energy (eV), ionization potential (eV), λ max UV–visible (nm), λ max far-UV–visible (nm), logP, LUMO energy (eV), molar refractivity, molecular weight, polarizability, ring count (all rings), size of smallest ring, size of largest ring, and solvent-accessible surface area (Å2).

QSAR model for cytotoxic activity against the lung cancer cell line (A-549)

To develop a QSAR model for predicting cytotoxic activity against the lung cancer cell line A-549, a training set containing 36 drugs/compounds was devised, and 50 chemical descriptors were included during model development (Table 1). Forward stepwise multiple linear regression QSAR modeling was performed using a leave-one-out approach to validation. It was observed that the cytotoxic drugs/compounds in the training set were fitted well by this model. Three molecular descriptors—LUMO energy (eV), ring count (all rings), and solvent-accessible surface area (Å2)—were significantly correlated with anticancer activity:

$$ \begin{gathered} {\text{Predicted log}}{{IC}_{{{5}0}}}(\mu {\text{M}}) = + 0.{6713}0{\text{1} \times \text{ LUMO energy }}\left( {\text{eV}} \right) \hfill \\ - 0.{\text{31319} \times \text{ ring count }}\left( {\text{all rings}} \right) \hfill \\ - 0.00{\text{276924} \times \text{ solvent-accessible surface area }}\left( {{{\AA}^{{2}}}} \right) \hfill \\ + {4}.0{7115} \hfill \\ \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\left[ {{{r}^{{2}}} = 0.{\text{852225 and } r{\text{C}}}{{\text{V}}^{{2}}} = 0.{8}00{499}} \right] \hfill \\ \end{gathered}. $$
(1)

This QSAR model equation shows that there is a relationship between in vitro experimental activity (IC50) as the dependent variable and the three chemical descriptors mentioned above as independent variables. The regression coefficient r 2 = 0.85 indicates 85% correlation between the activities and the chemical descriptors of the training data set compounds, while the crossvalidation regression coefficient rCV2 = 0.80, meaning that the prediction accuracy of the QSAR model is 80% (Fig. 1). It is evident from the above equation that among the molecular descriptors, LUMO energy (eV) is positive correlated with activity, i.e., if LUMO energy increases the biological activity against the lung cancer cell line also increases. On the other hand, the ring count (all rings) and solvent-accessible surface area (Å2) are both negatively correlated with activity, meaning that the biological activity decreases if these descriptors increase.

Fig. 1
figure 1

Graph of experimental vs. predicted activities for the training and test set molecules from the multiple stepwise linear regression model. Training set is denoted by black dots and the test set by red dots

QSAR model for cytotoxic activity against the CNS cancer cell line (SF-295)

To develop a QSAR model for predicting cytotoxic activity against the CNS cancer cell line SF-295, a training set containing 26 drugs/compounds was produced, and 50 chemical descriptors were included during model development (Table 2). Forward stepwise multiple linear regression QSAR modeling was performed using a leave-one-out approach to validation. It was observed that the cytotoxic drugs/compounds in the training set were fitted well by this model. Two molecular descriptors—dipole vector Z (debyes) and solvent-accessible surface area (Å2)—were significantly correlated with anticancer activity:

$$ \begin{gathered} {\text{Predicted log}}{{\text{IC}}_{{{5}0}}}(\mu {\text{M}}) = + 0.0{\text{777154} \times \text{ dipole vector Z }}\left( {\text{debye}} \right) \hfill \\ + 0.0{\text{118329} \times \text{ solvent-accessible surface area }}\left( {{{\AA}^{{2}}}} \right) \hfill \\ - {4}.{11523} \hfill \\ \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\left[ {{{r}^{{2}}} = 0.{9875}0{\text{8 and } r{\text{C}}}{{\text{V}}^{{2}}} = 0.{962561}} \right] \hfill \\ \end{gathered}. $$
(2)

This QSAR model equation shows that there is a relationship between in vitro experimental activity (IC50) as the dependent variable and the two chemical descriptors mentioned above as independent variables. The regression coefficient r 2 = 0.98 indicates 98% correlation between the activities and the chemical descriptors of the training data set compounds, while the crossvalidation regression coefficient rCV2 = 0.96, meaning that the prediction accuracy of the QSAR model is 96% (Fig. 2).

Fig. 2
figure 2

Graph of experimental vs. predicted activities for the training and test set molecules from the multiple stepwise linear regression model. Training set is denoted by black dots and the test set by red dots

HUMO–LUMO energy calculation of virtually active analogs of ursolic acid

Further QSAR modeling were supported by a theoretical approach that was used to correlate electronic indices to the biological activity, and which derived a simple rule for predicting the biological activities of ursolic acid derivatives, a novel class of inhibitors of osteoclast formation. This approach considered the energy separation of the frontier molecular orbitals and their relative contributions to the local density of electronic states in specific molecular regions [17]. In order to further explore structure–activity relationships, a preliminary study involving semiempirical and ab initio calculations of the locations and the relative energies of the frontier molecular orbitals—namely the HOMO (the highest occupied molecular orbital) and the LUMO (the lowest unoccupied molecular orbital)—in all of the virtually active analogs of ursolic acid was performed, by calculating optimized geometries in MO-G [20, 21, 2325] using PM3 parameters.

Screening for druglikeness using pharmacokinetic properties

The ideal oral drug is one that is rapidly and completely absorbed in the gastrointestinal tract, is distributed specifically to its site of action in the body, is metabolized in a way that does not instantly remove its activity, and is eliminated in a suitable manner, without causing any harm. It is reported that around half of all developed drugs fail to make it to the market due to poor pharmacokinetic (PK) properties [26]. PK properties depend on the chemical properties of the molecule. PK properties such as absorption, distribution, metabolism, excretion, and toxicity (ADME) are important factors in the success of the compound for human therapeutic use [2729]. To screen for potential druglike leads, different PK properties of the ursolic acid analogs were analyzed. The importance of some of these ADME properties is summarized here to aid reader comprehension. Polar surface area is considered a primary determinant of fraction absorbed [30]. The relation between low molecular weight of the compound and oral absorption has been considered [31]. The distribution of the compound in the human body depends on factors such as the blood–brain barrier (BBB), permeability, volume of distribution, and plasma protein binding [32], so these parameters were calculated. The octanol–water partition coefficient has been implicated in BBB penetration and in permeability prediction, as has the polar surface area [33]. It has been reported that the excretion process, which eliminates the compound from the human body, depends on the molecular weight and the octanol–water partition coefficient. Similarly, rapid renal clearance is associated with small and hydrophilic compounds. The metabolism of most drugs in the liver is associated with large, hydrophobic compounds [34]. High compound lipophilicity leads to increased metabolism and poor absorption, along with an increased probability of binding to undesirable hydrophobic macromolecules, thereby increasing the potential for toxicity [33]. In spite of some observed exceptions to Lipinski’s rule, the values of the PK properties of the vast majority (90%) of orally active compounds are within their cut-off limits [35, 36]. Molecules that violate more than one of these rules may have problems with bioavailability. Lipinski’s “rule of five” was used to study the PK properties of the ursolic acid analogs considered here, in order to determine their druglikeness. Briefly, this rule is based on the observation that most orally administered drugs have a molecular weight (MW) of 500 or less, a logP of no higher than 5, five or fewer hydrogen-bond donor sites, and ten or fewer hydrogen-bond acceptor sites (N and O atoms). In addition, the bioavailability of each derivative was assessed through topological polar surface area analysis. We calculated the polar surface area (PSA) using a method based on summing the tabulated surface contributions of polar fragments, termed topological PSA (TPSA) (ChemAxon-Marvinview 5.2.6: PSA plugin [37]). The PSA contributed by the polar atoms of the molecule. This descriptor was shown to correlate well with passive molecular transport through membranes, so it allows the transport properties of drugs to be predicted, and has been linked to drug bioavailability. The percentage of the dose that reaches the circulation is called the bioavailability. Generally, passively absorbed molecules with PSA > 140 Å2 are thought to have low oral bioavailabilities [28, 37]. The number of rotatable bonds is another simple topological parameter used by researchers under an extended Lipinski’s rule as measure of molecular flexibility. It has been shown to be a very good descriptor of oral drug bioavailability [38]. A rotatable bond is defined as any single nonring bond to a nonterminal, heavy (i.e., nonhydrogen) atom. Amide C–N bonds are not considered to be rotatable because of their high rotational energy barriers. Moreover, some researchers also include the sum of H-bond donors and H-bond acceptors as a secondary determinant of fraction absorbed. The primary determinant of fraction absorbed is polar surface area [30, 39]. According to the extended rule, the sum of H-bond donors and acceptors should be ≤12 or the polar surface area should be ≤140 Å2, and the number of rotatable bonds should be ≤10 [37]. Calculations of other important ADME properties of ursolic acid analogs were performed using QikProp (version 3.2, Schrödinger, LLC, San Diego, CA, USA, 2009). We also screened for active ursolic acid analogs using Jorgensen’s rule of three, which state that logS should be more than −5.7, P Caco should be >22 nm/s, and the number of primary metabolites should be <7 (Schrödinger). It is assumed that ursolic acid analogs that do not violate Jorgensen’s rule are more likely to be orally available.

General experimental procedure

300 MHz 1H and 75 MHz 13C NMR spectra of the analogs were recorded on a Bruker (Billerica, MA, USA) 300 spectrometer in either C5D5N or CDCl3 solution. The chemical shifts are presented in this work as ppm with tetramethylsilane (TMS) used as the internal reference, and J values are reported in hertz. Carbon atom types (C, CH, CH2, CH3) were determined via DEPT pulse sequence. Silica gel G or H (Merck, Whitehouse Station, NJ, USA) was used for TLC, VLC, and flash chromatography. Reactions that required an inert atmosphere were carried out under N2 with oven-dried glassware. All amines were purchased from Spectrochem (Mumbai, India), and all alcohols were purchased from Thomas Baker Pvt. India Ltd. (Mumbai, India). All reactions were monitored by TLC on precoated Merck 60 F254 silica gel. All spots on the TLC plates were visualized with a spray reagent [vanillin–ethanol sulfuric acid (1 g: 95 ml: 5 ml)] and then heated for 5–10 min at 110 °C.

Plant material

To isolate ursolic acid, leaves of E. hybrid were collected from the medicinal farm of the Central Institute of Medicinal and Aromatic Plants (CIMAP, Lucknow, Uttar Pradesh, India) during January 2008. A voucher specimen (CIMAP no. 12470) has been deposited in the herbarium section of the Botany Department of CIMAP.

Extraction and isolation of ursolic acid

The leaves of E. hybrid were air dried under shade and then powdered. Extraction and fractionation of the leaves was carried out as shown in Fig. 3. The powdered material (1.3 kg) was defatted with hexane (4 × 6 L, 24 h each) at room temperature, which yielded a hexane extract (4 g). The defatted material was then further extracted with methanol (4 × 5 L) and left the residual part, termed as Marc. The combined methanol extract was subjected to complete solvent removal at 40 °C under vacuum. This dried methanolic extract was dissolved in distilled water (2 L) and successively extracted with hexane and ethyl acetate (4 × 400 ml). The combined hexane and ethyl acetate extracts were separately subjected to vacuum distillation at 40 °C, which yielded hexane (2 g) and ethyl acetate (45 g) extracts, respectively. To isolate the ursolic acid, the EtOAc extract (7 g) was separated using vacuum liquid chromatography (VLC) with silica gel H (150 g, average particle size 10 μm, G1 104 × 90 mm).

Fig. 3
figure 3

Schematic procedure for the extraction and fractionation of Eucalyptus hybrid leaves

Gradient elution of VLC was carried out with hexane, chloroform, chloroform, and methanol in various proportions. Fractions of 50 ml were collected, and a total of 284 fractions were collected. Fractions were pooled based on their TLC (SiO2, chloroform: methanol 9:1 and 9:3; vanillin–sulfuric acid) profile. The VLC fractions 175–182 (1.5 g) that eluted with CHCl3 (100%) were crystallized with the aid of chloroform, and the crystals were washed with hexane and filtered under vacuum, which resulted in the isolation of 1 g of pure white crystals. The TLC profile of the crystalline product was very similar to that of an authentic sample of ursolic acid in a different solvent system, and was therefore characterized as ursolic acid on the basis of its spectroscopic data [40].

Semisynthesis of virtually active analogs of ursolic acid

In order to validate the developed QSAR models, the predicted virtually active analogs of ursolic acid (2–9, Fig. 4) were semisynthesized in the lab according to the procedures reported [17, 41]. For the semisynthesis of ester and amide analogs of ursolic acid (UA-1) in alkaline conditions, the hydroxyl group of UA-1 was protected with acetate. The protected 3-O-acetylursolic acid was obtained by reacting UA-1 with acetic anhydride (2 equiv.) in the presence of dry pyridine. To prepare the acid chloride, the 3-O-acetylursolic acid was reacted with oxalyl chloride (1–2 equiv.) in dry dichloromethane (DCM) under an N2 atmosphere. After 3 h of stirring, the respective dry alcohols (1.5 equiv.) for esters or dry amines (1.5 equiv.) for amides were added under a nitrogen atmosphere. The resulting airtight reaction mixture was refluxed for 3-4 h, which resulted in the formation of the desired ester and amide analogs. The products were further purified by column chromatography, which afforded the desired analogs in yields of 65–80%. All the analogs were characterized on the basis of their 1H and 13C NMR spectroscopic data.

Fig. 4
figure 4

Semisynthesis of ester and amide derivatives of ursolic acid

Cytotoxicity assay

The human lung (A-549) and CNS (SF-295) cancer cell lines were procured from the National Cancer Institute (Frederick, MD, USA). Cells were grown in tissue culture flasks in complete growth medium (RPMI-1640 medium with 2 mM glutamine, pH 7.4, supplemented with 10% fetal calf serum, 100 μg/mL streptomycin, and 100 IU/mL penicillin) in a carbon dioxide incubator (37 °C, 5% CO2, 90% RH). The cells at the subconfluent stage were harvested from the flask by treatment with trypsin [0.05% in PBS (pH 7.4) containing 0.02% EDTA]. Cells with a viability of more than 98%, as determined by trypan blue exclusion, were used to determine cytotoxicity. A cell suspension of 1 × 105 cells/mL was prepared in complete growth medium. Stock solutions (2 × 10−2 M) were prepared in 20% pyridine + 80% DMSO. A suitable control with appropriate concentrations of pyridine and DMSO was used for comparison. The stock solutions were serially diluted with complete growth medium containing 50 μg/mL of gentamycin to obtain a working test solution of 1 × 10−4 M.

The in vitro cytotoxicities of UA-1 and its analogs UA-2 to UA-9 against the five cancer cell lines were determined using 96-well tissue culture plates [42]. One hundred microliters of cell suspension were added to each well of the 96-well tissue culture plates. The cells were allowed to grow in a CO2 incubator (37 °C, 5% CO2, 90% RH) for 24 h. Test materials in complete growth medium (100 μL) were added after 24 h incubation to the wells containing cell suspension. The plates were further incubated for 48 h (37 °C, 5% CO2 and 90% RH) in a carbon dioxide incubator. Cell growth was stopped by gently layering trichloroacetic acid (50% TCA, 50 μL) on top of the medium in all of the wells. The plates were incubated at 4 °C for 1 h to fix the cells attached to the bottom of the wells. The liquid from all of the wells was gently pipetted out and discarded. The plates were washed five times with distilled water to remove TCA, growth medium, low molecular weight metabolites, and serum proteins, and air dried. Cell growth was measured by staining with sulforhodamine B dye (0.4 % w/v in 1% acetic acid) [43]. The adsorbed dye was dissolved in Tris-HCl buffer (100 μL, 0.01 M, pH 10.4) and the plates were gently stirred for 10 min on a mechanical stirrer. The optical density (OD) was recorded on an ELISA reader at 540 nm. Anticancer activity results of UA-1 and its analogs (UA-2 to -9) are presented in Table 3 after deducting the cytotoxic effect of the vehicle (20% pyridine + 80% DMSO) at equivalent concentration.

Table 3 Predicted anticancer activities (IC50 in μM) of UA-1 and its virtual analogs (UA-2 to -14) against the lung cancer cell line A-549

Results and discussion

In the present work, we first calculated most of the physicochemical properties (descriptors) of compounds/drugs that have been experimentally shown to possess anticancer activity against human lung (A-549) and CNS (SF-295) cancer cell lines for the training set. Further, we carried out forward stepwise multiple linear regression analysis and identified the highly correlated properties responsible for the anticancer activity against the above lung and CNS cancer cell lines. To validate the derived QSAR model, we used a leave-one-out (LOO) approach and evaluated the QSAR model through test data sets (see ESM files 3 and 4), which indicated that the model has significant accuracy. These data were also supported by HUMO–LUMO energy-minimization geometric parameters. Further, experiments were carried out to isolate ursolic acid from the leaves of E. hybrid. Later on, the isolated ursolic acid was used for the semisynthesis of the predicted virtual analogs of UA-1. All of the semisynthetic analogs of ursolic acid were characterized on the basis of their 1H and 13C NMR spectroscopic data. Finally, the semisynthetic analogs of UA-1 were evaluated in vitro for their anticancer activities against the human lung (A-549) and CNS (SF-295) cancer cell lines in order to validate their predicted activities.

Virtual screening of ursolic acid analogs for cytotoxic activity against the lung cancer cell line A-549

After developing a validated QSAR model for activity against the lung cancer cell line, we screened ursolic acid (UA-1) and 13 of its virtual analogs (UA-2 to -14) (Fig. 5), and the results are presented in Table 3. They show that all of the analogs are active against the human lung cancer cell line (A-549), but among the 13 analogs, eight (UA-2 to -9) were more active. Further, careful analysis of the most active analogs showed that the 4-bromoanilamideursolic acid analog UA-9 was the most active of all, and possesses higher cytotoxic activity than the control drug adriamycin.

Fig. 5
figure 5figure 5

Structures of the predicted ursolic acid derivatives

Virtual screening of ursolic acid analogs for cytotoxic activity against the CNS cell line SF-295

After developing a validated QSAR model for activity against the CNS cancer cell line, we screened ursolic acid (UA-1) and 26 of its virtual analogs (UA-15 to -32) (Fig. 5), and the results are presented in Table 4. They showed that all of the analogs are active against the human CNS cancer cell line (SF-295), but among the 26 analogs, eight (UA-2 to -9) were more active. Further, careful analysis of the most active analogs showed that the methyl and ethyl ester analogs of ursolic acid (UA-2 and -3) were the most active of all, and possess higher cytotoxic activity than the control drug cisplatin.

Table 4 Predicted anticancer activities (IC50 in μM) of UA-1 and its virtual analogs (UA-15 to -32) against the CNS cancer cell line SF-295

HUMO–LUMO energy calculations for the virtually active analogs of ursolic acid

HUMO–LUMO energy calculations for all of the virtually active analogs of ursolic acid were performed via geometry optimization calculations in MO-G using PM3 parameters (Table 5). The results showed that two analogs, UA-2 and UA-9, of ursolic acid exhibited strong biological activities and higher orbital energies than the other analogs, but large differences in the locations and the relative energies of the HOMO and LUMO (E HOMO and E LUMO) were observed.

Table 5 Energies of the frontier molecular orbitals of ursolic acid derivatives with biological activities in μM against CNS and Lung cancer cell lines

For UA-9, the HOMO was mainly located on the double bond of ring C and partially on ring D, while the LUMO was mainly located on the double bond of ring E and partially on the side chain of ring D. UA-9 possessed a higher E HOMO than UA-2 and UA-8 (Table 5). This compound, which has potent anticancer activity, has a high E HOMO, which accounts for its electron-donating ability. A graphical representation of the HOMO and LUMO of UA-9 is given in Fig. 6. These results suggest that conversion of ursolic acid into derivatives having electron donating groups such as in UA-9 will have strong impact on the energies and locations of the HOMO and LUMO hence, the compound with the highest E HOMO would have the most potent activity.

Fig. 6
figure 6

Comparison of the frontier molecular orbitals (HOMO and LUMO) of compound UA-9

From the above, we can conclude that the energy of the HOMO and the energy difference between the HOMO and LUMO are important and related to the anticancer activity of the analog. Further, the energy of the highest occupied molecular orbital (E HOMO) has a significant effect on the activity, as the energy of the HOMO is directly related to the ionization potential of the analog, and characterizes the susceptibility of the molecule to electrophilic attack. The above ursolic acid analogs tend to lose a pair of electrons to an electrophile, and are thus soft nucleophiles. It can also be concluded that the HOMO–LUMO energy gap plays a significant role in antitumor activity. The HOMO–LUMO energy gap is an important stability index. As the above ursolic acid analogs have large HOMO–LUMO energy gaps, these compounds are very reactive in interactions and have high excitation energy. Further studies may clarify the relationship between the electronic structure and activity, thus providing better guidance when synthesizing a more potent analog.

Pharmacokinetic studies of bioavailability

During the 1990s, the pharmaceutical industry noticed that too many compounds were being terminated during clinical development due to unsatisfactory pharmacokinetics (PK). Thus, it is essential to consider PK parameters during lead optimization. PK properties such as absorption, distribution, metabolism, excretion, and toxicity (ADMET) are important influences on the success of the compound for human therapeutic use. Therefore, we considered several physiochemical properties related to the PK while screening the active, druglike compounds. Lipophilicity (the ratio of the solubility of the analog in octanol compared to its solubility in water), as measured through logP, was found to be quite high for all of the designed compounds. LogP has been implicated in blood–brain barrier penetration, as well as permeability; the excretion process that eliminates the compound from the human body also depends on logP as well as the molecular weight. Except for ursolic acid, all of the analogs have high molecular weights, so they are likely to have low solubilities and to pass through cell membranes with difficulty. Ursolic acid, which has an intermediate value for the lipophilicity, has a better chance of arriving at the receptor site. The analogs have limited polarity to aid with permeation and absorption, as revealed by their H-bond donors and H-bond acceptors.

All of the studied analogs have low oral bioavailabilities because they violate Lipinski’s rule of five by two parameters: they have high logP values and molecular weights (Table 6). Moreover, when we calculated the topological polar surface area (TPSA) as a chemical descriptor for passive molecular transport through membranes, the results showed that their TPSA values are <140 Å2. Generally, passively absorbed molecules with TPSA values of >140 Å2 have low oral bioavailabilities. Calculations related to aqueous solubility, serum protein binding, the blood–brain barrier (log BB and apparent MDCK cell permeability), the gut–blood barrier (Caco-2 cell permeability), predicted central nervous system activity, number of likely metabolic reactions, log IC50 for HERG K+ channel blockage, transdermal transport rate (J m), skin permeability (K p), and human oral absorption in the gastrointestinal tract showed that the active ursolic acid derivatives had values for these parameters that were within the standard ranges of drugs (Table 7). Based on bioavailability and in silico ADME screening (Table 6), we concluded that ursolic acid and its derivative UA-9 have marked cytotoxic activities.

Table 6 Compliance of the ursolic acid derivatives with the recommended ranges of computed bioavailability parameters and druglikeness properties
Table 7 Compliance of the ursolic acid derivatives with the recommended ranges of computed pharmacokinetic parameters (ADME)

Chemistry

Chemical structure–activity relationship

A total of 31 virtual analogs of ursolic acid (UA 2-32) were evaluated for their anticancer activities using QSAR models of activity against human lung (A-549) and CNS (SF-295) cancer cell lines, followed by HUMO–LUMO energy minimization. From the results shown in Table 3 and 4, it is evident that virtual analogs UA-2 to UA-9 are more active against the lung (A-549) as well as CNS (SF-295) cancer cell lines. Thus, we carried out semisynthesis of these ursolic acid analogs (UA-2 to -9) in the wet lab. The pentacyclic base moiety of ursolic acid was used as a pharmacophore, and its 3-hydroxy and 28-oic acid groups were used to add flexibility to the molecule. The detailed method used for the semisynthesis of ursolic acid analogs (UA-2 to -9) was discussed in the “Materials and methods” section. The 13C NMR chemical shift assignments for the derivatives are shown in Table 8, while the 1H NMR and MS data for the derivatives are shown in the ESM (file 5).

Table 8 13C NMR chemical shift assignments for UA-1 and its derivatives UA-1b and UA-2 to UA-9

The cytotoxic activities of ursolic acid (UA-1) and its semisynthetic ester (UA-2 to -7) and amide (UA-8 to -9) derivatives were tested against the various cancer cell lines, and the results are presented in Table 9; the corresponding values for paclitaxel, adriamycin, and mitomycin are also included in the table for comparison, as they are standard anticancer drugs. All of the the compounds showed cytotoxicity against the two cancer cell lines. Ursolic acid (UA-1) itself showed significant activity against both the human lung (A-549) and CNS (SF-295) cancer cell lines. From Table 9, it is evident that UA-9 is 5–7 times more active than the starting material UA-1 against the human lung (A-549) and CNS (SF-295) cancer cell lines, as calculated by the QSAR model.

Table 9 Cytotoxic activities of UA-1 and its ester and amide analogs (UA-2 to -9) against human lung (A-549) and CNS (SF-295) cancer cell lines

Further, it is worth mentioning that UA-9 is 1.7 times more active than the anticancer drug mitomycin against the human lung cancer cell line A-549, while it has almost the same level of activity against the human lung (A-549) and CNS (SF-295) cancer cell lines as the anticancer drug paclitaxel (Fig. 7).

Fig. 7
figure 7

Cytotoxic activities of ursolic acid (UA-1) and its ester and amide derivatives (UA-2 to -9) against lung (A-549) and CNS (SF-295) human cancer cell lines

We can therefore conclude that UA-9 possesses potential activity against human lung (A-549) and CNS (SF-295) cancer cell lines. Its activity should help us to identify and prepare new active derivatives economically.

Conclusions

Molecular modeling calculations were used to predict the potential cytotoxic activities of ursolic acid analogs. The screening of the ursolic acid analogs using the derived QSAR models showed that some of the UA analogs possess significant anticancer activity, but these analogs violate Lipinski’s rule, indicating low oral availability. Moreover, when we calculated the TPSA as chemical descriptor for passive molecular transport through membranes, the results showed that the analogs complied with the standard range TPSA < 140 Å2. Based on bioavailability and in silico ADME screening, we concluded that ursolic acid (UA-1) and its 4-bromoanalamideursolic acid analog (UA-9) have marked cytotoxic activities. UA-9 was also prepared experimentally from UA-1 via semisynthesis, and later evaluated for its anticancer potential in vitro; it demonstrated promising activity against the human lung (A-549) and CNS (SF-295) cancer cell lines. These results may be of great help in the development of anticancer drugs from a very common, inexpensive, and nontoxic natural product.