Introduction

General

Among the innumerable viruses that cause infection and disease in humans and animals is a large family called retroviruses, which cause cancers and leukemias. Some retroviruses, called immunodeficiency viruses, cause immune deficiency in cattle, monkeys, and humans. Retroviruses have some very interesting properties. Once a person is infected, the infection remains life-long. In most people the infection causes no disease for a long period. Yet the infected person can be the source of infection for others.

Human immunodeficiency virus type 1 (HIV-1) integrase is an enzyme required for viral replication (De Clercq, 1995). HIV integrase catalyzes integration of viral DNA into host genome in two separate but chemically similar reactions known as 3′ processing and DNA strand transfer (Craigie et al., 1990; Katz et al., 1990). In 3′ processing, integrase (IN) removes a dinucleotide next to a conserved cytosine-adenine sequence from each 3′- end of the viral DNA. IN then attaches the processed 3′-end of the viral DNA to the host cell DNA in the strand transfer reaction. As there is no known human counterpart of HIV IN, it is an attractive target for antiretroviral drug design (Chen et al., 2000). A large number of HIV IN inhibitors have been discovered (Neamati, 2002). However, the mechanism of action is incompletely understood (Parril, 2003).

Several families of IN inhibitors have been identified. Most of them can be classified into three groups: DNA ligands, C-terminal domain ligands, and compounds that interfere with the catalytic domain of the protein. The first family contains nonspecific intercalating agents (Carteau et al., 1993; Fesen et al., 1993) as well as more specific oligonucleotide targeting IN binding sites on both long terminal repeats (LTRs; Mouscadet et al., 1994).

While many IN inhibitors have now been developed, only a handful displayed antiviral activity in cell culture. This group comprises lignanolides (Eich et al., 1996), curcumin (Majumder et al., 1995), aurintricarboxylic acids (Cushman and Sherman, 1992), dicaffeoyl quinic acids and analogues (Robinson et al., 1996a,b), diarylsulfones (Mazumder et al., 1996), and finally G-rich oligonucleotides (Hansch, 1969).

Computational chemistry has developed into an important contributor to rational drug design. Quantitative structure–activity relationship (QSAR) modeling results in a quantitative correlation between chemical structure and biological activity (Parril, 2003). In this study, we performed QSAR analysis of caffeoyl naphthalene sulfonamide derivatives using WIN CAChe 6.1 and STATISTICA.

The software

The software WIN CAChe 6.1 was used to draw the molecules, to minimize the energy of the molecules, and to calculate the physicochemical properties (Table 1) on a project leader provided in the software. This was followed by regression analysis, which was performed via STATISTICA (Softstat). WIN CAChe 6.1 is a product of Fujitsu private limited (http://www.cachesoftware.com/contacts/japan.shtml and http://www.cachesoftware.com/techsupport/download.shtml).

Table 1 Physicochemical parameters and the antiviral activity data used to derive QSAR

Objective of the study

Some of the features of anti- HIV agents include polyaromatic rings separated by central linker and presence of catechol moieties. It has been reported that the majority of natural products endowed with anti-IN activity were characterized by one or two 3,4- dihydroxycinnamoyl moieties (Santo et al., 2003). Despite these features, caffeoyl naphthalene sulfonamides do not prevent the replication of HIV at nontoxic concentrations. Hence the present study was conducted to design molecules with improved potency.

Molecular descriptors

A property calculated from mathematical and physical entities of a compound’s molecular structure is called as descriptor. Examples of descriptor types are topological, thermodynamic, spatial, and electronic.

Spatial descriptors describe the molecule’s “solvent-accessible” surface areas and their charges. Electronic descriptors describe the electron orientation and charge.

Topological descriptors are based on graph/structure concepts and geometric features such as shape, size, and branching. Thermodynamic descriptors describe energy of molecules and their conversions. Quantum mechanical descriptors are calculated using semiempirical methods that are likely to be more accurate.

The following list includes some of the experiments that are available with Project Leader of CAChe 6.1. These experiments can be performed on one or more chemical samples simultaneously to calculate the properties:

  • Optimization to find a low energy structure for steric energy, heat of formation, or total energy

  • Net positive and negative charge for a molecule

  • Molecular formula, weight, and refractivity

  • Ring count and size

  • Investigation of molecular orbital energies such as HOMO and LUMO energies

  • Calculation of the dielectric, steric, total energy, and heat of formation of a structure at its current geometry

  • Investigation of visible and UV-visible spectra data

  • Zero-order, first-order, and second-order molecular or valence connectivity indices

  • Dipole moment and dipole vectors x, y, and z

  • Electron affinity

  • Shape index order 1, 2, and 3

  • Octanol–water partition coefficient.

In the present study, the descriptors calculated were zero-order molecular or valence connectivity index (CI0), first-order molecular or valence connectivity index (CI1), dipole moment (DM), electron affinity (EA), total energy at its current geometry after optimization of structure (TE), heat of formation at its current geometry after optimization of structure (HF), highest occupied molecular orbital energies (HOMO), lowest unoccupied molecular orbital energies (LUMO), UV-visible spectra data (LMAX), octanol–water partition coefficient (LOGP), conformational minimum energies (ME), molar refractivity (MR), ionization potential (IP), shape index order 1 or basic kappa order 1 (BKO1), and solvent-accessible surface area (SAS).

Results and Discussion

We searched for a molecule having the same nucleus but better biological activity as the existing caffeoyl naphthalene sulfonamide derivatives. After regression analysis via STATISTICA, the best equation (Eq. 1) obtained was:

$$ \eqalign{ {\rm Log} \ (1/{\rm IC}50) =&\ (-3.3466 \pm 0.976) {\rm CIO} + (0.00131 \pm 0.00054) {\rm TE} \cr & + (0 .5726 \pm 0 .295) {\rm LOGP} + (2.5455 \pm 0.775) {\rm BKO1} \cr &+ (8.8528 \pm 2.360)} $$
(1)

n = 20, r = 0.841, s = 0.2868, calculated F-ratio = 9.03, t-test value = 2.131 (95%), r2 = 0.7073, Y-variance = 0.470, Y-mean = −1.526, variance in Y explained via the regression = 70.7%. IC50 is the molar concentration of the drug leading to 50% inhibition of integrase, CI0 = connectivity index, TE = total energy, LOGP = partition coefficient, BKO1 = basic kappa order 1 (shape index), n = number of data points, r = correlation coefficient, s = standard error of regression, F-ratio = F-ratio between variances of calculated and observed value, t-test = Student’s t-test for statistical significance.

Regression analysis of caffeoyl naphthalene sulfonamides, calculated coefficients, and estimates of error are given in Table 2. This equation reveals that biological activity can be increased if (1) the partition coefficient (LOG P) of the molecule is increased by attaching groups that impart good partition coefficient (alkyl groups, aromatic rings, trifluromethane -CF3 etc.); (2) the connectivity index is decreased; (3) the shape index is increased.

Table 2 Regression analysis of caffeoyl naphthalene sulfonamides: calculated coefficients and Estimates of error

Equation (1) is obtained after QSAR of training set of compounds.

$$ \eqalign{ {\rm Log} \ (1/{\rm IC}50) = &\ (- 3.7721 \pm 1.161) {\rm CIO} + (0.00136 \pm 0.00061) {\rm TE} \cr &+ (0.4101 \pm 0.355) {\rm LOGP} + (2.8629 \pm 0.915) {\rm BKO1} \cr &+ (11.2711 \pm 3.387)} $$
(2)

n = 16, r = 0.841, s = 0.3162, calculated F-ratio = 6.35, t-test value = 2.201 (95%), r 2 = 0.698, Y-variance = 0.493, Y-mean = –1.495, variance in Y explained via the regression = 69.8%.

The values of variables present in the Eq.(2) and observed and predicted values for the test set of compounds (Table 3) show that the prediction by the equation obtained via QSAR is very close to the observed values.

Table 3 Physicochemical properties, observed and predicted activities of test set of compounds

The predicted activities of a newly designed series (Table 4) of compounds show that they all have predicted activities ranging from IC50 = 1.0 μg/ml to 1.11 μg/ml whereas in the reported series of caffeoyl naphthalene sulfonamide derivatives (Xu et al., 2003) the most potent compound has an activity of 4.5 μg/ml.

Table 4 Physicochemical properties and predicted activities of a designed series of compounds

Conclusion

Equation 1 predicts that increase in partition coefficient, total energy, and shape index would increase the biological activity of the compound. The effect of an increase of total energy on increase in biological activity is less compared to that of the other three variables. The biological activity is increased when the connectivity index is decreased. Thus we conclude that if the groups that bring about the aforementioned changes in the molecule are attached to it, the biological activity will be increased.

Experimental

Method

The biological activities of all 20 compounds were collected from the literature (Xu et al., 2003). All 20 compounds were built on workspace of WIN CAChe 6.1, followed by minimization of energy by geometry optimization of molecules using MM3 (Molecular Mechanics version 3). The physicochemical properties were calculated via the project leader file of the software (Table 1). This was followed by regression analysis on STATISTICA. Regression analysis included the correlation matrix (Table 5), observed and estimated values with residuals (Table 6), and calculated coefficients and estimates of error (Table 2). Observed and estimated values have been plotted as a graph (Fig. 1).

Table 5 Correlation matrix
Table 6 Observed and estimated values with residuals for a series of caffeoyl naphthalene sulfonamide derivatives
Fig. 1
figure 1

Plot of observed versus estimated anti-HIV values for a series of caffeoyl naphthalene sulfonamide derivatives. The symbol ▲ indicates the outliers

Model validation

The model was validated by taking the first 16 compounds of the series as training set and the last four as the test set. The QSAR was done for the training set and the equation thus obtained (Eq. 2) was used to predict the biological activities of the remaining four compounds of the series. The observed and estimated values with residuals, calculated coefficients, and estimates of error are given in Tables 7 and 8, respectively.

Table 7 Observed and estimated values with residuals for a series of the first 16 caffeoyl naphthalene sulfonamide derivatives
Table 8 Regression analysis: calculated Coefficients and estimates of error for data of first 16 caffeoyl naphthalene sulfonamide derivatives

Designed molecules

On the basis of Eq. (1), a series of 30 compounds (Table 9) was designed to find molecules with higher potencies than existing caffeoyl naphthalene sulfonamide derivatives. The independent variables were calculated and used in Eq. (1) to obtain predicted biological activities of all 30 compounds of the designed series. The structures of these 30 molecules show some relationship with the activities. The general structure of the designed molecule is

Table 9 Series designed on the basis of Eq. (1)

The presence of aromatic groups at the R2 position increases the predicted activity of caffeoyl naphthalene sulfonamide derivatives. The presence of alkyl groups such as ethyl, propyl, and isopropyl on this aromatic ring improves the partition coefficient of the molecule without decreasing the predicted activity of the molecule. Similarly, the presence of groups such as CF3, SO2CF3, and SCF3 increases the partition coefficient and predicted activity of the compound. The presence of a caffeoyl group is essential for good predicted activity. Lower alkyl groups can be attached at positions R4 and R5 to improve the partition coefficient without any decrease in predicted activity.