Introduction

Structure-Property and Structure-Activity Relationships (QSPR/QSAR) remain the focus of many studies aimed at the modeling and prediction of physicochemical and biological properties of molecules [15]. This kind of study is based on the main paradigm of chemistry where the physicochemical and biological properties of organic compounds depend on their molecular structures.

The main contribution to the widespread use of QSPR/QSAR models comes from the development of novel structural descriptors and statistical equations relating various physical, chemical, and biological properties to the chemical structures. The success of the QSPR/QSAR approach can be explained by the insight offered into the structural determination of chemical compounds, and the possibility to estimate the properties without the need to synthesize and test them [6].

Several topological, geometric, electronic, and quantum chemical descriptors have been used in QSPR/QSAR research [79]. The topological descriptors have shown their usefulness in the prediction of diverse physicochemical and biological properties of various types of compounds [1023]. In general, these indices are numbers containing relevant information about the steric structure of molecules. Most of the measured physicochemical properties are steric properties, and consequently they may be reasonably well described by topological indices. However, in some cases, these indices also contain structural information related to the electronic and/or polar features of molecules [24, 25].

The molecular size, shape, polarity, and the ability of the molecule to participate in hydrogen bonding are among the different factors that can contribute to the physicochemical properties or biological activities of a molecule. It is well known that these factors are related to intermolecular interactions such as van der Waals forces and hydrogen-bonding interactions.

In recent years, our research group introduced a new topological descriptor, called the semi-empirical topological index (IET), to predict the chromatographic retention on low polarity stationary phases for different classes of organic compounds [2633]. This topological descriptor takes into account the contributions of each individual atom type or group to the property considered. The IET is able to encode information about structural features such as the presence and position of the heteroatom and the size and branching of the molecules.

In this context, alcohols, which show interesting biological properties, represent an attractive class of organic compounds for QSPR/QSAR studies considering the influence of the hydrogen-bonding interaction. In a recent study [31] the IET was extended to estimate the chromatographic retention of aliphatic alcohols, yielding high-quality structure-property relationships.

The aim of this study is to demonstrate the applicability of IET to predict physicochemical properties and especially biological activities that depend on the strength of intermolecular forces.

Methods

Calculation of the semi-empirical topological index

The development of IET for aliphatic alcohols has been described in a previous paper [31].

In the present approach, the molecules are represented by hydrogen-suppressed molecular graphs based on chemical graph theory, where the carbon atoms and the C–OH group are considered as vertices of the molecular graph of these compounds. The contribution coming from carbon atoms and from the C–OH group to the property considered is represented by a single symbol, C i , as may be observed from Eq. 1. Thus, the IET is expressed as:

$$ I_{{\text{ET}}} = \sum\limits_i {({\text{C}}_i + \delta _i )} , $$
(1)

where C i is the value attributed to C–OH fragments and/or to each carbon atom i in the molecule; and δ i is the sum of the logarithm of the values for each adjacent carbon atom (C1, C2, C3, and C4) and/or the logarithm of the value of the adjacent C–OH group [31].

The values attributed to the carbon atoms and to the functional groups (C i ) for alcohols were calculated by numerical approximation, based on the experimental data and supported by theoretical considerations. These values (C i ) are listed in Ref. [31].

The molecule 2-methyl-3-hexanol is taken as an example of the calculation of the semi-empirical topological index, using Eq. 1:

Structure 1

Structure 1
structure 1

 

Annexe “a”

C1= C3 = –CH3 = 2(1.0 + log 0.75) = 1.7501

C2= >CH– = 0.75 + log 1.0 + log 1.0 + log 1.78 = 1.0004

C4= >CH–OH = 1.78 + log 0.75 + log 0.9 = 1.6093

C5= –CH2– = 0.9 + log 1.78 + log 0.9 = 1.1047

C6= –CH2– = 0.9 + log 0.9 + log 1.0 = 0.8542

C7= –CH3– = 1.0 + log 0.9 = 0.9542

IET= 1.7501 + 1.0004 + 1.6093 + 1.1047 + 0.8542 + 0.9542 = 7.2730

Data set

The physicochemical properties and biological activities of aliphatic alcohols selected in this study were: boiling points, BP (°C); molar volumes, MV (cm3/mol1); molar refractions, MR (cm3/mol1); molecular total surface area, TSA \( ({\mathop {{\text{\ifmmode\expandafter\hat\else\expandafter\^\fi{A}}}}\limits_{} }^{{\text{2}}} ); \) water solubility, log (1/S), where S is the solubility in mol/L; octanol/water partition coefficient, log P; narcosis activities on barnacle larvae, toxicities towards tomatoes and spiders, pC and odor threshold, log T.

Physicochemical properties

The experimental boiling points of 130 alcohols, used to develop the structure-boiling point model were taken from Ref. [15]. Molar Volumes, molar refractions, and molecular total surface area were obtained from Ref. [16]. The water solubility and the octanol/water partition coefficient, the most frequently used measure of hydrophobicity (or lipophilicity) of organic compounds, were taken from the literature [15]. The interest in these properties is because of the toxic action of these compounds is mainly dependent on their solubility in water. The compounds and the experimental data values for all these physicochemical properties are listed in Table S1 (Supplementary material).

Biological activities and toxicities

In this study, the narcosis activity of alcohols in relation to barnacle larvae and toxicity of alcohol steams towards tomatoes and spiders were selected in order to verify the applicability of IET to estimate biological activity and toxicity. The data set for the narcosis activity of 14 alcohols are recorded in terms of pC values, where pC = log (1/C) and C is the molar concentration that elicits a constant biological response [15]. The toxicity of organic compounds is one of the biological activities of particular interest to the scientific community due to its impact on environmental and human health. The toxicities of 14 alcohols towards tomatoes and spiders are taken directly from the literature [15], where the toxicity (pC) is 50% inhibitory growth impairment concentration (−log LC50). The odor-threshold values for a set of 49 alcohols, used in the present investigation, were reported [34] with high and low values in units of μmol/L. The average odor threshold value was selected as the dependent variable. The range of average thresholds spanned five orders of magnitude (0.047–980). Therefore, it was necessary to take the log of the average threshold to avoid leverage problems and artificially good model statistics [34]. The biological activity and toxicity values for each compound are shown in Table S2 (Supplementary material).

Regression analysis

The Origin and Bilin [35] program packages were used in the regression analysis. To test the quality of the regression equation, the coefficient of determination (r2), the coefficient of correlation (r) and the standard deviation (SD) were used as statistical parameters. To verify the validity and stability of the model obtained, a cross-validation test (r2cv), using the “leave-one-out” method [36, 37] was performed, and a further examination of external stability of the model was carried out by means of a procedure in which the entire data group is systematically divided into three subgroups and each subgroup is predicted by using the other two as the training group [38].

Results and discussion

The good results obtained employing the IET to predict the chromatographic retention of aliphatic alcohols can be considered as an initial step towards forthcoming QSPR/QSAR studies.

Analyzing the influence of structural features on the chromatographic behavior of the organic compounds, it is possible to verify that the retention mainly depends on the number of carbon atoms, the degree of branching, the presence of heteroatoms and the position of functional groups in the carbon chain. Our topological index was able to reflect these structural factors, and it is therefore expected that physicochemical properties and biological activities, which are related to these factors, will also show good relationships using the IET.

To illustrate the potential of this index in QSPR/QSAR studies, two series of examples of applications were analyzed. First, several representative properties such as the normal boiling points (BP), water solubility, octanol/water partitioning alcohols with a wide range of nonhydrogen atoms were selected for this case. The other series of examples was related to biological activities and toxicities of alcohols.

Correlations between the properties studied

The correlations between the properties and activities examined are shown in Table 1. As can be seen, most of properties are highly correlated with one another, with the exception of odor threshold of alcohols and toxicities of alcohols towards tomatoes and spiders. The remaining seven properties/activities: BP, log (1/S), log P, TSA, MR, MV and narcosis activity of alcohols in relation to barnacle larvae, (pC) have correlation coefficients greater than 0.95. It can be noted that the high colinearity between most of the properties/activities studied suggests that similar interactions play an important role in the properties/activities examined in this work.

Table 1 Correlations between the physicochemical properties and biological activities of aliphatic alcohols (correlation coefficients and number of compounds)

Correlations between IET and physicochemical properties/biological activities

Physicochemical properties

We studied six structure-property models that had been reported previously in the literature [15, 16]. Boiling points and retention indices are typical “surface”-dependent properties, while molar volumes and molar refractions are “molecular volume”-dependent properties.

The boiling point of a compound is predetermined by the intermolecular interactions in the liquid and by the difference in the molecular internal partition function in the gas phase and in the liquid at the boiling temperature. Therefore, it is expected to be related directly to the chemical structure of the molecule, and indeed numerous methods have been developed over the years for estimating the normal boiling point of a compound from its structure [1]. The molar refraction is related to the bulk and polarizability of a molecule and is also a useful physical parameter in the field of chemical, biological and pharmaceutical sciences. The molecular total surface area defined as the cavity dimension of the solute when placed in a water media, is a practically valuable property in the estimation of the aqueous solubility of organic compounds. Aqueous solubility and octanol/water partition coefficient are particularly important properties in medicinal chemistry, toxicology and pharmaceutical or environmental science. They are also valuable in understanding drug transport and environmental impact. The partition coefficient for octanol–water has become the preferred measurement for lipophilicity in the development of biologically active molecules, in which transport across biological membranes is often critical. Lipophilicity is a measure of the degree to which a given molecule prefers hydrophobic nonpolar environments to water [39].

The simple linear regressions obtained using the IET for the six selected properties are of good quality, as can be seen in Table 2. The validity and the stability of the QSPR models were tested in a cross-validation-like procedure, with the computation of r2cv (Table 2).

Table 2 Simple linear regressions for the properties/activities using IET and statistical parameters

To prove the external stability of the QSPR models obtained for the properties (BP, log P, and log (1/S)) we systematically selected from the entire data set of alcohols studied, three different subgroups. Each subgroup was predicted by using the other two subgroups as the training group. The results obtained are listed in Table 3 with an average training quality of r2 for the three tested properties (BP, log P, and log (1/S)). The cross-validation correlation coefficient (r2cv) in comparison with the coefficient of determination (r2) indicates the stability of the QSPR model obtained.

Table 3 Verification of statistical validity of the QSPR/QSAR models

The correlation coefficient (r) and the standard deviation (SD), in general, measure the quality of the QSPR models. Mihalic and Trinajstic [40] suggested that a good QSPR model must have r > 0.99 and SD< 5.0°C for BP. In the present study the QSPR model obtained through simple linear regression with BP, log P, and log (1/S) can explain 98.2, 98.7, and 97.4% of the variances in the experimental values (and the predicted variances are 98.1, 98.8, and 97.1%), respectively (Table 2). These results represent satisfactory QSPR models that can be used to predict these properties.

The IET index was developed to take into account the contributions of each individual atom type or group to the property considered. Thus, the good results indicate the importance of the separate contributions to the physical properties of a molecule.

In our previous paper [31] it was observed that dispersive forces play a more important role in the process of chromatographic retention of aliphatic alcohols than hydrogen-bonding interactions. The same behavior can be attributed to the physicochemical properties studied in this work, such as BP, log P, and log (1/S), where the size and the branching of the molecules are the dominant factors and the hydrogen-bonding interaction formed with the –OH group may be considered as a secondary factor. This fact can be observed by the good results obtained for these properties (Table 2). The semiempirical topological index encodes information on the hydrogen-bonding interaction, which is important to explain some processes, such as aqueous solubility, octanol/water partitioning and other biological processes.

The plots of calculated BP, log P, and log (1/S) versus experimental data for aliphatic alcohols are shown in Fig. 1.

Fig. 1
figure 1

The plots of the calculated versus experimental properties for aliphatic alcohols: a Normal boiling point (BP); b Octanol/water partition (log P); c Water solubility (log (1/S).

The QSPR model obtained through simple linear regression with MR, MV, and TSA can explain 95.1, 94.3, and 97.5% of the variances in the experimental values (and the predicted variances are 94.4, 93.6, and 97.4%), respectively, as can be observed in Table 2.

Biological activities and toxicities

In this section, we will provide other examples of the application of the new topological index, IET with the aim to further verify its applicability to predicting biological activities and toxicities.

The models and statistical parameters for predicting toxicities of 14 alcohols towards tomatoes and spiders and for narcosis activities of alcohols in relation to barnacle larvae are on listed in Table 2. In general, the values for correlation coefficients are very good, the models explain 92.7, 94.3, and 97.2% of variances in the experimental values (the predicted variances are 89.2, 89.7, and 96.1%) respectively. The plot of calculated versus observed data for these biological activities are shown in Fig. 2. It is interesting to note that the experimental values for biological activity generally have more uncertainty than those for the physicochemical properties. As is well known, the interaction of –OH groups present in small molecules with biological macromolecules is carried out through the hydrogen bonding formed between them. The strength of hydrogen bonding interactions is mainly controlled by the position and steric hindrance of the –OH group in a molecule. The good results obtained show that the IET was able to differentiate and characterize the –OH group in terms of C i values attributed to this fragment [31].

Fig. 2
figure 2

The plots of the calculated versus experimental biological activities for aliphatic alcohols: a Toxicities of alcohols on spiders (pCa); b Toxicities of alcohols on tomatoes (pCb); c Narcosis activities of alcohols on Barnacle Larvaes (pCc)

The present study further verifies the high potential of this index for application to predicting biological activities or toxicities within the structural class studied in this work.

The correlation between the IET and the log of the average odor threshold values for alcohols is shown in Table 2. The model explains more than 76.5% of the variance in the experimental values (r=0.8748, n=49, the predicted variance is 74.7%). The values for the statistical parameters obtained with this simple model are of a similar order and quality to those obtained by Anker et al. [34] using four descriptors in a multiple linear regression (r=0.929, SD=0.372 and n=49). Four compounds were considered as outliers (22, 35, 54, and 55) as can be seen in Table S2. This anomalous behavior would be explained by the conformational or steric effects observed for these compounds that our index was not able to encode.

The good predictive ability of the models obtained is indicated by the cross-validated correlation coefficients, r2cv and the stability of the QSPR/QSAR models in comparison with determination coefficient, r2 (Table 2).

For the properties/activities studied, the QSPR/QSAR models using the IET through a simple linear regression show statistical qualities similar to those obtained by recent studies employing multiple linear regressions [15, 16] These results indicate again that the IET is very suitable for modeling the properties/activities of aliphatic alcohols.

Conclusion

This study demonstrates the successful application of the semi-empirical topological index to predict selected physicochemical properties and biological activities of a large group of aliphatic alcohols.

Statistical analysis shows that the application of QSPR/QSAR models has high internal stability, as established by cross-validation (r2cv). The results indicate that the physicochemical properties of alcohols are dominated by the molecular size showing that the structural features are very important in determining these properties.

This study indicates that the theoretical predictive models for physicochemical and biological properties of alcohols based on IET, have powerful predictive capability, general applicability and the advantage of the simplicity and the use of only one parameter.

This method may be extended to QSPR/QSAR for other compounds. Further researches along these lines are in progress and will continue.