Abstract
An artificial neural network (ANN) includes nonlinear computational elements called neurons, which are linked by weighted connections. Typically, a neuron receives an input information and performs a weighted summation, which is propagated by an activation function to other neurons through the ANN. Numerous ANN paradigms have been proposed for pattern classification, clustering, function approximation, prediction, optimization, and control. In this chapter, an attempt is made to review the main applications of ANNs in ecotoxicology. Our goal was not to catalog all the models in the field but only to show the diversity of the situations in which these nonlinear tools have proved their interest for modeling the environmental fate and effects of chemicals.
Access provided by Autonomous University of Puebla. Download chapter PDF
Similar content being viewed by others
Keywords
1 Introduction
The last decade has witnessed a surge interest in the use of artificial neural networks (ANNs) for modeling complex tasks in a variety of fields including data mining, speech, image recognition, finance, business, drug design, and so on [1–6]. The raison d’être of these powerful tools is to exploit the imprecision and uncertainty of real-world problems for deriving valuable and robust models.
The concepts of ANNs are directly inspired by neurobiology. Thus, the cerebral cortex contains about 100 billion neurons, which are special cells processing information. A biological neuron receives signals from other neurons through its dendrites and transmits information generated by its soma along its axon. In the brain, each neuron is connected to 1,000–11,000 other neurons via synapses in which neurotransmitters inducing different activities are released. The human brain contains approximately \(1{0}^{14} - 1{0}^{15}\) interconnections [7–9]. Consequently, the brain can be viewed as a nonlinear and highly parallel biological device characterized by robustness and fault tolerance. It can learn, handle imprecise, fuzzy, and noisy information, and can generalize from past and/or new experiences [10, 11]. ANNs can be defined as weighted directed graphs with connected nodes called neurons that attempt to mimic some of the basic characteristics of the human brain [11]. Consequently, it is not surprising to see that now these nonlinear statistical tools are widely used in numerous technical and scientific domains to process complex information. After a brief overview of the characteristics of ANNs, this chapter will review the main applications of ANNs for modeling the toxicity and ecotoxicity of chemicals as well as their environmental fate. Their advantages and limitations will be also stressed.
2 Characteristics of ANNs
A precise definition of learning is difficult to formulate but the fundamental questions that neurobehaviorists try to answer are: How do we learn? Which is the most efficient process for learning? How much and how fast can we learn? In a neurocomputing context, a learning process can be viewed as a method for updating the architecture as well as the connection weights of an ANN to optimize its efficiency to perform a specific task. The three main learning paradigms are the following: supervised, unsupervised (or self-organized), and reinforcement. Each category includes numerous algorithms. Supervised is the most commonly employed learning paradigm to develop classification and prediction applications. The algorithm takes the difference between the observed and calculated output and uses that information to adjust the weights in the network so that next time, the prediction will be closer to the correct answer (Fig. 1) [1]. Unsupervised learning is used when we want to perform a clustering of the input data. ANNs that are trained using this learning process are called self-organizing neural networks because they receive no direction on what the desired output should be. Indeed, when presented with a series of inputs, the outputs self-organize by initially competing to recognize the input information and then cooperating to adjust their connection weights. Over time, the network evolves so that each output unit is sensitive to and will recognize inputs from a specific portion of the input space (Fig. 2) [1]. Reinforcement learning attempts to learn the input–output mapping through trial and error with a view to maximizing a performance index called the reinforcement signal (Fig. 3). Reinforcement learning is particularly suited to solve difficult temporal (time-dependent) problems [1].
ANNs are also characterized by their connection topology. The arrangement of neurons and their interconnections can have an important impact on the modeling capabilities of the ANNs. Generally, ANNs are organized into layers of neurons. Data can flow between the neurons in these layers in two different ways. In feedforward networks, no loops occur while in recurrent networks feedback connections are found.
The description of the different ANN paradigms is beyond the scope of this chapter and the interested readers are invited to consult the rich body of literature on this topic (see e.g., [12–17]). However, Table 1 summarizes the main characteristics of the different types of ANNs cited in the following sections. It is also beyond the scope of this chapter to provide information on computer tools that can be used for deriving ANN models. However, it is noteworthy that a list of freeware, shareware, and commercial ANN software can be found in Devillers and Doré [18].
3 Use of ANNs in Quantitative Structure–Property Relationship (QSPR) Modeling
Knowing the physicochemical properties of xenobiotics is a prerequisite to estimate their bioactivity, bioavailability, transport, and distribution between the different compartments of the biosphere [19–22]. Unfortunately, there are very limited or no experimental physicochemical data available for most of the chemicals susceptible to contaminate the aquatic and terrestrial ecosystems. Consequently, for the many compounds without experimental data, the only alternative to using actual measurements is to approximate values by means of estimation models, which are generically termed quantitative structure–property relationships (QSPRs). The ingredients necessary to derive a QSPR model are given in Fig. 4. Although most of the QSPR models have been derived from simple contribution methods and regression analysis [23–27], attempts have been made to use ANNs for modeling the intrinsic physicochemical properties of organic molecules as well as their environmental degradation parameters linked to transformation process. These models are discussed in the following sections.
3.1 Boiling Point
The normal boiling point (BP), corresponding to the temperature at which a substance presents a vapor pressure (VP) of 760mmHg, depends on a number of molecular properties that control the ability of a molecule to escape from the surface of a liquid into the vapor phase. These properties are molecular size, polar and hydrogen bonding forces, and entropic factors such as flexibility and orientation [27]. Different types of ANNs have been used for computing BP models. Thus, a radial basis function (RBF) network was used by Lohninger [28] for predicting the BPs of 185 ethers, peroxides, acetals, and their sulfur analogs. Molecules were described by two sets of three topological and structural descriptors yielding the design of two models, both including 20 hidden neurons and cross-validated from a leave-25%-out procedure. Both models outperformed regressions models obtained under the same conditions. Cherquaoui and coworkers [29] used the same data set of 185 molecules but their ANN was a three-layer perceptron (TLP) trained by the backpropagation algorithm, and the chemical structures were characterized by embedding frequencies. The ANN presented 20 input neurons and a bias, from 3 to 8 hidden neurons and a bias, and an output neuron. Their selected 20/5/1 (input/hidden/output) TLP after 4,000 iterations presented good statistics but undoubtedly this model presented a problem of overtraining, and it is noteworthy that the number of connections within the ANN is high. At that time, other TLP models allowing the estimation of BPs of chlorofluorocarbons with 1, 1–2, or 1–4 carbon atoms (n= 15, 62, and 276, respectively) as well as of halomethanes with up to four different halogen atoms (n= 48) were also proposed [30]. Egolf and coworkers [31] used a TLP trained by the Broyden–Fletcher–Goldfarb–Shanno (BFGS) quasi-Newton optimization method for deriving a model allowing for the prediction of the BP of industrial chemicals. A database of 298 structurally diverse chemicals was first split into a learning set (LS), a cross-validation set (CVS), and an external testing set (ETS) of 241, 27, and 30 chemicals, respectively. It is noteworthy that the CVS is used to monitor the ANN. Topological, geometrical, and electronic descriptors were generated for characterizing the molecules. The best configuration was a 8/3/1 ANN yielding RMS error values of 11.18, 9.17, and 10.69K for the LS, CVS, and ETS, respectively. The same methodology was applied to a larger database [32]. The selected 6/5/1 ANN gave RMS error values of 5.7K for the training and CVSs of 267 and 29 chemicals, respectively. The network model was validated with a 15-member external prediction set. The RMS error of prediction was 7.1K. This was substantially better than the 8.5K error obtained from a regression model derived under the same conditions and with the same descriptors. E-state indices [33] for 19 atom types were used [34] as inputs neurons of a TLP trained by the backpropagation algorithm for predicting the BPs of chemicals from a LS and ETS of 268 and 30 compounds, respectively. The best model included five neurons on the hidden layer. It produced a mean absolute error of 3.86 and 4.57K for the LS and ETS, respectively. These authors experienced the same strategy on a larger database of 372 chemicals but only including alkanes, alcohols, and (poly)chloroalkanes [35]. The interest of the TLP and a fuzzy ARTMAP ANN was tested by Espinosa et al. [36] from a limited database including 140 alkanes, 144 alkenes, and 43 alkynes. Even if this kind of study allows us to compare methods and/or descriptors, it is obvious that ANNs show their full interest when models are derived from large sets of molecules from which, it is not easy to relate the structure of the molecules to a property (or activity) under study from classical linear methods. Thus, an interesting approach based on the use of a TLP and descriptors calculated using AM1 and PM3 semiempirical quantum-chemical methods was used by Chalk and coworkers [37] for deriving models from a database of 6,629 experimental BPs. The LS and ETS included 6,000 and 629 chemicals, respectively. The best results were obtained with a 18/10/1 ANN architecture. Ten separate ANNs with random starting weights were then trained with different LSs and ETSs, chosen such that each chemical appeared only once in an ETS. The standard deviations (means of the results for 10 nets) for the LS and ETS were 16.54 and 19.02K with the AM1 approach and 18.33 and 20.27K with the PM3 approach.
3.2 Vapor Pressure
The VP determines the potential of a chemical to volatilize from its condensed or dissolved phases and to therefore exist as a gas [38]. VP strongly depends on the temperature as expressed in the classical Clausius–Clapeyron equation [24]. As previously seen, the BP of a chemical can be easily derived from its VP. Numerous methods can be used for estimating the VPs of chemicals, and among them, some are based on the use of ANNs. Thus, different regression and ANN models were tested by Liang and Gallagher [39] from a set of 479 chemicals described by various descriptors encoding the structure and physicochemical properties of the molecules. Standard errors of 0.534 and 0.522 (log units, Torr) were obtained for the regression models with seven independent variables and a 7/5/1 ANN. However, the interest of the results is very limited because of total lack of information on the conditions in which the models were derived. More reliable models were designed by McClelland and Jurs [40]. TLP models were developed to relate the structural characteristics of 420 diverse organic compounds to their VP at 25°C expressed as log (VP in Pascals). The log (VP) values ranged over eight orders of magnitude from−1.34 to 6.68 log units. The database was split into a learning set (LS), a CVS, and an ETS of 290, 65, and 65 chemicals, respectively. A 8/3/1 TLP trained by a BFGS optimization algorithm and only including topological descriptors yielded RMS errors of 0.26, 0.29, and 0.37 for the LS, CVS, and ETS, respectively (log units, Pa). An alternative 10/4/1 TLP containing a lager selection of descriptor types (e.g., quantum mechanical descriptors) resulted in improved performance with RMS errors of 0.19, 0.24, and 0.33 for the LS, CVS, and ETS, respectively [40]. In the same way, Beck and coworkers [41] derived a 10/8/1 TLP trained by the backpropagation algorithm for estimating the log VP at 25°C. Descriptors derived from quantum mechanical calculations were used for describing the 551 chemicals constituting the learning and testing sets. The leave-one-out (LOO) cross validation gave a standard deviation of 0.37 log units (Torr) and a maximum absolute error of 1.65. A temperature-dependent model based on a TLP trained by the backpropagation algorithm and descriptors calculated using AM1 semiemperical MO-theory was proposed by Chalk et al. [42]. A data set of 8,542 measurements at various temperatures for a total of 2,349 molecules was divided into a training set of 7,681 measurements and an external validation set of 861 measurements in such a manner that the validation set spans the full range of VPs. The standard deviation of the error (log units, Torr) for the learning, LOO cross-validation, and validation sets obtained with the selected 27/15/1 TLP was equal to 0.32, 0.46, and 0.33, respectively. Yaffe and Cohen [43] also computed a temperature-dependent QSPR model for VP of aliphatic, aromatic, and polycyclic aromatic hydrocarbons, ranging from 4 to 12 carbon atoms using a TLP trained by the backpropagation algorithm with connectivity indices [44], molecular weight, and temperature as input parameters in the ANN. The database of 274 molecules included 7,613 vapor pressure–temperature data. It was split into a learning set (LS), a CVS, and an ETS of 5,330, 754, and 1,529 chemicals, respectively. The best model was a 7/29/1 TLP yielding average absolute VP errors of 11.6% (0.051 log units or 34kPa), 8.2% (0.036 log units or 23.2kPa), 9.2% (0.039 log units or 26.8kPa), and 10.7% (0.046 log P units or 31.1kPa) for the training, test, validation, and overall sets, respectively.
3.3 Water Solubility
Of the various parameters affecting the fate and transport of organic chemicals in the ecosystems, water solubility is one of the most important. Highly soluble chemicals are easily and rapidly distributed in the environment. These chemicals tend to have relatively low adsorption coefficients in soils and sediments and also negligible bioconcentration factors in living species. They tend to be more readily biodegradable by microorganisms. The water solubility of chemicals also influences their photolysis, hydrolysis, oxidation, and volatilization [23]. A quite large number of estimation methods have been proposed for modeling the water solubility of organic chemicals, and some of them are based on the use of ANNs. Thus, a database of water solubility values for 157 substituted aromatic hydrocarbons described from structural fragments was randomly split into a LS, a CVS, and an ETS of 95, 31, and 31 chemicals, respectively [45]. A TLP trained by the backpropagation algorithm was used as statistical engine. The best model was a 9/11/1 ANN (learning rate 0.35, 276 cycles) yielding a mean square error (MSE) of 0.21 from 40 randomly selected test data sets. For comparison purpose, the MSE obtained with a regression analysis was 0.25. A rather similar approach was used by Sutter and Jurs [46] from solubility data for 140 organic compounds presenting diverse structures, which were divided into a LS, a CVS, and an ETS of 116, 11, and 13 chemicals, respectively. Chemicals were described by means of 144 descriptors encoding topological and/or physicochemical properties. This pool of descriptors was reduced to nine that were used for deriving a regression model from a LS of 127 (116+11) chemicals. An RMS error of 0.321 log units was found. However, four chemicals were detected as outlier, and their removal from the regression model allowed to obtain an RMS error of 0.277 log units. A 9/3/1 TLP including the nine descriptors as input neurons was then derived. It gave RMS errors of 0.217, 0.282, and 0.222 log units for the LS (n= 112), CVS (n= 11), and ETS (n= 13), respectively. It is noteworthy that another 9/3/1 TLP model was computed by Sutter and Jurs [46] after exclusion of the polychlorinated biphenyls (PCBs). In that case, RMS error values of 0.145, 0.151, and 0.166 log units were obtained for the LS (n= 94), CVS (n= 13), and ETS (n= 13), respectively. Other TLP models for predicting the aqueous solubility of chemicals were proposed by Mitchell and Jurs [47], McElroy and Jurs [48], and Huuskonen et al. [49] from databases of limited sizes. Yaffe and coworkers [50] used a heterogeneous set of 515 organic compounds with their solubility data for comparing the performances of a TLP and fuzzy ARTMAP ANNs. The first ANN model derived from a large diverse set of aqueous solubility data was proposed by Huuskonen [51]. A database of 1,297 chemicals with their aqueous solubility values was split into a TS and an ETS of 884 and 413 chemicals, respectively. Another testing set (ETS+) of 21 chemicals was also considered. All the chemicals were encoded from the 30 following topological indices: 24 atom-type electrotopological state indices [33], path 1 simple and valence connectivity indices [44], flexibility index, the number of H-bond acceptors, and indicators of aromaticity and for aliphatic hydrocarbons. A 30/12/1 TLP trained by the backpropagation algorithm yielded standard deviation values of 0.47, 0.60, and 0.63 for the LS, ETS, and ETS+, respectively. A regression analysis performed under the same conditions gave standard deviation values of 0.67, 0.71, and 0.88 for the LS, ETS, and ETS+, respectively [51]. Liu and So [52] tried to derive an ANN with fewer connections but presenting similar performances by using a LS and an ETS of 1,033 and 258 chemicals, respectively. A 7/2/1 TLP with the 1-octanol/walter partition coefficient (log P), topological polar surface area (TPSA), molecular weight, and four topological indices as input neurons gave standard deviation values of 0.70 and 0.71 in log units for the LS and ETS, respectively. An interesting hybrid model was proposed by Hansen and coworkers [53] developed for the prediction of pH-dependent aqueous solubility of chemicals. It used a TLP ANN trained from 4,548 solubility values and a commercial software tool for estimating the acid/base dissociation coefficients.
It is important to note that the aqueous solubility estimations obtained from QSPR models have to be used with caution. Thus, for example, QSPR models generally calculate solubility in pure water at 25°C while it is well-known that the varying temperatures found in the environment change the solubility of chemicals. The degree of salinity of the aquatic ecosystems also influences the solubility of the chemicals in these media.
3.4 Henry’s Law Constant
The Henry’s law constant (Hc) of a chemical is defined as the ratio of its concentration in air to its concentration in water when these two phases are in contact and equilibrium distribution of the chemical is achieved [25]. Hc is of first importance for assessing the environmental distribution of chemicals. The different methods allowing to calculate this parameter have been reviewed by Dearden and Schüürmann [54]. Among them, two studies deal with the use of ANNs for modeling the Hc of chemicals at 25°C. A database of 357 organic chemicals with their log H values ranged from−7.08 to 2.32 was used by English and Carroll [55] for deriving their ANN models. Chemicals were described by 29 descriptors including topological indices, physicochemical properties, and atomic and group contributions. The best results were obtained in 3,000 cycles with a 10/3/1 TLP. The standard errors for the LS (n=261), CVS (n=42), and ETS (n=54) were equal to 0.202, 0.157, and 0.237 log units, respectively. Comparatively, the standard errors obtained with a regression analysis, performed according to the same conditions, were 0.262 and 0.285 log units for the LS (n=303) and ETS (n=54), respectively.
Experimental Hc at 25°C for a diverse set of 495 chemicals were collected by Yaffe et al. [56]. The log H values ranged from−6.72 to 2.87. Six physicochemical descriptors (heat of formation, dipole moments, ionization potential, average polarizability) and the second-order valence molecular connectivity index were used as input parameters for a fuzzy ARTMAP ANN and a TLP ANN trained by the backpropagation algorithm. The average absolute error values obtained with the fuzzy ARTMAP ANN were 0.01 and 0.13 for the LS (n=421) and ETS (n=74). The selected 7/17/1 TLP yielded average absolute error values of 0.29, 0.28, and 0.27 for the LS (n=331), validation set (n=421) and ETS (n=74).
3.5 Octanol/Water Partition Coefficient
In 1872, Berthelot [57] undertook the study of partitioning as a purely physicochemical phenomenon. He was the first to collect the evidence proving that the ratio of the concentrations of small solutes when distributed between water and an immiscible solvent (e.g., ether) remained constant even when the solvent ratios varied widely [58]. In 1891, Nernst [59] put this type of equilibrium on a firmer thermodynamic basis. About a decade later, Meyer [60] and Overton [61], who showed that the narcotic action of simple chemicals was reflected rather closely by their oil–water partition coefficients, initiated the use of this physicochemical property for deriving structure–activity relationships. In the first part of the twentieth century, many different organic solvent/water systems were tested to derive structure–activity relationships. However, in 1962–1964, the 1-octanol was adopted as solvent of choice after the pioneering works of Hansch and coworkers in quantitative structure–activity relationships (QSARs) [62, 63] demonstrating that the 1-octanol/water partition coefficient (Kow) could provide a rationalization for the interaction of organic chemicals with living organisms or for biological processes occurring in organisms [64]. Kow is simply defined as the ratio of a chemical’s concentration in the octanol phase to its concentration in the aqueous phase of a two-phase octanol/water system. Values of Kow are thus unitless and are expressed in a logarithmic form (i.e., log Kow or log P) when used in pharmaceutical and environmental modeling. There are numerous methods available for the experimental measurement of log P as well as for its estimation from contribution methods or from linear and nonlinear QSPRs [23, 24, 58, 64, 65]. Different ANN models for log P have been derived from a limited number of chemicals (see e.g., [66–68]). A database of 1,870 log P values for structurally diverse chemicals was used by Huuskonen and coworkers [69] for deriving a log P model based on atom-type electrotopological state indices [33] and a TLP. It was split into a LS and an ETS of 1,754 and 116 molecules, respectively. The best configuration included the molecular weight and 38 electrotopological state indices as input neurons, five hidden neurons, and bias neurons. Averaged results of 200 ANN simulations were used to calculate the final outputs. With this strategy, RMS (LOO) values of 0.46 and 0.41 were obtained for the LS and ETS, respectively. This model was further refined from an extended LS and is now called ALOGPS [70, 71]. A log P model was designed by Devillers and coworkers [72–74] from a TLP trained by the backpropagation algorithm using 7,200 log P values for the learning process. Experimental log P values were retrieved from original publications or unpublished results. The log P values of the LS ranged between−3.7 and 9.95 with a mean of 2.13 and a standard deviation of 1.65. Molecules were described by means of autocorrelation descriptors [75, 76] encoding lipophilicity (H) defined according to Rekker and Mannhold [65], molecular refractivity (MR), and H-bonding donor (HBD) and H-bonding acceptor (HBA) abilities. Prior to calculations, data were scaled with a classical min/max equation. The optimal architecture and set of parameters for the neural network model were determined by means of a trial and error procedure. The different training exercises were monitored with a validation set of 200 molecules presenting a high structural diversity but not deviating too much from the chemical structures included in the training set. This procedure showed that a neural network model with 35 input neurons (i.e., H0 to H14, MR0 to MR14, HBA0 to HBA3, and HBD0) was necessary to correctly describe the molecules and model the 7,200 experimental log P values. The hidden layer consisted of 32 neurons. It was found that a learning rate of 0.5 and a momentum term of 0.9 always gave good neural network generalization within ca. 5,500 cycles. A composite network constituted of four configurations was selected as final model \((\mathrm{RMS} = 0.37, r = 0.97)\) because it allowed to obtain the best simulation results on an ETS of 519 chemicals \((\mathrm{RMS} = 0.39, r = 0.98)\). It has been shown that this model competed favorably with other log P models [77, 78] and was particularly suited for estimating the log P values of pesticides [79]. It is noteworthy that a commercial version of this model called AUTOLOGPTM is available [80, 81].
3.6 Degradation Parameters
Biodegradation is an important mechanism for eliminating xenobiotics by biotransforming them into simple organic and inorganic products. Two types of biodegradation can be distinguished. The primary biodegradation denotes a simple transformation not leading to a complete mineralization. The biodegradation products are specifically measured from chromatographic methods, and the results are expressed by means of kinetic parameters such as biodegradation rate constant (k) and half-life (T 1∕2). The ultimate (or total) biodegradation totally converts chemicals into simple molecules such as CO2 and H2O. Biodegradation tests are time consuming, expensive, and their results are difficult to interpret because they depend on numerous parameters linked to the experimental conditions such as the nature and concentration of the inoculum, cultivation, and adaptation of the microbial culture, concentration of the test substance [82–84]. Because ANNs are particularly suited for modeling noisy data, they have been successfully used to model biodegradation processes [85]. Thus, for example, 47 molecules presenting a high degree of heterogeneity were described in a qualitative way for their biodegradability (i.e., 0=weak,1=high) from a survey made by 22 experts in microbial degradation [86]. They were encoded from 11 Boolean descriptors representing structural features associated with persistent or degradable chemicals. These descriptors are listed in Table 2. A TLP trained by the backpropagation algorithm was used as statistical engine to find a relationship between the structure of the molecules and their biodegradation potential. The learning phase yielded 100% of good classification (i.e., 47/47) with a 11/4/1 ANN in 500 cycles. The predictive power of this model was estimated from two ETSs. With the former ETS, 78% of good classifications (i.e., 18/23) were obtained while with the latter, 94% (i.e., 16/17) of the chemicals were correctly classified. The use of Boolean descriptors as input neurons in a TLP especially for modeling a complex property can induce problems of overfitting. To avoid this drawback without losing the interest of fragment descriptors, the usefulness of correspondence factor analysis [87] for reducing the dimensionality of a data matrix was tested. Thus, a CFA was used to scale the 47 ×11 Boolean matrix and the CFA factors were directly introduced as inputs in the ANN. Same results were obtained also in 500 cycles with only the first seven factors (87.9% of the total inertia). It is noteworthy that an intercommunicating hybrid system including this ANN model and a genetic algorithm [88] was then constructed for designing molecules with specific biodegradability characteristics [89].
TLPs with structural descriptors [90, 91] or autocorrelation descriptors [92] were used for modeling the biodegradability of other sets of aliphatic and aromatic chemicals. The field half-lives of 110 pesticides were modeled using a TLP trained by the backpropagation algorithm [93]. Because periodicities in agricultural calendars are measured in days, weeks, and months (i.e., seasons), the field half-lives (T 1∕2) of pesticides were divided into the three following classes: Class 1 (encoded 100 in the ANN output) contained pesticides with T 1∕2≤ 10 days, class 2 (encoded 010) included pesticides with 10 days<T 1∕2≤30 days, and class 3 (encoded 001) included pesticides with 30 days<T 1∕2≤90 days. Molecules were described by means of the frequency of 17 structural fragments. Different scaling transformations were tested but the best results were obtained with a CFA, which also allowed a reduction of the dimensionality of the descriptor matrix. The optimal results were obtained by using the first 12 factors (95.8% of the total inertia) as input neurons and seven neurons for the hidden layer. With this configuration, 95.5% of correct classifications were obtained with the LS. The performances of the selected ANN model were tested from an ETS of 13 pesticides representing the three classes of field half-lives. The testing phase with CFA gave 84.6% of correction predictions. A discriminant factor analysis at three classes was performed for comparison purposes. In that case, 60% and 53.8% of good classifications were obtained for the LS and ETS, respectively [93].
4 Use of ANNs in Quantitative Structure–Activity Relationship (QSAR) Modeling
The knowledge about systematic relationships between the structure of chemicals and their biological activity dates back to the prime infancy of the modern pharmacology and toxicology. Thus, for example, Cros [94] stressed, in the last page of his thesis published in 1863, an empirical relationship between the number of carbon and hydrogen atoms in a series of alcohols and their solubility in water and toxicity. Until about the middle of the twentieth century, most of these structure–activity relationships were only qualitative. The dramatic change resulted from the systematic use, in the early 1960s, of linear regression analysis for correlating biological activities of congeneric series of molecules with their physicochemical properties or some of their structural features encoded by means of Boolean descriptors (i.e., 0/1). These contributions started the development of two QSAR methodologies later termed Hansch analysis [62, 63] and Free-Wilson analysis [95], respectively.
Nowadays, regression analysis remains the most widely used statistical tool for deriving QSARs, even if most of the basic statistical assumptions for its correct use are often not satisfied with numerous data sets [96]. In addition, the choice of regression analysis can also be annoying because a postulate is made that only linear relationships exist between the variables involved in the modeling process, while generally it is not true. Since about one decade, ANNs have become the focus of much attention in QSAR to find complex relationships between the structure of molecules and their toxicity. These models have been derived on various organisms such as the marine luminescent bacterium Vibrio fischeri (formerly known as Photobacterium phosphoreum) [97, 98], the freshwater protozoan Tetrahymena pyriformis [99–110], the waterflea Daphnia magna [111], the freshwater amphipod Gammarus fasciatus [112], the midge Chironomus riparius [113], the fathead minnow Pimephales promelas [114–122], the rainbow trout Oncorhynchus mykiss [123], the bluegill Lepomis macrochirus [124], and the honey bee Apis mellifera [125, 126]. All these models were recently analyzed [127]. Consequently, only the main characteristics of some of them are presented in Table 3.
It is interesting to note that due to their high flexibility and their ability to find complex relationships between variables, ANNs can be used to derive QSARs from sets of variables encoding, as usual, the structure and physicochemical properties of the molecules but also the experimental conditions in which the different tests are performed such as the time of exposure [98] or the temperature, pH, hardness of the medium, and size of the organisms [112, 123, 124]. In the same way, due to their pure nonlinear nature, ANNs can be used in synergy with another statistical tool, especially regression analysis. Devillers [122] showed that this kind of modeling approach was particularly interesting in the common situation in which the toxicity of molecules mainly depended on their log P. In that case, in a first step, a classical regression equation with log P is derived. The residuals obtained with this simple linear equation are then modeled from a TLP including different molecular descriptors as input neurons. Finally, results produced by the linear and nonlinear QSAR models are both considered for calculating the toxicity values, which are then compared with the initial toxicity data.
5 Use of ANNs for Modeling Environmental Contaminations
5.1 Air Pollution
There is a large body of evidence suggesting that exposure to air pollution, even at the levels commonly achieved nowadays in the industrial countries, leads to adverse health effects. In particular, exposure to pollutants such as particulate matter and ozone has been found to be associated with increases in hospital admissions for cardiovascular and respiratory diseases and to the incidence of cancers [128]. Air pollution not only affects the quality of the air we breathe, but it also directly and indirectly impacts the biotopes and the biocenoses constituting the aquatic and terrestrial ecosystems. For the evaluation of air pollution events in a particular geographical area, it is crucial to have a powerful mapping technique allowing to perform typologies, compare sampling sites, and so on. The Kohonen self-organizing map (KSOM) [16] is particularly suited to perform these tasks. Thus, for example, Ferré-Huguet and coworkers [129] used a KSOM to assess the environmental impact and human health risks of polychlorinated dibenzo-p-dioxins and dibenzofurans in the vicinity of a new hazardous waste incinerator in Spain 4 years after regular operation of the facility. More specifically, KSOM, which was a 48 (8 ×6) rectangular grid, was applied to soil and herbage samples to establish pattern similarities among the samples as well as to identify hot spots near the plant. Lee and coworkers [130] used a KSOM of 150 (15 ×10) output neurons to examine the influence of urbanization on the assembly patterns of 52 breeding birds in 367 sites.
Undoubtedly KSOM offers an interesting tool for data compression of p multivariate samples defined in an n-dimensional space into v clusters (loaded neurons). This data reduction to a few clusters provides an optimal data structure display. However, in KSOM, the problem is that information about the correct distance between the neurons disappears during the projection onto the 1, 2, or 3D array of nodes. To overcome this problem, a minimum spanning tree (MST) [131] can be calculated between the loaded neurons of a trained KSOM to visualize the shortest distances between them. The hybridization of the KSOM and MST algorithms constitutes the basis of the 3MAP algorithm designed and used by Wienke for locating fine airborne particle sources [132, 133–135]. It is noteworthy that because there remains information not represented, about the correct distances between all the loaded neurons, a nonlinear mapping (NLM) [136] performed on these loaded neurons can be used to visualize all the distances separating them. The hybridization of the KSOM, MST, and NLM algorithms constitutes the basis of the N2M algorithm [137, 138] (Fig. 5). A rather similar hybridization approach in combination with a multilayer perceptron (MLP) was used by Kolehmainen and coworkers [139] to forecast urban air quality. Hourly airborne pollutant and meteorological averages collected during the years 1995–1997 were analyzed to identify air quality episodes having typical and the most probable combinations of air pollutants and meteorological variables. This modeling was performed from KSOM, NLM, and fuzzy distance metrics. Several overlapping MLPs were then applied to the clustered data, each representing a pollution episode.
KSOM is not the unique ANN clustering technique that was used to visualize air pollution events. Thus, Owega and coworkers [140] used cluster analysis and an adaptive resonance theory (ART-2a) [141] ANN to classify back trajectories of air masses arriving in Toronto (Canada) into distinct transport patterns. Spencer and coworkers [142] also used an ART-2a ANN to analyze ambient aerosol particles in Riverside (California).
Numerous MLPs have been used alone or in combination or in competition with other statistical approaches for estimating various atmospheric pollution events. Some examples are given in Table 4 [143–150].
5.2 Aquatic Contaminations
The worldwide environmental problem of eutrophication in lenthic ecosystems is caused by an unbalanced increase in the nutrient inflow due to the human activities. Indeed, when the nutrient concentration increases under high-temperature conditions in a lake during the summertime, certain microalgae can overgrow yielding the production of blooms, which can cause water discolorations, mortality in fish and invertebrates as well as in humans because of the production of harmful toxins [166]. It is obvious that these deleterious effects could be prevented or at least minimized if the algal blooms could be predicted in an early stage. Different ANNs have been used to reach this goal. Thus, Recknagel and coworkers [167] used a TLP trained by the backpropagation algorithm for modeling algal bloom in three lakes and a river. The lakes, located in Japan and Finland, were of different characteristics including a variety of nutrient levels, light and temperature conditions, depth and water retention time. The river was located in Australia. Four different ANNs were computed. Different parameters such as concentration in nitrate, water temperature, concentration in chlorophyll a, and concentration in dissolved oxygen were used as input neurons. The dominating algal species (in number of cells/mL or mg/L for the Finnish lake) were considered as output neurons. One or two hidden layers having a maximum of 20 neurons per layer were used to distribute the information within the networks. The ANNs were trained for 500,000 cycles with measured input and output data from 6 to 10 years. For the validation of model predictions, data of 2 independent years were used for each ANN model. More realistic and optimized models were proposed by Lee and coworkers [168] for predicting the algal bloom dynamics for two bays in the eutrophic coastal waters of Hong Kong. A TLP was also used as statistical engine. Biweekly water quality data were tested as input neurons. Concentration in chlorophyll-a or cell concentration of Skeletonema were used as output neurons in each ANN model. Data collected in different years were used to train (3,000 cycles) and test the two ANN models. Different combinations of parameters were tested as inputs but in both cases, the best results were obtained by only using the time-lagged chlorophyll-a or log (Skeletonema (cells/l) as input neurons. This work clearly suggested that the algal concentration in the eutrophic subtropical coastal waters was mainly dependent on the antecedent algal concentrations in the previous 1–2 weeks.
Oh and coworkers [169] used a KSOM for patterning algal communities and then a TLP for identifying important factors causing algal blooms in Daechung reservoir (Korea). Thirty-nine samples were used for KSOM analysis. The patterns of the sample communities were investigated on the basis of community abundance data (Cyanophyceae, Chlorophyceae, Bacillariophyceae, and others) in percentages for 1999 and 2003. The best arrangement of the output layer of 24 (6 ×4) neurons was a hexagonal lattice. Interestingly, a hierarchical cluster analysis, based on Ward algorithm and using the Euclidean distance, was performed on the KSOM units. Analysis of the results showed that the clustering was based on the phytoplankton communities and sampling time. A TLP was used to predict the chlorophyll-a concentration and abundance of Cyanophyceae from environmental factors including the total nitrogen, total dissolved nitrogen, total particulate nitrogen, total phosphorus, total dissolved phosphorus, total particulate phosphorus, temperature, DO, pH, conductivity, turbidity, Secchi depth, precipitation, and daily irradiance. Data were collected from 54 samples over 3 years. Gradient descent optimization was used for error reduction. The best models for chlorophyll-a concentration and abundance of Cyanophyceae were 14/3/1 and 14/6/1 TLPs. The predictive performances of the models were not estimated from an ETS. Conversely, a sensitivity analysis was performed to determine the most influential variables. Results showed that they were different for the two TLP ANNs.
Lenthic and lotic ecosystems are also contaminated by numerous xenobiotics resulting from agricultural and industrial activities. Thus, pesticides are used to control weeds, insects, and other organisms in a wide variety of agricultural and nonagricultural settings yielding their release into the environment including the aquatic compartment. Among the collection of models available for predicting the environmental fate and effects of pesticides, some of them are based on nonlinear methods, especially the ANNs. Thus, for example, Kim and coworkers [170] coupled wavelet analysis and a TLP trained by the backpropagation algorithm for modeling the movement behavior of Chironomus samoensis larvae in response to treatments of carbofuran at 0.1mg/L in seminatural conditions. Various ANN paradigms have been also used for modeling the contamination of groundwater by pesticides and other anthropic pollutants [171–176].
Samecka-Cymerman and coworkers [177] used a KSOM to perform a typology of three species of aquatic bryophytes (Fontinalis antipyretica, Platyhypnidium riparioides, Scapania undulata) according to their concentration in Al, Be, Ca, Cd, Co, Cr, Cu, Fe, K, Mg, Mn, Ni, Pb, and Zn. The sampling sites were divided into three groups depending on the type of rock basement of the stream. Sampling sites in group one consisted of granites and gneisses (n=21), those in group two of sandstones (n=5), and those in group three of limestones and dolomites (n=26). The output layer of 5 ×5 neurons visualized by hexagonal cells showed that the bryophytes were clustered according to their sampling origin. There was no difference between the bryophytes from the three types of rock in terms of concentrations in Be, Fe, K, Co, and Cu. Conversely, bryophytes growing in streams flowing through granites/gneisses contained significantly higher concentrations of Cd and Pb, while bryophytes from streams flowing through sandstones contained significantly higher concentrations of Cr. Bryophytes from group three were characterized by high concentrations in Ca and Mg. These results were confirmed from a PCA.
Last, it is noteworthy that ANNs have been used in the areas of wastewater treatment and analyses [178–180].
5.3 Soil and Sediment Contaminations
Soils and sediments can be contaminated by various pollutants released into the environment from a number of anthropogenic sources. ANNs have shown their interest for characterizing and/or quantifying these contaminations. Thus, for example, in Winter 2002, 24 soil and 12 wild chard (Beta vulgaris) samples were collected by Nadal et al. [181] in Tarragona County (Catalonia, Spain). Soil sampling points were chosen as follows: 15 in the industrial complex (8 in the vicinity of chemical industries and 7 near petroleum refineries), 5 in Tarragona downtown and its residential area, and 4 in presumably unpolluted zones. The number of wild chard samples collected from industrial, residential, and unpolluted areas were 6, 3, and 3, respectively. The samples were analyzed for their concentrations in As, Cd, Cr, Hg, Mn, Pb, and V. In chard samples, significant differences between areas were only found for vanadium (V). Regarding the soil samples, the differences and concentrations between the three zones were higher. A KSOM was successfully used to perform their typology according their differences in metal concentrations. The same type of methodology based on KSOM was applied by Arias and coworkers [182] for evaluating the pollution level in Cu, Mn, Ni, Cr, Pb, and Zn of the sediments dredged from the dry dock of a former shipyard in the Bilbao estuary (Bizkaia, Spain). KSOM was compared with different cluster analysis algorithms to classify 407 samples of various origins contaminated by polychlorinated dibenzodioxins and polychlorinated dibenzofurans [183].
Other ANN paradigms were used to model soil and sediment contaminations. Thus, for example, Kanevski [184] tested the usefulness of general regression ANNs, based on kernel statistical estimators for predicting the soil contamination in Cs137 in Western part of Briansk region following Chernobyl accident.
6 Conclusion
On the basis of a computing model similar to the underlying structure of a mammalian brain, ANNs share the brain’s ability to learn or adapt in responses to external inputs. When exposed to a stream of training data, they can uncover previously unknown relationships and learn complex mappings in the data. Under these conditions, ANNs provide interesting alternatives to well-established linear methods commonly used in ecotoxicology modeling. In this chapter, different ANN models computed for predicting the environmental fate and effects of chemicals are presented. Our goal was not to catalog all the models in the field but only to show the diversity of the situations in which these nonlinear tools have proved their interest. Their correct use requires to have some practical experience for architecture and parameter setting as well as to interpret the modeling results. They also need to respect some rules dealing with the size of the data sets, the constitution of learning and testing sets, and so on. Despite these limitations, it is obvious that their use in ecotoxicology modeling will continue to grow, especially in combination with other linear and nonlinear statistical methods to create powerful hybrid systems.
References
Bigus JP (1996) Data mining with neural networks. Solving business problems – From application development to decision support. McGraw-Hill, New York
Bengio Y (1996) Neural networks for speech and sequence recognition. International Thomson Computer Press, London
Zupan J, Gasteiger J (1993) Neural networks for chemists. An introduction. VCH, Weinheim
Devillers J (1996) Neural networks in QSAR and drug design. Academic Press, London
Guegan JF, Lek S (2000) Artificial neuronal networks: Application to ecology and evolution. Springer, New York
Christopoulos C, Georgiopoulos M (2001) Applications of neural networks in electromagnetics. Artech House Publishers, London
Lisboa PJG, Taylor MJ (1993) Techniques and applications of neural networks. Ellis Horwood, London
Wade N (1998) The science Times book of the brain. Lyons Press, New York
Müller B, Reinhardt J, Strickland MT (1995) Neural networks. An introduction. Springer, Berlin
Pal SK, Srimani PK (1996) Neurocomputing. Motivation, models, and hybridization. Computer March: 24–28
Jain AK, Mao J, Mohiuddin KM (1996) Artificial neural networks: A tutorial. Computer March: 31–44
Wasserman PD (1989) Neural computing: Theory and practice. Van Nostrand Reinhold, New York
Pao YH (1989) Adaptive pattern recognition and neural networks. Addison-Wesley Publishing Company, Reading
Eberhart RC, Dobbins RW (1990) Neural network PC tools. A practical guide. Academic Press, San Diego
Wasserman PD (1993) Advanced methods in neural computing. Van Nostrand Reinhold, New York
Kohonen T (1995) Self-organizing maps. Springer, Berlin
Fiesler E, Beale R (1997) Handbook of neural computation. IOP Publishing Ltd, Bristol
Devillers J, Doré JC (2002) e-statistics for deriving QSAR models. SAR QSAR Environ Res 13: 409–416
Domine D, Devillers J, Chastrette M, Karcher W (1992) Multivariate structure-property relationships (MSPR) of pesticides. Pestic Sci 35: 73–82
Samiullah Y. (1990) Prediction of the environmental fate of chemicals. Elsevier, London
Mackay D, Di Guardo A, Hickie B, Webster E (1997) Environmental modelling: Progress and prospects. SAR QSAR Environ Res 6: 1–17
Hemond HF, Fechner EJ (1994) Chemical fate and transport in the environment. Academic Press, San Diego
Lyman WJ, Reehl WF, Rosenblatt DH (1990) Handbook of chemical property estimation methods. American Chemical Society, Washington, DC
Reinhard M, Drefahl A (1999) Handbook for estimating physicochemical properties of organic compounds. Wiley, New York
Boethling RS, Howard PH, Meylan WM (2004) Finding and estimating property data for environmental assessment. Environ Toxicol Chem 23: 2290–2308
Cronin MTD, Livingstone DJ (2004) Calculation of physicochemical properties. In: Cronin MTD, Livingstone DJ (eds) Predicting chemical toxicity and fate, CRC Press, Boca Raton
Dearden JC (2003) Quantitative structure-property relationships for prediction of boiling point, vapor pressure, and melting point. Environ Toxicol Chem 22: 1696–1709
Lohninger H (1993) Evaluation of neural networks based on radial basis functions and their application to the prediction of boiling points from structural parameters. J Chem Inf Comput Sci 33: 736–744
Cherqaoui D, Villemin D, Mesbah A, Cense JM, Kvasnicka V (1994) Use of a neural network to determine the normal boiling points of acyclic ethers, peroxides, acetals and their sulfur analogues. J Chem Soc Faraday Trans 90: 2015–2019
Balaban AT, Basak SC, Colburn T, Grunwald GD (1994) Correlation between structure and normal boiling points of haloalkanes \({\mathrm{C}}_{1}\mbox{ \textendash }{\mathrm{C}}_{4}\) using neural networks. J Chem Inf Comput Sci 34: 1118–1121
Egolf LM, Wessel MD, Jurs PC (1994) Prediction of boiling points and critical temperatures of industrially important organic compounds from molecular structure. J Chem Inf Comput Sci 34, 947–956
Wessel MD, Jurs PC (1995) Prediction of normal boiling points of hydrocarbons from molecular structure. J Chem Inf Comput Sci 35: 68–76
Kier LB, Hall LH (1999) The electrotopological state; Structure modeling for QSAR and database analysis. In: Devillers J, Balaban AT (eds) Topological indices and related descriptors in QSAR and QSPR, Gordon and Breach Publishers, Amsterdam
Hall LH, Story CT (1996) Boiling point and critical temperature of a heterogeneous data set: QSAR with atom type electrotopological state indices using artificial neural networks. J Chem Inf Comput Sci 36: 1004–1014
Hall LH, Story CT (1997) Boiling point of a set of alkanes, alcohols and chloroalkanes: QSAR with atom type electrotopological state indices using artificial neural networks. SAR QSAR Environ Res 6: 139–161
Espinosa G, Yaffe D, Cohen Y, Arenas A, Giralt F (2000) Neural network based quantitative structural property relations (QSPRs) for predicting boiling points of aliphatic hydrocarbons. J Chem Inf Comput Sci 40: 859–879
Chalk AJ, Beck B, Clark T (2001) A quantum mechanical/neural net model for boiling points with error estimation. J Chem Inf Comput Sci 41: 457–462.
Anonymous (1998) QSARs in the assessment of the environmental fate and effects of chemicals. Technical report no. 74, ECETOC, Brussels
Liang C, Gallagher DA (1998) QSPR prediction of vapor pressure from solely theoretically-derived descriptors. J Chem Inf Comput Sci 38: 321–324
McClelland HE, Jurs PC (2000) Quantitative structure-property relationships for the prediction of vapor pressures of organic compounds from molecular structures. J Chem Inf Comput Sci 40: 967–975
Beck B, Breindl A, Clark T (2000) QM/NN QSPR models with error estimation: Vapor pressure and log P. J Chem Inf Comput Sci 40: 1046–1051
Chalk AJ, Beck B, Clark T (2001) A temperature-dependent quantum mechanical/neural net model for vapor pressure. J Chem Inf Comput Sci 41: 1053–1059
Yaffe D, Cohen Y (2001) Neural network based temperature-dependent quantitative structure property relations (QSPRs) for predicting vapor pressure of hydrocarbons. J Chem Inf Comput Sci 41: 463–477
Hall LH, Kier LB (1999) Molecular connectivity Chi indices for database analysis and structure-property modeling. In: Devillers J, Balaban AT (eds) Topological indices and related descriptors in QSAR and QSPR, Gordon and Breach Publishers, Amsterdam
Chow H, Chen H, Ng T, Myrdal P, Yalkowsky SH (1995) Using backpropagation networks for the estimation of aqueous activity coefficients of aromatic organic compounds. J Chem Inf Comput Sci 35: 723–728
Sutter JM, Jurs PC (1996) Prediction of aqueous solubility for a diverse set of heteroatom-containing organic compounds using a quantitative structure-property relationship. J Chem Inf Comput Sci 36: 100–107
Mitchell BE, Jurs PC (1998) Prediction of aqueous solubility of organic compounds from molecular structure. J Chem Inf Comput Sci 38: 489–496
McElroy NR, Jurs PC (2001) Prediction of aqueous solubility of heteroatom-containing organic compounds from molecular structure. J Chem Inf Comput Sci 41: 1237–1247
Huuskonen J, Salo M, Taskinen J (1998) Aqueous solubility prediction of drugs based on molecular topology and neural network modeling. J Chem Inf Comput Sci 38: 450–456
Yaffe D, Cohen Y, Espinosa G, Arenas A, Giralt F (2001) A fuzzy ARTMAP based on quantitative structure-property relationships (QSPRs) for predicting aqueous solubility of organic compounds. J Chem Inf Comput Sci 41: 1177–1207
Huuskonen J (2000) Estimation of aqueous solubility for a diverse set of organic compounds based on molecular topology. J Chem Inf Comput Sci 40: 773–777
Liu R, So SS (2001) Development of quantitative structure-property relationship models for the early ADME evaluation in drug discovery. 1. Aqueous solubility. J Chem Inf Comput Sci 41: 1633–1639
Hansen NT, Kouskoumvekaki I, Jørgensen FS, Brunak S, Jonsdottir SO (2006) Prediction of pH-dependent aqueous solubility of druglike molecules. J Chem Inf Comput Sci 46: 2601–2609
Dearden JC, Schüürmann G (2003) Quantitative structure-property relationships for predicting Henry’s law constant from molecular structure. Environ Toxicol Chem 22: 1755–1770
English NJ, Caroll DG (2001) Prediction of Henry’s law constants by a quantitative structure property relationship and neural networks. J Chem Inf Comput Sci 41: 1150–1161
Yaffe D, Cohen Y, Espinosa G, Arenas A, Giralt F (2003) A fuzzy ARTMAP-based quantitative structure-property relationship (QSPR) for the Henry’s law constant of organic compounds. J Chem Inf Comput Sci 43: 85–112
Berthelot M (1872) Sur les lois qui président au partage d’un corps entre deux dissolvants (Théorie). Ann Chim Phys 26: 408–417
Hansch C, Leo A (1995) Exploring QSAR. Fundamentals and applications in chemistry and biology. American Chemical Society, Washington
Nernst W (1891) Verteilung eines Stoffes zwischen zwei Lösungsmitteln und zwischen Lösungsmittel und Dampfraum. Z Phys Chem 8: 110–139
Meyer H (1899) Zur Theorie der Alkoholnarkose. Arch Exp Pathol Pharmakol 42: 109–118
Overton E. (1901) Studien über die Narkose. Gustav Fischer, Jena
Hansch C, Maloney PP, Fujita T, Muir RM (1962) Correlation of biological activity of phenoxyacetic acids with Hammett substituent constants and partition coefficients. Nature 194: 178–180
Hansch C, Fujita T (1964) \(\rho \mbox{ -}\sigma \mbox{ -}{ \pi }\) Analysis. A method for the correlation of biological activity and chemical structure. J Am Chem Soc 86: 1616–1626
Sangster J (1997) Octanol-water partition coefficients: Fundamentals and physical chemistry. Wiley, Chichester
Rekker RF, Mannhold R (1992) Calculation of drug lipophilicity. The hydrophobic fragmental constant approach. VCH, Weinheim
Schaper KJ, Samitier MLR (1997) Calculation of octanol/water partition coefficients (log P) using artificial neural networks and connection matrices. Quant Struct Act Relat 16: 224–230
Bodor N, Ming-Ju H, Harget A (1994) Neural network studies. III: Prediction of partition coefficients. J Molec Struct Theochem 309: 259–266
Yaffe D, Cohen Y, Espinosa G, Arenas A, Giralt F (2002) Fuzzy ARTMAP and back-propagation neural networks based quantitative structure-property relationships (QSPRs) for octanol-water partition coefficient of organic compounds. J Chem Inf Comput Sci 42: 162–183
Huuskonen JJ, Livingstone DJ, Tetko IV (2000) Neural network modeling for estimation of partition coefficient based on atom-type electrotopological states indices. J Chem Inf Comput Sci 40: 947–955
Tetko IV, Tanchuk VY, Villa AEP (2001) Prediction of n-octanol/water partition coefficients from PHYSPROP database using artificial neural networks and E-states indices. J Chem Inf Comput Sci 41: 1407–1421
Tetko IV, Tanchuk VY (2002) Application of associative neural networks for prediction of lipophilicity in ALOGPS 2.1 program. J Chem Inf Comput Sci 42: 1136–1145
Devillers J, Domine D, Guillon C, Bintein S, Karcher W (1997) Prediction of partition coefficients (log Poct) using autocorrelation descriptors. SAR QSAR Environ Res 7: 151–172
Devillers J, Domine D, Guillon C (1998) Autocorrelation modeling of lipophilicity with a back-propagation neural network. Eur J Med Chem 33, 659–664
Devillers J, Domine D, Guillon C, Karcher W (1998) Simulating lipophilicity of organic molecules with a back-propagation neural network. J Pharm Sci 87, 1086–1090
Broto P, Devillers J (1990) Autocorrelation of properties distributed on molecular graphs. In: Karcher W, Devillers J (eds) Practical applications of quantitative structure-activity relationships (QSAR) in environmental chemistry and toxicology, Kluwer, Dordrecht
Devillers J (1999) Autocorrelation descriptors for modeling (eco)toxicological endpoints. In: Devillers J, Balaban AT (eds) Topological indices and related descriptors in QSAR and QSPR, Gordon and Breach Publishers, Amsterdam
Devillers J, Domine D (1997) Comparison of reliability of log P values calculated from a group contribution approach and from the autocorrelation method. SAR QSAR Environ Res 7: 195–232
Devillers J (2000) EVA/PLS versus autocorrelation/neural network estimation of partition coefficients. Pespect Drug Discov Design 19: 117–131
Devillers J (1999). Calculation of octanol/water partition coefficients for pesticides. A comparative study. SAR QSAR Environ Res 10: 249–262
Domine D, Devillers J (1998) A computer tool for simulating lipophilicity of organic molecules. Sci Comput Autom 15: 55–63
Devillers J (1999) AUTOLOGPTM: A computer tool for simulating n-octanol-water partition coefficients. Analusis 27 23–29
Pitter P, Chudoba J (1990) Biodegradability of organic substances in the aquatic environment. CRC Press, Boca Raton
Kuenemann P, Vasseur P, Devillers J (1990) Structure-biodegradability relationships. In: Karcher W, Devillers J (eds) Practical applications of quantitative structure-activity relationships (QSAR) in environmental chemistry and toxicology, Kluwer, Dordrecht
Vasseur P, Kuenemann P, Devillers J (1993) Quantitative structure-biodegradability relationships for predictive purposes. In: Calamari D (ed) Chemical exposure predictions, Lewis Publishers, Boca Raton
Devillers J (1996). On the necessity of multivariate statistical tools for modeling biodegradation. In: Ford MG, Greenwood R, Brooks CT, Franke R (eds) Bioactive compound design: Possibilities for industrial use. BIOS Scientific Publishers, Oxford
Cambon B, Devillers J (1993) New trends in structure-biodegradability relationships. Quant Struct Act Relat 12 49–56
Devillers J, Karcher W (1990) Correspondence factor analysis as a tool in environmental SAR and QSAR studies. In: Karcher W, Devillers J (eds) Practical applications of quantitative structure-activity relationships (QSAR) in environmental chemistry and toxicology, Kluwer, Dordrecht
Devillers J (1996) Genetic algorithms in molecular modeling. Academic Press, London
Devillers J (1996) Designing molecules with specific properties from intercommunicating hybrid systems. J Chem Inf Comput Sci 36: 1061–1066
Devillers J (1993) Neural modelling of the biodegradability of benzene derivatives. SAR QSAR Environ Res 1: 161–167
Tabak HH, Govind R (1993) Prediction of biodegradation kinetics using a nonlinear group contribution method. Environ Toxicol Chem 12: 251–260
Devillers J, Domine D, Boethling RS (1996) Use of a backpropagation neural network and autocorrelation descriptors for predicting the biodegradation of organic chemicals. In: Devillers J (ed) Neural networks in QSAR and drug design, Academic Press, London
Domine D, Devillers J, Chastrette M, Karcher W (1993) Estimating pesticide field half-lives from a backpropagation neural network. SAR QSAR Environ Res 1: 211–219
Cros AFA (1863) Action de l’alcool amylique sur l’organisme. Thesis, University of Strasbourg, Strasbourg
Free SM, Wilson JW (1964) A mathematical contribution to structure-activity studies. J Med Chem 1: 395–399
Devillers J, Lipnick RL (1990) Practical applications of regression analysis in environmental QSAR studies. In: Karcher W, Devillers J (eds) Practical applications of quantitative structure-activity relationships (QSAR) in environmental chemistry and toxicology, Kluwer, Dordrecht
Devillers J, Bintein S, Domine D, Karcher W (1995) A general QSAR model for predicting the toxicity of organic chemicals to luminescent bacteria (Microtox® test). SAR QSAR Environ Res 4: 29–38
Devillers J, Domine D (1999) A noncongeneric model for predicting toxicity of organic molecules to Vibrio fischeri. SAR QSAR Environ Res 10: 61–70
Xu L, Ball JW, Dixon SL, Jurs PC (1994) Quantitative structure-activity relationships for toxicity of phenols using regression analysis and computational neural networks. Environ Toxicol Chem 13: 841–851
Serra JR, Jurs PC, Kaiser KLE (2001) Linear regression and computational neural network prediction of Tetrahymena acute toxicity of aromatic compounds from molecular structure. Chem Res Toxicol 14: 1535–1545
Burden FR, Winkler DA (2000) A Quantitative structure-activity relationships model for the acute toxicity of substituted benzenes to Tetrahymena pyriformis using Bayesian-regularized neural networks. Chem Res Toxicol 13: 436–440
Winkler D, Burden F (2003) Toxicity modelling using Bayesian neural nets and automatic relevance determination. In: Ford M, Livingstone D, Dearden J, van de Waterbeemd H (eds) EuroQSAR 2002. Designing drugs and crop protectants: Processes, problems and solutions, Blackwell publishing, Malden
Devillers J (2004) Linear versus nonlinear QSAR modeling of the toxicity of phenol derivatives to Tetrahymena pyriformis. SAR QSAR Environ Res 15: 237–249
Yao XJ, Panaye A, Doucet JP, Zhang RS, Chen HF, Liu MC, Hu ZD, Fan BT (2004) Comparative study of QSAR/QSPR correlations using support vector machines, radial basis function neural networks, and multiple linear regression. J Chem Inf Comput Sci 44: 1257–1266
Ren S (2003) Modeling the toxicity of aromatic compounds to Tetrahymena pyriformis: The response surface methodology with nonlinear methods. J Chem Inf Comput Sci 43: 1679–1687
Panaye A, Fan BT, Doucet JP, Yao XJ, Zhang RS, Liu MC, Hu ZD (2006) Quantitative structure-toxicity relationships (QSTRs): A comparative study of various non linear methods. General regression neural network, radial basis function neural network and support vector machine in predicting toxicity of nitro- and cyano-aromatics to Tetrahymena pyriformis. SAR QSAR Environ Res 17: 75–91
Novic M, Vracko M (2003) Artificial neural networks in molecular-structures-property studies. In: Leardi R (ed) Nature-inspired methods in chemometrics: Genetic algorithms and artificial neural networks, Elsevier, Amsterdam
Niculescu SP, Kaiser KLE Schultz TW (2000) Modeling the toxicity of chemicals to Tetrahymena pyriformis using molecular fragment descriptors and probabilistic neural networks. Arch Environ Contam Toxicol 39: 289–298
Kaiser KLE, Niculescu SP, Schultz TW (2002) Probabilistic neural network modeling of the toxicity of chemicals to Tetrahymena pyriformis with molecular fragment descriptors. SAR QSAR Environ Res 13: 57–67
Kahn I, Sild S, Maran U (2007) Modeling the toxicity of chemicals to Tetrahymena pyriformis using heuristic multilinear regression and heuristic back-propagation neural networks. J Chem Inf Model 47: 2271–2279
Kaiser KLE, Niculescu SP (2001) Modeling acute toxicity of chemicals to Daphnia magna: A probabilistic neural network approach. Environ Toxicol Chem 20: 420–431
Devillers J (2003) A QSAR model for predicting the acute toxicity of pesticides to gammarids. In: Leardi R (ed) Nature-inspired methods in chemometrics: Genetic algorithms and artificial neural networks, Elsevier, Amsterdam
Devillers J (2000) Prediction of toxicity of organophosphorus insecticides against the midge, Chironomus riparius, via a QSAR neural network model integrating environmental variables. Toxicol Meth 10: 69–79
Kaiser KLE, Niculescu SP, Schüürmann G (1997) Feed forward backpropagation neural networks and their use in predicting the acute toxicity of chemicals to the fathead minnow. Water Qual Res J Canada 32: 637–657
Kaiser KLE, Niculescu SP, McKinnon MB (1997) On simple linear regression, multiple linear regression, and elementary probabilistic neural network with Gaussian kernel’s performance in modeling toxicity values to fathead minnow based on Microtox data, octanol/water partition coefficient, and various structural descriptors for a 419-compound dataset In: Chen F, Schüürmann G, Proceedings of the 7th international workshop on QSAR in environmental sciences, SETAC Press
Eldred DV, Weikel CL, Jurs PC, Kaiser KLE (1999) Prediction of fathead minnow acute toxicity of organic compounds from molecular structure. Chem Res Toxicol 12: 670–678
Kaiser KLE, Niculescu SP (1999) Using probabilistic neural networks to model the toxicity of chemicals to the fathead minnow (Pimephales promelas): A study based on 865 compounds. Chemosphere 38: 3237–3245
Niculescu SP, Atkinson A, Hammond G, Lewis M (2004) Using fragment chemistry data mining and probabilistic neural networks in screening chemicals for acute toxicity to the fathead minnow. SAR QSAR Environ Res 15: 293–309
Espinosa G, Arenas A, Giralt F (2002) An integrated SOM-fuzzy ARTMAP neural system for the evaluation of toxicity. J Chem Inf Comput Sci 42: 343–359
Mazzatorta P, Benfenati E, Neagu CD, Gini G (2003) Tuning neural and fuzzy-neural networks for toxicity modeling. J Chem Inf Comput Sci 43: 513–518
Vracko M, Bandelj V, Barbieri P, Benfenati E, Chaudry Q, Cronin M, Devillers J, Gallegos A, Gini G, Gramatica P, Helma C, Mazzatorta P, Neagu D, Netzeva T, Pavan M, Patlewicz G, Randic M, Tsakovska I, Worth A (2006) Validation of counter propagation neural network models for predictive toxicology according to the OECD principles: A case study. SAR QSAR Environ Res 17: 265–284
Devillers J (2005) A new strategy for using supervised artificial neural networks in QSAR. SAR QSAR Environ Res 16: 433–442
Devillers J, Flatin J (2000) A general QSAR model for predicting the acute toxicity of pesticides to Oncorhynchus mykiss. SAR QSAR Environ Res 11: 25–43
Devillers J. (2001) A general QSAR model for predicting the acute toxicity of pesticides to Lepomis macrochirus. SAR QSAR Environ Res 11: 397–417
Devillers J, Pham-Delègue MH, Decourtye A, Budzinski H, Cluzeau S, Maurin G (2002) Structure-toxicity modeling of pesticides to honey bees. SAR QSAR Environ Res 13: 641–648
Devillers J, Pham-Delègue MH, Decourtye A, Budzinski H, Cluzeau S, Maurin G (2003) Modeling the acute toxicity of pesticides to Apis mellifera. Bull Insect 56: 103–109
Devillers J (2008) Artificial neural network modeling in environmental toxicology. In: Livingstone D (ed) Artificial neural networks: Methods and protocols, Humana Press, New York
WHO (2003) Health aspects of air pollution with particulate matter, ozone and nitrogen dioxide, Germany, Bonn
Ferré-Huguet N, Nadal M, Schuhmacher M, Domingo JL (2006) Environmental impact and human health risks of polychlorinated dibenzo-p-dioxins and dibenzofurans in the vicinity of a new hazardous waste incinerator: A case study. Environ Sci Technol 40: 61–66
Lee J, Kwak IS, Lee E, Kim KA (2007) Classification of breeding bird communities along an urbanization gradient using an unsupervised artificial neural network. Ecol Model 203: 62–71
Devillers J, Doré JC (1989) Heuristic potency of the minimum spanning tree (MST) method in toxicology. Ecotoxicol Environ Safety 17: 227–235
Wienke D, Hopke PK (1994) Visual neural mapping technique for locating fine airborne particles sources. Environ Sci Technol 28: 1015–1022
Wienke D, Gao N, Hopke PK (1994) Multiple site receptor modeling with a minimal spanning tree combined with a neural network. Environ Sci Technol 28: 1023–1030
Wienke D, Hopke PK (1994) Projection of Prim’s minimal spanning tree into a Kohonen neural network for identification of airborne particle sources by their multielement trace patterns. Anal Chim Acta 291: 1–18
Wienke D, Xie Y, Hopke PK (1995) Classification of airborne particles by analytical SEM imaging and a modified Kohonen neural network (3MAP). Anal Chim Acta 310: 1–14
Domine D, Devillers J, Chastrette M, Karcher W (1993) Non-linear mapping for structure-activity and structure-property modelling. J Chemom 7: 227–242
Domine D, Devillers J, Wienke D, Buydens L (1996) Test series selection from nonlinear neural mapping. Quant Struct Act Relat 15: 395–402
Domine D, Wienke D, Devillers J, Buydens L (1996) A new nonlinear neural mapping technique for visual exploration of QSAR data. In: Devillers J (ed) Neural networks in QSAR and drug design, Academic Press, London
Kolehmainen M, Martikainen H, Hiltunen T, Ruuskanen J (2000) Forecasting air quality parameters using hybrid neural network modelling. Environ Monit Ass 65: 277–286
Owega S, Khan BUZ, Evans GJ, Jervis RE, Fila M (2006) Identification of long-range aerosol transport patterns to Toronto via classification of back trajectories by cluster analysis and neural network techniques. Chemom Int Lab Syst 83: 26–33
Wienke D, Domine D, Buydens L, Devillers J (1996) Adaptive resonance theory based neural networks explored for pattern recognition analysis of QSAR data. In: Devillers J (ed) Neural networks in QSAR and drug design, Academic Press, London
Spencer MT, Shields LG, Prather KA (2007) Simultaneous measurement of the effective density and chemical composition of ambient aerosol particles. Environ Sci Technol 41: 1303–1309
Ibarra-Berastegi G, Elias A, Barona A, Saenz J, Ezcurra A, Diaz de Argandonia J (2008) From diagnosis to prognosis for forecasting air pollution using neural networks: Air pollution monitoring in Bilbao. Environ Model Soft 23: 622–637
Kurt A, Gulbagci B, Karaca F, Alagha O (2008) An online air pollution forecasting system using neural networks. Environ Int 34: 592–598
Sahin U, Ucan ON, Bayat C, Oztorun N (2005) Modeling of SO2 distribution in Istanbul using artificial neural networks. Environ Model Ass 10: 135–142
Chelani AB, Singh RN, Devotta S (2005) Nonlinear dynamical characterization and prediction of ambient nitrogen dioxide concentration. Water Air Soil Pollut 166: 121–138
Niska H, Hiltunen T, Karppinen A, Ruuskanen J, Kolehmainen M (2004) Evolving the neural network model for forecasting air pollution time series. Eng Appl Artif Int 17: 159–167
Gardner MW, Dorling SR (2000) Statistical surface ozone models: An improved methodology to account for non-linear behaviour. Atmos Environ 34: 21–34
Balaguer Ballester E, Camps i Valls G, Carrasco-Rodriguez JL, Soria Olivas E, del Valle-Tascon S (2002) Effective 1-day ahead prediction of hourly surface ozone concentrations in eastern Spain using linear models and neural networks. Ecol Model 156: 27–41
Abdul-Wahab SA, Al-Alawi SM (2002) Assessment and prediction of tropospheric ozone concentration levels using artificial neural networks. Environ Model Soft 17: 219–228
Corani G (2005) Air quality prediction in Milan: Feed-forward neural networks, pruned neural networks and lazy learning. Ecol Model 185: 513–529
Sousa SIV, Martins FG, Pereira MC, Alvim-Ferraz MCM (2006) Prediction of ozone concentrations in Oporto city with statistical approaches. Chemosphere 64: 1141–1149
Lu HC, Hsieh JC, Chang TS (2006) Prediction of daily maximum ozone concentrations from meteorological conditions using a two-stage neural network. Atmos Res 81: 124–139
Karatzas KD, Kaltsatos S (2007) Air pollution modeling with the aid of computational intelligence methods in Thessaloniki, Greece. Simulat Model Pract Theor 15: 1310–1319
Salazar-Ruiz E, Ordieres JB, Vergara EP, Capuz-Rizo SF (2008) Development and comparative analysis of tropospheric ozone prediction models using linear and artificial intelligence-based models in Mexicalli, Baja California (Mexico) and Calexico, California (US). Environ Model Soft 23: 1056–1069
Slini T, Kaprara A, Karatzas K, Moussiopoulos N (2006) PM10 forecasting for Thessaloniki, Greece. Environ Model Soft 21: 559–565
Papanastasiou DK, Melas D, Kioutsioukis I (2007) Development and assessment of neural network and multiple regression models in order to predict PM10 levels in a medium-sized Mediterranean city. Water Air Soil Pollut 182: 325–334
Perez P, Reyes J (2001) Prediction of particulate air pollution using neural techniques. Neural Comput Applic 10: 165–171
Ordieres JB, Vergara EP, Capuz RS, Salazar RE (2005) Neural network prediction model for fine particulate matter (PM2.5) on the US-Mexico border in El Paso (Texas) and Ciudad Juárez (Chihuahua). Environ Model Soft 20: 547–559
Dimopoulos I, Chronopoulos J, Chronopoulou-Sereli A, Lek S (1999) Neural network models to study relationships between lead concentration in grasses and permanent urban descriptors in Athens city (Greece). Ecol Model 120: 157–165
Nagendra SMS, Khare M (2006) Artificial neural network approach for modelling nitrogen dioxide dispersion from vehicular exhaust emissions. Ecol Model 190: 99–115
Jensen RR, Karki S, Salehfar H (2004) Artificial neural network-based estimation of mercury speciation in combustion flue gases. Fuel Process Technol 85: 451–462
Karakitsios SP, Papaloukas CL, Kassomenos PA, Pilidis GA (2006) Assessment and prediction of benzene concentrations in a street canyon using artificial neural networks and deterministic models. Their response to “what if” scenarios. Ecol Model 193: 253–270
De Vito S, Massera E, Piga M, Martinotto L, Di Francia G (2008) On field calibration of an electronic nose for benzene estimation in an urban pollution monitoring scenario. Sensors Actuators B 129: 750–757
Devillers J, Guillon C, Domine D (1996) A neural structure-odor threshold model for chemicals of environmental and industrial concern. In: Devillers J (ed) Neural networks in QSAR and drug design, Academic Press, London
Guyot M, Doré JC, Devillers J (2004) Typology of secondary cyanobacterial metabolites from minimum spanning tree analysis. SAR QSAR Environ Res 15: 101–114
Recknagel F, French M, Harkonen P, Yabunaka KI (1997) Artificial neural network approach for modelling and prediction of algal blooms. Ecol Model 96: 11–28
Lee JHW, Huang Y, Dickman M, Jayawardena AW (2003) Neural network modelling of coastal algal blooms. Ecol Model 159: 179–201
Oh HM, Ahn CY, Lee JW, Chon TS, Choi KH, Park YS (2007) Community patterning and identification of predominant factors in algal bloom in Deachung reservoir (Korea) using artificial neural networks. Ecol Model 203: 109–118
Kim CK, Kwak IS, Cha EY, Chon TS (2006) Implementation of wavelets and artificial neural networks to detection of toxic response behavior of chironomids (Chironomidae: Diptera) for water quality monitoring. Ecol Model 195: 61–71
Ray C, Klindworth KK (2000) Neural networks for agrichemical vulnerability assessment of rural private wells. J Hydrol Eng 5: 162–171
Mishra A, Ray C, Kolpin DW (2004) Use of qualitative and quantitative information in neural networks for assessing agricultural chemical contamination of domestic wells. J Hydrol Eng 9: 502–511
Sahoo GB, Ray C, Wade HF (2005) Pesticide prediction in ground water in North Carolina domestic wells using artificial neural networks. Ecol Model 183: 29–46
Sahoo GB, Ray C, Mehnert E, Keefer DA (2006) Application of artificial neural networks to assess pesticide contamination in shallow groundwater. Sci Total Environ 367: 234–251
Stenemo F, Lindahl AML, Gärdenäs A, Jarvis N (2007) Meta-modeling of the pesticide fate model MACRO for groundwater exposure assessments using artificial neural networks. J Contam Hydrol 93: 270–283
El Tabach E, Lancelot L, Shahrour I, Najjar Y (2007) Use of artificial neural network simulation metamodelling to assess groundwater contamination in a road project. Math Comput Model 45: 766–776
Samecka-Cymerman A, Stankiewicz A, Kolon K, Kempers AJ (2007) Self-organizing feature map (neural networks) as a tool in classification of the relation between chemical composition of aquatic bryophytes and types of streambeds in the Tatra national park in Poland. Chemosphere 67: 954–960
Gagné F, Blaise C (1997) Predicting the toxicity of complex mixtures using artificial neural networks. Chemosphere 35: 1343–1363
Pigram GM, Macdonald TR (2001) Use of neural network models to predict industrial bioreactor effluent quality. Environ Sci Technol 35: 157–162
Lopez Garcia H, Machon Gonzalez I (2004) Self-organizing map and clustering for wastewater treatment monitoring. Eng Appl Art Int 17: 215–225
Nadal M, Schuhmacher M, Domingo JL (2004) Metal pollution of soils and vegetation in an area with petrochemical industry. Sci Total Environ 321: 59–69
Arias R, Barona A, Ibarra-Berastegi G, Aranguiz I, Elias A (2008) Assessment of metal contamination in degded sediments using fractionation and self-organizing maps. J Hazad Mat 151: 78–85
Götz R, Lauer R (2003) Analysis of sources of dioxin contamination in sediments and soils using multivariate statistical methods and neural networks. Environ Sci Technol 37: 5559–5565
Kanevski MF (1999) Spatial predictions of soil contamination using general regression neural networks. Int J Syst Res Inf Syst 8: 241–256
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Devillers, J. (2009). Artificial Neural Network Modeling of the Environmental Fate and Ecotoxicity of Chemicals. In: Devillers, J. (eds) Ecotoxicology Modeling. Emerging Topics in Ecotoxicology, vol 2. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-0197-2_1
Download citation
DOI: https://doi.org/10.1007/978-1-4419-0197-2_1
Published:
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-0196-5
Online ISBN: 978-1-4419-0197-2
eBook Packages: Earth and Environmental ScienceEarth and Environmental Science (R0)