Keywords

1 Introduction

The last decade has witnessed a surge interest in the use of artificial neural networks (ANNs) for modeling complex tasks in a variety of fields including data mining, speech, image recognition, finance, business, drug design, and so on [16]. The raison d’être of these powerful tools is to exploit the imprecision and uncertainty of real-world problems for deriving valuable and robust models.

The concepts of ANNs are directly inspired by neurobiology. Thus, the cerebral cortex contains about 100 billion neurons, which are special cells processing information. A biological neuron receives signals from other neurons through its dendrites and transmits information generated by its soma along its axon. In the brain, each neuron is connected to 1,000–11,000 other neurons via synapses in which neurotransmitters inducing different activities are released. The human brain contains approximately \(1{0}^{14} - 1{0}^{15}\) interconnections [79]. Consequently, the brain can be viewed as a nonlinear and highly parallel biological device characterized by robustness and fault tolerance. It can learn, handle imprecise, fuzzy, and noisy information, and can generalize from past and/or new experiences [10, 11]. ANNs can be defined as weighted directed graphs with connected nodes called neurons that attempt to mimic some of the basic characteristics of the human brain [11]. Consequently, it is not surprising to see that now these nonlinear statistical tools are widely used in numerous technical and scientific domains to process complex information. After a brief overview of the characteristics of ANNs, this chapter will review the main applications of ANNs for modeling the toxicity and ecotoxicity of chemicals as well as their environmental fate. Their advantages and limitations will be also stressed.

2 Characteristics of ANNs

A precise definition of learning is difficult to formulate but the fundamental questions that neurobehaviorists try to answer are: How do we learn? Which is the most efficient process for learning? How much and how fast can we learn? In a neurocomputing context, a learning process can be viewed as a method for updating the architecture as well as the connection weights of an ANN to optimize its efficiency to perform a specific task. The three main learning paradigms are the following: supervised, unsupervised (or self-organized), and reinforcement. Each category includes numerous algorithms. Supervised is the most commonly employed learning paradigm to develop classification and prediction applications. The algorithm takes the difference between the observed and calculated output and uses that information to adjust the weights in the network so that next time, the prediction will be closer to the correct answer (Fig. 1) [1]. Unsupervised learning is used when we want to perform a clustering of the input data. ANNs that are trained using this learning process are called self-organizing neural networks because they receive no direction on what the desired output should be. Indeed, when presented with a series of inputs, the outputs self-organize by initially competing to recognize the input information and then cooperating to adjust their connection weights. Over time, the network evolves so that each output unit is sensitive to and will recognize inputs from a specific portion of the input space (Fig. 2) [1]. Reinforcement learning attempts to learn the input–output mapping through trial and error with a view to maximizing a performance index called the reinforcement signal (Fig. 3). Reinforcement learning is particularly suited to solve difficult temporal (time-dependent) problems [1].

Fig. 1
figure 1_1

Supervised learning paradigm (adapted from [1])

Fig. 2
figure 2_1

Unsupervised learning paradigm (adapted from [1])

Fig. 3
figure 3_1

Reinforcement learning paradigm (adapted from [1])

ANNs are also characterized by their connection topology. The arrangement of neurons and their interconnections can have an important impact on the modeling capabilities of the ANNs. Generally, ANNs are organized into layers of neurons. Data can flow between the neurons in these layers in two different ways. In feedforward networks, no loops occur while in recurrent networks feedback connections are found.

The description of the different ANN paradigms is beyond the scope of this chapter and the interested readers are invited to consult the rich body of literature on this topic (see e.g., [1217]). However, Table 1 summarizes the main characteristics of the different types of ANNs cited in the following sections. It is also beyond the scope of this chapter to provide information on computer tools that can be used for deriving ANN models. However, it is noteworthy that a list of freeware, shareware, and commercial ANN software can be found in Devillers and Doré [18].

Table 1 Taxonomy of the main types of ANNs (adapted from [1, 11])

3 Use of ANNs in Quantitative Structure–Property Relationship (QSPR) Modeling

Knowing the physicochemical properties of xenobiotics is a prerequisite to estimate their bioactivity, bioavailability, transport, and distribution between the different compartments of the biosphere [1922]. Unfortunately, there are very limited or no experimental physicochemical data available for most of the chemicals susceptible to contaminate the aquatic and terrestrial ecosystems. Consequently, for the many compounds without experimental data, the only alternative to using actual measurements is to approximate values by means of estimation models, which are generically termed quantitative structure–property relationships (QSPRs). The ingredients necessary to derive a QSPR model are given in Fig. 4. Although most of the QSPR models have been derived from simple contribution methods and regression analysis [2327], attempts have been made to use ANNs for modeling the intrinsic physicochemical properties of organic molecules as well as their environmental degradation parameters linked to transformation process. These models are discussed in the following sections.

Fig. 4
figure 4_1

Ingredients for deriving a QSPR model

3.1 Boiling Point

The normal boiling point (BP), corresponding to the temperature at which a substance presents a vapor pressure (VP) of 760mmHg, depends on a number of molecular properties that control the ability of a molecule to escape from the surface of a liquid into the vapor phase. These properties are molecular size, polar and hydrogen bonding forces, and entropic factors such as flexibility and orientation [27]. Different types of ANNs have been used for computing BP models. Thus, a radial basis function (RBF) network was used by Lohninger [28] for predicting the BPs of 185 ethers, peroxides, acetals, and their sulfur analogs. Molecules were described by two sets of three topological and structural descriptors yielding the design of two models, both including 20 hidden neurons and cross-validated from a leave-25%-out procedure. Both models outperformed regressions models obtained under the same conditions. Cherquaoui and coworkers [29] used the same data set of 185 molecules but their ANN was a three-layer perceptron (TLP) trained by the backpropagation algorithm, and the chemical structures were characterized by embedding frequencies. The ANN presented 20 input neurons and a bias, from 3 to 8 hidden neurons and a bias, and an output neuron. Their selected 20/5/1 (input/hidden/output) TLP after 4,000 iterations presented good statistics but undoubtedly this model presented a problem of overtraining, and it is noteworthy that the number of connections within the ANN is high. At that time, other TLP models allowing the estimation of BPs of chlorofluorocarbons with 1, 1–2, or 1–4 carbon atoms (n​=​ 15, 62, and 276, respectively) as well as of halomethanes with up to four different halogen atoms (n​=​ 48) were also proposed [30]. Egolf and coworkers [31] used a TLP trained by the Broyden–Fletcher–Goldfarb–Shanno (BFGS) quasi-Newton optimization method for deriving a model allowing for the prediction of the BP of industrial chemicals. A database of 298 structurally diverse chemicals was first split into a learning set (LS), a cross-validation set (CVS), and an external testing set (ETS) of 241, 27, and 30 chemicals, respectively. It is noteworthy that the CVS is used to monitor the ANN. Topological, geometrical, and electronic descriptors were generated for characterizing the molecules. The best configuration was a 8/3/1 ANN yielding RMS error values of 11.18, 9.17, and 10.69K for the LS, CVS, and ETS, respectively. The same methodology was applied to a larger database [32]. The selected 6/5/1 ANN gave RMS error values of 5.7K for the training and CVSs of 267 and 29 chemicals, respectively. The network model was validated with a 15-member external prediction set. The RMS error of prediction was 7.1K. This was substantially better than the 8.5K error obtained from a regression model derived under the same conditions and with the same descriptors. E-state indices [33] for 19 atom types were used [34] as inputs neurons of a TLP trained by the backpropagation algorithm for predicting the BPs of chemicals from a LS and ETS of 268 and 30 compounds, respectively. The best model included five neurons on the hidden layer. It produced a mean absolute error of 3.86 and 4.57K for the LS and ETS, respectively. These authors experienced the same strategy on a larger database of 372 chemicals but only including alkanes, alcohols, and (poly)chloroalkanes [35]. The interest of the TLP and a fuzzy ARTMAP ANN was tested by Espinosa et al. [36] from a limited database including 140 alkanes, 144 alkenes, and 43 alkynes. Even if this kind of study allows us to compare methods and/or descriptors, it is obvious that ANNs show their full interest when models are derived from large sets of molecules from which, it is not easy to relate the structure of the molecules to a property (or activity) under study from classical linear methods. Thus, an interesting approach based on the use of a TLP and descriptors calculated using AM1 and PM3 semiempirical quantum-chemical methods was used by Chalk and coworkers [37] for deriving models from a database of 6,629 experimental BPs. The LS and ETS included 6,000 and 629 chemicals, respectively. The best results were obtained with a 18/10/1 ANN architecture. Ten separate ANNs with random starting weights were then trained with different LSs and ETSs, chosen such that each chemical appeared only once in an ETS. The standard deviations (means of the results for 10 nets) for the LS and ETS were 16.54 and 19.02K with the AM1 approach and 18.33 and 20.27K with the PM3 approach.

3.2 Vapor Pressure

The VP determines the potential of a chemical to volatilize from its condensed or dissolved phases and to therefore exist as a gas [38]. VP strongly depends on the temperature as expressed in the classical Clausius–Clapeyron equation [24]. As previously seen, the BP of a chemical can be easily derived from its VP. Numerous methods can be used for estimating the VPs of chemicals, and among them, some are based on the use of ANNs. Thus, different regression and ANN models were tested by Liang and Gallagher [39] from a set of 479 chemicals described by various descriptors encoding the structure and physicochemical properties of the molecules. Standard errors of 0.534 and 0.522 (log units, Torr) were obtained for the regression models with seven independent variables and a 7/5/1 ANN. However, the interest of the results is very limited because of total lack of information on the conditions in which the models were derived. More reliable models were designed by McClelland and Jurs [40]. TLP models were developed to relate the structural characteristics of 420 diverse organic compounds to their VP at 25°C expressed as log (VP in Pascals). The log (VP) values ranged over eight orders of magnitude from−1.34 to 6.68 log units. The database was split into a learning set (LS), a CVS, and an ETS of 290, 65, and 65 chemicals, respectively. A 8/3/1 TLP trained by a BFGS optimization algorithm and only including topological descriptors yielded RMS errors of 0.26, 0.29, and 0.37 for the LS, CVS, and ETS, respectively (log units, Pa). An alternative 10/4/1 TLP containing a lager selection of descriptor types (e.g., quantum mechanical descriptors) resulted in improved performance with RMS errors of 0.19, 0.24, and 0.33 for the LS, CVS, and ETS, respectively [40]. In the same way, Beck and coworkers [41] derived a 10/8/1 TLP trained by the backpropagation algorithm for estimating the log VP at 25°C. Descriptors derived from quantum mechanical calculations were used for describing the 551 chemicals constituting the learning and testing sets. The leave-one-out (LOO) cross validation gave a standard deviation of 0.37 log units (Torr) and a maximum absolute error of 1.65. A temperature-dependent model based on a TLP trained by the backpropagation algorithm and descriptors calculated using AM1 semiemperical MO-theory was proposed by Chalk et al. [42]. A data set of 8,542 measurements at various temperatures for a total of 2,349 molecules was divided into a training set of 7,681 measurements and an external validation set of 861 measurements in such a manner that the validation set spans the full range of VPs. The standard deviation of the error (log units, Torr) for the learning, LOO cross-validation, and validation sets obtained with the selected 27/15/1 TLP was equal to 0.32, 0.46, and 0.33, respectively. Yaffe and Cohen [43] also computed a temperature-dependent QSPR model for VP of aliphatic, aromatic, and polycyclic aromatic hydrocarbons, ranging from 4 to 12 carbon atoms using a TLP trained by the backpropagation algorithm with connectivity indices [44], molecular weight, and temperature as input parameters in the ANN. The database of 274 molecules included 7,613 vapor pressure–temperature data. It was split into a learning set (LS), a CVS, and an ETS of 5,330, 754, and 1,529 chemicals, respectively. The best model was a 7/29/1 TLP yielding average absolute VP errors of 11.6% (0.051 log units or 34kPa), 8.2% (0.036 log units or 23.2kPa), 9.2% (0.039 log units or 26.8kPa), and 10.7% (0.046 log P units or 31.1kPa) for the training, test, validation, and overall sets, respectively.

3.3 Water Solubility

Of the various parameters affecting the fate and transport of organic chemicals in the ecosystems, water solubility is one of the most important. Highly soluble chemicals are easily and rapidly distributed in the environment. These chemicals tend to have relatively low adsorption coefficients in soils and sediments and also negligible bioconcentration factors in living species. They tend to be more readily biodegradable by microorganisms. The water solubility of chemicals also influences their photolysis, hydrolysis, oxidation, and volatilization [23]. A quite large number of estimation methods have been proposed for modeling the water solubility of organic chemicals, and some of them are based on the use of ANNs. Thus, a database of water solubility values for 157 substituted aromatic hydrocarbons described from structural fragments was randomly split into a LS, a CVS, and an ETS of 95, 31, and 31 chemicals, respectively [45]. A TLP trained by the backpropagation algorithm was used as statistical engine. The best model was a 9/11/1 ANN (learning rate 0.35, 276 cycles) yielding a mean square error (MSE) of 0.21 from 40 randomly selected test data sets. For comparison purpose, the MSE obtained with a regression analysis was 0.25. A rather similar approach was used by Sutter and Jurs [46] from solubility data for 140 organic compounds presenting diverse structures, which were divided into a LS, a CVS, and an ETS of 116, 11, and 13 chemicals, respectively. Chemicals were described by means of 144 descriptors encoding topological and/or physicochemical properties. This pool of descriptors was reduced to nine that were used for deriving a regression model from a LS of 127 (116+11) chemicals. An RMS error of 0.321 log units was found. However, four chemicals were detected as outlier, and their removal from the regression model allowed to obtain an RMS error of 0.277 log units. A 9/3/1 TLP including the nine descriptors as input neurons was then derived. It gave RMS errors of 0.217, 0.282, and 0.222 log units for the LS (n​=​ 112), CVS (n​=​ 11), and ETS (n​=​ 13), respectively. It is noteworthy that another 9/3/1 TLP model was computed by Sutter and Jurs [46] after exclusion of the polychlorinated biphenyls (PCBs). In that case, RMS error values of 0.145, 0.151, and 0.166 log units were obtained for the LS (n​=​ 94), CVS (n​=​ 13), and ETS (n​=​ 13), respectively. Other TLP models for predicting the aqueous solubility of chemicals were proposed by Mitchell and Jurs [47], McElroy and Jurs [48], and Huuskonen et al. [49] from databases of limited sizes. Yaffe and coworkers [50] used a heterogeneous set of 515 organic compounds with their solubility data for comparing the performances of a TLP and fuzzy ARTMAP ANNs. The first ANN model derived from a large diverse set of aqueous solubility data was proposed by Huuskonen [51]. A database of 1,297 chemicals with their aqueous solubility values was split into a TS and an ETS of 884 and 413 chemicals, respectively. Another testing set (ETS+) of 21 chemicals was also considered. All the chemicals were encoded from the 30 following topological indices: 24 atom-type electrotopological state indices [33], path 1 simple and valence connectivity indices [44], flexibility index, the number of H-bond acceptors, and indicators of aromaticity and for aliphatic hydrocarbons. A 30/12/1 TLP trained by the backpropagation algorithm yielded standard deviation values of 0.47, 0.60, and 0.63 for the LS, ETS, and ETS+, respectively. A regression analysis performed under the same conditions gave standard deviation values of 0.67, 0.71, and 0.88 for the LS, ETS, and ETS+, respectively [51]. Liu and So [52] tried to derive an ANN with fewer connections but presenting similar performances by using a LS and an ETS of 1,033 and 258 chemicals, respectively. A 7/2/1 TLP with the 1-octanol/walter partition coefficient (log P), topological polar surface area (TPSA), molecular weight, and four topological indices as input neurons gave standard deviation values of 0.70 and 0.71 in log units for the LS and ETS, respectively. An interesting hybrid model was proposed by Hansen and coworkers [53] developed for the prediction of pH-dependent aqueous solubility of chemicals. It used a TLP ANN trained from 4,548 solubility values and a commercial software tool for estimating the acid/base dissociation coefficients.

It is important to note that the aqueous solubility estimations obtained from QSPR models have to be used with caution. Thus, for example, QSPR models generally calculate solubility in pure water at 25°C while it is well-known that the varying temperatures found in the environment change the solubility of chemicals. The degree of salinity of the aquatic ecosystems also influences the solubility of the chemicals in these media.

3.4 Henry’s Law Constant

The Henry’s law constant (Hc) of a chemical is defined as the ratio of its concentration in air to its concentration in water when these two phases are in contact and equilibrium distribution of the chemical is achieved [25]. Hc is of first importance for assessing the environmental distribution of chemicals. The different methods allowing to calculate this parameter have been reviewed by Dearden and Schüürmann [54]. Among them, two studies deal with the use of ANNs for modeling the Hc of chemicals at 25°C. A database of 357 organic chemicals with their log H values ranged from−7.08 to 2.32 was used by English and Carroll [55] for deriving their ANN models. Chemicals were described by 29 descriptors including topological indices, physicochemical properties, and atomic and group contributions. The best results were obtained in 3,000 cycles with a 10/3/1 TLP. The standard errors for the LS (n=261), CVS (n=42), and ETS (n=54) were equal to 0.202, 0.157, and 0.237 log units, respectively. Comparatively, the standard errors obtained with a regression analysis, performed according to the same conditions, were 0.262 and 0.285 log units for the LS (n=303) and ETS (n=54), respectively.

Experimental Hc at 25°C for a diverse set of 495 chemicals were collected by Yaffe et al. [56]. The log H values ranged from−6.72 to 2.87. Six physicochemical descriptors (heat of formation, dipole moments, ionization potential, average polarizability) and the second-order valence molecular connectivity index were used as input parameters for a fuzzy ARTMAP ANN and a TLP ANN trained by the backpropagation algorithm. The average absolute error values obtained with the fuzzy ARTMAP ANN were 0.01 and 0.13 for the LS (n=421) and ETS (n=74). The selected 7/17/1 TLP yielded average absolute error values of 0.29, 0.28, and 0.27 for the LS (n=331), validation set (n=421) and ETS (n=74).

3.5 Octanol/Water Partition Coefficient

In 1872, Berthelot [57] undertook the study of partitioning as a purely physicochemical phenomenon. He was the first to collect the evidence proving that the ratio of the concentrations of small solutes when distributed between water and an immiscible solvent (e.g., ether) remained constant even when the solvent ratios varied widely [58]. In 1891, Nernst [59] put this type of equilibrium on a firmer thermodynamic basis. About a decade later, Meyer [60] and Overton [61], who showed that the narcotic action of simple chemicals was reflected rather closely by their oil–water partition coefficients, initiated the use of this physicochemical property for deriving structure–activity relationships. In the first part of the twentieth century, many different organic solvent/water systems were tested to derive structure–activity relationships. However, in 1962–1964, the 1-octanol was adopted as solvent of choice after the pioneering works of Hansch and coworkers in quantitative structure–activity relationships (QSARs) [62, 63] demonstrating that the 1-octanol/water partition coefficient (Kow) could provide a rationalization for the interaction of organic chemicals with living organisms or for biological processes occurring in organisms [64]. Kow is simply defined as the ratio of a chemical’s concentration in the octanol phase to its concentration in the aqueous phase of a two-phase octanol/water system. Values of Kow are thus unitless and are expressed in a logarithmic form (i.e., log Kow or log P) when used in pharmaceutical and environmental modeling. There are numerous methods available for the experimental measurement of log P as well as for its estimation from contribution methods or from linear and nonlinear QSPRs [23, 24, 58, 64, 65]. Different ANN models for log P have been derived from a limited number of chemicals (see e.g., [6668]). A database of 1,870 log P values for structurally diverse chemicals was used by Huuskonen and coworkers [69] for deriving a log P model based on atom-type electrotopological state indices [33] and a TLP. It was split into a LS and an ETS of 1,754 and 116 molecules, respectively. The best configuration included the molecular weight and 38 electrotopological state indices as input neurons, five hidden neurons, and bias neurons. Averaged results of 200 ANN simulations were used to calculate the final outputs. With this strategy, RMS (LOO) values of 0.46 and 0.41 were obtained for the LS and ETS, respectively. This model was further refined from an extended LS and is now called ALOGPS [70, 71]. A log P model was designed by Devillers and coworkers [7274] from a TLP trained by the backpropagation algorithm using 7,200 log P values for the learning process. Experimental log P values were retrieved from original publications or unpublished results. The log P values of the LS ranged between−3.7 and 9.95 with a mean of 2.13 and a standard deviation of 1.65. Molecules were described by means of autocorrelation descriptors [75, 76] encoding lipophilicity (H) defined according to Rekker and Mannhold [65], molecular refractivity (MR), and H-bonding donor (HBD) and H-bonding acceptor (HBA) abilities. Prior to calculations, data were scaled with a classical min/max equation. The optimal architecture and set of parameters for the neural network model were determined by means of a trial and error procedure. The different training exercises were monitored with a validation set of 200 molecules presenting a high structural diversity but not deviating too much from the chemical structures included in the training set. This procedure showed that a neural network model with 35 input neurons (i.e., H0 to H14, MR0 to MR14, HBA0 to HBA3, and HBD0) was necessary to correctly describe the molecules and model the 7,200 experimental log P values. The hidden layer consisted of 32 neurons. It was found that a learning rate of 0.5 and a momentum term of 0.9 always gave good neural network generalization within ca. 5,500 cycles. A composite network constituted of four configurations was selected as final model \((\mathrm{RMS} = 0.37, r = 0.97)\) because it allowed to obtain the best simulation results on an ETS of 519 chemicals \((\mathrm{RMS} = 0.39, r = 0.98)\). It has been shown that this model competed favorably with other log P models [77, 78] and was particularly suited for estimating the log P values of pesticides [79]. It is noteworthy that a commercial version of this model called AUTOLOGPTM is available [80, 81].

3.6 Degradation Parameters

Biodegradation is an important mechanism for eliminating xenobiotics by biotransforming them into simple organic and inorganic products. Two types of biodegradation can be distinguished. The primary biodegradation denotes a simple transformation not leading to a complete mineralization. The biodegradation products are specifically measured from chromatographic methods, and the results are expressed by means of kinetic parameters such as biodegradation rate constant (k) and half-life (T 1∕2). The ultimate (or total) biodegradation totally converts chemicals into simple molecules such as CO2 and H2O. Biodegradation tests are time consuming, expensive, and their results are difficult to interpret because they depend on numerous parameters linked to the experimental conditions such as the nature and concentration of the inoculum, cultivation, and adaptation of the microbial culture, concentration of the test substance [8284]. Because ANNs are particularly suited for modeling noisy data, they have been successfully used to model biodegradation processes [85]. Thus, for example, 47 molecules presenting a high degree of heterogeneity were described in a qualitative way for their biodegradability (i.e., 0=weak,1=high) from a survey made by 22 experts in microbial degradation [86]. They were encoded from 11 Boolean descriptors representing structural features associated with persistent or degradable chemicals. These descriptors are listed in Table 2. A TLP trained by the backpropagation algorithm was used as statistical engine to find a relationship between the structure of the molecules and their biodegradation potential. The learning phase yielded 100% of good classification (i.e., 47/47) with a 11/4/1 ANN in 500 cycles. The predictive power of this model was estimated from two ETSs. With the former ETS, 78% of good classifications (i.e., 18/23) were obtained while with the latter, 94% (i.e., 16/17) of the chemicals were correctly classified. The use of Boolean descriptors as input neurons in a TLP especially for modeling a complex property can induce problems of overfitting. To avoid this drawback without losing the interest of fragment descriptors, the usefulness of correspondence factor analysis [87] for reducing the dimensionality of a data matrix was tested. Thus, a CFA was used to scale the 47 ×11 Boolean matrix and the CFA factors were directly introduced as inputs in the ANN. Same results were obtained also in 500 cycles with only the first seven factors (87.9% of the total inertia). It is noteworthy that an intercommunicating hybrid system including this ANN model and a genetic algorithm [88] was then constructed for designing molecules with specific biodegradability characteristics [89].

Table 2 Boolean descriptorsa used as input neurons in a TLP designed for predicting the biodegradability of chemicals

TLPs with structural descriptors [90, 91] or autocorrelation descriptors [92] were used for modeling the biodegradability of other sets of aliphatic and aromatic chemicals. The field half-lives of 110 pesticides were modeled using a TLP trained by the backpropagation algorithm [93]. Because periodicities in agricultural calendars are measured in days, weeks, and months (i.e., seasons), the field half-lives (T 1∕2) of pesticides were divided into the three following classes: Class 1 (encoded 100 in the ANN output) contained pesticides with T 1∕2​≤​ 10 days, class 2 (encoded 010) included pesticides with 10 days<T 1∕2≤30 days, and class 3 (encoded 001) included pesticides with 30 days<T 1∕2≤90 days. Molecules were described by means of the frequency of 17 structural fragments. Different scaling transformations were tested but the best results were obtained with a CFA, which also allowed a reduction of the dimensionality of the descriptor matrix. The optimal results were obtained by using the first 12 factors (95.8% of the total inertia) as input neurons and seven neurons for the hidden layer. With this configuration, 95.5% of correct classifications were obtained with the LS. The performances of the selected ANN model were tested from an ETS of 13 pesticides representing the three classes of field half-lives. The testing phase with CFA gave 84.6% of correction predictions. A discriminant factor analysis at three classes was performed for comparison purposes. In that case, 60% and 53.8% of good classifications were obtained for the LS and ETS, respectively [93].

4 Use of ANNs in Quantitative Structure–Activity Relationship (QSAR) Modeling

The knowledge about systematic relationships between the structure of chemicals and their biological activity dates back to the prime infancy of the modern pharmacology and toxicology. Thus, for example, Cros [94] stressed, in the last page of his thesis published in 1863, an empirical relationship between the number of carbon and hydrogen atoms in a series of alcohols and their solubility in water and toxicity. Until about the middle of the twentieth century, most of these structure–activity relationships were only qualitative. The dramatic change resulted from the systematic use, in the early 1960s, of linear regression analysis for correlating biological activities of congeneric series of molecules with their physicochemical properties or some of their structural features encoded by means of Boolean descriptors (i.e., 0/1). These contributions started the development of two QSAR methodologies later termed Hansch analysis [62, 63] and Free-Wilson analysis [95], respectively.

Nowadays, regression analysis remains the most widely used statistical tool for deriving QSARs, even if most of the basic statistical assumptions for its correct use are often not satisfied with numerous data sets [96]. In addition, the choice of regression analysis can also be annoying because a postulate is made that only linear relationships exist between the variables involved in the modeling process, while generally it is not true. Since about one decade, ANNs have become the focus of much attention in QSAR to find complex relationships between the structure of molecules and their toxicity. These models have been derived on various organisms such as the marine luminescent bacterium Vibrio fischeri (formerly known as Photobacterium phosphoreum) [97, 98], the freshwater protozoan Tetrahymena pyriformis [99110], the waterflea Daphnia magna [111], the freshwater amphipod Gammarus fasciatus [112], the midge Chironomus riparius [113], the fathead minnow Pimephales promelas [114122], the rainbow trout Oncorhynchus mykiss [123], the bluegill Lepomis macrochirus [124], and the honey bee Apis mellifera [125, 126]. All these models were recently analyzed [127]. Consequently, only the main characteristics of some of them are presented in Table 3.

Table 3 Selected ANN QSAR models derived from noncongeneric data sets

It is interesting to note that due to their high flexibility and their ability to find complex relationships between variables, ANNs can be used to derive QSARs from sets of variables encoding, as usual, the structure and physicochemical properties of the molecules but also the experimental conditions in which the different tests are performed such as the time of exposure [98] or the temperature, pH, hardness of the medium, and size of the organisms [112, 123, 124]. In the same way, due to their pure nonlinear nature, ANNs can be used in synergy with another statistical tool, especially regression analysis. Devillers [122] showed that this kind of modeling approach was particularly interesting in the common situation in which the toxicity of molecules mainly depended on their log P. In that case, in a first step, a classical regression equation with log P is derived. The residuals obtained with this simple linear equation are then modeled from a TLP including different molecular descriptors as input neurons. Finally, results produced by the linear and nonlinear QSAR models are both considered for calculating the toxicity values, which are then compared with the initial toxicity data.

5 Use of ANNs for Modeling Environmental Contaminations

5.1 Air Pollution

There is a large body of evidence suggesting that exposure to air pollution, even at the levels commonly achieved nowadays in the industrial countries, leads to adverse health effects. In particular, exposure to pollutants such as particulate matter and ozone has been found to be associated with increases in hospital admissions for cardiovascular and respiratory diseases and to the incidence of cancers [128]. Air pollution not only affects the quality of the air we breathe, but it also directly and indirectly impacts the biotopes and the biocenoses constituting the aquatic and terrestrial ecosystems. For the evaluation of air pollution events in a particular geographical area, it is crucial to have a powerful mapping technique allowing to perform typologies, compare sampling sites, and so on. The Kohonen self-organizing map (KSOM) [16] is particularly suited to perform these tasks. Thus, for example, Ferré-Huguet and coworkers [129] used a KSOM to assess the environmental impact and human health risks of polychlorinated dibenzo-p-dioxins and dibenzofurans in the vicinity of a new hazardous waste incinerator in Spain 4 years after regular operation of the facility. More specifically, KSOM, which was a 48 (8 ×6) rectangular grid, was applied to soil and herbage samples to establish pattern similarities among the samples as well as to identify hot spots near the plant. Lee and coworkers [130] used a KSOM of 150 (15 ×10) output neurons to examine the influence of urbanization on the assembly patterns of 52 breeding birds in 367 sites.

Undoubtedly KSOM offers an interesting tool for data compression of p multivariate samples defined in an n-dimensional space into v clusters (loaded neurons). This data reduction to a few clusters provides an optimal data structure display. However, in KSOM, the problem is that information about the correct distance between the neurons disappears during the projection onto the 1, 2, or 3D array of nodes. To overcome this problem, a minimum spanning tree (MST) [131] can be calculated between the loaded neurons of a trained KSOM to visualize the shortest distances between them. The hybridization of the KSOM and MST algorithms constitutes the basis of the 3MAP algorithm designed and used by Wienke for locating fine airborne particle sources [132, 133135]. It is noteworthy that because there remains information not represented, about the correct distances between all the loaded neurons, a nonlinear mapping (NLM) [136] performed on these loaded neurons can be used to visualize all the distances separating them. The hybridization of the KSOM, MST, and NLM algorithms constitutes the basis of the N2M algorithm [137, 138] (Fig. 5). A rather similar hybridization approach in combination with a multilayer perceptron (MLP) was used by Kolehmainen and coworkers [139] to forecast urban air quality. Hourly airborne pollutant and meteorological averages collected during the years 1995–1997 were analyzed to identify air quality episodes having typical and the most probable combinations of air pollutants and meteorological variables. This modeling was performed from KSOM, NLM, and fuzzy distance metrics. Several overlapping MLPs were then applied to the clustered data, each representing a pollution episode.

Fig. 5
figure 5_1

N2M algorithm flow diagram (adapted from [140])

KSOM is not the unique ANN clustering technique that was used to visualize air pollution events. Thus, Owega and coworkers [140] used cluster analysis and an adaptive resonance theory (ART-2a) [141] ANN to classify back trajectories of air masses arriving in Toronto (Canada) into distinct transport patterns. Spencer and coworkers [142] also used an ART-2a ANN to analyze ambient aerosol particles in Riverside (California).

Numerous MLPs have been used alone or in combination or in competition with other statistical approaches for estimating various atmospheric pollution events. Some examples are given in Table 4 [143150].

Table 4 Examples of MLP models designed for estimating atmospheric pollution events

5.2 Aquatic Contaminations

The worldwide environmental problem of eutrophication in lenthic ecosystems is caused by an unbalanced increase in the nutrient inflow due to the human activities. Indeed, when the nutrient concentration increases under high-temperature conditions in a lake during the summertime, certain microalgae can overgrow yielding the production of blooms, which can cause water discolorations, mortality in fish and invertebrates as well as in humans because of the production of harmful toxins [166]. It is obvious that these deleterious effects could be prevented or at least minimized if the algal blooms could be predicted in an early stage. Different ANNs have been used to reach this goal. Thus, Recknagel and coworkers [167] used a TLP trained by the backpropagation algorithm for modeling algal bloom in three lakes and a river. The lakes, located in Japan and Finland, were of different characteristics including a variety of nutrient levels, light and temperature conditions, depth and water retention time. The river was located in Australia. Four different ANNs were computed. Different parameters such as concentration in nitrate, water temperature, concentration in chlorophyll a, and concentration in dissolved oxygen were used as input neurons. The dominating algal species (in number of cells/mL or mg/L for the Finnish lake) were considered as output neurons. One or two hidden layers having a maximum of 20 neurons per layer were used to distribute the information within the networks. The ANNs were trained for 500,000 cycles with measured input and output data from 6 to 10 years. For the validation of model predictions, data of 2 independent years were used for each ANN model. More realistic and optimized models were proposed by Lee and coworkers [168] for predicting the algal bloom dynamics for two bays in the eutrophic coastal waters of Hong Kong. A TLP was also used as statistical engine. Biweekly water quality data were tested as input neurons. Concentration in chlorophyll-a or cell concentration of Skeletonema were used as output neurons in each ANN model. Data collected in different years were used to train (3,000 cycles) and test the two ANN models. Different combinations of parameters were tested as inputs but in both cases, the best results were obtained by only using the time-lagged chlorophyll-a or log (Skeletonema (cells/l) as input neurons. This work clearly suggested that the algal concentration in the eutrophic subtropical coastal waters was mainly dependent on the antecedent algal concentrations in the previous 1–2 weeks.

Oh and coworkers [169] used a KSOM for patterning algal communities and then a TLP for identifying important factors causing algal blooms in Daechung reservoir (Korea). Thirty-nine samples were used for KSOM analysis. The patterns of the sample communities were investigated on the basis of community abundance data (Cyanophyceae, Chlorophyceae, Bacillariophyceae, and others) in percentages for 1999 and 2003. The best arrangement of the output layer of 24 (6 ×4) neurons was a hexagonal lattice. Interestingly, a hierarchical cluster analysis, based on Ward algorithm and using the Euclidean distance, was performed on the KSOM units. Analysis of the results showed that the clustering was based on the phytoplankton communities and sampling time. A TLP was used to predict the chlorophyll-a concentration and abundance of Cyanophyceae from environmental factors including the total nitrogen, total dissolved nitrogen, total particulate nitrogen, total phosphorus, total dissolved phosphorus, total particulate phosphorus, temperature, DO, pH, conductivity, turbidity, Secchi depth, precipitation, and daily irradiance. Data were collected from 54 samples over 3 years. Gradient descent optimization was used for error reduction. The best models for chlorophyll-a concentration and abundance of Cyanophyceae were 14/3/1 and 14/6/1 TLPs. The predictive performances of the models were not estimated from an ETS. Conversely, a sensitivity analysis was performed to determine the most influential variables. Results showed that they were different for the two TLP ANNs.

Lenthic and lotic ecosystems are also contaminated by numerous xenobiotics resulting from agricultural and industrial activities. Thus, pesticides are used to control weeds, insects, and other organisms in a wide variety of agricultural and nonagricultural settings yielding their release into the environment including the aquatic compartment. Among the collection of models available for predicting the environmental fate and effects of pesticides, some of them are based on nonlinear methods, especially the ANNs. Thus, for example, Kim and coworkers [170] coupled wavelet analysis and a TLP trained by the backpropagation algorithm for modeling the movement behavior of Chironomus samoensis larvae in response to treatments of carbofuran at 0.1mg/L in seminatural conditions. Various ANN paradigms have been also used for modeling the contamination of groundwater by pesticides and other anthropic pollutants [171176].

Samecka-Cymerman and coworkers [177] used a KSOM to perform a typology of three species of aquatic bryophytes (Fontinalis antipyretica, Platyhypnidium riparioides, Scapania undulata) according to their concentration in Al, Be, Ca, Cd, Co, Cr, Cu, Fe, K, Mg, Mn, Ni, Pb, and Zn. The sampling sites were divided into three groups depending on the type of rock basement of the stream. Sampling sites in group one consisted of granites and gneisses (n=21), those in group two of sandstones (n=5), and those in group three of limestones and dolomites (n=26). The output layer of 5 ×5 neurons visualized by hexagonal cells showed that the bryophytes were clustered according to their sampling origin. There was no difference between the bryophytes from the three types of rock in terms of concentrations in Be, Fe, K, Co, and Cu. Conversely, bryophytes growing in streams flowing through granites/gneisses contained significantly higher concentrations of Cd and Pb, while bryophytes from streams flowing through sandstones contained significantly higher concentrations of Cr. Bryophytes from group three were characterized by high concentrations in Ca and Mg. These results were confirmed from a PCA.

Last, it is noteworthy that ANNs have been used in the areas of wastewater treatment and analyses [178180].

5.3 Soil and Sediment Contaminations

Soils and sediments can be contaminated by various pollutants released into the environment from a number of anthropogenic sources. ANNs have shown their interest for characterizing and/or quantifying these contaminations. Thus, for example, in Winter 2002, 24 soil and 12 wild chard (Beta vulgaris) samples were collected by Nadal et al. [181] in Tarragona County (Catalonia, Spain). Soil sampling points were chosen as follows: 15 in the industrial complex (8 in the vicinity of chemical industries and 7 near petroleum refineries), 5 in Tarragona downtown and its residential area, and 4 in presumably unpolluted zones. The number of wild chard samples collected from industrial, residential, and unpolluted areas were 6, 3, and 3, respectively. The samples were analyzed for their concentrations in As, Cd, Cr, Hg, Mn, Pb, and V. In chard samples, significant differences between areas were only found for vanadium (V). Regarding the soil samples, the differences and concentrations between the three zones were higher. A KSOM was successfully used to perform their typology according their differences in metal concentrations. The same type of methodology based on KSOM was applied by Arias and coworkers [182] for evaluating the pollution level in Cu, Mn, Ni, Cr, Pb, and Zn of the sediments dredged from the dry dock of a former shipyard in the Bilbao estuary (Bizkaia, Spain). KSOM was compared with different cluster analysis algorithms to classify 407 samples of various origins contaminated by polychlorinated dibenzodioxins and polychlorinated dibenzofurans [183].

Other ANN paradigms were used to model soil and sediment contaminations. Thus, for example, Kanevski [184] tested the usefulness of general regression ANNs, based on kernel statistical estimators for predicting the soil contamination in Cs137 in Western part of Briansk region following Chernobyl accident.

6 Conclusion

On the basis of a computing model similar to the underlying structure of a mammalian brain, ANNs share the brain’s ability to learn or adapt in responses to external inputs. When exposed to a stream of training data, they can uncover previously unknown relationships and learn complex mappings in the data. Under these conditions, ANNs provide interesting alternatives to well-established linear methods commonly used in ecotoxicology modeling. In this chapter, different ANN models computed for predicting the environmental fate and effects of chemicals are presented. Our goal was not to catalog all the models in the field but only to show the diversity of the situations in which these nonlinear tools have proved their interest. Their correct use requires to have some practical experience for architecture and parameter setting as well as to interpret the modeling results. They also need to respect some rules dealing with the size of the data sets, the constitution of learning and testing sets, and so on. Despite these limitations, it is obvious that their use in ecotoxicology modeling will continue to grow, especially in combination with other linear and nonlinear statistical methods to create powerful hybrid systems.