Introduction

Birch is an anemophilous tree with high pollen production (Moore and Webb 1978; Lewis et al. 1983), whose allergenic capacity has been cited by numerous authors (Spieksma 1990; Norris-Hill and Emberlin 1991; D’Amato and Spieksma 1992). Its pollen is considered to be the main cause of pollinosis in North and Central Europe (Wihl et al. 1998; Spieksma et al. 1995), not only during its pollen season but also during previous and subsequent periods, since birch pollen can easily be transported over long distances (Wallin et al. 1991; Hjelmroos 1991). In such cases, the antigenic activity seems to be linked to allergens, characteristic of birch pollen grains, deposited on dust particles inside houses which can trigger the onset of allergic processes, even up to 2 months after maximum pollen concentrations in the air were recorded (Ekebom et al. 1996; Rantio-Lehtimäki et al. 1996). The prevalence of birch pollen allergy reaches 13 to 60% in populations affected by pollinosis in some localities in Galicia, Spain (Arenas et al. 1996; Aira et al. 2001), and 19% in Santiago de Compostela (Dopazo 2001).

Several researchers have carried out aeropalynological studies on this taxon, in order to construct a model of the seasonal and daily behaviour of birch pollen and to ascertain the influence of different meteorological parameters on pollen concentration (Spieksma et al. 1989; Atkinson and Larsson 1990; Norris-Hill and Emberlin 1991; Spieksma et al. 1995; Aira et al. 1998; Jato et al. 2000; Latalowa et al. 2002). In this way, models can be established in order to predict both the beginning and the severity of the pollen season. Different factors were used as predictors of the start of pollen season in these different models. The sum of the temperatures from a given date was utilized by Clot (2001), Caramiello et al. (1994), Ruffaldi and Greffier (1991). In other works phenological factors as chilling units and growing degree days have been used as predictors (Andersen 1991). Larsson (1993) used the method of accumulative activity and Laaidi (2001) both the sum of temperatures and a multiple regression model. Different works have been conducted with the aim of producing models to predict the mean pollen concentration 1 day in advance by using linear regression (Rodríguez-Rajo 2000; Méndez 2000) or time series (Moseholm et al. 1987). However, there are no studies which aim to establish the risk of allergy being caused by a given quantity of pollen grains in the air.

The symptoms of Betula pollinosis can be provoked by different levels of Betula pollen in the air, depending on individual behaviour differences. Nevertheless, several threshold values have been mentioned as limiting for the provocation of symptoms. Ninety percent of clinically sensitive subjects showed symptoms when 80 pollen grains/m3 was reached and the onset of severe symptoms was recorded with concentrations higher than 30 pollen grains/m3 (Negrini et al. 1992). Corsico (1993) considered the same level as the threshold for the onset of allergic symptoms. Birch pollen is very abundant in the air of Santiago de Compostela during March and April and concentrations higher than 100 pollen grains/m3 are frequent. The daily maximum levels are registered in the afternoon—between 12 h and 18 h—and they are coincident with the highest frequency of allergy symptoms (Dopazo 2001)

Birch is represented in Galicia by one species, Betula alba L. (Moreno 1990). The former is widely distributed in this area and as the dominant tree it forms altimontane oro-Cantabrian acidophilic forests, with a clearly Euro-Siberian distribution. They are found above an altitude of 1,150 m, being the last tree formations of the altitudinal sequence, with montane thermo-climates and hyper-humid ombro-climates (Izco 1994). Their limit, although fairly controversial, is situated in the Galician mountain ranges of Ancares and Caurel (Costa et al. 1990). In this same area, but on siliceous soils and with a greater Mediterranean influence, there are birch forests in the Galician-Portuguese altimontane layer and in the Ourense-Sanabrian supra-Mediterranean layer. In Galicia’s mountains and foothills non-climatic birches may be found, in substitution for montane oak groves, which are located on acidic soils and with altitudinal limits between 600 and 1,100 m. In the Euro-Siberian region, birch may form part of riparian forests, along with Alnus glutinosa, Salix atrocinera and Frangula alnus. There are Betula as ornamental trees near the spore-trap, and this is one of the reasons why the concentrations of Betula pollen in Santiago are among the highest in Galicia.

The aim of this research is to predict the days of high allergenic risk during Betula pollination, using artificial neural networks, in order to alert allergists and the population with allergy problems to a potential risk situation. Artificial neural networks (ANNs) are a complete statistical tool for data analysis (Bishop 1995). The ANN’s origin dates back to the middle of the last century when an interdisciplinary group of biologists, psychologists and engineers interested in understanding the functioning of the human brain was created (Rosenblatt 1958). ANNs try to artificially reproduce the human ability to take decisions by simulating the human brain’s basic unit, the neuron, and the interconnections between neurons that allow them to work together and save information.

Recently, ANNs have been extended successfully to very different fields, from hydrology to finances. Neural networks was been also used in aerobiological studies, to achieve predictive models for the improvement of daily pollen concentration forecasts (Ranzi 2000; Hidalgo et al. 2002; Sánchez-Mesa et al. 2002)

Materials and methods

The study was conducted in the city of Santiago de Compostela, located in northwest Spain (Fig. 1). Pollen monitoring was carried out from 1993 to 2001 by means of a 7-day volumetric air sampler (Lanzoni VPPS 2000) situated approximately 25 m above the ground level. The methodology recommended by the Spanish Aerobiological Network (REA) was used to process and interpret the samples (Domínguez 1995).

Fig. 1
figure 1

Location of Santiago de Compostela in Europe

Three data series were considered (Chakraborty 1992). The daily pollen, pollen t , expressed as grains/m3, and two exogenous meteorological series, the daily rainfall, DR t , expressed as l/m2, and daily mean temperature, DMT t , expressed as °C.

The aim of this work was not forecast the pollen concentration but the level of allergenic risk. Given a level of pollen, lev, a new binary variable Y t can be defined. This variable takes the value 1 if pollen t is over the quantity lev and 0 otherwise. The selected levels were 20, 30, 70 and 80 grains/m3. For levels 20 and 30 the variable Y t measures the risk of the onset of allergic symptoms, and for levels 70 and 80 Y t measures the risk of severe symptoms for 90% of the most allergic population. The dependent or target variables are the Y t values associated with each level.

The selected independent variables were the previous day’s rainfall, DR t −1, the previous day’s mean temperature, DMTt−1, and the previous day’s pollen concentration, pollen t −1.

Artificial neural networks

The statistical method used for this study and to forecast of the risk level associated with Betula pollen is the artificial neural network technique (Ripley 1996). ANNs are “spoken data” methods, i.e., the structure of an ANN, and thus the relationship between the input and the output, depends on a historical observations set, named the training set, used for network ‘learning’. The training set is a data collection related to past situations and associated with them, the neural network correct answer or a variable closely related to the unknown correct answer.

During the training phase the ANN will ‘learn’ the underlying relationship between the inputs and outputs by means of a learning algorithm that compares the networks outputs with the real outputs. The learning algorithm used was thebackpropagation algorithm. (Rumelhart et al. 1986; Chauvin et al. 1995). After the training phase, when the ANN works with a new situation, it will behave in line with the learning set. So ANNs become interesting in unknown or complex structure data source phenomenon forecasting. Pollen dispersion is a very complex problem that involves a large amount of meteorological (wind direction, wind velocity, rainfall...), ecological (forest situation and selected species concentration in the vicinity of the prediction location...), and topographic (hills, valleys, rivers, towns, exact location) information, which is not always available. ANNs became a useful tool because it is not necessary to determine all of these characteristics. Instead, a general structure ANN and a diverse and extensive data set for training can be used. The performance of ANN general models are based on their universal approximation proprieties (Cybenko 1989; Hornik et al. 1989; Park and Sandberg 1991)

As the relationship between input variables and output or target variables, in this case meteorological or ecological features of the area, may change over time, it is useful to re-train the ANN periodically, increasing the training set with new data that reflect the changes in the variables with time.

One of the most popular ANN architectures is the multilayer perceptron (MLP) (Rosenblatt 1958). In an MLP, the nodes or basic unit of information processors, are distributed in layers. Only the nodes of consecutive layers can be connected. The first layer is named the input layer; the last is named the output layer, while the layers between them are known as hidden layers. An MLP with one hidden layer, represented in Fig. 2, has been considered.

Fig. 2
figure 2

Diagram of a multilayer perceptron with one hidden layer (N i -N h -No) with N i input variables X i , N h nodes in the hidden layer with processed information h j , N o predictions or output o k , No target variables y k and the ANN’s weights between connections w

ANN for binary response data

The target variable Y t is a binary variable, i.e., it takes only two values, 1 or 0. We have focused on a special family of ANNs, with sigmoidal output nodes, appropriate for binary target variable data processing.

For prediction problems, the aim is to approximate the expected value of the target variable, conditioned by the independent variables. For binary target variables this expectation is the probability of Y t taking the value 1, conditioned by the input variable values (Goldberger 1973; Agresti 1990). This conditional probability can be considered as a unknown function of the independent variables that takes values from 0 to 1 (McCullagh and Nelder 1989). Estimating this function is the aim of the ANN. Given a probability predictor a family of classifiers \( F = \{ C_{p} /p \in [0,1]\} \)can be constructed. Each one of these classifiers, determined by a given p, allows the building of a binary variable from the probability predictor by means of the following procedure. If the predicted probability is less than p the predicted value for Y t will be 0, and otherwise will be 1. In order to obtain a Y t estimation, \( \hat{Y}_{t} \), we must select a classifier from this family, choosing a threshold p. The selected p was 0.5.

Using the ANN probability prediction, the forecast of Y t may be obtained as follows: if the predicted probability is less than 0.5, the predicted value for Y t will be 0, and otherwise will be 1.

Error function for the binary target variable

The density function estimation, as well as the probability estimation, are unsupervised learning problems. The real probabilities, as well as the real density, are not available; instead there is a binary variable, Y, with some kind of information about the probability value.

In the introduction it was explained that network learning is based on a training algorithm. This kind of algorithm compares the model output with the real target variable values and modifies the network parameters in order to minimize the differences between them. These differences are measured using an error function. Selecting the appropriate error function for the data is essential for training the network successfully. In binary dependent variable problems, during the training phase, the training algorithm will compare the binary target variable with the continous network output, and the estimation of the probability of the target variable takes the value 1.

When working with binary targets, the usual error function is the deviance (Hastie 1987), dev (Table 1), that somehow measures the credibility of the probability estimation considering the binary variable value.

Table 1 Equations

As we have explained, a probability estimation provides us with a binary variable \( \hat{Y} \), an estimation of the target binary variable Y. This is a two-class classification problem. So the total misclassification probability, mcp, can be considered as the error function; the mcp is the probability of having a target variable valued at 0 and the estimation \( \hat{Y} \) valued at 1, MCI, added to the probability of having a target variable valued at 1 and the estimation \( \hat{Y} \) valued 0, MCII (Table 1).

We can consider separately the type I error, errorI, and the type II error, errorII (Table 1). The type I error is the probability of estimation \( \hat{Y} \) being 0 conditioned to the target variable Y taking a value of 1; in our problem this is the probability of predicting a below-threshold level day on a over-threshold level day (false security). The type II error, the probability of estimation of \( \hat{Y} \) to be 1, conditioned to value 0 of the target variable Y , that is, the probability of predicting a over-threshold level day on a below-threshold level day (false alarm). Balancing both errors is necessary. Usually one of the errors is fixed, and the classifier that minimizes the other error is selected. All these error measures are estimated using the empirical probability formula (Table 1) over a collection of observations, F. Several equations will involve a function, namely the cardinal, card. The cardinal of a set A is the number of elements that belong to the set A.

In many real problems, e.g., ecological, epidemiological or medical problems, both error types, errorI and errorII, do not have the same importance and one of them must be penalized. One reason may be the different proportion of cases with a target variable valued at 0. In many problems, the economic or health consequences of a false negative are very different from the consequences of a false positive, so it is necessary to penalize separately both errors.

The deviance does not distinguish between both kind of errors, so it leads the ANN performance to reach an equal balance between both empirical errors. In this problem, the proportion of 1-valued days is very small compared to the proportion of 0-valued days, so the deviance is not the best choice of error function so another loss function must be considered. We have defined the follow error function: \( {\text{error}}_{{K_{1} \cdot K_{2} }} (o,Y) = K_{1} \cdot Y \cdot (1 - o) + K_{2} \cdot (1 - Y) \cdot o,\;{\text{with}}\,K_{1} ,K_{2} \geqslant 0 \), with o representing the ANN output, i.e., the estimated probability, and Y the binary target variable

This function penalizes both errors separately. The constants K1 and K2 will decode which error is more serious. During the training process this function can be minimized or equally: \( {\text{error}}_{K} (o,Y) = K \cdot Y \cdot (1 - o) + (1 - Y) \cdot o,\;{\text{with}}\,K \geqslant 0 \)

The constant K value determines the penalization. If K>1 the class 1 observation misclassification will be penalized; on the other hand if K<1 the class 1 observation misclassification will be penalized; finally if K=1 both misclassifications will be considered as equally serious.

Results and discussion

Four levels of pollen concentration were considered, lev = 20, 30, 70, 80 g/m3. We have generated four artificial neural networks, one for each alert level. Explicit expressions of the four neural networks are showed in Tables 2 and 3.

Table 2 Artificial neural network equations. Matrix of parameters
Table 3 ANN selected matrix of parameters

The period between 2 January 1993 and 11 March 2000, is the data set used to train the neural networks. In order to evaluate the performance of the ANNs before a new situation, we have considered a validation set that started on 12 March 2000 and ended on 1 December 2001. The selected validation set contained two consecutive Betula pollination periods because of the biannual behaviour of the studied pollen.

The parameter K involved in the error function takes different values for the different levels. The K values have been selected around the value of the empirical proportion between the below-threshold level pollen days and the over-threshold level pollen days, Zlev. [Table 4, (section 4.1)]. In fact, if Zlev is select as K the ANN minimizes the misclassification probability. During Betula pollination the number of days with pollen in the air is less than the number of days when the pollen concentration is zero, so K is greater than one. Higher values of lev bear higher Zlev values, and thus higher K values.

Table 4 Equations part II

In order to show, compare and discuss the results we must consider two different complementary concordance measures. The proportion of good classification over the observations with the target variable value equals 1, GCI, and the proportion of good classification over the observations with the target variable value equals 0, GCII (Table 4). Table 5 shows the results obtained.

Table 5 Note that GCI represents the empirical probability of good classification a over-level pollen day, and GCII represents the empirical probability of good classification a below-level pollen day

Figure 3 shows the predicted conditional probabilities against the target variable, for the levels 30 and 80 over a training set section and the validation set.

Fig. 3
figure 3

Predicted conditional probabilities against the target variable, for pollen levels 30 and 80 g/m3 over a training set section and the validation set.The dotted horizontal line separates the two prediction zones. If the predicted conditional probability is above the line, the binary prediction will take the value 1, and if it is below the dotted line, the binary prediction will take the value 0

Pollen forecasting has become an important aim in aerobiology. The objective is to provide accurate information on pollen in the air to sensitive users in order to help them optimize their medication.

Usually, aerobiological information spread is made by using fixed categories in relation to threshold values. In this sense, our goal was to look for a model to allow us to know the probability of the Betula daily mean pollen concentration increasing over several threshold values, some of which have previously been cited as responsible for allergic symptomatology (Negrini et al. 1992; Corsico 1993).

Neural networks provided us a good result for forecasting the probability that a given value of Betula pollen concentration occurs. Between 83 and 92% of the occurrences in the year 2000, and 100% of the occurrences in the year 2000, of pollen concentration values reaching the thresholds considered (>20, >30, >70 and >80 g/m3) were predicted in advance. Similarly, in the year 2000 between 92 and 93% of the occurrences of pollen concentrations below a threshold value were correctly predicted, while for 2001 this figure was between 96 to 97%. Therefore, neural networks are a good tool to predict the probability of pollen concentrations reaching or exceeding a threshold value, and thus help the dissemination of aerobiological information to the population suffering from allergic problems. This is a first step towards the automatization of the prediction system.