1 Introduction

Allergenic respiratory symptoms induced by pollen grains are progressively on the rise in the industrialized countries due to environmental, anthropogenic, and climatic change factors (Jato et al. 2013; Ziello et al. 2012). Consequently, the growth pattern of urban plants, seasons of growth, length of the phenological period, and the metabolic mechanism could response in different ways to these changes (Gonzalez-Parrado et al. 2014a). Recent researches pointed out in some species of the genus Plantago (mainly Plantago lanceolata L.) the metabolic changes can modify the morphology of some plant structures such as the size and shape of leaves and flowers, pollen grain size, and aperture number (Gonzalez-Parrado et al. 2014b). The Plantago genus is one of the most prolific herbaceous plants in the Spain urban areas growing in dry and downtrodden soils. In natural environments, this plant is well distributed in degraded pastures, roadsides, ditches, and abandoned fields. The most abundant species are P. lanceolata L. (ribwort plantain), P. coronopus L. (erba stella), and P. major L. (common plantain).

The special allergenic ability of Plantago pollen grains has been demonstrated (Izco et al. 1972; González Minero et al. 1998). Even though Plantago pollen is found in the atmosphere with low concentrations, its allergenic proteins cause significant allergic symptoms (Osvath 1991) affecting nearly the 45 % of the sensitive patients (Feo Brito et al. 1998). Clinical examination of monosensitized people to Plantago in the North of Spain (Gonzalez-Parrado et al. 2014b) disclosed that when these individuals begin to show respiratory symptoms, few or no Plantago pollen grains are present in the atmosphere with airborne levels remaining <5 grains/m3 for a long time. Even though the real role of Plantago in respiratory allergies is traditionally underestimated and difficult to assess because these plants bloom during the same period as the grass, but with a lower pollen grains production (Detandt and Nolard 1991), in Germany, this pollen type is recognized as one of the major allergens (Kersten et al. 1991) while in Barcelona pollinosis to Plantago ranks the third in herbaceous plants behind Parietaria and Artemisia (Ranea Arroyo 2002). In the South Europe areas, Plantago pollen allergens present a high incidence affecting between the 42–51 % of patients with pollinosis (García Ortiz et al. 1995; Feo Brito et al. 1998; Carretero et al. 2005; Blanco et al. 2008). Arenas et al. (1996) suggest that pollen of Plantago is the second most frequently positive reactions producer in skin tests in Ourense (afterward the Poaceae pollen sensitizations). Similar findings were conducted in the nearby areas of Coruña (Ferreiro et al. 2002) or Vigo (Belmonte et al. 1998) with a 9 % of patients with positive reactions to skin tests. Calabozo et al. (2001) identified and characterized the major allergenic components of pollen grain P. lanceolata, demonstrating that Pla l 1 is the major allergen. This protein has a similar sequence to the Ole e 1 allergen, which may indicate the existence of cross-reactions between P. lanceolata and Olea europaea. Moreover, cross-reactivity with group 5 of grasses and some foods (as the melon) was also detected (Asero et al. 2000).

During the last years, neural networks have been used in aerobiology and epidemiology in order to improve daily pollen concentration forecast (Puc 2012; Rodríguez-Rajo et al. 2010). Artificial neural networks (ANNs) are a complete tool for data analysis (Cid et al. 2011) developed by an interdisciplinary group with interest in understanding the functioning of human brain (Rosenblatt 1958). The ANNs try to reproduce human ability of taking decisions simulating the human brain´s basic unit, the neuron, and the interconnections between the neurons that allow them to work together and save experience’s information (Astray et al. 2013a, b; Montoya et al. 2012). ANNs have demonstrated their ability solving problems in nonlinear systems, where other tools and methods find difficulties to obtain good results. In this work, pollen concentration and meteorological data are one of these situations.

The aim of this research was the short-term prediction of airborne Plantago pollen concentrations by means of ANNs in order to check whether allergic individuals could take preventative measures to protect themselves from the severity of the pollen season. Short-term models were traditionally developed in aerobiological studies making use of statistical tools such as time series analysis (Cotos-Yañez et al. 2004; Rodriguez-Rajo et al. 2006) or linear parametric statistics (Chuine and Belmonte 2004; Rodríguez-Rajo et al. 2004; Sánchez-Mesa et al. 2005).

2 Materials and methods

The study was carried out in the city of Ourense, situated in a depression at 139 m. a.s.l. in northwestern Spain (42°20′N and 7°52′W) (Fig. 1). The climate of Ourense is oceanic, with a strong Mediterranean influence. Records for the last 30 years show a mean annual temperature of 14.2 °C, maximum average temperature of 20.2 °C, and minimum average temperature of 8.2 °C. Annual rainfall is 794 mm, with very irregular distribution over the year; average summer rainfall is only 21.6 mm (Martínez and Pérez 1999).

Fig. 1
figure 1

Location of Ourense in Europe

A Hirst-type LANZONI VPPS 2000 volumetric 7-day recording sampler (Hirst 1952) was used to collect the airborne pollen from 1995 to 2010. The sampler was situated on the roof of the Sciences Faculty (approximately 20 meters above ground level). The Lanzoni sampler is calibrated to handle a flow of 10 liters of air per minute, thus matching the human breathing rate. Pollen grains are impacted on a cylindrical drum covered by a melinex film coated with a 2 % silicon solution as trapping surface. The drum was changed weekly, and the exposed tape was cut into seven pieces, which were mounted on separate glass slides. Daily values were expressed as number of pollen grains per cubic meter of air. Pollen grain identification was performed using a NIKON OPTIPHOT II microscope equipped with a 40×/0.95 lens. Pollen counts were conducted using the model proposed by the R.E.A., consisting in four continuous longitudinal traverses along the 24-h slide (Galán et al. 2007). The following atmospheric pollen season characteristics were analyzed: first and last dates, length, maximum pollen count and date recorded, and total pollen. The pollination period was calculated as 95 % of the annual total pollen, the first days, producing up to 2.5 % of total production, were removed from the calculations as well as the final days (from 97.5 to 100.0 % of total production) (Andersen 1991). The meteorological data used for the study were recorded by the Spanish Meteorological Agency.

Artificial neural networks are a tool for modelling and forecasting several nonlinear processes, including airborne pollen concentration. In this work, ANNs have been used with the purpose of obtaining predictive values of atmospheric Plantago pollen concentration in the city of Ourense. This method works enabling relation among the different income variables or parameters with an already registered outcome value (training process). In a later step, called validation process, with new values at the entry, the system provides exit values.

Formally, performance of ANN is described as follows:

The information collected by a vector (Eqs. 1, 2) is propagated to the first intermediate layer by a function called spread function. This adds all the excitatory signals that reach neuron.

$$x^{p} = \left( {x_{1}^{p} ,x_{2}^{p} , \ldots ,x_{N}^{p} } \right)^{T}$$
(1)
$$s_{i}^{p} = \sum\limits_{j = 1}^{N} {w_{ij} x_{j}^{p} = b_{i} }$$
(2)

were N is the number of the network input neurons, w ji is the weight value of the connection between the neuron j from the input layer and the neuron i from the intermediate layer, and b i is the value of the “bias” associated with the neuron i. Activation function treats the generated value and generates an excitatory answer to signals received (Eqs. 3, 4). There are available different activation functions but in this case, following recommendations from the literature (Haykin 2008; Hilera and Martínez 1995; Haykin 2009), it was chosen as sigmoidal function (Eq. 4):

$$y_{i}^{p} = F_{i} \left( {s_{i}^{p} } \right)$$
(3)
$$F_{k} \left( {s_{k}^{p} } \right) = \frac{1}{{1 + e^{{ - s_{k}^{p} }} }}$$
(4)

The term of error for the output neuron is calculated by means of the following Eq. 5;

$$E^{p} = \frac{1}{2}\sum\limits_{k = 1}^{M} {\left( {d_{k}^{p} - y_{k}^{p} } \right)^{2} }$$
(5)

the process would be iterated until E p yield the desired error value.

Structure or architecture of ANN means how many and what variables are used as inputs and outputs and number of intermediate layers. The aim during the research process consisted of obtaining the structure of ANN that provides better results. In this part, several tests were carried out. The procedure is essentially a trial-and-error method: by modifying the number of layers and the variables used, the level of maximum required error or the number of the cycles in training part. The intention is to optimize the labor in terms of time and memory spending.

In this work are used stored data from 1993 to 2010 corresponding to meteorological data and airborne pollen concentration from Plantago. Data from 1993 to 2008 were used in training part, and data of 2009 and 2010 were used in validation part. After research process, it was determined that the ANN structure provides good results: the one built with five inputs (corresponding to Julian day, Daily precipitation, Daily humidity value, Daily value of number of hours of insolation, and Daily Plantago airborne pollen concentration value), one intermediate layer (with from 3 up to 12 neurons), and three output neurons, corresponding with the predictive value of Plantago pollen concentration 1, 2, and 3 days before, respectively. Depending on the pursued value (predictive values 1, 2 or 3 days after), it will be used as a specific structure for determining predictive values of pollen concentration. This structure is named 5-x-3, being x the number of intermediate layers (Fig. 2).

Fig. 2
figure 2

Diagram of perceptron network constituted by five neurons in the input layer, a hidden layer with undetermined neurons (x), and an three neurons in the output layer

3 Results and discussion

Plantago pollen pollination showed a marked phenology pattern during the years of the study in the atmosphere of Ourense. Concentrations rose during the second fortnight of March reaching the highest concentrations during the spring and summer, mainly in the month of June (Fig. 3). Plantago pollen appears in the air of the Northern half of the Iberian Peninsula from April to October while from March to September in the Southern part (Gutiérrez et al. 1999; Rodriguez de la Cruz 2009).

Fig. 3
figure 3

Evolution of the concentration of Plantago pollen along the year during the studied period (years 1993–2010)

The atmospheric Plantago pollination period is long and characterized by a wide range of pollen concentrations oscillations as a result of the various family species bloom succession which overlap their flowering period. The annual total amount of pollen ranges between the 103 pollen grains recorded in 2003 and the 1.008 achieved during 2007, with a mean of 480 pollen grains (Table 1). The total Plantago pollen counts obtained in our study were similar to those reported for other Spanish cities (Gutiérrez et al. 1999). Studies conducted by González-Parrado et al. (2014a) evidenced that the total Plantago pollen amounts in the atmosphere vary from year to year depending mainly on weather and soil conditions. The growth of herbaceous plants as Plantago appears to be influenced by accumulated rainfall during the three months preceding the pollination period (Trigo et al. 1997). Moreover, the years with high pollen values corresponded to lower precipitation values during the previous flowering season months (González-Parrado et al. 2014a).

Table 1 Characteristics of the main pollen season (MPS) of Plantago (1993–2010): starting julian day (S), ending julian day (E), length in days (L), average value (Av), maximum value (Max), day of the maximum (DMax), total concentration in the main pollen season (Ts), and annual concentration (T)

The study of the evolution of the most important atmospheric pollen season characteristics during the years of study (Fig. 4) showed a trend to an increase in the total annual amount of pollen grains and the peak daily pollen concentration value (as well as an advance of the date at which this peak occurs). By contrast, no clear trend in the annual total pollen count was noted for Plantago in other Northern areas (González-Parrado et al. 2014a; Ziello et al. 2012). Variations in the timing of the peak pollen concentrations may also be related to variable weather conditions (Tormo Molina et al. 2001; Puc 2009). Recent experimental studies (Shea et al. 2008) describe a direct relationship between temperature and CO2 increases in the atmosphere and further development of the biomass of certain herbaceous plants, which consequently induce a higher plant pollen production. Furthermore, an early start date of the atmospheric pollen season accompanied by an increase in its length was detected in Ourense, although no clear trend in the timing of the pollen season was noted for Plantago in a 17-year study of nearest areas (González-Parrado et al. 2014a). Changes in air temperature in recent years could affect the Plantago atmospheric pollen season inducing an earlier onset when the weather becomes warmer (Rodríguez de la Cruz 2009). Some researches focused on the possible effect of climatic change on the start of flowering pointed out that spring pollination species tended to start progressively earlier (Recio et al. 2010). Studies conducted by Gutiérrez et al. (1999) showed that Plantago had a long pollination period in the Northern Spain than in the Mediterranean area.

Fig. 4
figure 4

Evolution of the characteristics of the main pollen season (MPS) of Plantago (years 1993–2010)

All aforementioned variations make difficult the attainment of prediction models. Previous researches have tackled this subject in aerobiology developing ANN models to predict pollen data (Arizmendi et al. 1993; Sánchez-Mesa et al. 2005; Castellano-Méndez et al. 2005). Furthermore, studies conducted by Poot et al. (1997) indicated that in the particular case of Plantago, different individuals show considerable variation in phenology, even under homogeneous conditions.

The correlations obtained between the pollen concentrations and the daily meteorological parameters are showed in Table 2. Different studies have also analyzed the impact of meteorological parameters on Plantago airborne pollen concentration (Gutiérrez et al. 1999; Puc 2009; Tormo Molina et al. 2001; Trigo et al. 1997). The Spearman’s test showed positive correlations, with a significance level of 99 % (p ≤ 0.01), between the concentration of Plantago pollen and the sun hours, maximum and mean temperatures, and wind coming from the N–NE area. As corresponds to an eminently flowering summer taxon, the best correlations were obtained with sunshine. The positive effect of sun hours on atmospheric Plantago pollen concentration was noted by Trigo et al. (1997) who found that heat parameters (such a temperature and sun hours) have the greatest impact on its pollen concentration, mainly at the beginning of the pollen season. Likewise, maximum temperatures have been associated with peaks Plantago pollen concentration in the atmosphere (Puc 2009; Tormo Molina et al. 2001; Trigo et al. 1997). By contrast, a negative correlation was registered with the rainfall, relative humidity, and the southern winds. Studies conducted by González-Parrado et al. (2014a) also showed that heat parameters such as temperature, sun hours, and evaporation contribute to the increase in the Plantago pollen amount, whereas relative humidity and precipitation exert a negative effect on their pollen dispersion (Tormo Molina et al. 2001). As a consequence, for these plants, an important need of sun hour’s threshold accompanied by high temperatures was required to induce its pollen release to the atmosphere (Tormo Molina et al. 2001). Studies conducted by González-Parrado et al. (2014a) noted that the cumulative sum of maximum temperature was the best parameter for predicting the onset of the Plantago pollination period. Bricchi et al. (1995) suggest that this is the behavior in plants that bloom in late spring or early summer, which are greatly influenced by photoperiod and temperature. Hyde and Williams (1945) observed that the flowers may remain in a stationary state, with no pollen release, when the temperatures fall during the pollination period. Finally, the correlation obtained with winds allows us to conclude that the source of the atmospheric Plantago pollen is the Northern areas close to the city, where they are located suitable habitats for these plants: nitrophilous ruderal and abandoned areas and gardens (Rodríguez de la Cruz 2009). The wind induced increases or decreases in the Plantago pollen concentration in the atmosphere of the cities depending on its origin (González-Parrado et al. 2014a).

Table 2 Spearman’s correlation coefficients among Plantago pollen concentration and meteorological parameters (** 99 % confidence)

The objective of many techniques used for forecasting pollen in the air was to provide accurate information about airborne allergenic pollen to sensitive patients. Sanchez-Mesa et al. (2005) carried out a comparison among different methodologies, showing that using ANN can obtain prediction of Poaceae pollen concentration with great accuracy. The combination of meteorological and pollen data in neural network models yields the better results (Sanchez-Mesa et al. 2005). Therefore, in the present paper, we tried to forecast daily Plantago pollen concentrations by means the development of an ANN model. After a tedious trial-and-error process, it was obtained that the structure of ANN that provided better results in terms of time and memory spending was 5-x-3, being the five inputs meteorological values and previous Plantago pollen concentrations the outputs of the predictive values for the next 1, 2, and 3 days. The number of neurons in the intermediate layer was varied from 3 up to 11, trying to obtain the appropriated scheme of network. For training data, the period 1993–2008 was used, while for the validation of the model, the 2009–2010 data set period was used. To carry out an initial evaluation of the results, all days from 2009 and 2010 were used simultaneously, taking into account the R 2 fitting parameter between real and forecasted values. The predicted values with 1, 2, and 3 days of advance were analyzed. Table 3 shows the specific network structures with its R 2 value considering the pursued prediction in any case. Therefore, structure 5-5-3 offers good results for the prediction with 1 day on anticipation (it will be taken the first of the 3 outputs) while the structure 5-3-3 was applied for the 2 days of anticipation forecast (taking obviously the second of the outputs).

Table 3 Values of fitting parameter R 2 between real and predicted values of Plantago pollen concentration (2009 and 2010) for the different structures of neural networks

Table 4 shows the data of the variables sensitivity and importance. Sensitivity express how change an output when the inputs are slightly modified. For the two structures proposed in this work (5-5-3 and 5-3-3), rainfall is the most sensitive variable; it should be the variable with less variation in case to be modified at the entrance of the ANN. Importance of variable represents the relative importance of every input variable. In absolute terms, it corresponds to the sum of the absolute value of the weighs of the connections from the input layer to the first undercover layer. Julian day is the most important variable for both architectures employed in the present work. It is notice that in percentage terms, all the used variables have importance values that must be taken into account.

Table 4 Sensitivity (%) and importance (%) of the used variables in the ANNs

When the system of ANN is being implemented and with the aim to confirm the partial results obtained during the validation part, an additional test has been conducted. This additional test consists in a second validation step using a set of previously employed training data. We verify that predicted results agree with the real ones in both verification steps. When forecasted values versus the real ones were analyzed during the main pollen season (MPS), the prediction with 1 day of anticipation achieved values of correlation coefficient of R 2 = 0.549 (year 2009) and R 2 = 0.443 (year 2010). In the case of 2 days ahead forecast, R 2 values obtained were 0.372 and 0.319, respectively. The representation of the predictive and real values of Plantago pollen concentration during the main pollen season of the years used for the ANN model validation (2009 and 2010) was showed in Fig. 5.

Fig. 5
figure 5

Real and forecasted values of Plantago pollen concentration in MPS of 2009 (left) and 2010 (right) for 1 day of anticipation (top) and 2 days (down)

The most traditional common forecasting short-term concentrations models in aerobiology are developed making use of statistical analysis such as linear regressions equations (Chuine and Belmonte 2004; Laaidi 2001) only based on meteorological variables as prediction parameters. Other estimators should be included in order of better improvement of the forecast accuracy. An available option is to take into consideration as predictor variables other parameters such as the day number from January 1 or daily mean pollen concentration of the previous days (Rodríguez- Rajo et al. 2010). Although ANN only trained with measured values of pollen concentration was able to predict near future values (Arizmendi et al. 1993), the failure to include as estimators environmental parameters makes difficult the prediction of the pollen season start date, which was a serious drawback (Puc 2012). The ANN technique by means of only meteorological data (ordering the meteorological factors according to their statistical importance obtained from the ANN parametric sensitivity analysis and Spearman’s correlation) eliminates this drawback as the method can be used at any moment of time and provides a forecast for the beginning course and at the end of the pollen season (Puc 2012). Therefore, the combination of meteorological and pollen data in neural networks models yields the better results (Sanchez-Mesa et al. 2005).

Our results are better than those obtained by classical aerobiology forecasting methodology, both concerning the prediction horizon (since the pollen data are available 72 h before the day whose pollen concentration we want to determine) and the pollen concentration predicted. We also tried to predict the Plantago pollen concentrations by means of a multilinear regression model (Eqs. 6, 7). Using data from 1993 to 2008, it was tried to obtain an estimation of values of pollen concentration for 2009 and 2010. This model was built defining:

$$y = \log [c + 1]$$
(6)

where c is daily pollen concentration, so the resulting linear fit line has this form:

$$y = r_{0} + r_{1} x_{1} + r_{2} x_{2} + r_{3} x_{3} + \cdots$$
(7)

Being r i the constants and x i the values of predictors, (in our case the variables with more significance from the statistical correlation: NE wind, SW wind, sun hours), obtained results for Plantago are too much far from the real ones. It could be because of the meteorological data have great randomness. Hence, it is necessary to opt for other predictive methods. The ANN model developed in this paper offers possibilities for forecasting daily Plantago pollen concentrations with a high degree of accuracy providing accurate information to sensitive patients and medical doctors in order to help them optimize their treatment process.

4 Conclusions

Artificial neural networks is a precise methodology for forecasting short-term Plantago airborne pollen concentration with good accuracy. The proposed model provides predictive results by using data sets from Plantago pollen concentration and meteorological variables. Although there is detected an important meteorological variability, this model is able to offer good estimations with a prediction horizon of 2 days in advance. This subject is too important to Public Health systems in order to take measures in population who suffer allergies and other respiratory diseases.