Introduction

Finding the relationship between the past values of the meteorological elements and the actual pollen season is an issue of great importance in aerobiology, but only few papers have been published on this topic. In an early detailed analysis, temperature and relative humidity displayed a positive and negative correlation with Ambrosia pollen release, respectively, though omitting the effect of past meteorological elements (Bianchi et al. 1959). However, for instance, Speiksma et al. (1995), using linear correlations, found that the air temperature during the preceding 40 days had a decisive influence on the start date of the Betula pollen season. In Melbourne, Ong et al. (1997) used the rainfall total of July to develop a linear regression analysis to predict the onset of the grass pollen season. Käpylä (1984) and Galán et al. (1998) observed that diurnal pollen concentrations of trees were much more irregular than those for herbaceous taxa without having a clear and recurrent annual pattern. Furthermore, once anthesis has started in trees, it is relatively independent of weather variables (Käpylä 1984). Emberlin and Norris-Hill (1991) established that annual differences in the cumulative Urticaceae pollen concentration were primarily due to weather conditions in the period of pollen formation and only secondarily due to weather conditions in the pollen release season. Moreover, relative humidity, temperature, wind speed and rainfall were the most important in daily variations, but their relative importance varied over the years. Giner et al. (1999) associated daily Artemisia pollen concentrations with global solar flux, the minimum temperature and rainfall in the preceding weeks by applying linear correlations. They found that once pollination had begun, meteorological factors (excluding the wind direction) did not seem to significantly influence pollen concentrations. Though the above-mentioned papers cover the past effects of the meteorological elements on phyto-physiological processes of certain species; however, to the best of our knowledge, no related studies exist for ragweed. Also, no studies are available where the authors separate the weights of the current and past meteorological elements in influencing the current pollen concentration either for Ambrosia or for other taxa.

General observations show that the earth’s ecosystem is experiencing a global warming that is exerting a regional impact on the natural and human environments. This involves changes in characteristics (start and end dates of the pollen season, length of the pollen season, daily peak pollen counts, total annual pollen count) of allergenic pollen in the mid- and high latitudes of the Northern Hemisphere (IPCC 2013).

Ambrosia favours disturbed lands, having no competitors and does not like shading. Closed stocks and grasses are unfavourable to its propagation. As a result of global warming, its higher potential distribution is anticipated due to its high climate tolerance. Namely, this genus can adapt well to dry and hot conditions. If more fallow areas appear in the landscape, its further increase is expected, especially on sandy soils (Makra et al. 2011; Ziska et al. 2008).

Daily ragweed pollen concentration is influenced by numerous factors and processes. Altogether, 10 factors are listed and introduced below, but we should add that only soil conditions in the root zone, as well as current and preceding weather variables (including the height of the planetary boundary layer (PBL)) are analysed and discussed in detail. Influencing factors with their explanations are listed in separate paragraphs indicated by increasing serial numbers as follows:

  1. (1)

    Genetic attributes. They are taxon-dependent and have adapted themselves over many climate cycles (Prentis et al. 2008; Hodgins and Rieseberg 2011).

  2. (2)

    Soil type including location specific nutrient availability. Ambrosia favours sandy soils (Szigetvári and Benkő 2004). However, land eutrophication facilitating higher pollen production (Szigetvári and Benkő 2004) is not a characteristic in an agricultural area consisting of small private plots for the Szeged agglomeration (Deák et al. 2013). (Szeged is the largest settlement in south-eastern Hungary. See more information on this city and its surroundings in section ‘Location and data’.)

  3. (3)

    Soil conditions in the root zone. Soil temperature and soil humidity may be important influencing parameters for plant development and hence for pollen production. Soil temperature and soil humidity may be key influencing parameters for plant development and hence for pollen production. The depth of Ambrosia seeds in the soil modifies the effect of soil temperature and soil moisture. Namely, for seeds found deeper, the soil temperature increases more slowly with depth and soil moisture is above its optimum value. Hence, a worse germination rate can be expected here. Based on laboratory experiments, germination is the best directly on the surface, while it gradually worsens with increasing depth. The optimum germination depth of Ambrosia seeds in Hungary is 3 cm; however, even if germination is successful, if the seed is deeper than 7 cm in the soil, the plant cannot reach the surface (Kazinczi et al. 2008).

  4. (4)

    Land use may change. Ragweed pollen concentrations are influenced by agricultural and social factors (Deák et al. 2013). Stripping agricultural lands for building purposes could mean an expansion of neglected areas that contributes to an increase of habitat regions of weeds and hence to an increase in pollen production (Makra et al. 2011).

  5. (5)

    Current and preceding weather variables. Pollen grains will be removed from the anther when the climate parameters dependent forces acting to remove the particles exceed the binding force (Jones and Harrison 2004). Ragweed pollen concentration is proportional to the daily mean air temperature (Puc 2006), while it displays an inverse relationship with the daily relative humidity (Puc 2006). Solar radiation also influences pollen levels (Laaidi et al. 2003). But the above associations are highly complex. For instance, the lack of available water during the hottest summer period can limit pollination capabilities of Ambrosia because the plant concentrates on preserving water and maintaining its vegetative life functions at the expense of its generative processes (Makra et al. 2011). However, the role of rainfall and wind speed is unclear. Higher pollen concentrations may occur as a result of a slight rain and wind breeze, both shaking plants moderately and, hence, facilitating pollen release. A moderate wind speed, in conjunction with high solar radiation and low relative humidity, promotes drying and then rupturing of anthers, resulting in an increase in pollen release. However, heavy showers by washing out pollen from the air and/or wind by blowing from pollen-free areas contribute to a decrease in local pollen levels (Jones and Harrison 2004). At higher wind speeds, pollen concentrations may become diluted due to the higher turbulence (Jones and Harrison 2004). In addition, the frost that kills the adult plants in autumn has a major role as it is associated with the ceasing of pollen production (Ziska et al. 2008; Chapman et al. 2014).

  6. (6)

    Height of the planetary boundary layer (PBL). The PBL height characterising the mixing in the air may provides an indication of the dilution of the pollen concentration. Namely, an afternoon increase in pollen concentration may be associated with a reduction in height of the mixing layer and, vice versa, a thicker mixing layer insures a more efficient dilution of pollen concentration (Jones and Harrison 2004).

  7. (7)

    Long-range pollen transport. Ambrosia pollen with its aerodynamic diameter of around 20 μm can be transported over long distances (Sofiev et al. 2006a; Makra et al. 2010)

  8. (8)

    Resuspension of the pollen grains. This effect facilitated by strong winds preceding storms may contribute to an increase of the local pollen level (Venables et al. 1997; Sofiev et al. 2006b).

  9. (9)

    Disruption of the pollen grains. This may occur by rain producing smaller starch particles (Venables et al. 1997). The ratio of these small particles comprising protein may be up to 11–15 % of the total fine particles in the air (Womiloju et al. 2003).

  10. (10)

    Pollen grains as condensation nuclei. This process may substantially reduce the airborne pollen level. Particles of a certain size can act as efficient condensation nuclei at a higher relative humidity or dew point (Pope 2010). The efficiency of the long-range pollen transport will be reduced with those particles that have become condensation nuclei during the transport process.

The novelty of our present paper is twofold. First, previous studies concerning the analysis of the relationship between the past values of the meteorological elements and the actual pollen season characteristics (phenological characteristics: start, end and duration of the pollen season; quantity related characteristics: annual total pollen counts, peak pollen counts) used linear correlations; hence, their results might be unclear and distorted (see first paragraph of Section 1: Spieksma et al. 1995; Ong et al. 1997; Giner et al. 1999). Here, instead of using the traditional Pearson correlation, a generalised correlation able to measure not only the linear but every kind of relationship between variables is applied. Second, we examine not the pollen season characteristics but daily airborne ragweed pollen levels by relating them to both the actual daily values of meteorological variables and the past meteorological conditions. The results are evaluated with special attention to the interactions between the phyto-physiological processes and the meteorological elements.

Materials and methods

Location and data

Szeged (46.25°N; 20.10°E), the largest settlement in south-eastern Hungary is located at the confluence of the Rivers Tisza and Maros (Fig. 1). The area is characterised by an extensive flat landscape of the Great Hungarian Plain, namely Pannonian Plain, with an elevation of 79 m above sea level. The city is the centre of the Szeged region with 203,000 inhabitants. The climate of Szeged belongs to Köppen’s Ca type (warm temperate climate) with relatively mild and short winters and hot summers (Köppen 1931).

Fig. 1
figure 1

Location of Szeged in Europe/Hungary (upper panel) and the urban web of Szeged with the positions of the data sources (lower panel). 1: meteorological station; 2: aerobiological station. The distance between the aerobiological and the meteorological station is 2 km

The selection of Szeged can be justified by the following considerations. (1) Szeged is located almost at the centre of the Pannonian Plain within the Carpathian Basin. The importance of this fact is that the Carpathian Basin is the area having the highest pollen concentrations in Europe (Makra et al. 2010); (2) Ambrosia pollen counts measured at Szeged are mainly of local origin completed with medium-range transport, while the role of long-range transport is small (Makra et al. 2010); (3) Ambrosia pollen counts measured at Szeged are representative for the whole Carpathian Basin (Makra 2012); accordingly, one station (e.g.. Szeged) is sufficient to characterise associations between Ambrosia pollen counts as the target variable, on one hand and meteorological variables as influencing variables on the other; (4) Szeged has the biggest Ambrosia pollen data set in the Carpathian Basin, commencing in 1989.

The pollen content of the air was measured using a 7-day recording Hirst type volumetric spore trap (Hirst 1952) (Fig. 1). The air sampler is located on top of the building of the Faculty of Arts at the University of Szeged, approximately 20 m above the ground surface (Makra et al. 2010). We used daily ragweed pollen concentrations (pollen grains/m3 of air) and daily values of the following meteorological variables: mean temperature (T, °C), mean dew point depression (TT d, °C) (where Td is the temperature to which the initial temperature should be decreased at constant air pressure in order to reach saturation), mean sea level atmospheric pressure (SLP, hPa), mean wind speed (WS, m/s), maximum air temperature (T max, °C), minimum air temperature (T min, °C), rainfall amount (RF, mm), soil temperature (T soil, °C) at 10 cm depth, soil moisture (H soil, mm) at 10 cm depth and mean planetary boundary layer height (PBL, m). The first seven meteorological variables were measured in a meteorological monitoring station located in the inner city area of Szeged (Fig. 1). T soil and H soil were measured at the Meteorological Observatory of Szeged on the outskirts of the city. PBL data for Szeged were available as 3-hourly data from the ECMWF ERA Interim Database. ERA Interim data is measured data. The majority of the data originates from satellites. The measurements of atmospheric refraction got from GPS radio occultation began to be used in ERA Interim in 2001. The conventional observing system, despite its much lower data volumes, still serves as an indispensable constraint to the atmospheric reanalysis. In situ measurements of upper-air temperatures (T), wind (u/v) and specific humidity (q) are available from radiosondes, pilot balloons, aircraft and wind profilers. Observations of the surface pressure (P), 2 m temperature, 2 m relative humidity (RH) and near-surface (10 m) winds (u/v) from ships, drifting buoys and land stations have also been assimilated (Dee et al. 2011).

The 19-year data sets of the above-mentioned parameters were used in our analysis for the period 1989–2007.

The pollen season is defined by its start and end dates. For the start (end) of the season, we used the first (last) date on which one pollen grain/m3 of air is recorded and at least five consecutive (preceding) days also show one or more pollen grains/m3 of air (Galán et al. 2001). As the pollen season varies from year to year, the longest observed season (from July 15 to October 15) including 93 days was considered for each year.

Arrays of the daily values of the above-mentioned influencing variables in the pollen season of Ambrosia were labelled as current variables, while arrays of these variables in the period starting from the first pollen-free day following the previous pollen season (October 16) to the last pollen-free day preceding the actual pollen season (July 14) were labelled as past variables.

Methodology

A simple way to measure the strength of the relationships between daily pollen concentrations and daily values of meteorological elements is to calculate correlations between them. The correlation, however, characterises linear relationships, but relationships between pollen levels and meteorological conditions might be nonlinear (Jones and Harrison 2004). Therefore, there is a need for a generalised correlation R that measures every kind of relationship between random variables. Such a correlation satisfies |R| ≤ 1. Here, |R| = 1 when the two random variables X and Y have a deterministic relationship, and R = 0 when X and Y are independent.

Let Y be a 93-dimensional random vector representing daily pollen concentrations during the 93-day length pollen season of Ambrosia. Let X be a 93-dimensional random vector representing daily values of a given meteorological element during the same period when the relationship between the current pollen levels and the current meteorological conditions is in question. Otherwise, X is a 272-dimensional random vector when the relationship between the current pollen levels and the past meteorological conditions is examined. The value 272 comes from the fact that the period starting from the first pollen-free day following the previous pollen season to the last pollen-free day preceding the actual pollen season covers 272 days.

Székely et al. (2007) introduced a distance correlation that is able to measure any kind of relationship between random vectors having equal or different dimensions. Székely and Rizzo (2013a) developed a modified distance correlation R(X, Y) and its sample estimate \( \widehat{R}\left(X,Y\right) \) with a test for zero correlation in high dimensions. This modified sample correlation performs much better in high dimensions than the original correlation even for a small sample size n. Therefore, this technique will be applied here because of our high dimensions (93 and 93 or 93 and 272) and small sample sizes (n = 19 or 18 as we have 19-year data). Let (x 1, y 1), …, (x 1, y n ) be a sample for (X,Y). The modified sample distance correlation is defined as

$$ \widehat{R}\left(X,Y\right)=\frac{\mathsf{v}^{\prime}\left(X,Y\right)}{\sqrt{\mathsf{v}^{\prime}\left(X,X\right)\mathsf{v}^{\prime}\left(Y,Y\right)}} $$
(1)

with modified sample distance covariance

$$ \mathsf{v}^{\prime}\left(X,Y\right)=\frac{1}{n\left(n-3\right)}\left({\displaystyle \sum_{i,j=1}^n{A}_{ij}^{\prime }{B}_{ij}^{\prime }-\frac{n}{n-2}{\displaystyle \sum_{i=1}^n{A}_{ij}^{\prime }{B}_{ij}^{\prime }}}\right), $$
(2)

where

$$ {A}_{ij}^{\prime }=\left\{\begin{array}{c}\hfill n/\left(n-1\right)\left({A}_{ij}-{a}_{ij}/n\right),\kern0.5em i\ne j\hfill \\ {}\hfill n/\left(n-1\right)\left({a}_i-a\right),\kern0.5em i=j\kern1em \hfill \end{array}\right\}\kern1em ,{A}_{ij}={a}_{ij}-{a}_i-{a}_j+a, $$
(3)

and

$$ {a}_{ij}=\left|{x}_i-{x}_j\right|,\;{a}_i=1/n{\displaystyle \sum_{j=1}^n{a}_{ij}},\;{a}_j=1/n{\displaystyle \sum_{i=1}^n{a}_{ij}},\kern0.24em a=1/{n}^2{\displaystyle \sum_{ij=1}^n{a}_{ij}}. $$
(4)

The quantities related to B ij are calculated in the same way, except for

$$ {b}_{ij}=\left|{y}_i-{y}_j\right|,\kern0.24em {b}_i=1/n{\displaystyle \sum_{j=1}^n{b}_{ij}},\kern0.24em {b}_j=1/n{\displaystyle \sum_{i=1}^n{b}_{ij}},\kern0.24em b=1/{n}^2{\displaystyle \sum_{ij=1}^n{b}_{ij}}. $$
(5)

The test for whether \( \widehat{R}\left(X,Y\right) \) differs significantly from zero is based on the fact that

$$ Q=\sqrt{m-1}\frac{\widehat{R}\left(X,Y\right)}{\sqrt{1-{\widehat{R}}^2\left(X,Y\right)}} $$
(6)

asymptotically has a t distribution with degrees of freedom m = n(n − 3)/2, and the convergence of the true distribution of Q to the asymptotic distribution is fast.

It is useful to determine the order of importance of the meteorological variables in influencing pollen concentrations. This is based on partial distance correlations (Székely and Rizzo 2013b) as partial correlation helps spot spurious correlations (i.e. correlations explained by the effect of other variables) as well as to reveal hidden correlations (i.e. correlations masked by the effect of other variables). The method is a natural extension of the well-known stepwise regression (Draper and Smith 1981) in multivariate linear regression but using distance correlations. Our procedure based on partial distance correlations can be summarised as follows: (1) Prior to calculations, the meteorological variable X S exhibiting the highest \( \left|\widehat{R}\left(X,Y\right)\right| \) value from every possible X is selected as the most informative variable. Note that \( \widehat{R}\left(X,Y\right) \) can be viewed as a partial distance correlation without conditioning on other variables. (2) Take Z = X S . (3) Calculate the partial correlation \( {\widehat{R}}_p\left(X,Y;Z\right) \) between X and Y conditioned on Z. (4) The meteorological variable which provides the highest \( \left|{\widehat{R}}_p\left(X,Y;Z\right)\right| \) value among every possible X not included in Z is selected as the next most informative variable. Merge this selected variable with Z to get a new Z. (5) Repeat steps 3 and 4 until one (the least informative) meteorological variable remains.

When the relationship between the current pollen levels and the past of a meteorological variable is examined, it is unrealistic that this meteorological variable is uniformly informative over the entire 272-day period for future pollen concentrations. Therefore, it is useful to determine those subperiods (hereafter called key periods) where the meteorological conditions have the biggest influence on the pollen concentrations. These periods should not be too long, but they should not be too short either, as meteorological conditions for only a few days should not be crucial for far-future daily pollen production. Hence, the key period for a given meteorological variable is searched by finding the maximum of \( \left|{\widehat{R}}_1\right|,\dots, \left|{\widehat{R}}_m\right| \), where \( {\widehat{R}}_1,\dots, {\widehat{R}}_m \) represent correlations estimated for the period lengths from 7 to 70 days (dimension of X varies from 7 to 70) with every possible starting date. A problem here is that the probability distribution of max \( \left|\widehat{R}\right| \) under the null hypothesis H 0 : R 1 = R 2 = … = R m  = 0 is not known, in contrast to the probability distribution of \( \widehat{R} \) under H 0 : R = 0 (Székely and Rizzo 2013a). Let (x 1, y 1), …, (x 1, y n ) (n = 18) be the sample of (X,Y) for a given period. (1) Form a sample (x i(1), y i(1)), …, (x i(n), y i(n)), where i(j) ≠ j, j = 1, … n are randomly chosen such that i(j) ≠ i(k), k ≠ j, and calculate \( \widehat{R} \). (2) Perform step 1 for each period selected. (3) Take the maximum of these \( \left|\widehat{R}\right| \) values. (4) Perform steps 1–3 N = 1000 times to get a sample of maxima. The 1 − ε quantile of this sample is the critical value for max \( \left|\widehat{R}\right| \) to reject H 0 : R 1 = R 2 = … = R m  = 0 at a significant level ε.

Results and discussions

Correlations between arrays of daily values of the 10 meteorological elements in question and the array of daily ragweed pollen concentrations during the pollen season are in general highly significant (Table 1). T, T max, T min and T soil display a significant positive correlation, while TT d, SLP, WS and RF indicate a significant negative correlation. The role of H soil and PBL is, however, negligible. T min, WS and TT d, in decreasing order, are the most important in influencing the daily ragweed pollen concentrations, while SLP, H soil and T max are the least informative variables. T, T max and T min as temperature parameters are directly proportional to ragweed pollen counts. This is an expected association as Ambrosia is warm-tolerant (Kazinczi et al. 2008). During the vegetation period, the temperature increases and hence the degree of metabolic and generative processes become more active, and this contributes to the growth and pollen production of the plants. The importance of T min (indicated by its highest correlation) is emphasised by the two points that the vegetation period starts after early spring frosts and pollen production ceases immediately after the first frost occurs in autumn (Ziska et al. 2008).

Table 1 Correlations between daily values of the meteorological elements and the daily ragweed pollen concentrations during the entire pollen season. Last column is based on partial correlations

The role of WS is unclear as it can be either positive or negative. If air currents arrive over regions providing high (low) ragweed pollen production from an area characterised by smaller (higher) pollen counts, the role of WS is negative (positive). As the Szeged area is characterised in general by extreme high ragweed pollen loads, WS should play a negative role by transporting air with smaller pollen content over the city, hence diluting the locally released ragweed pollen grains.

TT d is a humidity difference that tells us how much the actual temperature is to be reduced in order to reach dew point. Small values of TT d displaying high relative humidity involve a smaller pollen concentration. This negative association can be explained by the fact that pollen grains, with higher humidity, can more easily cohere and in this way result in a smaller number of particles. A secondary consequence which reinforces this point is that pollen grains which stick together are heavier and accordingly drop out from the air faster, and this further decreases the pollen concentration. Furthermore, when a warm front is approaching or after a rain shower, the relative humidity is also higher (TT d is lower)—which again leads to a decrease in the pollen count. In contrast, with low relative humidity (indicated by high values of TT d), the anther becomes dry and this aids its rupturing and pollen release, which increases the air pollen level. As a result, a negative association of TT d with the pollen concentration (in the actual pollen season) can be simply explained by the above points (Láng 1998; Haraszty 2004).

The role of rainfall is somewhat less clear. Drizzle or short moderate rain, associated with a passing warm front, can shake plants slightly and encourage pollen release. Despite this, water is required for photosynthetic and generative processes that are indispensable for pollen production. However, heavy local showers during the period of the substantial pollen release (high summer and early autumn) wash out the pollen from the air. The importance of these showers is to refill the water storage of soil and vegetation in order to contribute to the survival of plants. In extremely dry periods, the plant reduces its pollen production for the sake of survival. If sufficient water is available for the above-mentioned photosynthetic and generative processes, the plants can use it for pollen production (Láng 1998; Haraszty 2004). In short, the role of rainfall in pollen release is quite complex.

The order of importance of the variables based on partial correlations (last column of Table 1) and order of correlations is rather different. This is because correctly arranging the importance of influencing variables for a target variable is not only a difficult but probably an unrealistic task due to multiple relationships among variables. Our ordering procedure is based on partial distance correlations (section ‘Methodology’) as partial correlation helps spot spurious correlations (i.e. correlations explained by the effect of other variables) as well as to reveal hidden correlations (i.e. correlations masked by the effect of other variables).

The main results of the correlations between arrays of daily values of meteorological variables in the period starting from the first pollen-free day following the previous pollen season (October 16) to the last pollen-free day preceding the actual pollen season (July 14) and array of daily pollen counts of the actual pollen season are as follows (Table 2): Past values of T soil and H soil display significant positive associations, while those of TT d, SLP, WS and RF display significant negative associations with actual ragweed pollen counts. In addition, T, T max, T min and PBL have no significant effect on the current ragweed pollen concentrations. Past SLP, T and T min are the most influential elements (in this order), while RF, WS and TT d are the least informative for determining the current pollen concentrations.

Table 2 Correlations between daily values of the meteorological elements in the ragweed pollen-free season preceding the actual pollen season and the daily ragweed pollen concentrations during the actual pollen season. Last column is based on partial correlations

SLP is a complex meteorological variable that partly involves the effects of all variables as variations of SLP are related to marked weather phenomena that affect almost every meteorological element.

Past values of T, T max and T min display no significant correlations with the actual pollen counts. However, the last column of Table 2 clearly shows their importance. For instance, temperature conditions in winter and early spring are important from a survival perspective of the seeds, and they influence their future germination.

The plant survives the winter period in the form of seed and so the soil variables are important for the following vegetation phases. Soil temperature (T soil) at the positive range and the refill of the water storage of the soil (H soil) by the beginning of the vegetation period are very important as this is a basic condition for the start of germination, and it displays definite positive associations between the past soil variables on one hand and the Ambrosia pollen concentration on the other. Note that H soil is determined not only by rainfall of the pollen-free period preceding the actual pollen season but also by the rainfall of earlier periods. H soil is also influenced by ground water flows, and it has a much longer-lasting effect on the actual ground water levels.

The significant negative association between past TT d and current pollen levels is hard to account for. It seems reasonable that there exists another influencing variable that significantly correlates with both the current pollen counts and the past TT d values. Nevertheless, it seems rather difficult to identify this variable. A similar argument can be applied for WS as well. The role of RF in the pollen season is much more important than that in the pollen-free season preceding the actual pollen season. A large amount of rain falling in the pollen-free season preceding the actual pollen season can favour the competitors. This effect can substantially reduce the population size of ragweed and lead to a decrease in the actual pollen counts. As a result, a negative value of R for rainfall (RF) in Table 2 appears reasonable.

We should mention here that both the past and current values of T and T min are highly influential for the current pollen season. T is associated with the fact that nearly optimal or even rather warm conditions are essential for ragweed, while T min is important because of ragweed’s poor frost tolerance (Ziska et al. 2008). The highest difference in the order of importance of the past and current values of the influencing variables is observed for TT d, WS, RF and SLP. TT d, WS and RF are all important in the actual pollen season, while they are unimportant in the pollen-free season preceding the actual pollen season. In winter, the plant exists as seeds and this time, only the survival is important for the plant. In the actual pollen season, TT d, through relative humidity and RF, helps in the outwashing of the pollen grains from the air. In the preceding pollen-free season, the air is free of pollen grains, so these factors are irrelevant. The survival of the seeds is independent of WS, but it substantially influences pollen concentration through transport processes in the pollen season. RF is less important in winter, since it falls mostly in the form of snow that can be used for plants just after melting, which increases H soil (Láng 1998; Haraszty 2004). SLP is the most important of all the influencing variables in the pollen-free season preceding the actual pollen season (Table 2), while it is the least relevant in the actual pollen season (Table 1). This is because in the winter period, only the survival of the seeds is crucial and the role of past meteorological variables essential in this survival partially appears via SLP (Table 2). However, in the actual pollen season, the influence of the remaining current meteorological variables is more visible as phyto-physiological processes occur only in the actual pollen season and hence multiple associations of the current meteorological variables with these processes are much more complicated than the mere survival of Ambrosia (Table 1).

In the winter period, meteorological variables exert their own effects partly via SLP on the survival of the seeds (Table 2). Namely their own, direct effects along with the indirect effects of the meteorological parameters influence the viability of seeds. In the same way, during the current pollen season, the same relationship affects Ambrosia pollen concentration (Table 1).

We should again emphasise that both SLP and the other meteorological parameters have their own direct effect, as well as indirect effects through the other parameters for determining the target variable. It is a difficult task to distinguish these two effects for determining Ambrosia pollen concentration in the actual pollen season or determining the survival of the seeds in the pollen-free season preceding the actual pollen season. However, an attempt can be made to separate these two components. Namely, both SLP (with its direct effect) and the other meteorological parameters (with their indirect effects) can be characterised by their data. Applying an appropriate statistical procedure (such as factor analysis with special transformation, Jahn and Vahle 1968) the load of both SLP-related direct components and the other climate parameter-related indirect components of the survival of the seeds in the pollen-free season and Ambrosia pollen concentration in the actual pollen season can be estimated. However, such an analysis is beyond the scope of the paper.

Key periods of meteorological variables in the pollen-free season preceding the actual pollen season for pollen concentrations during the actual pollen season are shown in Table 3. RF does not exhibit a significant correlation at the probability level of 0.05. This finding seems to contradict Table 2 as the correlation between past values of this meteorological variable, and pollen concentrations in the actual pollen season is significant. However, the critical value of the correlations being significantly different from zero is substantially higher for Table 3 because the key period is selected according to the maximum of the correlations (see section ‘Methodology’). Therefore, Table 3 appears to tell us that rainfall has no well-defined key period influencing pollen concentrations. The key periods can be classified into four groups according to their timing in the year. Namely, groups of the key periods found in June–July are indicated by serial number 1; in January–February, it is group no. 2; in November and December, it is group no. 3; and in April–May, it is group no. 4 (Table 3). It should be added that highest significant correlations appear in group 3, consisting of WS, T min and PBL. The key periods (Table 3) can be phyto-physiologically interpreted. Temperature and humidity thresholds for the soil should be exceeded by mid-spring in order to start the development of the plants (germination). The highest impact of T soil and H soil for the current pollen concentrations of the actual pollen season is delimited by their key periods of April 12–23 and April 27–May 8, respectively. For T max, the period April 25–May 5 is of key importance regarding the vegetative processes. At the turn of June and July (June 27–July 7), the temperature plays a dominant role in the vegetative and the beginning generative processes. This is the time when the first Ambrosia pollen counts can be captured and simultaneously when a continental rainfall peak can occur. This rainfall peak can be delayed to this period and a delay may suppress the development of plants. As regards T min, the period of November 19–28 seems to be of key importance due to the early frosts, and this may reduce the survival potential of the seeds. Though the remaining variables (with their key periods), namely TT d (January 27–February 5), SLP (June 12–24), WS (December 6–15) and PBL (December 2–13) significantly influence the pollen concentration in the current pollen season, direct associations with the current pollen concentrations cannot be phyto-physiologically interpreted except for SLP. A significant negative association of this latter meteorological variable in the period June 12–24 with the pollen counts in the current pollen season may be again due to a continental rainfall peak. Frequent weather fronts associated with low sea level air pressure and heavy showers at this time ensure sufficient water for the photosynthetic and generative processes and, accordingly, give rise to a more intense pollen production.

Table 3 Key periods of daily meteorological variables in the ragweed pollen-free season preceding the actual pollen season. Key periods can be classified into four groups according to the last column. Rainfall is not classified as its correlation is significant only at a 0.12 level

The present paper analysed the association of the current Ambrosia pollen counts with the current and past meteorological elements. The importance of this kind of analysis is that a knowledge of the past values of the meteorological variables may help us to be better prepared for potential severe ragweed pollen episodes. Previous studies concerning the analysis of the relationship of the past values of the meteorological elements with the actual pollen season used linear correlations, and hence their results might be unclear and contain distortions (see first paragraph of the ‘Introduction’: Spieksma et al. 1995; Ong et al. 1997; Giner et al. 1999). Here, a generalised correlation was applied that is able to measure not only the linear but also every kind of relationship between variables.

As was mentioned in the ‘Introduction’, daily ragweed pollen concentration is influenced by a variety of factors and processes. (1) Genetic attributes are constant for a given taxon, and their difference for the individual taxa is manifested in their different adaptation to climate cycles. (2) Soil type is also constant for a given area. However, (3) the nutrient availability generally changes over time and it is difficult to determine exactly. (4) Land use changes for the Szeged area can be described using CORINE Land Cover Database. However, this database is available only for the years 1990, 2000 and 2006; hence, the regression of ragweed pollen levels as a function of changing land use cannot be determined. In spite of this, change in land use for the Szeged area was negligible from the year 1990 to the year 2006. Thus, land use changes did not influence the pollen concentration of Ambrosia over the Szeged area in the period in question (Deák et al. 2013). (5) The role of long-range pollen transport can be quantified by statistical models (e.g. Makra et al. 2010) and numerical simulation models (e.g. Sofiev et al. 2006a; Zink et al. 2012). It strongly depends on the pollen production of the target area, the source area and the route passed by air parcels. Hence, it is difficult to determine the daily variation of the percentage of the long-range pollen transport in the measured pollen concentration.

Conclusions

Instead of a Pearson correlation that measures the strength of the linear relationship between two random variables, distance correlation (Székely and Rizzo 2013a) measuring every kind of relationship between two random vectors was used in our study. Distance and partial distance correlations (Székely and Rizzo 2013b) between arrays of daily values of the 10 meteorological elements and the array of daily ragweed pollen concentrations during the current pollen season were calculated. When the current values of these meteorological variables were considered, the six most important variables turned out to be two temperature variables (T min, T), two wetness variables (TT d, RF) and two variables characterising the mixing of the air (WS, PBL). As regards the past meteorological conditions, the six most important variables contain four temperature variables (T, T min, T max, T soil) and two variables characterising large-scale weather patterns (SLP, PBL). An identification of key periods of the past meteorological variables allows one to develop a statistical model that is able to reduce the uncertainty concerning the upcoming ragweed pollen season and hence to contribute to a more accurate ragweed pollen forecast including days critical for pollen-sensitised people well before the pollen season. Of course, such a model would only be valid for the city of Szeged, but it could be useful for other locations as well.

It should be added that previous studies numerically evaluating the role of current and past weather conditions in influencing the current local pollen concentration are not available in the literature, so our results could not be compared.

The 19-year ragweed pollen data set used in our analysis is definitely long (the longest available for the whole Carpathian Basin) when compared to the typical length of ragweed pollen data sets in Europe. Although PBL and the soil variables (soil temperature and soil humidity) were used as new elements, every possible influencing variable could not be considered. The hidden impact of other possible influencing variables is presumed for those ‘influencing parameter−pollen count’ relationships that cannot be interpreted from a phyto-physiological point of view.

We should mention here that the results of the analysis performed in our study are not independent of the environmental conditions of a certain region as the metabolic features of Ambrosia have adapted themselves to a certain climate zone. In particular, our results for Ambrosia should be interpreted just for a temperate climate zone.