Introduction

The spatial distribution of airborne pollutants in ecosystems can be studied using passive moss biomonitoring (Markert et al. 2003). This low-cost monitoring technique is well-recognized in studies of atmospheric deposition and transboundary pollution all over Europe (Schröder et al. 2008; Harmens et al. 2010). Regular European surveys have been carried out every 5 years since 1990 (Harmens et al. 2015). According to the Monitoring Manual by the International Cooperative Programme on Effects of Air Pollution on Natural Vegetation and Crops, only the apical segments of the moss are to be collected during the passive monitoring surveys (ICP Vegetation 2014). In field, this usually translates to collecting the green part or green-brown shoots with maximum length of 3–4 cm, representing the last 2 to 3 years of growth depending on the species. The lack of standardization of the exposure time has been criticized, and collecting only the green parts of the same length was recommended in order to minimize the age-related cation uptake of tissues (Boquete et al. 2014). Nevertheless, the relationship between the elemental concentrations in mosses and atmospheric deposition was deemed to be obscure despite following these recommendations (Fernández et al. 2015b). In order to assess the association between the atmospheric conditions and tissue concentrations, the data on air pollution and at the sampling site are needed. This can be achieved by performing the biomonitoring survey at the site of the technical monitoring stations (e.g. Motyka et al. 2015) or by calculating the air pollution situation at the sites of interest using air pollution modelling. The spatial distribution of the airborne pollution is closely related to the transport of atmospheric particles depending on emissions and meteorological conditions (Connan et al. 2013; Fang et al. 2014; Omrani et al. 2017; Siudek and Frankowski 2017). When these are known, air pollution modelling can be employed. Gaussian dispersion models are common air pollution models and are used for modelling complex real-world environments with high number of different kinds of pollution sources. The models assume an emission transport from continuous pollution sources in homogenous wind field without spatial limits. The transport itself is-in the model-provided by the convection by wind and via turbulence diffusion, which is described statistically by Gaussian distribution. Spatial limitations, mainly the terrain, are included into model by correction coefficients. Gaussian dispersion models are commonly used for long-term (e.g. annual) average concentrations modelling. The dispersion is calculated for a set of standard meteorological conditions and summed, weighted by probability of occurrence of such conditions. Gaussian models can also incorporate dry deposition velocity of particles allowing the dry deposition calculation. (Zanetti 1990).

The study area is characterized by specific air pollution problems connected with its history, topography and local meteorological conditions (Blažek 2013). It is situated in the eastern part of the Czech-Polish borderland, in the Moravian-Silesian region. The region is burdened with black coal mining and heavy industry: energy industry, coking plants and ironworks (Klusáček 2005; Cabala 2004). The concentration of industrial activities led to a population with high density, which is related with substantial emissions from domestic boilers, especially in the case of the Polish part of the area where the coal is still a frequently used fuel. Thus, it belongs to the most polluted regions in Europe. The concentrations of particles (PM10, PM2.5), benzo [a] pyrene and ozone repetitively exceed the limit values (European Environmental Agency 2017) settled in the Directive on ambient air quality and cleaner air for Europe (Directive 2008/50/EC). The air pollution limit values set for harmful metals such as Pb, As, Cd and Ni in particulate matter are usually not exceeded in the region (Czech Hydrometeorological Institute 2016), but since they tend to accumulate in the environment and are connected to the anthropogenic sources present in the area for a long time (Vojtěšek et al. 2009; Voutsa and Samara 2002), they represent a significant health and ecosystem risk in the area. This presumption was previously confirmed by systematic biomonitoring performed in the framework of the International Co-operative Programmes (ICPs) under Convention on Long-Range Transboundary Air Pollution (CLRTAP) (Suchara and Sucharová 2004; Sucharová et al. 2008; Suchara et al. 2015;Suchara et al. 2017). Some other biomonitoring surveys have also partially covered this area (Grodzińska et al. 2003; Kapusta et al. 2014; Kłos et al. 2011); however, no gathered data are detailed enough neither to reveal the local specific pollution sources nor to provide data possible to be compared with air pollution modelling.

The partial aims of this study were (1) identification of the origin of air pollution in the Moravian-Silesian region, (2) determination of the spatial distribution of trace elements in the Moravian-Silesian region and (3) verification of the air pollution model SYMOS’97 by the biomonitoring survey results and vice versa.

The hypothesis tested was that the results of moss biomonitoring reflect the air pollution situation-determined by air pollution modelling-prior to the sampling and not the immediate situation at the time of the sampling.

Materials and methods

Sampling and analysis

The sampling network for this study was designed to cover the area were the PM concentrations continuously exceed the annual average limit (Jančík et al. 2013). According to the standards and critical reviews (EN 16414:2014, (Fernández et al. 2005; Fernández et al. 2015a), the regular grid was used to design the sampling network. Sampling sites were located at nodes of a regular 10 × 10 km grid with extra points within every grid cell. The grid numbered 41 points covering an area of 1600 km2 (40 km × 40 km).

The sampling was carried out according to the ICP Vegetation Monitoring Manual (ICP Vegetation 2014). The moss samples were collected within 1 week in October 2015 to minimize the influence of the intra-annual variability (Fernández et al. 2015b). Although just one moss species should be sampled to avoid the interspecific element concentration variation (Fernández et al. 2015b; Schröder et al. 2008), this prerequisite could not be met since no one species was present at every site. Due to the design of the study, this was unavoidable trade-off; nevertheless, the supposed variation between the species had no effect on the eventual grouping of results. In the cases when two species were available at one site, the concentrations were assessed according to the recommendations of Halleraker et al. (1998). Necessary requirements for disregarding the inter-species differences (significant correlation between concentrations, species ratio around 1) were satisfied. The most frequently sampled moss species in the area was Brachythecium rutabulum (Hedw.) (66% of all samples). This moss grows in areas affected by anthropogenic activity (Sucharová et al. 2008). Other pleurocarpous mosses sampled were (in descending order of frequency): Cirriphyllum piliferum (Hedw.) (12% of samples), Hypnum cupressiforme (Hedw.) (10%), Hylocomium splendens (Hedw.), Brachythecium salebrosum Schimp. and Eurhynchium hians (Hedw.).

The samples were transported to a laboratory on a daily basis; here, they were left at constant ambient temperature (20 °C) for 24 h and, then, manually cleaned. All extraneous material (plant remains, visible particles) was removed and green apical segments-representing the approx. 3-year growth-were separated from shoots. The cleaned samples were transported for the instrumental neutron activation analysis (INAA) to the Frank Laboratory of Neutron Physics, Joint Institute for Nuclear Research in Dubna (Frontasyeva 2011). The samples were analysed for the concentrations of Na, Mg, Al, Cl, K, Ca, Sc, Ti, V, Cr, Mn, Fe, Co, Ni, Zn, As, Se, Br, Rb, Sr, Mo, Cd, Sb, I, Cs, Ba, La, Ce, Nd, Sm, Tb, Tm, Hf, Ta, W, Au, Th and U.

NAA applied within the IBR-2 reactor provides activation with thermal and epithermal neutrons at low temperatures-convenient for biological samples-and it is equipped with the automatic system for sample transportation and measurement (Pavlov et al. 2016). Neutron flux characteristics and other technical details can be found in work of Frontasyeva (2005). To determine elemental content in moss, samples (approx. 300 mg a piece) were-after drying at 40 °C to constant weight-pelletized and packed in polyethylene and aluminium cups for short-term and long-term irradiation, respectively. Complete information about automation of the process and improvement of the quality of analysis (labelling, storage and recording of analysed samples, irradiations, measurements and systematization of the results of analysis) can be found in (Dmitriev and Pavlov 2013) and (Pavlov et al. 2016).

For short-term irradiation (Al, Br, Ca, Cl, I, In, Mg, Mn, Ti and V isotopes), Channel 2 was used with irradiation time about 3 min. Samples were measured immediately after irradiation for 15 min. For long-term irradiation (Cd), Channel 1 was used with irradiation time around 4 days (epithermal neutrons, flux density φepi = 3.6 × 1011 n.cm−2.s−1). After cooling for 4 days, the samples were repacked and measured twice. The first time, directly after repacking, for 45 min to determine As, Br, Dy, K, La, Na, Mo, Sm, U and W and the second time, 20 days after the irradiation, for 1.5 h to determine Ba, Ce, Co, Cr, Cs, Eu, Fe,Hf, Ni, Rb, Sb, Sc, Se, Sr, Ta, Tb, Th, W, Yb, Zn and Zr. Gamma spectra of activated samples was measured on HPGe detectors (resolution of 1.9 keV for the 60Co 1332 keV line). All the gamma-spectra obtained were processed using GENIE software (CANBERRA 2009), and content of each element in moss was calculated using the certified reference materials and flux comparators via software developed in the FLNP (Pavlov et al. 2016).

The quality control of NAA results was ensured by performing a simultaneous analysis of the reference material. As nuclear reactions and decay processes are virtually unaffected by the chemical and physical structures of the material during and after irradiation, standards with different compositions can be employed (Frontasyeva 2011). Following standard reference materials were used: 2711 Montana II Soil from the National Institute of Standards and Technology (NIST), 1633b Constituent Elements in Coal Fly Ash (NIST) and BCR-667 Estuarine sediment (trace elements) from the Institute for Reference Materials and Measurements (IRMM). The reference materials and 10–12 moss samples were packed together at each transport container. Thus, four measurements of the reference materials were done for each set of samples.

Air pollution modelling

In biomonitoring studies, the elemental content in moss tissues is compared with the European Monitoring and Evaluation Programme (EMEP) deposition modelling results (e.g. Schröder et al. 2014; Schröder et al. 2013; Schröder et al. 2017; Harmens et al. 2012; Pacyna et al. 2009). Nowadays, EMEP provides data on the atmospheric deposition of PM and selected metals on a 0.1° × 0.1° longitude-latitude grid. This resolution of the atmospheric deposition data is not detailed enough to be compared with the present biomonitoring survey .

Therefore, appropriate air pollution modelling in the area was performed. The Czech reference methodology Symos’97 was applied (Bubník 1998). The Symos’97 model is a Gaussian plume model developed by the Czech Hydrometeorological Institute (compare to Benson (1979) or Cambridge Environmental Research Consultants (2017)). This methodology is based on the application of the statistical theory of turbulent diffusion formulated by Sutton (Sutton 1947). Input meteorological data are based on the processing the real meteorological observations (wind direction, wind speed and the average vertical temperature gradient in the mixing layer). The annual average data on respective sources (industry, transport, households) and annual average meteorological data is used. The respective pollution sources are computed separately, which enables the evaluation of their contributions to the total annual concentration in the calculation point later on. To get more accurate concentration values, modelling results are calibrated in accordance with the pollution monitoring data (Merbitz et al. 2012; Hoek et al. 2008). Therefore, modelling output concentrations characterize the pollution distribution more realistically and accurately and the influence of different pollution sources on the air quality in a specific location can be estimated.

Symos’97 enables the computation of pollution dispersion both particulate and gaseous pollutants as well as dry deposition in the mesh of receptor points. The model was implemented in the Python programming language using numpy, pandas and multiprocessing modules, with a gravitational settling speed of 0.5 cm s−1. (Lapple 1961).

The air pollution modelling was performed for PM10 at receptors located on the moss sampling sites for the years 2012 and 2015. The emission data for 2012 were obtained from the emission inventory carried out within the Air Silesia project (AIR SILESIA n.d.), updated within the Air Progress Czecho-Slovakia project (Air Progress Czecho-Slovakia). The emission data for 2015 was acquired from the database of the AIR TRITIA project.

The data regarding the terrain and meteorological data needed for the modelling were also extracted from the results of these projects. The outputs of the modelling were the annual average PM10 concentration and the annual average PM10 dry deposition at each sampling point. Only the results of dispersion modelling were taken into account for further analyses since the results of the deposition modelling were found to be highly underestimated due to insufficiently detailed input PM characteristics and no possibility of calibration for lack of deposition monitoring in the area-only four deposition monitoring sites are present in the area (Czech Hydrometeorological Institute 2016). At each receptor, the contribution of the respective pollution sources to the air pollution at the site was quantified. This comprised the contribution of industrial sources, domestic boilers and traffic. According to these contributions, the prevalent origin of air pollution was determined.

Statistical analyses

All statistical analyses, as well as the visualization of the results, were performed in the R environment (R Core Team 2015). The measurements containing sub-limit values (rounded zeros) have to be removed from the dataset or suitable values have to be imputed instead (Dray and Josse 2015) in order to meet the principle component analysis assumption of the complete dataset. For the imputation, expectation-maximization-based replacement of rounded zeros in compositional data was applied using the impRZilr algorithm present in the robCompositions package (Templ et al. 2011).

The lowest observed non-zero concentrations were taken as a detection limit since neutron activation analysis detection limits vary from sample to sample. The ilr-EM algorithm allows imputation of unique non-zero values under the detection limit (or lowest observed) value. This ensures that no distortion of the multivariate analysis results due either to inappropriate imputation or undesirable removal of the information from the dataset (when denoting them NA) is present.

The dataset with imputed values was further transformed following the principles of compositional data analysis (CoDa) in order to allow relevant multivariate analysis. Since compositional data are non-Euclidean, their transformation into the Euclidean space is required (Pawlowsky-Glahn and Buccianti 2011). The isometric log-ratio (ilr) transformation (Egozcue et al. 2003) was used since it allows the expression of the composition in orthonormal coordinates (hence it better represents distances between points). Although-in comparison with another transformation methods-it prevents the identification of the individual variables (by reducing the n-dimensional space to n-1 dimensions), it is an ideal approach for the analysis of the overall similarity between the elemental composition of the samples collected on the individual collecting sites.

The calculated results of dispersion modelling relevant to the sampling sites (with PM10 concentrations predicted for traffic, local heating and industrial sources) were transformed using centred log-ratio (clr) transformation (Aitchison 2003). This transformation is not orthogonal; on the other hand, it allows identification of the individual variables, which was desirable in this case. Principal component analysis (PCA) followed by hierarchical clustering on principal components (HCPC) was performed on the transformed data to discover the clusters of sampling sites in the FactoMineR package (Husson et al. 2015). The initial clustering based on Ward’s method was supplemented by k-means consolidation to get more robust clusters and more optimal partition in terms of inertia criterion; the maximum number of iterations for k-means set to ten (Le Ray et al. 2009).

For the characterization of the clusters, clr-transformed variables were also used since the ilr transformation, though better suited for the distinction of the clusters, leads to loss of the information on the individual variables; hence, the characterization would be impossible. The values predicted by both the dispersion models-for the years 2012 and 2015-were assigned to the identified clusters, disregarding the cluster comprising only one site and one-way analysis of variance (ANOVA) in order to assess whether the clusters based on the biomonitoring data are characterizable by the values predicted by the models.

Results and discussion

Principal component analysis (PCA) showed that the first two principal components account for more than 47% of the total variation, while point 40 is the most unique observation. No outliers-defined by measurements with a contribution to the plane higher than three times the standard deviation-were detected in the dataset. Nine first principal components-accounting for 82.8% of explained variation-had eigenvalues higher than one. On these nine principal components-or, more precisely, on the scores of the measurements on these principal components-agglomerative hierarchical clustering (HCPC) was performed; the rest of the variation was regarded to be a random fluctuation (statistical noise).

Five distinct clusters could be observed, while one sampling point-the aforementioned unique point 40-had its own cluster. The five-cluster cut of the dendrogram was the most reasonable, mainly because of the highest relative inertia loss. The resulting clusters are plotted on the results of the PCA in Fig. 1. All the clusters are distinctly divided alongside the first axis (Dim 1), only Clusters 3 and four 4 are more distinguished in their scores on the second axis (Dim 2). Cluster 2 appears to be the most heterogeneous, while Cluster 3 is the most homogenous of all the clusters (disregarding the one site forming a unique Cluster 1).

Fig. 1
figure 1

Hierarchical clustering on principal components (HCPC): resulting clusters

Interestingly, the species of the moss collected had no influence on the clustering. This may be due to the fact that the interspecies differences were negligible or that they were eliminated either by the transformation of the data or during the first step of analyses-the PCA pre-treatment. Indeed, when the PCA was performed on untransformed data, significant relationship of species and the first component was revealed; this was, however, not the case when transformed data were assessed. Moreover, the process of performing the clustering on principal component is able to disregard the less important sources of variation in the dataset.

Тhe characterization of resulting clusters is presented in Table 1.

Table 1 Elements most characteristic for the clusters (clr-transformed values)

Site 40 formed a unique cluster (Cluster 1), and, since the cluster had too little observations to make a conclusive comparison with the modelling results, it was excluded from the further analysis. The sample on this site was taken after the rainy period, which could explain that elements connected with the crustal composition are lower than average and physiological elements (K, Mg) higher than average. The higher than average concentration of Rb, Cs and Zn indicates the association with primary ferrous metallurgy (Hlínová 2005; Alleman et al. 2010; Larsen et al. 2008). Higher relative concentrations of Zn imply a possible connection to Cluster 3, which is also supported by the geographical proximity of the site and the sites comprising this cluster.

Cluster 3 is characterized by the elevated content of Fe, Mn, Cr, W. These elements are typical for the iron- and steelworks-related pollution. (Hlínová 2005; Alleman et al. 2010). Mn is a common element in austenitic steels produced in local steelworks, while Cr and W are important solutes for steel alloying in order to obtain special properties of steel (Ghosh and Chatterjee 2010). Thus, the cluster can be deemed to be most affected by the metallurgical industry in the surveyed region. This is further shown in Fig. 2, where the sampling sites and their corresponding clusters are plotted along with the dominant sources of the pollution in the area (iron and steelworks). Apart from site 9 (and partially site 23), all sites belonging to Cluster 3 are in the vicinity of these pollution sources. In the case of the sampling points 19, 41 and 24, the dispersion of the pollution from the steelworks in the city of Třinec is further strongly corroborated by the wind rose (Fig. 2) displaying the general direction of the wind in 2012. In the valley delimited by two mountain ranges, on the northeast and northwest, the respective sites belong to the same cluster affected by the iron and steelworks industry. Sampling point 23 located at the mountain slope in the Protected Landscape Area Moravskoslezské Beskydy, somewhat apart from the sub-cluster forming around Třinec, may be influenced by both of the Ostrava and Třinec Steelworks. Since there are no other sources of pollution in the vicinity, long-range transport could be the source of pollution in this sampling point.

Fig. 2
figure 2

Map of the surveyed area. Sampling points are coloured respective to their clusters. Wind roses demonstrate prevailing winds surrounding the primary sources of pollution

In the case of Cluster 2, the origin of the pollution is not as clear, although the correlation revealed a statistically significant positive correlation between the relative concentrations of Fe and Cr, Co and Zn (Pearson correlation coefficient of 0.82, 0.83 and 0.74, respectively). This could indicate a relation with the metallurgical industry in the region once again (Raclavská et al. 2014). Ca relative concentration was significantly positively correlated with the relative concentration of Mg and Ti (r = 0.78 and 0.7, respectively), which could imply a connection with metallurgical industry as the correlation with other elements connected with the crustal layer is absent. Ca and Mg constitute base additives used in almost each step of steel making process from agglomeration and blast furnaces (dolomitic limestone, limestone, dolomite), to steel making (lime, magnesite), and Ti is an important solute (Geerdes et al. 2015; Sylvestre et al. 2017); this is further supported by the fact that the correlation with other elements connected with the crustal layer is absent.

Cluster 4 seems to be comprised of well-prospering mosses, as the concentrations of elements related to the proper vital function are high (K highest within all clusters) and the respective sampling sites were in green localities (woods, clearings, etc.) and, hence, they were less influenced by anthropogenic activities. The bivariate correlation assessment exposed a statistically significant positive correlation between the relative concentrations of Na and both K and Cl (r = 0.6 in both cases). The content of Na and Cl higher than average together with lanthanoids (Nd, Tm) can imply also the influence of crustal contamination or mining (Matýsek et al. 2014).

Cluster 5 seems to represent sites contaminated by mineral dust as the elements connected with the crustal composition are significantly positively correlated Al-Ta, Ti, V, Hf, La (r = 0.92, 0.9, 0.85, 0.83 and 0.8, respectively).

In the case of the 2012 models (Fig. 3), ANOVA showed a significant difference between the identified clusters in PM10 values typical for industrial pollution sources for the dispersion model (p = 0.0102). In the case of the 2015 models, no significant relationship between the observed clusters and predicted PM10 values was revealed at all.

Fig. 3
figure 3

PM10 concentrations as predicted by the 2012 (left) and 2015 (right) model. a, b Linear sources. c, d Local sources. e, f Industrial sources

Apparently, biomonitoring data-in particular, the characterization of the sampling sites by their clustering-reflects the pollution in the studied region with a delay. The 2012 model revealed an association of the predicted PM10 concentration values and the biomonitoring-derived clusters, while the model for the year of the biomonitoring survey sampling (2015) revealed no association at all. This accords with and confirms the most elemental assumption of moss biomonitoring methodology (Frontasyeva and Harmens 2014)-that moss accumulates pollutants from the atmosphere for a more prolonged period of time. Given that the recommendations of the ICP Vegetation Monitoring Manual (ICP Vegetation 2014) leads to collection of material up to 3 years old, no revealed association between the biomonitoring results and the model based on the situation in the same year (2015) was to be expected. Although dispersion modelling for all the 3 years prior to the biomonitoring was not possible mainly due to absence of historical records at the Polish part of the study area, the modelled year 2012 can be deemed as representing the preceding pollution load well. Years 2013–2015 were, according to Czech Hydrometeorological Institute (2016), rather similar in terms of air pollution, while it was significantly lower than in the years before (2010–2012). Association with the modelled air pollution for the year 2012 is, hence, reflective of the ability of moss to retain pollutants accumulated in the years prior to the biomonitoring survey. Furthermore, as the most important and specific local pollution sources define the concentrations of particular elements in the moss tissue over a longer period, this pollution can trace back even after emissions reductions or a shutdown.

Conclusions

Multivariate analysis of the results of the biomonitoring survey in a heavily polluted region performed on the properly transformed data revealed clusters of sampling sites closely related to the known pollution sources and the geographical aspects of the assessed region. When compared with the dispersion model-predicted PM10 concentrations related to the three prevailing sources of pollution, the resulting clusters associated with the industry, specifically iron and steelworks, were identified. The comparison of the modelling and biomonitoring in this study is novel, and it confirms the presumed relationship between the accumulated pollutants in the moss and the pollution in the surveyed region. Since the moss reflected the pollution state years prior to the sampling and not the state contemporary to the sampling, this study brings further confirmation of the fact that moss biomonitoring reveals atmospheric conditions typical for a period of time prior to the sampling.