Introduction

Concentrations of airborne particulate matter (PM) and their attribution to specific sources represent an important and actual research topic. The possibility to discriminate between different types of sources and between natural and anthropogenic contribution is of outmost importance, especially in areas having legislation threshold exceedances, to plan efficient remediation and mitigation strategies. In addition, the Directive 2008/50/CE (EC 2008) states that if natural aerosol contributions to atmospheric pollutants in ambient air can be determined with sufficient certainty, and where exceedances of a legislation limit are due entirely (or in part) to these natural contributions, these may be deducted when assessing compliance with air quality standards. This is of particular relevance in those countries where the natural contributions could often influence PM levels, like, for example, the Mediterranean region. Advection of Saharan dust can significantly increase aerosol levels in the Mediterranean area, and the contribution of this source is larger, on average, moving from North to South Europe and from west to east (Querol et al. 2009; Pey et al. 2013). The advection of Saharan dust is particularly frequent in South Italy (Contini et al. 2014a), and it is often accompanied to relevant contributions of marine aerosol in the coarse fraction of aerosol (Contini et al. 2014b). Therefore, it is important to discriminate between different types of sources (local or not local) and then between the natural and anthropogenic ones to design effective mitigation strategies.

Source apportionment (SA), applied to atmospheric aerosol, is the practice of deriving information about aerosol sources and their contribution to measured ambient PM concentrations. This task can be accomplished using direct methods (source-oriented models) or inverse method (receptor models (RMs)). RMs have been extensively used to estimate the contribution of emission sources to atmospheric PM in specific sites (Bove et al. 2014; Belis et al. 2013; Argyropoulos and Samara 2011; Alleman et al. 2010; Viana et al. 2008a; Koçak et al. 2009; Mazzei and Prati 2009; Nicolás et al. 2008; Pandolfi et al. 2008; Karar and Gupta 2007; Hopke et al. 2006; Watson and Chow 2005; Rodrıguez et al. 2002). There is a large variety of receptor models used in the last 30 years, which are based on different mathematical approaches, but, in the period 2000–2012, a shift from principal component analysis and classical factor analysis to positive matrix factorization was observed (Belis et al. 2013).

The analysis of the application of RMs to atmospheric PM reveals a certain variability in the proposed analysis, particularly in terms of the “working variables” used (number of samples present in the dataset and number of chemical species employed). The chemical species included in the SA play an important role in the identification of sources, given the difficulties in the characterization of sources with similar chemical profiles. For example, particles of crustal origin in residential/urban areas are often due to the simultaneous presence of a local contribution (such as re-suspension processes from the unpaved soil or the road dust) and, at least for the Mediterranean region, a contribution of long-range advection of Saharan dust (Cesari et al. 2012; Pietrodangelo et al. 2013; Contini et al. 2014a). These sources, although different, have extensive similarities in the chemical profiles that may make the characterization of the crustal contribution very difficult. Further, marine aerosols can be a source of ambiguity in SA works as it can be present either in the form of “fresh” contribution and in the form of “aged” contribution with a different role of nitrates (Contini et al. 2014b). Moreover, the phenomena of Cl depletion can strongly modify the ratio Cl/Na in relation to the characteristics and to the typical meteorological conditions of the site (Zhao and Gao 2008).

Previous studies comparing the results of different RMs on the same datasets showed that the number and the estimated contributions of sources identified with different models may be different (Contini et al. 2012; Favez et al. 2010; Hopke et al. 2006; Larsen et al. 2008; Viana et al. 2008b; Stortini et al. 2009; Amato et al. 2009a; Tauler et al. 2009). This variability has been associated with the different theoretical approaches behind the models used. The need for standardization of RM application to source apportionment of atmospheric PM brought the European Commission’s Joint Research Centre (JRC) to organize an inter-comparison exercise for application of RMs to have a better understanding about the performances of different source apportionment methodologies and the comparability of their outputs (Karagulian et al. 2012).

The present work aims to perform an inter-comparison of PM10 SA results obtained at three measurement sites: an Italian urban background site (Lecce), a Spanish urban background site (Barcelona), and a Spanish industrial site (Algeciras). The inter-comparison is performed using two receptor models (principal component analysis (PCA) and positive matrix factorization (PMF)) to investigate their performances in source identification (chemical profiles), in the quantification of source contributions, and in the stability and robustness of results as function of the number of chemical species included in the source apportionment.

Methods

Description of the datasets used

Three different datasets were used in this work. Two of them, obtained in Spain, were collected by the Department of Geosciences of the Institute of Environmental Assessment and Water Research (IDAEA)-Spanish Council for Scientific Research-CSIC (Barcelona, Spain). The third one, obtained in Italy, was collected by the Institute of Atmospheric Sciences and Climate, ISAC-CNR (Lecce, Italy).

The first Spanish dataset was collected in Barcelona (in the following also indicated as BCN), from 2003 to 2007, at an urban background monitoring station, using MCV high-volume (30 m3/h) samplers equipped with DIGITEL PM10, PM2.5, and PM1 inlets. Particles were collected daily on quartz fiber filters and chemically analyzed following the procedures described by Querol et al. (2001) using elemental analyzer for total carbon (TC), inductively coupled plasma mass (ICP-MS), and atomic emission spectrometry (ICP-AES) for determination of elemental concentrations, ion chromatography for NO3 and Cl, and specific ion electrode for NH4 +. In this work, the analysis was focused on the PM10 fraction, and the chemical species used in source apportionment were Al, Ca, K, Na, Mg, Fe, Mn, Ti, P, S, V, Cr, Ni, Cu, Zn, As, Rb, Sr, Cd, Sn, Sb, Pb, NH4 +, NO3 , Cl, and total carbon (TC). In total, the dataset has 243 samples and 26 chemical species with a characterization of 55 % of PM10 mass. Major details about this dataset are available in Amato et al. (2009a).

The second Spanish dataset was collected in the Bay of Algeciras (in the following also indicated as AL), from 2003 to 2007, at four urbanized areas classified as urban background with industrial influence sites, using high-volume samplers equipped with PM10 and PM2.5 inlets (TISCH or Grasbey-Andersen, 68 m3/h). Particles were collected daily on quartz filters, and determination of major and trace elements was performed by a combination of analytical tools including ICP-MS, ICP-AES, ion chromatography, selective electrode, and elemental analysis, according to the methodology described by Querol et al. (2008). Also for this dataset, the analysis was focused on the PM10 fraction characterized by the following chemical species: Al, Ca, K, Na, Mg, Fe, Mn, Ti, P, V, Cr, Ni, Cu, Zn, As, Se, Rb, Sr, Sn, Sb, Pb, Li, La, Cl, NH4 +, NO3 , SO4 2−, and TC. In total, the dataset has 567 samples for 28 chemical species representing 60 % of PM10. Major details about this dataset are available in Pandolfi et al. (2011).

The third dataset was collected in Lecce (in the following also indicated as LE), at an urban background site. PM10 has been simultaneously collected on Teflon and quartz filters. Soluble ionic species, SO4 2−, NO3 , NH4 +, Cl, Na+, K+, Mg2+, and Ca2+, have been analyzed via high-performance ion chromatography (HPIC), while elements Ni, Cu, V, Mn, As, Pb, Cr, and Sb have been analyzed via graphite furnace atomic absorption spectroscopy (GF-AAS) and elements Fe, Al, Zn, and Ti by ICP-AES. In total, the dataset has 91 daily samples, collected between January 2007 and January 2008, with 17 chemical species corresponding to a characterization of 40 % of PM10 mass. Further details about this dataset are available in Contini et al. (2010). The number of samples for the LE site is significantly lower with respect to AL and BCN sites; however, it is a sufficiently large number to obtain stable results in factor analysis according to typical thresholds indicated in different statistical criteria (Henry et al. 1984; Thurston and Spengler 1985).

It is worth to specify that source apportionment results for the three datasets have been already published in other scientific works (Amato et al. 2009a; Contini et al. 2010; Pandolfi et al. 2011). However, the main difference between this work and the previous publications is that this study is focused only on the PM10 fraction (as this is the only fraction commonly present for the three sites); thus, PM2.5 and PM1 fractions were excluded for BCN dataset and PM2.5 fraction for AL dataset. For each samples collected at the three sites, uncertainties on concentrations of chemical species were estimated taking in account errors coming from the analytical procedure, from the subtraction of blank filters for the different chemical species, and from the errors related to sampling procedure as described in Amato et al. (2009a) for the Barcelona dataset, Pandolfi et al. (2011) for the Algeciras dataset, and Contini et al. 2010 for the Lecce dataset.

In Table 1, the average concentrations of PM10 and of the different chemical species, together with minimum and maximum concentration values, are reported.

Table 1 Average, minimum, and maximum concentrations of PM10 and of the different chemical species analyzed in the three sites

Datasets handling for sensitivity tests

The number of chemical species is different in the three datasets, and, specifically, the LE has a significantly lower number of chemical species available with respect to BCN and AL datasets. Therefore, it has been decided to perform two types of analysis with receptor models. The first analysis using the complete datasets (i.e., the original ones) and the second on incomplete datasets in which the BCN and LA datasets have been reduced in order to have more comparable sets of chemical species in the three datasets. Specifically, in the BCN dataset, elements excluded were TC, Ti, P, Rb, Sr, As, Cd, Sn, and Sb; all these chemical species accounted for 20.8 % of PM10 mass (TC alone accounting for 20.6 %). In the AL dataset, elements excluded were TC, Li, P, Ti, As, Se, Rb, Sr, Sn, Sb, and La; all these chemical species accounted for 12.9 % of PM10 mass (TC accounting for 12.7 %). The inter-comparison of SA performed on complete and incomplete datasets allowed to study how the presence or absence of specific chemical species could influence the SA results and if the models provide stable solutions in terms of both source profiles and source contributions.

The PCA and PMF receptor models

Source apportionment was performed using receptor models, which are based on the mass conservation principle:

$$ {x}_{ij}={\displaystyle \sum_{k=1}^p{g}_{ik}{f}_{kj}}+{e}_{ij}\kern0.5em i=\mathsf{1},\ \mathsf{2}, \cdot \cdot \cdot,\ m\kern0.5em j = \mathsf{1},\ \mathsf{2}, \cdot \cdot \cdot,\ n $$
(1)

where x ij is the j th species concentration measured in the i th sample, g ik is the contribution of the k th source to i th sample, and f jk is the concentration of the j th species in k th source where e ij is the residual for each sample/species. In the case in which both number/nature of aerosol sources f jk, and their contributions g ik, are unknowns, factor analysis approach, such as the PCA (Henry and Hidy 1979) or the PMF (Paatero and Tapper 1994; Paatero 1997), are useful tools to solve Eq. (1). Even if PCA has been used since 1960s to aerosol data (Blifford and Meeker 1967), nowadays the PMF is widely used in SA works: about 36 % of the European SA studies performed between 2001 and 2010 were based on this technique (Karagulian and Belis 2012). It should be noted that the principal difference between PCA and PMF is the non-negativity of factors (both loadings and scores) that is built into the PMF model and the use of individual data uncertainties. Moreover, PMF does not rely on information from the correlation matrix but utilizes a point-by-point least squares minimization scheme. Therefore, the profiles produced with PMF model can be directly compared to the input matrix without transformation. Instead, the matrix of loads of PCA is non-dimensional, and it should be coupled with a multi-linear regression analysis (Thurston and Spengler 1985) to obtain the contribution of the different principal components to the measured concentrations.

In this work, both PCA and PMF approaches have been used to estimate source contributions and source profiles to PM10 for the different datasets, and results have been inter-compared. The PCA method used is essentially that described in Thurston and Spengler (1985), coupled with multi-linear regression analysis (PCA-MLRA), and it has been applied to Z-scores using the tool STATISTICA v. 11. The varimax rotation has been applied. Finally, the uncertainties of reconstructed PM10 concentrations have been evaluated propagating the standard deviation of the fit coefficients.

Positive matrix factorization analysis was performed using the software EPA PMF3.0 (Norris et al. 2008). The analysis of the S/N ratio (Paatero and Hopke 2003) allowed considering all species as “strong” for the three sites. The parameters IM (the maximum individual column mean), IS (the maximum individual column standard deviation) obtained from the scaled residual matrix, and Q (the object function) help in supporting the choice of the right number of factors in PMF applications (Lee et al. 1999; Viana et al. 2008b). However, the final solution is a compromise between the trends of these parameters and the physical meaning of the factors obtained. For each site, different runs from four to nine factors were examined to find the optimal PMF solution. For AL and BCN datasets, IM was the most sensitive parameter to changes in the number of factors with a sudden drop at six factors and a relatively constant value from seven factors. Considering the bootstrap mapping (Results of source apportionment of complete datasets section) and the interpretability of the factors/profiles, we retained seven factors as final solution for both complete and incomplete datasets. For LE dataset, the parameter IM was not very definitive in the choice of the number of factors; however, it was observed that PMF solutions became unstable for a number of factors larger than four with several runs non-convergent. A sudden drop of the Q/Q expected ratio was observed moving from three to four factors, and the bootstrap mapping became worse when more than four factors were included (Results of source apportionment of complete datasets section). Considering that the four factors had also a reasonable physical interpretation, it was chosen as final solution.

Afterwards, in order to investigate the rotational ambiguity of PMF results, the solutions with F peak coefficient in the interval between −1 and 1 was explored. However, no significant improvements of results were gained so that the standard value F peak = 0.0 (no rotation) was chosen. The uncertainties on the PMF results were obtained using the bootstrap (Heidam 1987). Bootstrap was done, for all sites, with the following settings: 100 runs with random seed, correlation threshold 0.6, block size 6.

Results of source apportionment of complete datasets

Results of SA performed on the three original datasets are reported in this section. The application of PMF allowed identifying the profiles reported in Fig. 1. The matrices of loads found with the PCA are reported in Fig. 2.

Fig. 1
figure 1

Chemical profiles of the sources identified using PMF on complete datasets

Fig. 2
figure 2

Chemical profiles of the sources identified using PCA on complete datasets

BCN dataset

PCA and PMF models identified, respectively, six and seven different sources contributing to PM10 mass, explaining about 81 % of the total variance (for the PCA solution):

  • Crustal, characterized by tracers Al, Ca, Ti, and, to a lesser extent, Fe, K, Mg2+, Mn, P, Rb, and Sr.

  • Traffic, characterized by TC, Sb, Sn, Cu, and Cr.

  • Industrial, characterized by Pb, Cd, Zn, and As. This could be associated with a mixed influence of activities in the area such as smelters and cement kilns (Amato et al. 2009a).

  • Secondary inorganic aerosol (SIA), characterized by ions NH4 + and NO3 and with the presence of element S likely related to sulfates. PMF was able to separate the contributions of secondary sulfate from secondary nitrate; instead, PCA analysis identified a single source associated with SIA. Therefore, for inter-comparison purposes, a single PMF profile (called SIA) has been obtained combining secondary sulfate and secondary nitrate.

  • Marine, characterized by Na+, Cl, and by Mg2+.

  • Heavy oil combustion, characterized by V and Ni. These elements could be considered of industrial origin and also due to ship emissions (Viana et al. 2009; Cesari et al. 2014).

The two receptor models applied to the BCN dataset show very similar component/factor profiles; the main discrepancy is that PMF identifies as SIA two separate factors: “secondary sulfate” (with high levels of S and NH4 +) and “secondary nitrate”(with high levels of NO3 ). These differences are often present in SA studies in which different receptor models are used (Amato et al. 2009a; Stortini et al. 2009) and could be related to the different mathematical approach of the used models. The bootstrap mapping showed that each factor was matched well with crustal, 0 % unmapped (for both complete and incomplete dataset); traffic, 0 % unmapped (for both complete and incomplete dataset); industrial, 4 and 0 % unmapped (for complete and incomplete dataset, respectively); sulfate, 4 and 1 % unmapped (for complete and incomplete dataset, respectively); nitrate, 0 % unmapped (for both complete and incomplete dataset); marine, 0 % unmapped (for both complete and incomplete dataset); and heavy oil combustion, 0 and 1 % unmapped (for complete and incomplete dataset, respectively).

AL dataset

PCA and PMF models identified, respectively, six and seven sources contributing to PM10, explaining about 75 % of total variance (for PCA solution):

  • Crustal, with major elements/tracers as Al, Ca, Ti, K, Fe, Rb, Sr, and Li.

  • Industrial, characterized by Cr, Mn, Ni, Pb, and Zn.

  • Marine, characterized by Na+, Cl, and by Mg2+.

  • Traffic, characterized by TC, Sb, Sn, and Cu.

  • Secondary inorganic aerosol, characterized by ions NH4 +, NO3 , and SO4 2−. Also in this case, the PMF individuates two separate profiles (secondary sulfate and secondary nitrate), but PCA only found a single SIA profile.

  • Industrial_2/heavy oil combustion, characterized by key species as As, Se, Sn, and Pb in PCA component (indicated as industrial 2) and by V and Ni in PMF factor (indicated as heavy oil combustion).

Similarly to what has been observed in the BCN dataset, RMs show differences in the component/factor profiles associated to SIA. Specifically, PMF separates principal component SIA in two factors, labeled as NaNO3 (with high levels of Na+ and NO3 ) and secondary sulfate (with high levels of SO4 2− and NH4 +). Another difference is observed in industrial emissions. PCA found a component labeled as industrial_2 that is essentially different from the factor found by PMF that was labeled heavy oil combustion. Again, these differences could be related to the different RM mathematical approach. The bootstrap mapping for AL dataset showed that each bootstrap factor was reasonably matched: crustal, 0 % unmapped (for both complete and incomplete dataset); industrial, 0 % unmapped (for complete and incomplete dataset, respectively); marine, 0 % unmapped (for both complete and incomplete dataset); traffic, 0 % unmapped (for both complete and incomplete dataset); sulfate, 0 % unmapped (for both complete and incomplete dataset); nitrate, 3 and 1 % unmapped (for complete and incomplete dataset, respectively); and industrial_2/heavy oil combustion, 23 and 0 % unmapped (for complete and incomplete dataset, respectively). It is interesting to observe that the largest variability on bootstrap mapping was observed for the mixed contribution industrial_2/heavy oil combustion.

LE dataset

PCA and PMF models identified five and four sources, respectively, explaining about 76 % of total variance (for PCA solution):

  • Crustal, with major elements as Al, Ca2+, K+, Fe, and Mn.

  • Marine characterized by Na+, Cl, and Mg2+.

  • Traffic, characterized by Cu and Pb.

  • SIA, characterized by ions NH4 +, NO3 , and SO4 2−, having a contribution from V and Ni.

  • Industrial, characterized by Zn, Cr, and to a lesser extent V. This component has been found only with PCA, and it was not found using PMF. This is the most difficult in the interpretation and is likely of anthropogenic origin possibly due to transported industrial emissions even if the load in Zn could also suggest a certain contribution from traffic. However, its absolute contribution to measured concentrations (0.5 ± 0.6 μg/m3, Table 1) is essentially negligible.

The bootstrap mapping for LE dataset indicated a reasonable match for SIA, 3 % unmapped; crustal, 6 % unmapped; and marine, 0 % unmapped. A worse result was obtained for the traffic factor (28 % unmapped) probably as a consequence of the lack of specific markers for this source (such as TC, for example).

In this case, RMs present a difference in the number of factors/components individuated by PCA and PMF. Further, the SIA is not separated in secondary sulfate and secondary nitrate by the PMF as in the Spanish sites. It is also interesting to observe that, contrarily to what happens for BCN and AL datasets, the PMF applied to LE dataset does not separate V and Ni in a heavy oil combustion factor; rather, they are included in secondary inorganic aerosol. This indicates the presence of common sources of SIA and V and Ni that could be identified in long-range transport of emissions from the industrial area of Brindisi and from ship emissions (Contini et al. 2010). This difference with respect to the SA in Spanish sites could be site-specific and explained considering that the LE site is located much more distant from heavy oil combustion sources (about 30 km from the Brindisi harbor-industrial area and about 80 km from the Taranto harbor-industrial area) with respect to BCN and AL sites where these elements are mainly coming from city harbors. Therefore, the contribution of V–Ni sources is smaller and more difficult to be separated from SIA.

In Table 2, the typology of sources identified with their contributions to PM10, in terms of μg/m3, and uncertainties are shown. The table also includes the unexplained mass that appears reasonably small in all cases. This work is not based on the analysis of the contributions of the different sources found at the three sites; rather, it is focused on the inter-comparison of receptor model outputs. However, an analysis of the seasonal trends of the different factors found was performed, together with an analysis of the differences between weekdays/weekends, to have a more robust identification of the different factors/sources. In BCN and AL sites, the typical pattern of nitrate and sulfate was observed, with larger contribution of sulfate during the hot seasons (spring and summer), as a consequence of the increased photochemical activities. This is opposite to the pattern of nitrate that is thermally unstable and presented lower contributions during the hot seasons. At LE site, it has not been possible to investigate separately nitrate and sulfate because a single factor SIA was obtained. However, seasonal trends reported in Contini et al. (2010) for this dataset showed that NO3 was lower during summer and SO4 2− was higher, but the sum of secondary nitrate and secondary sulfate was almost constant. Moreover, the heavy oil combustion factors identified for BCN and AL datasets (in the latter is more properly “industrial_2/heavy oil combustion”) showed higher contribution during the spring and summer probably as a consequence of a stronger influence of tourist ship traffic in the warm period. Finally, a higher traffic contribution was observed in weekday with respect to weekends for AL and LE dataset. In the BCN dataset, only weekday samples were present, and this comparison was not possible.

Table 2 Absolute average contributions and uncertainties obtained with PCA and PMF receptor models for the three complete datasets

The comparison between daily PM10 concentrations measured and reconstructed by the PCA and PMF models is quite good for all cases, as shown in Table 3. Table 3 shows the Pearson correlation coefficients, which is commonly used in literature to compare source contributions (Hopke et al. 2006; Pandolfi et al. 2008); the slopes and the determination coefficient (R 2) obtained with a linear fit of measured and modeled PM10 passing through the origin. Table 3 also includes three statistical parameters computed to better quantify the differences between solutions reported for measured and modeled data: the root mean square error (RMSE), the absolute fractional bias (AFB), and the weighted difference (WD) defined as

Table 3 Comparison of measured and modeled daily PM10 concentrations using PCA and PMF
$$ RMSE=\sqrt{\frac{1}{m}{\displaystyle \sum_{N=1}^m{\left({X}_N-{Y}_N\right)}^2}} $$
(2)
$$ AFB=2/m{\displaystyle \sum_{N=1}^m\left|{X}_N-{Y}_N\right|}/\left({X}_N+{Y}_N\right) $$
(3)
$$ WD=1/m{\displaystyle \sum_{N=1}^m\frac{\left|{X}_N\right.-\left.{Y}_N\right|}{\sqrt{s_N^2+{r}_N^2}}} $$
(4)

In Eqs. (2), (3), and (4), m is the total number of samples, X N and Y N are the measured and modeled PM10 concentrations, while s N and r N are their uncertainties. Particularly, the RMSE is an indicator of the spread of the two time series; AFB is an indicator of the agreement of the average concentrations (the range of acceptability is considered between 0 and 2); the WD is commonly used to test the relationship of the distance between two time series considering their uncertainty (the range of acceptability is considered between 0 and 2). Analyzing results in Table 3, almost unitary values for the slopes and the correlation coefficients indicate a good average comparison of PM10 reconstruction for both models at all sites. However, the RMSE values indicate a certain level of scatter in the data; WD and AFB are in the acceptable ranges.

We have observed similar patterns for all sites: the crustal contributions found with PCA are larger than those found by PMF, and the SIA contributions found by PCA are lower than those found by PMF. It is interesting to observe that this happens in BCN and AL sites, in which PMF separates secondary sulfate and secondary nitrate, but also in LE site in which this separation does not takes place. This trend is compatible with the results reported by Callén et al. (2009) and Viana et al. (2008b), and it is presumably related to the different operating method of the mathematical model and and/or to the different role of measurement uncertainties in PMF. Crustal contribution can be calculated from measured concentrations summing the concentration of elements (as metal oxides) generally associated with mineral dust: Al, Si, Ti, and Fe plus the insoluble fraction of K and Ca, indicated with asterisks, as 1.15 (1.89 Al + 2.14 Si + 1.67 Ti + 1.4 Ca* + 1.2 K* + 1.36 Fe) and carbonates calculated from non-sea salt calcium and magnesium as 1.5 nss-Ca2+ + 2.5 Mg2+ (Marcazzan et al. 2001; Perrino et al. 2014). The factor 1.15 takes into account sodium and magnesium oxides. The ratio between soluble and total concentration of Ca, K, and Mg measured for LE site (Contini et al. 2010) were also used for the BCN and AL sites. The Si concentrations, not measured, were estimated considering that 3 Al2O3 = SiO2 thereby giving Si = 2.65 Al. These estimation gave 6.3 μg/m3 (±0.7 μg/m3) for LE, 13.3 μg/m3 (±1.1 μg/m3) for BCN, and 9.2 μg/m3 (±0.8 μg/m3) for AL. Comparing these values with the results in Table 2, it appears that PCA is in good agreement with the stoichiometric evaluation for AL site with PMF giving a slight underestimation; for BCN, the stoichiometric values are intermediate with respect to PCA and PMF estimations; instead, for LE, both models overestimate with respect to stoichiometric calculations. Therefore, a general conclusion regarding model performances could not be obtained from this comparison because the results appear to site-dependent.

Site-dependent differences between models are observed for traffic and marine contributions. The traffic contribution estimated with PMF is lower than that estimated with PCA in BCN and AL sites, but it is comparable in the LE site. The marine contribution estimated with PMF is higher than that estimated with PCA in BCN and LE sites, and it is lower in the AL site. It has to be noted that this differences should be interpreted taking into account the details of the source profiles identified. Even if source typology by PCA and PMF are quite similar and are consequently labeled similarly, it possible to observe some differences in the chemical profiles provided by the two RMs. For example, focusing the attention on the marine source, the different source profiles can explain the difference observed in the estimated contributions. As a matter of fact, in AL, the PCA marine component is loaded with NO3 , suggesting that it is an aged marine aerosol, including a portion of NaNO3 that the PMF model associates to secondary nitrate. The PMF marine is instead a fresh marine that does not include a significant contribution of NO3 . This could explain why the PCA contribution is higher than PMF contribution. Instead, in LE site, both RMs identify an aged marine aerosol loaded with nitrate, but the PMF profile also includes traces of Ca2+, Cr, and other metals suggesting a possible mix with the crustal source. In BCN site, both RMs seem to identify a fresh marine aerosol. These differences could be related to the different characteristics of the measurement sites investigated. The BCN site, which is close to the seaside, probably is directly influenced by fresh sea spray, and RMs identified this behavior. Instead, LE site is quite distant from the coastline (minimum distance about 13 km), thereby fresh sea spray ages along the path toward the measurement site. Finally, even if AL measurement site is in a bay and then very close to the seaside, it may be influenced by the emissions of the harbor area, and then, the fresh sea spray could interact with some pollutants that are commonly produced in the close industrial sites, like NOx, producing sea salt enriched in NO3 (aged sea spray). Sea spray contributions could be evaluated from measured concentrations, as Cl + 1.4468 Na+, based on the assumption that all Na+ and Cl are coming from marine contribution (Contini et al. 2010). Results gave 1.9 μg/m3 (±0.1 μg/m3) for LE, 2.1 μg/m3 (±0.3 μg/m3) for BCN, and 4.3 μg/m3 (±0.5 μg/m3) for AL. Comparing these values with the results in Table 2, it follows that PCA estimations of marine contribution are in good agreement with stoichiometric calculations for BCN and LE with PMF presenting an overestimation. Instead, for AL, the opposite happens. This means that it is not possible to conclude, in general, that one model is better than the other for evaluation of marine contributions because the results are site-specific similarly to what has been observed for crustal contribution.

In Fig. 3, it is reported the comparison of PM10 daily concentrations reconstructed by PCA and by PMF. Despite the differences observed in the single source contributions, it appears that daily PM10 concentrations reconstructed by PCA and PMF are in good agreement with high determination coefficients for all datasets analyzed: LE has R 2 = 0.87, BCN has R 2 = 0.97, and AL has R 2 = 0.95.

Fig. 3
figure 3

Comparison of PM10 daily concentrations reconstructed by PCA and by PMF for the three datasets. The estimated average uncertainty for both models is ranging between 4–6 % with an interval (minimum–maximum) of 3–19 %

Effect of chemical species included in the datasets on source apportionment

The Spanish datasets were analyzed via PCA and PMF with complete datasets and also using reduced (incomplete) datasets (chosen as discussed in Datasets handling for sensitivity tests section) in order to perform a sensitivity test to the number of available species in the datasets.

Identification and characterization of source profiles

The inter-comparison of the factor/components found using complete and incomplete datasets showed that PMF is able to identify the same factors proving to have a good stability at both sites. However, some chemical profiles are more difficult to be identified by the investigators like, for example, the traffic source (Fig. 4 for BCN and Fig. 5 for AL) given the more limited number of tracers available. This difficulty is due to the absence of chemical markers such as TC, Sn, and Sb, since these species are important markers for the traffic, both in terms of direct emissions (TC) and in terms of non-exhaust emissions, such as the Sb and Sn in the case of “brake wear emission,” (Wåhlin et al. 2006; Amato et al. 2009b; Gietl et al. 2010). There is also a difficulty in identifying the industrial source, due to the similarity between the chemical profiles of emissions (with Zn and Pb as key species) typically associated also to traffic (Fig. 4). This observed trend is in agreement with the results in Hopke et al. (2006) in which it is reported that sources as crustal, sulfate (or, commonly, secondary aerosol), oil, and sea salt were most unambiguously identified in RMs, despite various investigators and different methods employed. Instead, traffic sources are not as well correlated among results present in literature. Probably, this trend could be related to the presence of several markers (related to fuel type used or to different types of engines) which could make difficult identifying of this source.

Fig. 4
figure 4

Comparison of traffic profiles (a, b) and industrial profiles (c, d) obtained with PCA (only positive values are shown) and PMF using the complete and incomplete datasets for the Barcelona site. The species in the black rectangle are those eliminated in the incomplete dataset

Fig. 5
figure 5

Comparison of traffic profiles (a, b) and industrial profiles (c, d) obtained with PCA (only positive values are shown) and PMF using the complete and incomplete datasets for the Algeciras site. The species in the black rectangle are those eliminated in the incomplete dataset

In AL dataset, PCA analysis of complete/incomplete datasets shows a different result in terms of identified sources. Particularly, while sources like crustal, SIA, marine, and traffic in both solutions are the same, the component labeled industrial_2 presents in the complete dataset, (with high loading of As, Se, Sb, and Pb that could indicate that emissions from coal power plants present close to the site) disappear in incomplete dataset, and it is substituted by a new component labeled heavy oil combustion (having as chemical markers V and Ni). This shift in the source is accompanied to a change in the industrial source with a reduction of the loads of V and Ni (Fig. 5).

Results show that, for a reasonable identification of the main sources in this typology of sites (i.e., coastal urban and urban background sites with some influences from industrial and heavy oil combustion emissions), a minimum set of chemical species is recommended for application of receptor models. Specifically, (1) TC (eventually split in elemental and organic carbon), Cu, Zn, and Sb (and eventually Sn) appear important for identification of traffic contributions; (2) major ions (Na+, NH4 +, Cl, SO4 2−, and NO3 ) are important to understand marine and SIA contributions as well as the interaction between them with formation of NaNO3; (3) the elements Fe, Al, Ca, Mg, and K (and/or their soluble ions Ca2+, Mg2+, and K+) are important to individuate crustal and resuspended dust; (4) the trace elements V and Ni are important tracers to put in evidence heavy oil combustion contribution; (5) Pb, Cr, and Mn are relevant in characterizing industrial emissions and separating this contribution from crustal and traffic sources.

Estimates of source contributions

In Fig. 6, a comparison between the contributions, and related uncertainties, of the sources identified by the PCA and PMF models for the Spanish complete and incomplete datasets is reported. The unexplained mass is comparable in the RM results for complete and incomplete datasets. Results show that, at both sites, the contributions of the different sources calculated with PMF are comparable within the estimated uncertainties. This confirms that the good stability of the PMF outputs observed for the chemical profiles of the different sources (see Identification and characterization of source profiles section) also stands for the calculated contributions. The PCA outputs are more sensitive to the chemical species used. The crustal contributions are higher in the incomplete datasets for both sites, and the traffic contributions are significantly lower in the PCA results for incomplete datasets.

Fig. 6
figure 6

Comparison between contributions of sources identified by PCA (a, c) and PMF models (b, d) for the Spanish complete and incomplete dataset

Looking at the site-dependent variability in PCA results, it is observed that SIA contribution calculated at AL for the incomplete dataset is significantly higher than the contribution calculated for complete dataset. This was not observed in the BCN site (the profiles are shown in Fig. 7). The reduction of traffic contribution estimated from the incomplete dataset with respect to the complete dataset seems to be comparable with the increases in crustal and SIA contributions suggesting a certain level of mixing among these sources when the incomplete dataset is used.

In Table 4, results of statistical parameters R, RMSE, AFB, and WD, for complete and incomplete datasets, are shown for the BCN site. In Table 5, a similar analysis is reported for the AL site. The R values are relatively high, and, in general terms, the lowest correlations are observed for the traffic source for both RMs. This source presents the largest RMSE and WD in the PCA model. Focusing the attention on the AFB values, all values are in the range 0–2 in both RMs and the highest values were observed for traffic and industrial/heavy oil combustion sources. The values of RMSE are larger, for several sources, in PCA with respect to PMF, with the highest values observed comparing PM10 modeled by complete dataset with PM10 modeled by incomplete dataset. This is compatible with the larger stability observed in PMF with respect to PCA. The largest values for WD are observed for PCA at both sites.

Table 4 Statistical parameters R, RMSE, AFB, and WD obtained comparing the results for complete and incomplete Barcelona dataset
Table 5 Statistical parameters R, RMSE, AFB, and WD obtained comparing the results for complete and incomplete Algeciras dataset

Conclusions

In this work, an inter-comparison of PM10 source apportionment performed with PCA and PMF receptor models has been reported for three European sites. The study was performed analyzing the performances of the two RMs on source identification, evaluation of source contributions, and sensitivity and robustness of RM outputs to the working variables (chemical species) included in the source apportionment.

Regarding identification of aerosol sources, results show that, in general, both models identify the same categories of sources. However, PMF model appears to be more suitable to separate secondary sulfate and secondary nitrate (AL and BCN sites) than PCA. Further, some difficulties have been observed with PCA in separating industrial and heavy oil combustion contributions (AL site). The number of factors/sources identified was the same in AL and BCN sites, but PCA found one additional source with respect to PMF in LE site being able to separate industrial and secondary inorganic aerosol. Therefore, the effective number of sources identified by the two RMs could be different.

Fig. 7
figure 7

Comparison of secondary inorganic aerosol profiles obtained with PCA (only positive values are shown) and PMF using the complete and incomplete datasets for the Algeciras site (a, b) and for Barcelona (c, d). The species in the black rectangle are those eliminated in the incomplete dataset

The analysis of source contributions suggests that both RMs are able to efficiently reconstruct the daily PM10 concentrations at all sites with small unexplained concentrations (between 0.1 and 1.5 % of PM10). Further, the comparison of the PCA and PMF reconstructed PM10 daily concentrations are in a very good agreement presenting determination coefficients larger or equal to 0.87 at all sites and essentially unitary slopes and small intercepts. Looking at the specific source contributions, it has been possible to identify a trend common in all sites: the crustal contributions found with PCA are larger than those found by PMF, and the SIA contributions found by PCA are lower than those found by PMF. However, the comparison with stoichiometric calculations of crustal contributions did not allow to individuate a model that gives the better results for all sites because there are site-dependent differences. Site-dependent differences are observed for traffic and marine contributions. The traffic contribution estimated with PMF is lower than that estimated with PCA in BCN and AL sites, but it is comparable in the LE site. The marine contribution estimated with PMF is higher than that estimated with PCA in BCN and LE sites, and it is lower in the AL site. It has to be noted that this differences should be interpreted taking into account the details of the source profiles identified. Even if source typologies in PCA and PMF are quite similar and are consequently labeled with the same name, it is possible to observe some differences in the chemical profiles provided by the two RMs.

The inter-comparison of source apportionment performed on complete datasets (using all the available chemical species) and incomplete datasets allowed to investigate the sensitivity of SA results to the working variables used in the RMs. Results show that, at both sites, the profiles and the contributions of the different sources calculated with PMF are comparable within the estimated uncertainties indicating a good stability and robustness of PMF results. In contrast, PCA outputs are more sensitive to the chemical species present in the datasets. The crustal contributions are higher in the incomplete datasets for both sites, and the traffic contributions are significantly lower in the PCA results for incomplete datasets.