Introduction

Particulate matter (PM) and gaseous pollutants have led to major environmental problems in the last few decades in southern and eastern Asia (Huang et al. 2014; Zhang et al. 2017). In the recent year, China has experienced extremely severe and persistent haze pollution, especially in the North China Plain (Wang et al. 2016; Cheng et al. 2016; Zou et al. 2017). Severe pollution is accompanied by extremely poor air quality and poor visibility, which threatens human health (Jerrett et al. 2017; Li et al. 2018; Almetwally et al. 2020). Considerable research has reported that the mechanisms are very complex and differ for different pollution episodes (Wu et al. 2019). However, pollution mechanisms and influence factors during different pollution episodes are still not well understood (Cheng et al. 2016; Xie et al. 2019). In order to investigate pollution mechanisms, severe pollution episodes were captured and studied by environmentalist.

The origins of PM, which can be emitted from primary sources and formed through chemical reactions from precursors, may rapidly change during haze episode (Notario et al. 2013; Zheng et al. 2014; Wu et al. 2020). So as to perform source apportionment of PM, receptor models based on high time-resolved species data have been used to investigate timely variations of source contributions (Eatough et al. 2008; Pancras et al. 2013; Gao et al. 2016; Han et al. 2016;), which is fundamental to understanding the influence factors of heavy pollution episodes. Meteorologists have confirmed that gaseous pollutants and meteorological parameters play important roles in haze (Kuwata et al. 2017; Wang et al. 2019a). Gaseous pollutants can act as precursors of secondary aerosol (Schelden et al. 2017; Yao et al. 2018). Adverse meteorological conditions can lead to accumulation of pollutants and favor formation of secondary aerosol (Wang et al. 2019b). Thus, gaseous pollution and meteorological parameters are important for PM2.5 heavy pollution and formation of secondary aerosol, but their relationships are very complex (Trivedi et al. 2014; Wang et al. 2014a). Nevertheless, there is nonlinear relationship between the decisive influencing factors (e.g., gaseous pollution and meteorological parameters) and PM2.5 concentration variation. At present, machine learning algorithm would be used to better identify the decisive influencing factors on PM2.5 levels and secondary formation. The algorithm provides novel machine learning–based framework for data analysis. In this study, random forest algorithm, a classification and regression tool (Svetnik et al. 2003), is introduced to study the formation mechanism and important influencing factors of air pollutants based on gaseous pollutants and meteorological parameters. In brief, this study has one main contribution that the research attempts to investigate the relationships between PM2.5 concentrations and the gaseous pollution and between PM2.5 and meteorological factors by machine learning method from the macroscopic perspective.

Comprehensive observations of chemical species, gases, PM mass concentrations, and meteorological parameters at a 1-h time resolution were conducted to explore influence factors of haze episodes. Different types of pollution episodes were observed during the sampling period. The levels of PM2.5-bound species (such as NO3, Cl, SO42−, NH4+, Ca2+, Na+, Mg2+, K+, OC, EC), pollutants (such as PM1, PM2.5, PM10, TSP, SO2, O3, NH3, NOx, CO), and meteorological parameters (such as wind speed, relative humidity, temperature) were investigated. Positive Matrix Factorization (PMF) would be applied for source apportionment of PM2.5 based on chemical species. Then, random forest method was applied to estimate the impacts of gaseous pollutants and meteorological parameters on PM2.5 concentrations, sulfur oxidation ratio (SOR), and nitrogen oxidation ratio (NOR). Our work used the machine learning algorithm to reveal the characteristics and the reasons for the formation of typical heavy pollution episodes. The severe pollution episodes provide researchers an opportunity to study different mechanisms of pollution. The methods and influence factors reported in this paper are applicable to other emerging economies or developing countries and have significance for efforts to design effective management strategies.

Methodology

Sampling site and study period

The high time-resolved measurements of chemical species, gases, and PM mass concentrations were conducted on the rooftop of a five-story building (the Tianjin Environmental Protection Bureau) in downtown Tianjin, which is a municipality directly under the Central Government of China. Tianjin, near the capital of China (Beijing), is in the Bohai Economic Circle. Tianjin is a megacity under rapid industrialization and urbanization in the last few decades and currently has a population of over 14 million and an area of 11,947 km2. The sampling site was located in a mixed residential and commercial area surrounded by few direct industrial sources, high vehicular emissions, and construction areas. The observations were conducted in February, March, June, July, August, and September of 2015 and produced data for 4,344 h. The details of sampling have been reported in our previous publication (Tian et al. 2018a).

Sampling methods and instrumentation

To obtain high time-resolved speciate data for PM2.5, instruments were used to detect ambient PM2.5. Hourly concentrations of elemental carbon (EC) and organic carbon (OC) were detected through a semi-continuous EC/OC carbon aerosol analyzer (Model-4, Sunset Laboratory Inc., USA) (Dall'Osto et al. 2014) using the basic thermal/optical transmittance measurement protocol of the National Institute for Occupational Safety and Health (NIOSH). Two temperature stages were used to determine OC and EC: an aliquot of sample filter (2.1 cm2) was heated stepwise to 820 °C in a furnace in a non-oxidizing atmosphere (100% He); the oxidizing oven was then cooled to 550 °C, and the filter was again gradually heated to 870 °C in an oxidizing atmosphere (98% He, 2% O2). Evolved carbon was oxidized to CO2 and detected by a non-dispersive infrared detector (NDIR) during each temperature step. The split point was quantified as the carbon evolved after the introduction of oxygen but before the point where transmittance became equal to its initial value. Calibration was performed by introducing a known amount of methane into the oven and measuring its constant response. The carbon that evolved before the split point was OC, whereas EC was measured as the carbon evolved after this point but prior to the methane calibration peak (Tiwari et al. 2013).

Hourly concentrations of ionic species in PM2.5, including NO3, Cl, F, SO42−, NH4+, Ca2+, Na+, Mg2+, and K+, were measured using a URG 9000D ambient ion monitor (AIM) (Chapel Hill, NC). The AIM separated and analyzed each anion and cation through a particle collection system and ion chromatographs (ICs). The PM2.5 samples were collected at a flow rate of 3 L min-1 by using a sharp cut cyclone (Manigrasso et al. 2010). Then, a liquid diffusion parallel-plate denuder was used to separate the gases from the aerosol samples. The water-soluble compositions of aerosol and gaseous pollutants were collected through four syringes installed into pre-concentrators and then injected into the ICs. Anion detection was conducted in a gradient elution program using a KOH solution at a flow rate of 1.0 mL min-1, and the cation analyzer was run with methane sulfonic acid at a flow rate of 0.5 mL min-1.

Gaseous pollutants included SO2, O3, NO, NO2, and CO, and PM mass concentrations were of four sizes: PM1, PM2.5, PM10, and TSP. Mass concentrations of PMs were continuously measured by a tapered element oscillating microbalance (TEOM) mass sensor. The SO2 in the atmosphere was determined by pulsed fluorescence technology. The O3 was analyzed by a dual cell photometer, a concept adopted by the US NIST as a national standard. Atmospheric NOx was measured by chemiluminescence technology; the analyzer had isolated outputs for NO and NO2 that could be individually calibrated. The CO was quantified based on the absorption of infrared radiation at a 4.6-μm wavelength.

For quality assurance and quality control (QA/QC), the particle stream entered an aerosol super-saturation chamber to increase particle growth to obtain higher efficiencies. After collection through four syringes, the aerosols and the gaseous pollutants were injected into the ICs within 1 h. Anion/cation calibration solutions were used for the calibration of the ICs on the AIM for at least 1 month. The minimum detection limits (MDLs) were as follows: 0.2 μg m-3 (Cl), 0.2 μg m-3 (F), 0.2 μg m-3 (NO3), 0.2 μg m-3 (NO2), 0.3 μg m-3 (SO42−), 1.8 μg m-3 (NH4+), 2.3 μg m-3 (Ca2+), 0.8 μg m-3 (Mg2+), 0.5 μg m-3 (K+), and 0.6 μg m-3 (Na+). The EC/OC carbon aerosol analyzer was calibrated each month through a blank punch of a pre-heated quartz fiber filter and standard sucrose solutions (3.2 μg C μl-1). The quartz fiber filter was changed each week during the analysis process. The MDLs were 0.45 μg cm-2 and 0.06 μg cm-2 for OC and EC, respectively. The MDLs for PM, NO, NO2, SO2, CO, and O3 were as high as 0.1 μg m-3, 0.40 ppb, 0.40 ppb, 0.5 ppb, 0.04 ppm, and 0.50 ppb, respectively. The flow calibration, gas tightness test, blank filter test, and standard sample calibration were all conducted for QA/QC.

Meteorological parameters, including temperature (T), relative humidity (RHU), air pressure (P), wind speed (WS), and wind direction (DD), used to evaluate the impact of meteorological conditions were acquired from an online database at http://rp5.by. The meteorological parameters were available for 2:00 AM, 5:00 AM, 8:00 AM, 11:00 AM, 2:00 PM, 5:00 PM, and 8:00 PM each day. The meteorological parameters were acquired at the Beimalu Weather Station in downtown Tianjin. It is about 7 km far away from the Tianjin Environmental Protection Bureau Building.

Except for chemical species, gases, and PM mass concentrations, SOR and NOR were also used as follows:

SOR=SO 4 2- /(SO 4 2- +SO 2 )

NOR=NO 3 - /(NO 3 - +NO x )

where SO42- is the S mass in SO42-, SO2 is the S mass in SO2, NO3- is the N mass in NO3-, and NOx is the N mass in NOx. TS and TN were used to indicate the total emissions of S and N (Chen and Xie 2014). SOR and NOR were used to indicate the reaction of sulfur and nitrogen (Chen and Xie 2014; Wang et al. 2018).

In this research, the aerosol pH was calculated by utilizing the thermodynamic model ISORROPIA-II (Fountoukis and Nenes 2007), which could predict the physical state and composition of atmospheric inorganic aerosols. The ISORROPIA-II model can solve forward problems in which T, relative humidity, and the concentrations of gas + aerosols were known (e.g., NH3+ NH4+), and reverse problems in which T, relative humidity, and the concentrations of aerosol (but not gas) species were known. The pH calculation utilized measurements of NH3, NH4+, Na+, K+, Mg2+, Ca2+, SO42-, NO3-, and Cl- for Tianjin from February to September of 2015. In this study, the ISORROPIA-II model was run in the forward mode and assumed that the aerosol solutions were metastable, as a high degree of accuracy determined on the basis of measurements of semivolatile partitioning of certain species (e.g., NH3/NH4+) (Guo et al. 2015; Song et al. 2018)

Source apportionment

EPA PMF 5.0 model (Paatero and Tapper 1994; Paatero 1997; EPA 2014) was applied for PM2.5 source apportionment based on hourly measured PM2.5-bound chemical species. The PMF is used to solve the mass balance between observed species concentrations X, estimated source profiles F, and estimated contributions G (Ogulei et al. 2006; Tian et al. 2014):

$$ {X}_{ij}={\sum}_{h=1}^p{g}_{ih}{f}_{hj}+{e}_{ij} $$

where xij is the measured concentration of the jth species in the ith sample, fhj (g g-1) is the estimated species profile of the hth source, gih is the estimated contribution by the hth factor to each sample, eij is the residuals, and p is the number of factors. Moreover, ME-2 can incorporate prior information such as chemical properties to apportion sources through auxiliary equations (Tian et al. 2018b).

In this work, the random forest algorithm, a non-linear, data-driven model, is employed to study the influence factor of PM2.5, NOR, and SOR. Random forest performs tree-ensemble of Classification and Regression Trees (CART). Regression tree is trained to perform differentiable mapping from the m-dimensional input x to its leaf index (Chen and Guestrin 2016):

$$ \mathrm{f}\left(\mathrm{x}\right)={\upomega}_{\mathrm{q}}\left(\mathrm{x}\right)\ \left(\mathrm{q}:{\mathbb{R}}^{\mathrm{m}}\to \mathrm{T},\upomega \in {\mathbb{R}}^{\mathrm{T}}\right) $$

Here, f is parameterized by the learned tree structure q (of T leaves) and leaf weights ω. Independent regression trees f can be trained by corresponding tree structure q and leaf weights ω.

Results and discussion

Concentrations

It was calculated that the average concentrations of PM1, PM2.5, PM10, and TSP were 57, 96, 104, and 152 μg m−3, respectively, from February 1, 2015, to September 30, 2015. The PM1/PM2.5, PM2.5/PM10, and PM10/TSP ratios were 0.59, 0.92, and 0.70, respectively. These ratios usually varied with different pollution processes and can indicate different pollution mechanisms. As reported by Alastuey et al. (2004), the fine particles can be caused by a low proportion of secondary and primary combustion emitted species, and the high coarse part can be caused by the high load of other contributions such as natural and anthropogenic mineral dust.

The important components of PM2.5 include NO3- (13%), SO42- (11%), NH4+ (14%), OC (9.5%), EC (2.7%), and Cl- (2.5%). And the hourly data were collected from February 1, 2015, to September 30, 2015. The annual average concentration of Cl-, NO3-, NO2-, SO42-, NH4+, Ca2+, Mg2+, K+, Na+, OC, EC, CO, SO2, NOx, O3, and NH3 were 2.5 μg m-3, 11 μg m-3, 0.40 μg m-3, 9.7 μg m-3, 13 μg m-3, 0.18 μg m-3, 0.07 μg m-3, 0.87 μg m-3, 0.77 μg m-3, 9.4 μg m-3, 2.7 μg m-3, 1.3 μg m-3, 28 μg m-3, 56 μg m-3, 103 μg m-3, and 29 μg m-3, respectively. Moreover, RHU, pH, and WD were recorded and their annual mean were 58%, 4.2, and 2.7 m/s, respectively.

Influence factors on PM2.5, SOR, and NOR

To comprehensively analyze the role of meteorological factors and gaseous pollutant for affecting local PM2.5 concentration, NOR and SOR, we employed random forest model. A number of factors were selected as follows: T, RHU, P, WS, DD, pH, CO, SO2, NOx, O3, and NH3. By analyzing variables through multiple nonlinear regression, the model quantitatively calculated the relative importance of different factors about PM2.5, NOR, and SOR. As shown in Table 1.

Table 1 Different influence factors on PM2.5, SOR and NOR during the whole period

Influence factors on PM2.5

Based on the random forest, the results displayed that CO had the most important influence on PM2.5 concentration and contributed 17% to PM2.5 concentration variations. SO2 and NOx (totally 24%) also strongly linked with PM2.5 concentration. It has been reported that in the condition of stable meteorological and enough residence time, SO2 and NOx can be transformed into sulfate and nitrate. P (9.9%) made the great contribution to PM2.5 concentration among meteorological parameters. It may happen that when there was high pressure, down draft impeded going up of PM2.5 and led to accumulation of particles. T (10%) was also an important factor for PM2.5 concentration variation. Luo et al. (2017) noted that air convection relied on temperature, high temperature could lead to dilution of and PM2.5, and it was on the contrary when temperature reduction, at the same time, high temperature could elevate formation rates of secondary aerosols. RHU (6.7%) and pH (4.8%) could accelerate formation of secondary aerosol and increased the chance of collision and adsorption between particles, which resulted in decreasing PM2.5 concentration. DD (4.7%) and WS (1.1%) can also influence on PM2.5 concentration variations. Wind speed could favor particles spread and diffusion of PM2.5 and low PM2.5 concentration (Wang et al. 2019b; Karimian et al. 2019). And different speed directions had a significant effect on the spatial distribution of PM2.5 and could change transport of atmosphere pollutants.

Influence factors on SOR and NOR

The importance analysis of gaseous pollutants (CO, SO2, NOx, O3, NH3) and meteorological parameters (T, RHU, P, DD, WS, and pH) on SOR and NOR were also constructed based on random forest method. It was found that pH (35%), SO2 (15%), RHU (15%), T (9.8%), and O3 (9%) played important role on SOR. As reported in this work and related work, SOR in the atmosphere was closely related to pH (Fuzzi 1978; Brimblecombe and Spedding 1972). Sakamoto et al. (2004) found that O3 had noticeable effect on the amount of SO2 oxidation. And RHU was significantly positively correlated with SOR and NOR (Yao et al. 2020). In addition, temperature could promote the rate of sulfur oxidation. Meanwhile, NOx (19%), O3 (14%), NH3 (13%), and RHU (15%) had a greater influence on NOR. Seinfeld and Pandis (1998) have found that the formation of nitrogen oxidation was dominated by the following reaction:

$$ {\displaystyle \begin{array}{l}{NO}_2+ OH+M\to {HNO}_3+M\\ {}{NO}_2+{O}_3\to {\mathrm{N}O}_3+{O}_2\\ {}\begin{array}{l}{NO}_2+{O}_3+M\to {N}_2{O}_5+M\\ {}{N}_2{O}_5+{H}_2O\to 2H{NO}_3\\ {}{NH}_3+{H}_2O\to {NH}_4^{+}+ OH\end{array}\end{array}} $$

Characteristics of typical pollution episodes

During the sampling period, three typical pollution episodes were captured and selected for further analysis. The concentrations of species in PM2.5 during each episode were introduced into the USEPA PMF 5.0 model for source apportionment, and the gaseous pollutants and meteorological parameters were introduced into the random forest model to explore the influence factors. The factor profiles of three episodes estimated USEPA PMF 5.0 is exhibited in Fig. 1. Coal combustion, traffic emission, and resuspended dust were consistently identified for three pollution episodes. Tian et al. (2020) has explored how to better conduct PMF during haze episodes, showing that the PMF performance was poor for some episodes through whole-based mode (using all data of whole sampling period as input). Thus, the episode-based mode (using data of each episode) was used in this work to conduct PMF. In addition, the consistent results of PMF and random forest based on different datasets demonstrate the reliability of two methods. We add the bootstrapping (BS) to estimate the uncertainty of the PMF solution. BS involves resampling the input dataset, fitting PMF model parameters for this resampled dataset, and then using the variations among these resampled or “bootstrapped” fitted profiles to estimate the uncertainty of the initial PMF solution (Norris et al. 2014). The BS results of each episode are listed in Table 2. Thus, 4, 6, and 5 factors were selected for episodes I, II and III, respectively, because the BS values were all higher than 75%, indicating the BS uncertainties can be interpreted. The coal combustion was identified by high loadings of Cl-, OC, and EC (Shen et al. 2010). The traffic emission was characterized by OC and EC but low Cl- (Liu et al. 2017). The resuspended dust was identified by relatively high weights of Ca2+ and Mg2+ (Shen et al. 2011). Factors associated with NO3-, SO42-, and NH4+ were distinguished as secondary sources or nitrate and sulfate (Guan et al. 2019). The OC and EC in the secondary sources may be linked with secondary organic carbon and transport of combustion emissions (Guan et al. 2019).

Fig. 1
figure 1

The factor profiles of three episodes estimated USEPA PMF 5.0. Episode I is characterized by heavy coal combustion; the five factors, coal combustion, traffic emission, resuspended dust, secondary sources, and other, make different contributions to various components in episode I; episode II is characterized by the Spring Festival; the seven factors, coal combustion, traffic emission, resuspended dust, sulfate, nitrate, firework, and other, make different contributions to various components in episode II; episode III is characterized by high inorganic ions and high RH; the six factors, coal combustion, traffic emission, resuspended dust, sulfate, nitrate, and other, make different contributions to various components in episode III

Table 2 The bootstrapping (BS) results of three typical pollution episodes

Process I

The first interesting episode was found from February 14, 2015, 0:00 AM to February 15, 2015, 4:00 AM; PM2.5 concentrations ranged from 157 to 313 μg m−3; then, after a short decrease, PM2.5 returned to relatively higher levels (February 15, 2015, 4:00 PM–February 16, 2015, 12:00 PM). As shown in Fig. 2(b), the abundances of OC and EC were higher than those for other sampling period. In general, OC and EC were markers of combustion (Chow et al. 2004). According to the PMF results in Fig. 1 (episode I), for PM2.5 concentration, the average contributions of coal combustion, traffic emission, resuspended dust, and secondary sources were 32%, 17%, 5.5%, and 38%, respectively. As demonstrated in Fig. 2(a), in the first stage, coal combustion (88 μg m−3, 40%) and secondary sources (98 μg m−3, 41%) were both the most important contributors, and in the second stage, the highest contribution was from secondary sources (63 μg m−3, 38%). In this stage, coal combustion had emitted high SO2, which formed secondary SO42- under the high RH in the second stage. And the secondary sources were primary pollution source.

Fig. 2
figure 2

The abundance and concentration of chemical species, PMF result, and random forest result during process I (episode characterized by heavy coal combustion). a the concentration contribution from coal combustion, traffic emission, resuspended dust, and secondary sources change over time in episode I; b the difference between the average mass percent of PM2.5 components in episode I and all year; c gases and meteorological parameters have different contributions to PM2.5 variation in episode I; d gases and meteorological parameters have different contributions to SOR in episode I; e gases and meteorological parameters have different contributions to NOR in episode I

Based on the result of random forest in Fig. 2(c), CO (40%) remained at a high contribution for PM2.5 concentration variation. Previous study noted that CO mainly resulted from the coal combustion for heating during the winter in northern China (Du et al. 2016; Wang et al. 2010). The results indicated that P (20%) and T (16%) also had a great influence on PM2.5 contribution variation in the heavy pollution process. Therefore, PMF and the random forest results were consistent and both of them demonstrated that this heavy pollution was characterized by coal combustion.

In addition, SOR was analyzed by in putting gaseous pollutants and meteorological parameter in random forest model; the results are shown in Fig. 2(d); it was obvious that, among all the factors, SO2 (43%) was the most important factor for SOR. In this pollution process, DD (22%) greatly affected the SOR transformation and it was the second largest factor. A suggested explanation is that wind direction determined gaseous pollutant pathway. Definitely, pH (6%) and RHU (7%) also served as pivotal role. When it came to NOR in Fig. 2(e), NOx (24%) and O3 (17%) occupied primary important factor for NOR formation variation. Moreover, in this process, CO (11%), DD (10%), and pH (12%) had more impact on NOR.

Process II

The second interesting episode was observed from February 18, 2015, 12:00 AM to February 21, 2019, 11:00 PM, which covered the Spring Festival. During the period of sampling, the concentration of PM2.5 ranged from 31 to 407 μg m-3. The chemical compositions of PM2.5 during this process were characterized by high K+ (3.9%) and Cl- (4.0%), which were much higher than the average of the entire period (1.0% for K+ and 2.7% for Cl-). Compared with the episode one, NH4+ and SO42- were much less; however, both of them were slightly higher than the annual average, respectively (as shown in Fig. 3 (b)). As reported in our previous work and the literature (Tian et al. 2014; Sarkar et al. 2010), Cl- and K+ were considered as firework-related species and could be markers for fireworks. K+ and Cl-, commonly in the form of perchlorate or chlorate, are major components in black powder and act as the main oxidizers during burning. The corresponding chemical equations are 2KClO3 = 2KCl + 3O2 and KClO4 = KCl + 2O2.

Fig. 3
figure 3

The abundance and concentration of chemical species, PMF result, and random forest result during process II (episode characterized by the Spring Festival). a The concentration contribution from coal combustion, traffic emission, resuspended dust, sulfate, nitrate, and firework change over time in episode II; b the difference between the average mass percent of PM2.5 components in episode II and all year; c gases and meteorological parameters have different contributions to PM2.5 variation in episode II; d gases and meteorological parameters have different contributions to SOR in episode II; e gases and meteorological parameters have different contributions to NOR in episode II

Based on the PMF results in Fig. 1 (episode II), in process II, for PM2.5 concentration, the average contributions of traffic emission, coal combustion, nitrate, resuspended dust, sulfate, firework, and unknown sources were 18%, 24%, 20%, 6.6%, 14%, 8.8%, and 7.5%, respectively. As demonstrated in Fig. 3(a), in this process, coal combustion (24 μg m−3) and nitrate (20 μg m−3) were important contributors, followed by traffic emission (18 μg m−3), sulfate (13 μg m−3), firework (8.6 μg m−3), and resuspended dust (6.4 μg m−3). It was found that firework was an important pollution source. A further discussion that gaseous pollutants and meteorological parameter had impact on PM2.5 concentration variation by the random forest, as visualized in Fig. 3(c), the result showed that SO2 (24%) kept at a high contribution for heavy pollution. Previous study noted that the number of firework was lighted, and it became a major pollution source in the Spring Festival (Wang et al. 2007; Li et al. 2017). CO (12%) and NOx (13%) were also considered major contribution for PM2.5 contribution, and both were closely to related to coal combustion emissions and vehicle exhaust (Kota et al. 2014). Besides, P (25%) and pH (11%) also had a great influence on the heavy pollution. Therefore, combining PMF and the random results showed that this heavy pollution was characterized by firework combustion.

In terms of the random forest results for SOR in Fig. 3(d), it was obvious that, among all the gaseous pollutants and meteorological parameters, precursor SO2 (27%) was the most important factor for SOR; in this pollution process, P (22%) greatly affected the SOR transformation. Definitely, pH (7%) and RHU (13%) also served as pivotal role. When it came to NOR, as visualized in Fig. 3(e), NOx (35%) occupied primary important factor, followed by P (20%), RHU (10%), and O3 (8%).

Process III

The third pollution episode was characterized by high inorganic ions and high RHU level (51–95%) in summer. The PM2.5 mass concentrations were at relatively higher levels, ranging from 81 to 291 μg m−3. The value of SOR and NOR were 0.55 and 0.27, respectively. As visualized in Fig. 4(b), high fractions of NO3- (15%) and SO42- (16%) during the pollution episode can be found and their mass concentrations were much higher than that during other sampling period.

Fig. 4
figure 4

The abundance and concentration of chemical species, PMF result and random forest result during process III (episode characterized by high inorganic ions and high RH). a The concentration contribution from coal combustion, traffic emission, resuspended dust, sulfate, and nitrate change over time in episode III; b the difference between the average mass percent of PM2.5 components in episode III and all year; c gases and meteorological parameters have different contributions to PM2.5 variation in episode III; d gases and meteorological parameters have different contributions to SOR in episode III; e gases and meteorological parameters have different contributions to NOR in episode III

Based on the PMF results, as demonstrated in Fig. 1 (episode III), the average contributions of traffic emission, coal combustion, resuspended dust, nitrate, sulfate, and other sources for PM2.5 mass concentration were 13%, 12%, 4.1%, 22%, 44%, and 3.9%, respectively. As shown in Fig. 4(a), sulfate (82 μg m−3) and nitrate (42 μg m−3) were the most important contributors. Other source contributors were traffic emission (25 μg m-3), coal burning (22 μg m-3), and resuspended dust (7.7 μg m-3).

According to the results of the random forest in Fig. 4(c), in process III, it was estimated that the relative importance of CO, SO2, NH3, and RHU to PM2.5 concentration variation were 40%, 13%, 8.3%, and 15%, respectively. Simultaneously, it was calculated that SO2 (24%), NH3 (28%), and T (19%) were the main influence factors for SOR. And the results showed that NOR was greatly affected by NOx and NH3, and the importance of both were 12% and 45%, respectively. NOx, NH3, and SO2 mainly came from vehicle exhaust and coal burning (Xie et al. 2005; Meng et al. 2011). Under the higher RHU, NH3 was an important alkaline gas in the atmosphere, and it can react with acid gases (SO2, NOx, etc.) to form secondary aerosols (Zhang et al. 2012). This episode can be identified as reaction caused by high RH. The strong influence of RH on the reaction of inorganic ions was worthy of notice. High RH was favorable for the formation of sulfate and nitrate in the aqueous phase and may also improve sulfate and nitrate partitioning into the liquid phase formed by a gas-phase homogeneous reaction of precursors. Alkaline aerosol particles in northern China have been reported to lead to high PH values (Kulshrestha et al. 1998), which could also enhance the equilibria reactions. Therefore, relative humidity provided a key catalyst to chemically react to secondary aerosol formation. It was apparent that this episode was a common type of haze in northern China with characteristic features of high inorganic aerosols, high RH, stagnant meteorological conditions with low mixing heights, and large emissions of primary air pollutants causing secondary pollution.

Conclusions

Hourly measured PM2.5-bound species, gases, and meteorological data were analyzed by the PMF receptor model to quantify source contributions and by the random forest regression method to identify the contribution of gaseous pollution and meteorological parameters on the changes of PM2.5 concentrations, SOR, and NOR. It was found that CO, SO2, NOx, P, and T played an important role in PM2.5 variation; pH, SO2, RHU, and T were the dominating influence factors for SOR; and NOx, O3, NH3, and RHU determined the NOR for the whole sampling period. Proposed PMF-random forest method was used to quantify the source contributions to PM2.5.

Three types of pollution episodes were captured. The first pollution episode (process I) was characterized by heavy coal combustion due to heating in the winter in northern China. CO remained at a high level and contributed 40% of PM2.5 concentration variation. According to the PMF results, coal combustion and secondary sources were the most important contributors in the first stage, and then, the highest contribution was from secondary sources. The second pollution episode (process II) happened during the Spring Festival and was characterized by high K+ and Cl-. Firework combustion contributed 8.8 μg m−3 estimated by PMF. High SO2 during process II, especially on the CNY’s Eve, was observed due to the firework displays, and SO2 gave a high contribution (24%) to PM2.5 concentration variation. The second pollution episode (process III) was characterized by high inorganic ions and high RH in summer. Sulfate (44 μg m−3) and nitrate (22 μg m−3) were the most important contributors. It was estimated that the relative importance of CO, SO2, NH3, and RHU to PM2.5 concentration variation were 40%, 13%, 8.3%, and 15%, respectively. Through comparing the influence factors for SOR and NOR during episode I in winter and episode III in summer, DD was more important in episode I indicating that this episode was strongly influenced by the transportation; while NH3 was more important in episode III indicating alkaline precursor strongly influence the episode in summer.