Introduction

Air pollution has become a major concern around the world since it is highly correlated with a variety of adverse effects on public health, especially in some developing countries. Nearly 91% of the world’s population inhales polluted air, and about 4.2 million deaths occur every year because of ambient air pollution, according to air pollution program of the World Health Organization (WHO) (Krishan et al. 2019). Among the six conventional air pollutants, PM2.5 and O3 constitute the greatest harm to the human bodies. Long-term exposure to PM2.5 (fine particles measuring less than 2.5 μm in diameter) air pollution is an important environmental risk factor for cardiopulmonary and lung cancer mortality (Arden Pope III 2002). O3 (formed as a secondary pollutant through photochemical reactions of NOx and VOCs) is known to significantly decrease crop yield and a key ingredient of smog that is potentially toxic to animal and plant life (Wang et al. 2019). High concentrations of O3 and everlasting haze pollution are also the main concerns for China at present, especially for Chengdu, Southwest China (Wang et al. 2017a, b). Hence, considering the serious harm and damage of air pollution to human health and agricultural economy, accurate air quality forecasting is of great significance, so as to assist and support the government in city atmospheric early warning, crisis response, and emergency planning and contribute to making travel plan in advance for citizens (Corani and Scanagatta 2016, Fan et al. 2018, Lauret et al. 2016, Prasad et al. 2016, Yang et al. 2018, Yang and Christakos 2015). Especially, short-term air pollution forecast (with lead time of 1 or 2 days) plays an important role in the daily operation of Local Emergency Management Agency.

Generally, the air quality forecasting models can be classified into two main types: the numerical prediction models and the statistical prediction models (Liu et al. 2018; Yang and Wang 2017). The numerical prediction models traditionally consist of the simulation of dispersion and transform mechanisms using emission source data and the knowledge of the transformations in the atmosphere (Zhang et al. 2012; Cobourn 2010; Hoshyaripour et al. 2016). Chemical transport model (CTM) refers to a type of atmospheric composition models relevant for air quality forecasting, which has been successfully used for decades since middle 1990s in the USA, China, Europe (Jakobs et al. 2002), and Canada (Pudykiewicz et al. 1997). This model can forecast the concentrations of air pollutants under both typical and atypical scenarios but will be independent of a large quantity of measurement data (Stern et al. 2008, Sun et al. 2012). Meanwhile, it is physically based and provides scientific insights of pollutant formation processes; thus, it can address issues that cannot be handled by other forecasting models such as long-range transport of air pollutants, emissions, and changes in air quality under different meteorological and emission scenarios (Vautard et al. 2001; Mchenry et al. 2010). By contrast, the statistical approaches usually require a large quantity of historical measured data under a variety of atmospheric conditions, which can produce good estimation results in the short-term forecasting (Wang et al. 2015a, b, Konovalov et al. 2009). They are easy to set up and fast to compute to handle nonlinear and chaotic chemical system at a site (Pires and Martins 2011). The statistical models have several common drawbacks. First, the nature of statistical modeling does not enable better understanding of chemical and physical processes (Gao et al. 2018). Second, they cannot predict concentrations under extreme air pollution conditions that deviate significantly from the historical records (Zhang et al. 2012). Third, they are usually confined to a given site and cannot be generalized to other sites (Stockwell et al. 2002). Throughout the course of air quality prediction research, scores of statistical prediction models were used, such as linear or nonlinear regression model, time series model, Markov model, gray model, artificial neural networks (ANNs), and so on.

Among most statistical prediction methods, artificial neural networks (ANNs) are easier and faster to establish and have a more flexible nonlinear modeling capability, adding to its strong adaptability and massive parallel computing abilities, and it proved effective to predict environmental parameters, especially for wind speed, water temperature, and air quality (Feng et al. 2015; Taylan 2017; Wang et al. 2015a, b). For instance, adaptive neuro-fuzzy inference system (ANFIS), Elman neural network (ENN), long short-term memory networks (LSTMs), multi-layer perceptron (MLP), the backpropagation neural networks (BPNN), and the support vector machine (SVM) have been extensively used for modelling air quality (Noori et al. 2009; Krishan et al. 2019; Li et al. 2018; Taylan 2017; Voukantsis et al. 2011; Wang et al. 2014, 2016; Wen and Yuan 2020; Zhou et al. 2020). Considering the atmospheric environment is an extremely complex huge and nonlinear system, which is under dynamic changing. And the concentration of air pollutants is influenced by a number of complex factors such as human activity, atmospheric pressure, wind direction, wind speed, temperature, humidity, temperature inversion, and rainfall (Wang et al. 2019). And even some physicochemical processes have significant impacts; as for chemical process, ozone is formed through the reaction of nitrogen oxides and volatile organic chemicals when there is strong sunshine and high temperatures (Wei et al. 2021), and nitrogen-oxide and sulfur oxide can react with other pollutants in the air to generate particles such as nitrate and sulfate, thus turning gaseous pollutants into solid pollutants and increasing PM2.5 concentration in the air (Luo et al. 2020). As for physical process, amines combine with sulfuric acid to form highly stable aerosol particles under typical atmospheric concentrations (Almeida et al. 2013). Moreover, most of the interrelationship between the various factors is uncharted. This is consistent with the multilayer mapping and nonlinear modeling ability of back propagation neural network (BPNN), a fairly common used ANNs (Guo et al. 2011; Wang et al. 2006), also referred to as error back propagation network that minimizes an error backward while information is propagated forward (Wang et al. 2015a, b). However, one apparent shortcoming of BPNN is that it easily gets in the local minimum, due to its randomly allocated initial connection weights and thresholds, leading to an inaccurate result.

To overcome the aforementioned shortcomings, many researchers have proposed various of improved methods to improve precision. There are generally three sub-categories, firstly, single intelligent evolution algorithms, such as particle swarm optimization (PSO) (Huang et al. 2020; Jin et al. 2012; Qiu et al. 2020; Ren et al. 2014; Wen and Yuan 2020) and genetic algorithm (GA) (Feng et al. 2011; Wang et al. 2016; Zhang et al. 2020), were used to select the initial connection weights and thresholds of BPNN. Secondly, the application of two joint intelligent evolution algorithms, Hu et al. (2019) proposed a forecasting model based on the hybrid GA-PSO-BPNN algorithm to avoid the defect that the prediction result easily falling into local optimum. Thirdly, the combination of BPNN with another algorithm, such as Bayesian regularization (Tang et al. 2020), support vector machine (SVM) (Sun et al. 2020), and empirical mode decomposition (EMD) (Liu et al. 2016). Results demonstrate that the above proposed models are superior to traditional BPNN models on the basis of convergence rate or prediction precision.

Nevertheless, to our knowledge, the optimization of BPNNs using the ant colony algorithm which possess the property of fast global searching and strong robustness has not been attempted in air quality forecasting. Last but not least to mention is that only limited attention appears to have been given to input parameters that largely determines the level of prediction precision (Hu et al. 2019). Too many input variables can not only prolong the training time but irrelevant or noisy variables may have adverse effects on the training process, leading to an unacceptable convergence speed and poor generalization power. Consequently, A quantifiable random forest method (RF) is used to select input parameters carefully to achieve desired results.

In summary, this paper proposes a novel air quality forecasting method based on a hybrid RF and IACA-BPNN neural network method. The RF method was used to remove the irrelevant factors after the original data was processed. And then the chosen input variables were loaded into the improved IACA-BPNN model for air quality forecasting both in two sites. Furthermore, horizontal and longitudinal contrast were carried out to assess the forecasting performance of the proposed model based on a series of models which contains BPNN, IACA-BPNN, ACA-BPNN, PSO-BPNN, GA-BPNN, and RF-IACA-BPNN.

The specific structure of the paper is organized as follows. The second section are a general introduction of the three concerned algorithms, two improvements to the ant colony algorithm, and procedure of IACA-BPNN prediction model. The third section are data pre-processing and description and then screening results of input variables. The fourth section are results and discussion of predictive simulation experiments both in two sites (the traffic site and the park site) in Chengdu, as well as the results under four evaluation indicators between observed and predicted value. The last section is the conclusion of the whole paper.

Method

The hybrid method of RF-IACA-BPNN plus the other five kinds of artificial neural networks (ANNs) were adopted for air quality forecasting in this paper. The models mentioned above were programmed in MATLAB version R2018a (Mathworks Inc., Natick, USA). The random forest (RF) algorithm involved in this study was implemented by python 3.6 to calculate the importance evaluation values.

Related theoretical basis

BPNN

By mimicking the behavior characteristics of biological neural networks, ANN is an algorithmic mathematical model in essence endowed with parallel and distributed information processing ability (Wang et al. 2016). As one of the most widely used ANNs, BPNN is an adaptive and nonlinear dynamic system composed of interconnected neurons, known for its forward propagation of information and error backward propagation, proposed by Rumelhart in 1986. Generally, the steepest descent method was adopted during the error propagation process to minimize the error of the network; meanwhile, the weights and thresholds adjust successively to form an expected model to properly reflect the mapping relationship between input and output values. BPNN usually consists of three layers (input layer, hidden layer, and output layer) (Park et al. 2017); theoretically, one single hidden layer BPNN is able to approximate any nonlinear function with satisfactory accuracy (Aslanargun et al. 2007). The brief structure of BPNN is shown in Fig. 1.

Fig. 1
figure 1

Structure of one possible BPNN model used in this study

The random forest

As an ensemble supervised learning method from machine learning based on bagging algorithm, random forest (RF) composes of multiple mutually independent decision trees which are the combination of classification and regression tree together (Wei et al. 2018). In virtue of the characteristic of fast calculation speed and effectively applied to a wide range of problems that are nonlinear and involving complex high-order interaction effects, coupled with reliably identifying relevant predictors from a large set of candidate variables, random forests have been widely applied to various fields relate to bioinformatics (Strobl et al. 2007) and environmental sciences (Wen and Yuan 2020) for prediction and variable selection. Whereupon, the variable importance score (VIS) derived from RF is adopted in this research to measure relative importance of factors affecting PM2.5 and O3. Using PM2.5 as an example, D represents the overall training sets, and vector X represents the sets of 10 factors affecting PM2.5. X = X1, X2, …Xj, …X10 ∣ j = {1, 2, …, 10}.

The bootstrap method was adopted to randomly sample k training subsets from the overall training sets; thus, the k-th training subset represented as DK ∣ k = 1, 2, …K.

$$VIS={VIS}_1,{VIS}_2,\dots, {VIS}_j,\dots, {VIS}_{10}\mid j=j=\left\{1,2,\dots, 10\right\}$$

ACA

Ant colony algorithm (ACA) was originally theorized by Dorigo proposed in 1991 (Liu et al. 2019) (Dorigo and Stützle 2003), through observing the ants foraging behavior, which is actually a random searching algorithm in nature that successfully applying to many optimization problems (Liu et al. 2007). On account of the ability that ACA is able to successfully find the optimal path through the positive feedback and distributed cooperation, which has been widely combined with BPNN to improve its convergence speed and avoid trapping into the local minimum.

ACA improvements

Improved content

Generally, the better a solution is, the more likely it is to find the optimal solution around it (Yu and Zhou 2006). Hence, the basic idea of this algorithm rests with enhancing excellent solutions and weakening inferior solutions. With the increasing differences of pheromone between excellent and inferior solutions, it is more likely that the searching path of ants concentrates in the vicinity of the optimal solution.

  1. (a)

    Improvement on pheromone updating rule

Ants are sorted according to the length of their paths. The more excellent ants pass by, the stronger pheromone they leave behind. The pheromone is adjusted according to Eq. (1):

$$\tau \left(r,s\right)=\left(1-\rho \right)\tau \left(r,s\right)+{\varepsilon}_0\frac{L_{worst}}{L_n}$$
(1)

where τ(r, s) represents the pheromone intensity between city r and s; ε0∈(0,1] is the pheromone penalty factor; Lworst is the length of the tour of the worst ant; Ln is the path length of the nth ant.

  1. (b)

    Adaptive improvement on evaporation rate of pheromone

Pheromone volatile factor (ρ) measures the reduction extent during the evaporation process, which greatly affects global searching ability and convergence speed of the algorithm. To avoid trapping into local optimization, the pheromone volatile factor was improved by

$$\rho\ (nc)=\frac{\varphi }{1+\lambda \frac{nc}{e^{nc_{max}}}}$$
(2)

where φ and λ both are constant coefficient; nc remains iteration times; ncmax represents maximum iterations. The analysis reveals that the convergence speed of the algorithm accelerating due to the large value of ρ (nc) at the beginning. Hereafter, the accumulation of ∆τij (t) leads to the algorithm falling into local optimum. As can be seen from Eq. (8) above, the ρ (nc) gradually decreases, making it easy for the algorithm to jump out of local optimization.

IACA-BPNN prediction model

Suppose the BPNN possesses m weights and thresholds in total, and each weight or threshold has n values to choose from, which are randomly generated within [0,1] and thus forming set Ai (1 ≤ i ≤ M).

Based on the above improvement of ACA, the process structure of the developed RF-IACA-BPNN prediction model is shown as Fig. 2. The specific steps of ACA optimizing the initial weight and threshold of BPNN are as follows:

  1. Step 1:

    Set the initial pheromone value τ (Ai) (0), ant number (m), maximum iteration number (ncmax), and other parameters (including the parameters of BPNN).

  2. Step 2:

    Elements from each set were selected by m ants according to Eq. (2), and all the elements selected by each ant constitutes a set of initial weights and thresholds of the neural networks.

  3. Step 3:

    When m ants complete one cycle, m sets of initial weights and thresholds selected in Step 2 are used to train the BPNN model, and the output error of the network is calculated simultaneously according to Eq. (2). Record the set of weights and thresholds with the smallest error, and compare the error with the expected error ε. If it is less than the expected error ε. Then output algorithm results and enter Step 6. Otherwise, entering step 4.

  4. Step 4:

    Pheromone of each element in the set Ai (1 ≤ i ≤ M) is updated according to Eqs. (4) and (5).

  5. Step 5:

    Repeat the steps (2) and (3) until all ants converge to the same path or reach the maximum iteration number ncmax.

Fig. 2
figure 2

Flow chart of the proposed RF-IACA-BPNN based forecasting model

Step 6: Take the optimal initial solutions selected by IACA as the weights and thresholds of BPNN. The neural network is further trained until the exit status is reached.

Data

Data source

Chengdu, being the capital city of Sichuan province and one of the biggest cities in west China, has been regarded as commercial and cultural center for its strong economic growth and numerous cultural heritages. As an inland city located in the west of Sichuan Basin, with diversified landforms, the area covering 14335 km2 is mainly composed of fertile plains that are surrounded by mountains. Independent of effective measures that have been taken by local government, air pollutants remain to be problematic in certain seasons especially for PM2.5 and O3. Generally, there are plenty of distinctive factors which impact the air pollution levels in Chengdu such as multitudinous populations (20 million in 2020), a large number of industrial enterprises and ever-increasing motor vehicles within the city, unfavorable local meteorological conditions due to distinct geographical location and topographic condition such as a high frequency of quiet wind throughout the year, and noteworthy temperature inversion in autumn and winter which result in negative atmospheric dispersion and transport mechanisms (Alimissis et al. 2018). Specifically, as the site of the FISU World University Games in 2022, it is indispensable and beneficial to accurately forecast of PM2.5 and O3 concentrations such that regional air quality can be managed and controlled appropriately.

Fifteen citywide air quality stations were established by Ministry of Ecological Environment of China to monitor air quality trends, coupled with twelve meteorological monitoring stations (Fig. 3). In this study, two different air quality monitoring stations were selected, station A1 (Dashi Road West, a downtown station located in areas of heavy traffic) and station A2 (Long-Quan, a suburban station near the forest park), to validate the stability of the prediction model. The A1 station is located in the inner part of the Chengdu city, the density of population around this station is substantially higher as well. While the A2 station is far away from the downtown area, which is located in the Chengdu Forest Park, involving a country lane and higher percentage of open and green spaces. The concentration of air pollutants at A1 station are obviously higher than A2 station, except for O3, which is 1.5 times less than A2 station (Table 1). The hourly data of six air quality factors (PM2.5, PM10, NO2, SO2, O3, CO) are acquired automatically by each monitoring station to form a dataset ranging from 8 January 2019 to 8 November 2021, accompanied by five meteorological parameters including wind speed (WS), wind direction (WD), relative humidity (RH), temperature (TEMP), and atmospheric pressure (ATM) for the same period from the China Weather Website Platform, which is maintained by China Meteorological Bureau. The air quality data were matched corresponding to the closest meteorological parameters (Feng et al. 2015). The forecasting models under investigation will forecast PM2.5 and O3 concentrations over the next 24 h.

Fig. 3
figure 3

Location of the air quality and meteorological measurement sites in Chengdu

Table 1 Basic statistics for the air quality and meteorological parameters of the two stations

Data preprocessing

Due to the power failures or equipment breakdown, some data were missing. These missing data need to be properly processed to develop a well model. Rows with consecutive hourly gaps of more than 4 h of missing data were discarded; other missing data were supplemented. Given the datasets developed are sequences of random variables indexed by time, it is feasible to employ nearest neighbor interpolation method to re-fill the datasets, which is a method that interpolates the gray value with the nearest point (Hu et al. 2019). After that, a total of 24200 and 24200 hourly datasets left for A1 and A2 stations, respectively. The air quality and meteorological data involved in each station were randomly divided into training sets, validation sets, and test sets according to a ratio of approximately 6:2:2. Subsequently, random forest method was used to calculate the relative importance between input variables (meteorological and air quality factors) and output variables (PM2.5, O3).

To eliminate the adverse effects caused by the singular data and improve the convergence speed of the model, the datasets are linearly scaled to within the range [0,1] by adopting linear transformation through the normal standardization formula (3):

$${X}_{norm}=\frac{X-{X}_{min}}{X_{max}-{X}_{min}}$$
(3)

where Xnorm is the normal standardization data, X is the original value, and Xmax and Xmin are the maximum and minimum values of the series, respectively.

Data description

Based on the datasets aforementioned of the two sites, the basic statistics about the meteorological and air quality data were studied (Table 1).

The suburban station demonstrated significantly lower level of concentrations of pollutants, except for O3, for which concentration was 1.5 times higher than the downtown station. A couple of factors help to explain, one is the consumption of O3 in the oxidation process with nitrogen oxides emitted by vehicles, ozone precursors are carried by the wind to the suburbs is another. Mean values of PM2.5 and PM10 concentrations at the downtown station were 1.3 and 1.5 times higher than the suburban station, respectively, on account of the mass emission sources, while the standard deviation was similar between the two stations. Besides, both of the 4 indicators for meteorological parameters of the two sites were nearly the same.

Model performance metrics

Till now, many researchers have employed various statistical indices to verify the predictive performance of the models, according to the literature (Alimissis et al. 2018; Li et al. 2018; Yildirim and Bayramoglu 2006). In this study, four statistical indices were adopted: the mean absolute error (MAE), the mean absolute percentage error, the coefficient of determination (R-square), and the root mean-square error (RMSE) defined as

$$MAPE=\frac{1}{n}\sum \nolimits_ {i=1}^n\left|\frac{y_o-{y}_p}{y_o}\right|\ast 100\%$$
(4)
$$MAE=\frac{1}{n}\sum \nolimits_ {i=1}^n\left|{y}_o-{y}_p\right|$$
(5)
$$RMSE=\sqrt{\frac{1}{n}\sum \nolimits_ {i=1}^n{\left({y}_o-{y}_p\right)}^2}$$
(6)
$${R}^2=1-\frac{\sum_{i=1}^n{\left({y}_o-{y}_p\right)}^2}{\sum_{i=1}^n{\left({y}_o-{\overline{y}}_p\right)}^2}$$
(7)

where yo and yp represent the observed and predicted value, respectively. \({\overline{y}}_p\) is the average of observed value, and n represents the number of observed data. The superiority of the model performance was measured by smaller values of MAPE, MAE, RMSE, and values of R2 closest to one, all of the indices aforementioned manifests a model with high forecasting accuracy be related to the observed values in the testing data.

Screening of prediction indicators

The visual map of each input variable’s importance is shown in Fig. 4. It is important to note that this test was repeated for several times in order to assure that their selection was not biased. With regard to the results of RF, the first eight indicators were sorted out as the input variables of each model in this study, as the importance of each single indicator was greater than 0.005 and the sum of the eight indicators was accounting for nearly 99%.

Fig. 4
figure 4

The visual graph of input variables’ importance. a O3 at A1 station. b PM2.5 at A1 station. c O3 at A2 station. d PM2.5 at A2 station

The reason why PM10 had a strong positive correlation with PM2.5 is that PM2.5 is included in PM10 and they can transform to each other under specified conditions. Higher RH is favorable for particulate matter to adhere to water vapor, which increases the mass concentration of particles (Zhang et al. 2017). Meanwhile, increases in RH favor nitric acid partitioning to the aerosol phase and therefore can lead to nitrate concentration increases (Dawson et al. 2007). Increasing TEMP can lead to elevated sulfate concentrations due to the increased rate of SO2 oxidation (Jacob and Winner 2009). TEMP also has a significant indirect effect on secondary organic aerosol (SOA) concentrations (Megaritis et al. 2014). CO concentrations had significant positive effects on PM2.5, and this positive correlation was likely due to industrial emissions and exhaust fumes, which produce large amounts of PM2.5 as well as CO. On the contrary, given that PM2.5 can reduce the radiation flux during photochemical reactions, there was a significant negative nonlinear relationship between O3 and PM2.5 (Cheng et al. 2021).

O3 is a secondary product of the oxidation of hydrocarbons (CH4 and NMHCs) and CO via reactions catalyzed by HOX and NOX radicals (Jacob 2000), which helps to explain the high correlation of O3 with NOX and also weak correlation with CO. While the correlation of O3 with NOX at A2 station is relatively low compared with A1 station, it can be contributed to low NOX concentration. PM can affect the atmospheric photochemistry by scattering the solar and terrestrial radiation, indirectly altering the air temperature and subsequently affecting the formation process of O3 (Sharma et al. 2016). The relationship between O3 and temperature is indirect, which is realized through higher downward solar radiation, high temperature promotes the propagation rate of the radical chains and the formation of O3 (Tu et al. 2007; Martins et al. 2012). Furthermore, RH is also a vital factor for O3 formation; an appropriate RH can promote the formation of O3 (Xu and Zhu 1994).

Generally, O3 and PM2.5 showed little correlation with the WD, WS at both two sites. One reason may explain this phenomenon is that the variation range of the WD and WS during the study period was too limited (the annual average wind speed of Chengdu is less than 1.2m/s) to show distinctive effects on O3 and PM2.5 (Ahmad et al. 2019). AP affects the diffusion of O3 and PM2.5; if surface pressure field was mainly controlled by the huge clod high pressure, downdraft appears in the center, which inhibits the upward diffusion of O3 and PM2.5.

Results and discussion

Six models based on different artificial intelligence algorithms and different input variables were adopted to predict PM2.5 and O3 in two stations, respectively. There are four experiments in total; each experiment contains six different models. The input variables of each experiment were solely selected by RF method and applied to six different models. It should be mentioned that each experiment in this study was independently carried out 15 times and the performance metrics (MAPE, MAE, RMSE, and R2) were calculated on account of predicted and observed values subsequently.

Case study 1

Accuracy of various models for O3 forecasts at A1 station

The input variables screened out by RF were NO2 (0.378), TEMP (0.360), HM (0.182), PM2.5 (0.020), PM10 (0.016), AP (0.014), WS (0.009), and WD (0.008), respectively, for predicting hourly O3 concentration at A1 station. The results showed that NO2 made the greatest contribution, as one important predecessor of O3, which is an important material to produce O3 along with VOCs in the presence of heat and sunlight (Abdul-Wahab and Al-Alawi 2002, Heo and Heo and Kim 2004). Furthermore, temperature and humidity were also two main meteorological factors that affect atmospheric photochemical reaction of O3.

Table 2 shows the performances of different models; the proposed RF-IACA-BPNN model provides significantly better forecasts when compared with the other 5 employed models for O3 prediction in case study 1. The order of R2 among different models from highest to lowest were RF-IACA-BPNN, IACA-BPNN, ACA-BPNN, GA-BPNN, PSO-BPNN, BPNN. GA-BPNN, and PSO-BPNN models yielded roughly similar results in terms of the four metrics. Despite RMSE, the rest of the three statistical criteria of IACA-BPNN outperform those of ACA-BPNN, which demonstrates that improvements on pheromone updating rule and evaporation rate of pheromone could enhance the fitting ability of ACA-BPNN. Moreover, the prediction results of 6 models were presented in Fig. 5. It can be seen that the scatters of the RF-IACA-BPNN model are closest and most concentrated on the regression line.

Table 2 Performances of different models for O3 during the testing phases at A1 station
Fig. 5
figure 5

Comparison between predicted and observed O3 concentration using different models at A1 station during testing periods

Performance of various models for PM2.5 forecasts at A1 station

As can be seen from Fig. 4b, the most influential factors of PM2.5 were PM10 (0.692), CO (0.152), TEMP (0.093), NO2 (0.020), O3 (0.011), HM (0.010), SO2 (0.008), and AP (0.008), respectively. PM10 contains PM2.5, the difference between the two is the aerodynamic diameter, which exactly explains the highly correlation of PM10 to PM2.5 (Biancofiore et al. 2017). As an important component of automobile exhaust, CO also has an impact on the concentration of PM2.5, meanwhile, temperature can also influence PM2.5 concentration by affecting boundary layer height.

The statistical metrics are listed in Table 3 and it can be seen more clearly in Fig. 6, the optimal indicator is marked in bold. The 3 metrics of RF-IACA-BPNN outperform the other 5 models. In the case of MAE, the RF-IACA-BPNN model is decreased by 2.06% compared with BPNN and by 0.87% compared with IACA-BPNN. It can be noted that the performance of this model with two optimizations for ACA along with screening process was significantly improved.

Table 3 Performances of different models for PM2.5 during the testing phases at A1 station
Fig. 6
figure 6

Comparison between predicted and observed PM2.5 concentration using different models at A1 station during testing periods

Case study 2

Performance of various models for O3 forecasts at A2 station

Factors highly associated with O3 included HM (0.492), TEMP (0.234), AP (0.112), NO2 (0.084), PM2.5 (0.025), PM10 (0.020), WS (0.013), and WD (0.009), which is nearly the same with O3 at A1 station but in a different order (Fig. 4c). Although the smallest MAE (11.534) appeared in the ACA-BPNN, the RF-IACA-BPNN model performs 4.82%, 22.26%, and 2.66% superior in determining RMSE values, MAPE values, and R2 values, respectively, when compared with BPNN model (Table 4).

Table 4 Performances of different models for O3 during the testing phases at A2 station

The prediction results of the RF-IACA-BPNN model are the closest to the actual values along with that even as peaks or valleys where the concentration fluctuates greatly, denoting that the RF-IACA-BPNN model performs the best (Fig. 7). In allusion to the scatter correlation figure, the regression line indicates the observed value is exactly the same as the predicted value; therefore, the closer the scatters are to this line, the better the performance (Sun and Li 2020a, b). It can be observed that the scatters of the RF-IACA-BPNN model are hithermost and most centralized on the regression line.

Fig. 7
figure 7

Comparison between predicted and observed O3 concentration using different models at A2 station during testing periods

Performance of various models for PM2.5 forecasts at A2 station

In this suburban station, the factors most closely related to PM2.5 were PM10 (0.740), CO (0.138), NO2 (0.043), TEMP (0.041), HM (0.011), O3 (0.008), AP (0.008), and SO2 (0.006), respectively. The components and order of its relative importance are very similar to that A1 station.

The observed and predicted PM2.5 values during testing phase at A2 station were presented in Fig. 8, coupled with the 4 performance metrics among 6 models (Table 5). It was found that models with improvement had better performance and were more easily and faster to converge to a better solution than a plain BPNN model; the screening process of input variables is also helpful in improving prediction performance. For example, the RF-IACA-BPNN model exhibits a 36.99% and 29.80% decrease in MAPE compared with the basis models IACA-BPNN and BPNN, respectively. Although the best metrics of MAE (3.280) did not appear in RF-IACA-BPNN model, however, the values of R2 exhibit 1.39%, 1.61%, 1.61%, 4.06%, and 5.68% increase compared with the models IACA-BPNN, ACA-BPNN, GA-BPNN, and PSO-BPNN, BPNN, respectively.

Fig. 8
figure 8

Comparison between predicted and observed PM2.5 concentration using different models at A2 station during testing periods

Table 5 Performances of different models for PM2.5 during the testing phases at A2 station

Comparison of the same pollutant between two stations

To verify the performance of the same model against data based on different kinds of monitoring sites, here, the performance of PM2.5 and O3 are compared at the two stations, respectively.

With regard to the forecasting results of O3, the MAPE of each model at A2 station (0.319, 0.347, 0.252, 0.491, 0.335, and 0.248) are smaller than that of corresponding 6 models that made it up at A1 station; it is the same with R2. The R2 of RF-IACA-BPNN at A1 station (0.912) is obviously higher than that of A2 station (0.887). As for PM2.5, the MAE and RMSE values at A1 station significantly higher (p < 0.05) than those corresponding models at A2 station. While R2 values at A2 station are generally higher than that of A1, furthermore, the R2 of RF-IACA-BPNN increased by 0.74% compared with A1 station.

Overall, for PM2.5, the prediction results of the models at A2 station are more acceptable than those at A1 station, while the prediction results of the models showed a better performance at A1 station for O3. Part of the reason lies in variability of data. The mean value and standard deviation of PM2.5 at A1 station are much higher than that of A2 station (Table 1). And the situation for O3 is just the opposite.

Stability of the proposed model

In order to comprehensively test the robustness of the new air quality forecasting model, 10 additional sites were selected to verify the robustness of the model. Five of them are downtown sites (Di, i = 1, 2, 3, 4, 5) which close to the city center, and the others are suburban sites (Si, i = 1, 2, 3, 4, 5) far from the highways and population centers. The stability test is implemented in this part according to Eq. (8). It is known that the stability of the forecasting performance can be indicated by the variance (Var) of the forecasting error (Sun et al. 2020, Hao et al. 2019, Wang et al. 2017a, b). Generally, a smaller variance represents a more stable model. The stability test results are shown in Table 6. As for PM2.5, the Var values of the proposed model at the 5 downtown sites are obviously higher than the Var values at the 5 suburban sites. The average Var value of downtown sites is 0.142, which is 73.17% more than that of suburban sites. As for O3, conversely, the average Var value of downtown sites is 1.320, which is 4.90% less than that of suburban sites. In conclusion, the model can be more stable when it was used to forecast hourly PM2.5 concentration at suburban sites. It’s just the opposite for O3, which had a higher stability at downtown sites.

$${S}_{Var}=\mathit{\operatorname{var}}\left(\left|\frac{y_o-{y}_p}{y_o}\right|\right)$$
(8)
Table 6 Stability test results at other 10 sites

Conclusions

Many studies have put their efforts to unilaterally improve the accuracy of air quality forecasting through various optimization algorithms. Nevertheless, few did it from the perspective of addressing original datasets, i.e., screening out highly related input variables, which have a significant impact on the forecasting precision and the training process of ANNs. In this study, a new hybrid method based on improved ACA and BPNN model was proposed, with the combination of screening ability of RF method to forecast hourly concentration of PM2.5 and O3. Two datasets based on two different types of monitoring stations were used to compare the forecasting performance of RF-IACA-BPNN with those of other models and to verify the feasibility and effectiveness of ACA optimization and screening process. The following conclusions are drawn.

  1. (1)

    With the determination of relative importance of input variables using RF method, results showed that the factors that most affected O3 were similar at both downtown and suburban stations, so is the case with PM2.5. As for O3, the five factors that have the greatest influence on ozone concentration were NO2, TEMP, HM, PM2.5, PM10, respectively, while the top five highly corelated factors to PM2.5 were PM10, CO, TEMP, NO2, O3, respectively.

  2. (2)

    In the case study 1, MAE and R2 of the RF-IACA-BPNN model for O3 were 10.448 and 0.912. As for PM2.5 at downtown station, 3 statistical criteria of RF-IACA-BPNN outperform other 5 models. And in the case study 2, MAPE, RMSE, and R2 of the proposed method for O3 were 0.248, 15.448, and 0.887, in allusion to PM2.5 at suburban station; the value of MAPE and R2 were 0.172 and 0.949, respectively. It is concluded that the proposed model is the ideal one compared with the other plain models.

  3. (3)

    On the whole, for PM2.5, the prediction results of the models at A2 station are more acceptable than those at A1 station, while the prediction results of the models showed a better performance at A1 station for O3, especially for RF-IACA-BPNN model, on account of the low variability at suburban station.

  4. (4)

    The model can be more stable when it was used to forecast hourly PM2.5 concentration at suburban sites. It’s just the opposite for O3, which had a higher stability at downtown sites.