Introduction

Air pollution is one of the major environmental challenges affecting the health condition of many people living in the urban areas due to increased industrial activities and urbanization. About 91% of the world population are believed to be exposed to polluted air causing premature death of almost 4.2 million people annually (WHO (World Health Organization) 2018). Particulate matters (PM2.5 and PM10), ozone (O3), nitrogen dioxide (NO2), carbon monoxide (CO) and sulphur dioxide (SO2) were identified as the most hazardous ambient air pollutants (Uzoigwe et al. 2013). PM2.5 acts as the major indicator for the air quality monitor system (Van Donkelaar et al. 2006). These toxic substances can be breathed into the lungs and distribute throughout the body as blood circulates due to their extremely small volume. Additionally, the increase of PM2.5 and PM10 concentration can lead to declining visibility, contributing to adverse impacts on the transportation industry (Sun and Li 2020). These problems can effectively be reduced by careful application of a good urban air quality management (UAQM). The fundamental elements of the UAQM involve clear description of objectives and standards, well-designed monitoring system, reliable air quality modelling, emission inventory, source apportionment, health exposure assessment, control strategies and public participation (Gulia et al. 2020). A reliable air quality model can provide required information for analysis and management of the air quality parameters which will helps stakeholders in decision making regarding issues related to UAQM budget and selection of potential mitigation measures required to reduce the pollution crisis and public health (Suleiman et al. 2019). The factors influencing the concentrations of the air pollutants can be classified into traffic-related factors, background concentration, meteorological and geographical factors (Cai et al. 2009).

Various mathematical models for advection and reactions of the air pollutants were proposed for forecasting the time-varying concentration of air pollutants in urban areas, e.g., steady-state Gaussian plume models. However, the diversity and complexity of the processes (physical and chemical) involved in both formation and transportation of the air pollutants in the urban areas make the application of these models very challenging or impossible in some situation. This is because large database and good understanding of the formation processes are required for application of the empirical methods, and in some cases the data is not available or insufficient (Arhami et al. 2013).

Motivated with the efficiency of artificial intelligence (AI)-based models in the prediction of complex processes in the fields of engineering, several AI-based models were developed for the prediction of air quality parameters. For instance, Arhami et al. (2013) developed an ANN model for the prediction of hourly criteria pollutants (NOx, NO2, NO, O3 and PM10) in an urban environment using wind direction, wind speed, relative humidity and air temperature as input variables. Suleiman et al. (2016) applied both ANN and boosted regression trees (BRT) to predict the concentration of PM2.5, PM10 and particle number count (PNC) at Marylebone road in London. The BRT model demonstrated higher efficiency over the ANN model. Azeez et al. (2019) integrated GIS into a hybrid model combining ANN and the correlation-based feature selection (CFS) algorithm for prediction of vehicular CO emissions. For comparison, Mehdipour et al. (2018) applied three different AI methods namely Bayesian network (BN) and decision tress (DT) support vector machines (SVM) for prediction of PM in Tehran. The model input parameters were temperature, precipitation, wind speed, nebulosity, relative humidity, sunshine, O3, PM10, SO2, NO2 and CO. The SVM has demonstrated higher prediction capability than both BN and DT. Krishan et al. (2019) used meteorological data, transport emissions, traffic data and air quality parameters to model hourly concentration of air quality indicators in Delhi, India using the long short-term memory (LSTM) approach. The AI models have demonstrated a high accuracy in the prediction of the air quality parameters (Cai et al. 2009). This is because the AI models are capable of handling multivariate inputs, nonlinearity and uncertainty of complex processes without requiring prior assumptions between the input parameters.

Although the mentioned AI models (ANN, SVM, ANFIS, etc.) provide higher prediction capability than both empirical and conventional multilinear regression (MLR) models, it is known that different models may lead to different outcomes for a particular problem depending on the conditions. Therefore, combining the outputs of the different models through an ensemble approach will provide outputs with lesser error variance compared to the single models (Nourani et al. 2019). The ensemble approach combines the unique features of the constituent models to come out with a better pattern of the presented database (Sharghi et al. 2018). The objective of this study is to present and also apply a novel neuro fuzzy ensemble (NS-E) technique for improved performance in the prediction of PM2.5 and PM10. The objectives could be achieved in three steps. First, selection of the dominant input parameters relevant in the prediction of the PM2.5 and PM10. Secondly, development of 4-single black box models (ANN, ANFIS, SVM and MLR). Finally, the NS-E models and two linear ensemble models were developed by combining the predicted outputs from the 4 different black box models developed in stage 2. This study presents the first application of the novel NS-E technique for the prediction of the PM2.5 and PM10 to the best of the authors’ knowledge. The selection of PM2.5 and PM10 for conducting the study was based on the strong adverse effect they have on human health as reported by Uzoigwe et al. (2013) and their major role in defining air quality (Sun and Li 2020).

Materials and methods

Data

For purposes of conducting the study, hourly data from the air quality monitoring site along the Marylebone road in central London was obtained from January 1, 2007 to December 31, 2007. Marylebone was selected for conducting the study due to its high average daily traffic of about 75,000 veh/day (Jones and Harrison 2005) since 64% of the PM was reported to come from the vehicular traffic (European Environment Agency 2012). The monitoring station was located at approximately 1.5 m from the road (southern side of road). Simultaneously with air pollutants (O3, NO, NO2, NOx, CO, SO2, PM10, PM2.5), traffic data (volumes of buses, cars and taxis, motorcycles, light commercial vehicles, pedal cycles and heavy goods vehicles) and the speed and the meteorological data (wind speed, wind direction and temperature) were recorded at the monitoring site (Jones and Harrison 2005). The data was made available for download at the UK air quality data archive (https://uk-air.defra.gov.uk/data/maryleboneroad). The traffic data was collected using high accuracy induction tubes for classification and counting buried on each lane. Two tapered element oscillating microbalance (1400AB model) each equipped with a different sampling head were used for monitoring the concentration of the PM2.5 and PM10 at the sampling location (Jones and Harrison 2005). The descriptive statistics measured data was presented in Table 1.

Table 1 Descriptive statistics of the data

Data preparation and performance evaluation

To ensure all input variables receive equal attention in black box models, the data are usually normalized to the same range usually between zero and unity. The normalization makes the data dimensionless during training and prevents overshadowing of the parameters in the lower numeric range by those in the higher numeric range. It also helps in reducing the computational difficulties of the model. In this study, the data are normalized between 0 and 1 using (Nourani et al. 2012):

$$ {P}_i=\frac{P-{P}_{\mathrm{min}}}{P_{\mathrm{max}}-{P}_{\mathrm{min}}} $$
(1)

where, Pi is the normalized value, P is the measured value, and Pmax and Pmin are the maximum and minimum measured concentrations, respectively.

Three statistical performance measures were used for evaluating the performance and efficiency of the models developed for predicting the PM2.5 and PM10. The statistical performance measures are Nash-Sutcliffe efficiency (NSE) which measures the model’s goodness of fit, mean absolute error (MAE) which evaluates the absolute mean error of the models and bias (BIAS) which reflects how much the computed value deviates from the observed value. Legates and McCabe Jr (1999) suggested that one absolute error measure and one goodness of fit measures can sufficiently evaluate the performance of prediction models. The performance criteria were computed using Eqs. 24, respectively (Nourani and Fard Sayyah 2012). The model’s accuracy can be interpreted based on the NSE values as very good (0.75 < NSE ≤ 1), good (0.65 < NSE ≤ 0.75), satisfactory (0.50 ≤ NSE ≤ 0.65) and unsatisfactory (NSE < 0.50) (Moriasi et al. 2007). The closer MAE and BIAS values approach 0, the better the model’s prediction.

$$ \mathrm{NSE}=1-\frac{\sum_{i=1}^n{\left({P}_{obs_i}-{P}_{pre_i}\right)}^2}{\sum_{i=1}^n{\left({P}_{obs_i}-\overline{P_{obs_i}}\right)}^2}\kern0.5em -\infty <\mathrm{DC}\le 1.0 $$
(2)
$$ \mathrm{MAE}=\frac{\sum_{i=1}^n\vert {P}_{obs_i}-{P}_{pre_i}\vert }{n} $$
(3)
$$ \mathrm{BIAS}=\frac{\sum_{i=1}^n\left({P}_{obs_i}-{P}_{pre_i}\right)}{\sum_{i=1}^n\left({P}_{pre_i}\right)} $$
(4)

where n represents the number of observations, \( \overline{P} \)obs is the mean value of the observed value, Pobs is the observed value and Ppre is the predicted value.

Proposed methodology

The study was conducted in three major steps as shown in Fig. 1. In the first step, the most relevant input parameters for developing the base models were selected using a single-input single-output neural network. In the second step, ANN, ANFIS, SVR and MLR models were proposed for the estimation of the air quality parameters (PM2.5, PM10). Finally, the NS-E model and two linear ensemble models combining the outputs of the proposed base models (ANN, ANFIS, SVR, and MLR) were proposed for enhanced performance in the estimation of PM2.5 and PM10.

Fig. 1
figure 1

Schematic of the proposed methodology for air quality parameters

The notion behind developing ensemble models is for achieving the following benefits: (i) Sometimes, it is difficult to select an appropriate model for modelling a particular time series problem; by the ensemble approach, the difficulty in model selection has been removed since the nonlinear ensemble models are capable of providing a result that is even better than that of the best base model (Nourani et al. 2020a). (ii) In certain real-life processes that possess both linear and nonlinear characteristics, neither linear nor nonlinear models can do well in the prediction since errors in the linear pattern could be inherited and magnified by the nonlinear models and vice versa. By combining the outputs of linear models (MLR) and the nonlinear models (ANN, SVR, ANFIS), the linear and the nonlinear patterns in the data could be captured effectively (Nourani et al. 2019). (iii) There is no particular model to perfectly investigate a certain process as approved by Sharghi et al. (2018). This is due to the complex nature of real-world problems whereby a unique model may not be able to identify a distinct pattern of a particular process.

Selection of relevant input parameters

The performance of all black models depends on the selection of appropriate input variables. Imposing many input parameters into the model will increase the complexity of the model, decrease the computational accuracy and increase the time required to train the model (Ahmed and Pradhan 2019). On the other hand, an insufficient number of input parameters will also result in poor performance of the model. Therefore, an optimum number of input parameters is required in developing a model with high estimation accuracy. Traditionally, the Pearson correlation matrix is used for selecting dominant input parameters, but the method had been criticized because correlations are built on linear relationships and most of real-life processes are complex and nonlinear in nature (Nourani et al. 2014). In view of that, single-input single-output nonlinear sensitivity analysis trained with feed forward neural network and a mutual information (MI) measure which computes the statistical dependency between variables based on entropy function were used in addition to the Pearson correlation matrix for determining the relevant input parameters. The single-input single-output nonlinear linear sensitivity analysis was evaluated using the NSE of the ANN model in the verification stage.

Black box models

FFNN

FFNN is among the most commonly used ANN models employed for capturing the nonlinearity and complex interaction between the predictor and response parameters (Jahani and Mohammadi 2019). FFNN gets the name from the manner in which information is transmitted, that is information only flows in the forward direction (Ghaffari et al. 2006). This type of ANN acquires its acceptance due to its simplicity in modelling and capturing nonlinear pattern in complex problems (Rumelhart et al. 1986). The suitability of the model to learn from experience without the need to necessarily identify the physical connection between the predictor and explained variables makes it effective and vital in modelling complex processes in many engineering fields (Kumar et al. 2014). In the FFNN, an interactive link between neurons is used to process the information and establish a relationship rather than build any complex mathematical model. The most widely used algorithm for training the FFNN is the backpropagation algorithms. To train the FFNN model, some attuned weights are initialized and multiplied by the inputs, the cumulative results then passed through the transfer function to handle the nonlinear pattern in the data before giving out the output values (Ghaffari et al. 2006). The architecture of the FFNN as shown in Fig. 2 consists of one input layer and one output layer connected by an intermediary hidden layer(s). All the nodes in any layer are only connected to the nodes of the immediately succeeding layer (Kim and Singh 2014). The general expression for the ANN model is given by

Fig. 2
figure 2

Structure of the three-layer FFNN (Wang et al. 2015)

$$ {\mathrm{y}}_{\mathrm{i}}={\sum}_{i=1}^n{w}_{ji}{x}_j+{b}_{i0}. $$
(5)

SVR

The SVM learning was first proposed by Vapnik (1998) and was proved capable of providing a reasonable and acceptable solution to the prediction, classification, pattern recognition and regression problems. It is one of the data-driven machine learning approaches. The two useful functions of the SVM models that differentiate it with other machine learning approaches like the ANN are minimization of structural risk and statistical learning theory. The SVR which is one of the SVM-based models is employed for the nonlinear regression problems that consider the minimization of operational risk as its objective function rather than error minimization between the predicted and measured values that is considered in other data-driven models like ANN models. In SVR, the data is first mapped into a linear regression which is then passed through a nonlinear kernel that captures the nonlinearity pattern in the data. For more details, the readers are referred to Wang et al. (2015) and Nourani et al. (2020b) about SVR modelling. Figure 3 gives the general structure of the SVR model. The SVR equation can be expressed as (Wang et al. 2015):

$$ f\left(x,{\alpha}_i,{\alpha}_i^{\ast}\right)={\sum}_{i=1}^N\left({\alpha}_i-{\alpha}_i^{\ast}\right)K\left(x,{x}_i\right)+b $$
(6)

where x represents the input vector, αi and αi* are the Lagrange multipliers, k(xi, xj) is the kernel function performing the nonlinear mapping into feature space and b is bias term. Gaussian radial basis function (RBF) kernel is the most commonly used kernel in the SVR and is given as

$$ k\left({x}_1,{x}_2\right)=\mathit{\exp}\left(-\gamma {\left\Vert {x}_1-{x}_2\right\Vert}^2\right) $$
(7)

where, γ is the kernel parameter.

Fig. 3
figure 3

Conceptual architecture of SVM algorithm

ANFIS

ANFIS is a hybrid model fused by Jang in 1993 for overcoming the limitations of both the ANN and the FIS. It combines both power of the fuzzy logic in dealing with the uncertainties and learning ability of the ANN. The ANFIS model is built on a fuzzy logic definition, and the system parameters are optimized automatically by the ANN unlike in the fuzzy system where the system parameters are manually tuned (Rai et al. 2015). The ANFIS proved to be a good useful tool for approximation problems due to its adaptive capability and flexibility in dealing with uncertainties and ability in processing huge noisy data from complex and dynamic systems (Çaydaş et al. 2009). The architecture of the ANFIS model (Fig. 4) consists of five layers constructed like a multi-layer feed forward neural network. The layers are named according to their operative function (Codur et al. 2017). ANFIS uses backpropagation algorithm for learning the parameters of membership functions and conventional least-squares estimator for estimating the parameter of the first-order polynomial of the Takagi-Sugeno fuzzy model. The overall output of the ANFIS system can be expressed as a linear combination of the consequent parameters (Çaydaş et al. 2009).

Fig. 4
figure 4

First-order type Sugeno FIS and ANFIS model structure (Jang 1993)

MLR

The most commonly used method for the prediction and analysis of engineering problems is the MLR. It helps understand the linear dependency between the predictor and the dependent variables. It explores the interaction between the variables and describes the relationship between them by keeping the independent variables fixed and varying one (Doǧan and Akgüngör 2013). The dependent variable y and n regressor variables can be correlated by (Elkiran et al. 2018):

$$ y={b}_0+{b}_1{x}_1+{b}_2{x}_2+{b}_3{x}_3+\dots +{b}_i{x}_i+\xi . $$
(8)

In Eq. 8, xi represents the value of the ith predictor, bi stands for the coefficient of the ith predictor, b0 is the constant of regression and ξ is the error term.

Ensemble approach

Ensemble approach is a machine learning approach used to merge the process of multiple predictors for an enhanced performance of the prediction process (Sharghi et al. 2018). The ensemble approach could either be linear or nonlinear (Raj Kiran and Ravi 2008). In the linear ensemble approach, simple average (SA), weighted average (WA) or weighted median (WM) are used to ensemble the results obtained by individual predictor models, while in the nonlinear approach, nonlinear kernels such as ANFIS, ANN, SVR, etc. are used to obtain the nonlinear average of the results obtained by the individual base models. The input layer of the ensemble technique is fed by the outputs of the considered models, each considered as one input variable (Nourani et al. 2018). The use of the ensemble approach for prediction, clustering and classification in the several engineering fields proved to provide higher accuracy than the individual models (Shtein et al. 2019; Nourani et al. 2020a). For the nonlinear ensemble approach employed for PM10 and PM2.5 prediction in this study, an ANFIS model was trained using the gbell function and a hybrid algorithm for nonlinear averaging of the values predicted using the base models. The predicted PM10 and PM2.5 obtained using the four base models (ANFIS, ANN, SVR, MLR) were fed into the input layer of the ANFIS model, and the corresponding PM concentrations were obtained.

For comparing the performance of the nonlinear ensemble technique, two linear ensembles, SA-ensemble (SA-E) and WA-ensemble (WA-E), were also developed for the prediction of both PM10 and PM2.5. In the SA-E, the arithmetic means of the predicted PM10 and PM2.5 concentrations were computed using Eq. 9. In the WA-E, the predicted PM10 and PM2.5 concentrations are computed by giving distinct weights to the outputs of the base models based on their relative importance. The weight is assigned based on the relative significance (NSE value) of the output. The WA-E is expressed by Eq. 10:

$$ \overline{P}=\frac{1}{N}{\sum}_{i=1}^{n_m}{P}_i $$
(9)
$$ \overline{P}={\sum}_{i=1}^{n_m}{w}_i{P}_i $$
(10)

in which \( \overline{P} \) shows the outcome of ensemble technique, nm is the number of models used (nm = 4) and Pi stands for the outcome of the ith method (i.e. ANN, ANFIS, SVR and MLR); wi is the applied weight on the output of the ith model and is determined by

$$ {w}_i=\frac{NSE_i}{\sum_i^{n_m}{NSE}_i}. $$
(11)

NSEi is the performance efficiency of the ith base model.

Results and discussions

Selection of relevant input parameters

Accuracy in selecting relevant input parameters for developing black box models is crucial since the accuracy and complexity of the model depend heavily on the models’ structure. In view of that, two nonlinear measures (single-input single-output sensitivity analysis evaluated by NSE and MI value between the parameters) were used for obtaining the dominant input parameters. PCC values between the potential input parameters and the responses (PM2.5 and PM10) were also computed for incorporating parameters that have a strong linear relationship with the PM into the models. The relevance of the parameter increases as the performance coefficient value of MI, NSE and PCC approaches 1. The parameters having an MI value > 0.2 or NSE value > 0.4 value or PCC values > ±0.5 were considered relevant and hence included in the models. Based on the set criteria, PM2.5-1, NOx, NO, NO2, CO, SO2, WS, S, Q, T and CLS2 were found to be relevant in the prediction of the PM2.5. From Fig. 5, it is clear that the background level of the PM2.5 (PM2.5-1) has the highest relevance in the prediction of the PM2.5 with an MI, NSE and PCC values of 0.51, 0.81 and 0.90, respectively. These findings were supported by several studies; for example Suleiman et al. (2016) found background PM2.5 to be the most relevant factor in predicting the PM2.5 in Marylebone, London followed by NO. NO2 and NOx were also identified to be the second most important factors in the prediction of the PM2.5 after vehicle emission (Suleiman et al. 2019). Yazdi et al. (2020) found average city wide PM2.5 and the average wind speed to be the most relevant parameters in the prediction of PM2.5 with 66.75% and 6.36% contributions, respectively.

Fig. 5
figure 5

MI, PCC and NSE coefficients indicating the relationship between the input parameters and PM2.5

Figure 6 showed that, PM10-1, NOx, NO, NO2, CO, SO2, WS, S, Q, T and CLS2 were also found to be the most important factors in the prediction of PM10 in the study area with PM10-1 being the most significant followed by NOx, NO and NO2 concentrations. Background concentration was identified to be the most relevant factor due to positive autocorrelation existing in the PM10 time series (Paschalidou et al. 2011). Other air pollutants, such as NO, NO2, CO and SO2, were also reported by Whalley and Zandi (2016) to provide a good prediction of PM10 when combined with metrological parameters like T and WS.

Fig. 6
figure 6

MI, PCC and NSE coefficients indicating the relationship between the input parameters and PM10

Base (single) models

In the second phase of the study, the dominant input parameters for the prediction of PM2.5 and PM10 obtained in stage 1 were used for the development of three AI-based models (ANN, ANFIS and SVR). The model’s efficiencies were evaluated using NSE, MAE and BIAS. The best model is considered as the model with the highest NSE value, least MAE value and BIAS closer to 0. Matlab 2019a was employed for development of all the models. The validation of the models was done using a 10-fold cross validation technique. According to Nourani et al. (2020b), obtaining an optimal structure is essential for any ANN-based models, as such several ANN models trained with Levenberg Marquardt algorithm and sigmoid transfer function to predict PM2.5 and PM10 were developed by changing the number of hidden neurons (8–23) using the 11 dominant input parameters. The range for the number of hidden neurons in the ANN was selected based on the range \( \left(2{n}^{\raisebox{1ex}{$1$}\!\left/ \!\raisebox{-1ex}{$2\ $}\right.}+m\right) \) to (2n + 1) given by Fletcher and Goss (1993) where n is the number of input neurons and m is the number of neurons in the output layer. The optimum ANN model was obtained with 12 and 14 hidden neurons for PM2.5 and PM10, respectively. For the ANFIS model, a matlab code was developed, and several models using the hybrid optimization algorithm were trained with different membership functions where the best model was obtained using the “gbell” membership function. The SVR on the other hand was trained with a radial basis function (RBF). The RBF kernel was selected for the SVR model due to a fewer number of parameters to be calibrated than the polynomial and the sigmoid kernel functions. Sharghi et al. (2018) also hinted that the RBF kernel mostly provides better performance than the polynomial and the sigmoid kernels. For comparison, a linear model (MLR) was also used for the PM prediction. The result of the best models was given in Table 2.

Table 2 Results of the base models for the PM2.5 and PM10

From Table 2, it can be seen that all the AI-based models give a very good performance in the PM2.5 prediction based on the NSE values (>0.75) in both training and testing stages. The result also demonstrated the higher prediction capability of the ANFIS model with NSE, MAE and a BIAS value of 91.03%, 2.26 μg/m3 and 0.09, respectively in the testing stage. The ANN model ranked second in terms of prediction efficiency and lastly SVR with an NSE and MAE values of 85.86%, 80.41% and 3.02 μg/m3, 3.79 μg/m3, respectively. Scatter plots between observed, and the computed values in training (Fig. 7) and testing stages (Fig. 8) show that the data was more compacted along the bisector line of the ANFIS plot indicating higher goodness of fit by the ANFIS model. The higher performance of the ANFIS model compared to other models is due to the combined power of the ANN model and the fuzzy logic in prediction. The stability of the models assessed by comparing the NSE values of the models in the training and testing stages found the SVR model to be more stable with 1.1% decrease in the NSE values followed by ANFIS (2.5%). The high stability of the SVR model in prediction has been reported by Fan et al. (2018). Compared to MLR, all the AI-based models have shown higher performance than the MLR with an improved performance of 17%, 12% and 6.6% for ANFIS, ANN and SVR, respectively. The superiority of the ANFIS model over ANN and SVR in PM2.5 prediction was also reported by Yeganeh et al. (2017).

Fig. 7
figure 7

Scatter plots between observed and computed PM2.5 in the training phase for a ANN, b SVR, c MLR and d ANFIS

Fig. 8
figure 8

Scatter plots between observed and computed PM2.5 in the testing phase for a ANN, b SVR, c MLR and d ANFIS

All the models including the MLR have shown a very good accuracy in PM10 prediction with an NSE value >0.75 in the testing stage. The results indicated higher performance of the ANFIS model (NSE = 95.40% and MAE = 3.03) in the testing stage followed by the SVR model (NSE = 81.44% and MAE = 6.03) and finally ANN. Figures 9 and 10 have also indicated better goodness of fit of the ANFIS model. The ANFIS model was found to be more stable with a NSE decrease of 1.6% between the training and testing stages. The high accuracy of the ANFIS model in predicting PM10 in this study was supported by the study conducted by Prasad et al. (2016). Comparing the performance of ANFIS, ANN and SVR models with MLR has shown an improved performance of 17%, 3.1% and 1.5%, respectively.

Fig. 9
figure 9

Scatter plots between observed and computed PM10 in the training phase for a ANN, b SVR, c MLR and d ANFIS

Fig. 10
figure 10

Scatter plots between observed and computed PM10 in the testing phase for a ANN, b SVR, c MLR and d ANFIS

The results obtained show that both PM10 and PM2.5 could be modelled with minimum error using the ANFIS model. Higher MAE values in PM10 models compared to PM2.5 are due to the higher data range and standard deviation of the PM10 data compared to the PM2.5 data. Except for the ANN model, the PM10 models have higher NSE and lower BIAS than the PM2.5 models indicating higher accuracy of PM10 models. Although ANFIS showed higher prediction accuracy in terms of NSE, the MAE is high and needs to be minimized.

Ensemble techniques

The ensemble modelling technique was employed to combine the advantages of the individual models for improved accuracy in prediction. The ANFIS model being the most robust base model in this study was used for nonlinear averaging of the predicted PM2.5 and PM10 for enhanced prediction. The NF-E model for both PM2.5 and PM10 was trained using the “gbell” function and a hybrid training algorithm. WA-E and SA-E were also developed for comparing the performance of the NF-E. Only results of the best models reported using the single models were used in the ensemble approach. The ensemble result was given in Table 3. It can be seen that NF-E performed better than all the ensemble models given NSE values of 0.9594 and 0.9865 in the testing stage for the PM2.5 and PM10, respectively. The WA-E and SA-E gave NSE values lower than the best single model (ANFIS); this is because in any linear averaging, the resulting value is always lower than the highest number (Nourani et al. 2020a). The accuracy of the ensemble models was compared by a radar plot (Figs. 11 and 12), and the result demonstrated a higher accuracy of the NF-E with the smallest NSE change between the training and testing stages. The modelling results of the models (single and ensemble) were further compared using a Taylor diagram (Figs. 13 and 14) which is a comprehensive tool for comparing models’ performances using three statistical measures (RMSE, R and standard deviation). In the Taylor diagram, the azimuthal position gives the correlation between the actual and the computed values. The RMSE values are directly proportional to the distance between the observed and the predicted fields having the same unit with the standard deviation. For any increase in correlation, the value of the RMSE is decreased. The standard deviation of the pattern increases with increasing radial distance measured from the origin (Taylor 2001). A model is said to be a perfect model by a reference point when its correlation coefficient is 1 (Yaseen et al. 2018). If the standard deviation of the computed values is greater than the standard deviation of the observed values, then it may lead to overestimation and vice versa; hence, standard deviation close to that of the actual data is always preferred. From Figs. 13 and 14, it is clear that NF-E outperformed all in the models in the prediction of PM2.5 and PM10 with highest R values and lowest RMSE value and value of standard deviation closer to that of the actual data. The improved performance of prediction models in the prediction of PM2.5 and PM10 using the ensemble technique has been proved by several studies including the study by Shtein et al. (2019) using the generalized additive model ensemble model. An improved performance in the prediction of PM2.5 concentration using BP-NN ensemble (Feng et al. 2019) and feature extraction and stacking-driven ensemble (Sun and Li 2020) supported the findings of this study. Maciąg et al. (2019) also found the clustering-based ensemble to improve the prediction accuracy of PM10 concentration in London. The higher performance of the ensemble approach is due to its ability to combine the unique advantage of each of the base models.

Table 3 Results of the ensemble modelling
Fig. 11
figure 11

Radar plots comparing the NSE values of the PM2.5 ensemble models in the training and testing stages

Fig. 12
figure 12

Radar plots comparing the NSE values of the PM10 ensemble models in the training and testing stages

Fig. 13
figure 13

Taylor diagram representing different statistical parameters of the PM2.5 models

Fig. 14
figure 14

Taylor diagram representing different statistical parameters of the PM10 models

Conclusion

The study proposed a novel nonlinear ensemble approach for the prediction of PM2.5 and PM10 concentration in Marylebone, London. The NF-E involves three main stages, that is relevant input selection via MI, PCC and sensitivity analysis, single modeling and lastly ensemble modeling. Findings from the sensitivity analysis revealed NOx, NO, NO2, CO and SO2 to be the most relevant air pollutants in the prediction of PM2.5 and PM10 concentration after the background concentration, while the most relevant metrological parameters were found to be WS and T. Q and CLS2 traffic were found to be the most important traffic-related parameters. The result of the ensemble models revealed higher prediction accuracy of the NF-E than all the models (linear ensemble and single models) which depending on the model could enhance the performance of the base models by 4–22% and 3–20% for PM2.5 and PM10, respectively at the testing stage. Higher prediction accuracy demonstrated by the proposed methodology was due to the careful selection of the relevant input parameters in the single modelling stage and combining the unique features of the four base models in the ensemble stage. Although the NF-E estimated both the PM2.5 and PM10 concentration with high accuracy, careful selection of the base models to be used for the ensemble could be a major limitation of the methodology, since the efficiency of the ensemble models depends heavily on the results obtained using the base models. In other words, including the result of a model with a low performance could result to a lower prediction accuracy of the ensemble model. The efficiency of the proposed methodology could be compared with other advanced models such as the emotional neural network and linear-nonlinear hybrid models in future studies.