Introduction

Regarding a worldwide study accompanied by World Health Organization, it has been revealed that annually, three million people lose their lives due to severe air pollution (WHO 2003). Health scientists around the world have scrutinized air pollutant impacts on humans (Liu and Peng 2018; Pope et al. 2018; Li et al. 2018) and other living organisms and found that particulate matter with a diameter less than 2.5 μm (PM2.5) is one the most detrimental pollutants (Davidson et al. 2005). PM2.5 has been found to be one the most hazardous pollutants for human health in several studies (Sfetsos and Vlachogiannis 2010; Xing et al. 2016; Schweitzer and Zhou 2010; Cao et al. 2013; Borja-Aburto et al. 1998); hence, more attention and specific researches about PM2.5 are required. Atkinson et al. (2014) studied a comprehensive, systematic review and meta-analysis of 110 published papers in health databases about PM2.5, resulted that a 10 μg/m3 increase of particulate matter concentration in an industrial city can cause and increase up to 2% in mortality due to cardiovascular and respiratory diseases. In a similar study on the nine Californian counties, the particulate matters’ (PM10 and PM2.5) impacts on the different parts of the society, with respect to sex, age, ethnicity, and so on of the members, have been analyzed. Results showed that 10 μg/m3 increment in PM2.5 concentration, only in 2 days, is the main factor of the 0.6% of the more mortality. These and other peer-reviewed studies about the particulate matter and human health (Pascal et al. 2013; Marzouni et al. 2016; Fattore et al. 2011; Fann et al. 2012; Dunea et al. 2016; Leili et al. 2008) illustrate the importance of more and accurate studies about PM2.5. Thus, in this paper, we aimed to prognosticate the PM2.5 concentration in Tehran, Iran, exploiting statistical modeling techniques. We used Bayesian network (BN) and decision tree (DT) as two of the reliable methods along with support vector machine (SVM) as a machine learning approach to predict the PM2.5 concentration and compared these three methods’ capabilities to each other with respect to the statistical results. Exploiting “intelligent machines” for data mining and variable prediction is prevalent in all scientific topics, and for environmental parameters, these methods have given promising results (Martí et al. 2013; Sharifi et al. 2016; Mehdipour et al. 2017; Kim et al. 2015). Mehdipour (2017) compared four prominent methods: gene expression programming, support vector machine, artificial neural network, and wavelet to forecast ground level ozone (O3) in Tehran. The results indicated that SVM has the best accuracy. Feng et al. (2015) studied the PM2.5 prediction in the Beijing, Tianjin, and Hebei provinces in China during a year and have used the wavelet transformation and geographic model to improve the artificial neural network (ANN) accuracy, and they recommended their method to be implemented on other countries’ air pollution centers. Wang et al. (2015) evolved a novel model for prediction of PM10 and SO2 daily concentration. They used a Taylor expansion forecasting model to ameliorate the support vector machine and artificial neural network, and finally assessed their own new model as a very promising one. Kisi et al. (2017) applied least square support vector regression (LSSVR) and multivariate adaptive regression splines (MARS) and M5 Model Tree (M5-Tree) to forecast sulfur dioxide (SO2) in three regions in India and the LSSVR had the best results. Decision tree and Bayesian belief networks have been applied to several environmental topics (McCann et al. 2006; Marchant and Ramos 2012; Liu et al. 2012; Aguilera et al. 2011) and their abilities for PM2.5 prediction have been analyzed and compared with others. Kujaroentavon et al. (2015) introduced the decision tree to classify the air pollution in Thailand. They used the air quality index (AQI) and decision tree to classify the air pollution levels for human health and the results were satisfactory. McMillan et al. (2007) aimed to find a way to validate the air pollution data monitoring. The model specified in a Bayesian framework and fitted by Markov Chain Monto Carlo techniques. Vafa-arani et al. (2014) conducted a research by dynamic modeling for analyzing the most important factors on the Tehran air pollution. The technology improvement in fuel and automotive industry and public transportation are the most affective factors among other manifold choices such as industry-related parameters, road construction, traffic control plan, and urban transportation. In another study in Tehran, the ground level ozone (O3) concentration has been scrutinized by Mehdipour and Memarianfard (2017) exploiting support vector machine and gene expression programming which are two most potent machine learning methods and the results comparing the predicted dataset with the testing ones, depicted acceptable upshots. All above cited papers show a deep indispensability for an accurate study about the fine particulate matter, and also support vector machine, decision tree, and Bayesian network have been introduced as potent methods for environmental problems.

Methodology

Decision tree

Decision tree (DT) is an expedient way to illustrate a concept which also is a tool for decision-making and we can evolve models to prognosticate a target value with respect to the input parameters and datasets (Rivest 1987). DT is a proper and prevalent method for data mining. We exploited a model or graph liken to tree showing an algorithm to find the best strategy with the most possibility to reach the target (Utgoff 1989). In decision analysis, a decision tree, specifically the diagram of the decision, represents a visible tool for more understandable and analytical decision-making (Kamiński et al. 2017). This tool classifies the “test” datasets from root up to branches and leaves. Every leaf of the tree represents a particular class. A well-developed tree is capable of handling of manifold parameters with numerous data for each parameter (Quinlan 2006). Three kinds of nodes are available in a graph of DT (Moret 1982): (a) decision node: square, (b) chance node: circle, and (c) end note: triangle. Every inner node corresponds an input data and the edges to children for each of the probable values of that input variable. A leaf depicts a value of the target variable given the values of the input variables represented by the path from the root to the leaf (James et al. 2000).

In this study, we developed a tree in which 12 predictors are assumed as the input values and PM2.5 plays the target parameter’s role. Wind speed, maximum ambient temperature, minimum ambient air temperature, average nebulosity, sunshine, humidity, participation, carbone monoxide, ground level ozone, nitrogen dioxide, sulfur dioxide, and particulate matter with 10-μm diameter used as predictors and the particulate matter with 2.5-μm size is the target variable. An expanded and wide tree may encounter with deep overfitting problem and a limited one probably cannot consider the all variables, where pruning the tree is a tool to keep the tree size in acceptable and optimum range. Overfitting occurs when the machine instead of learning memorizes the data sets and produces very similar outcomes to inputs.

Support vector machine

For the first time, Cortes and Vapnik (1995) invented a machine using vectors to classify the datasets into a two-dimensional space. Machines which use a part of datasets for training and another part for testing commonly categorize the datasets. According to Fig. 1, a vector machine can easily classify the datasets into groups in two-dimensional ambient by myriad cross lines and a particular super line. The best separating line or super line has the maximum distance from the border lines. Equations 1 and 2 represent the borderlines, while the super line’s equation is \( \frac{\mathbf{2}}{\left\Vert \mathbf{w}\right\Vert } \) (Ivanciuc 2007). In this study, 12 predictors and one predictable were available which added a deep complexity to the problem.

$$ \overrightarrow{\mathbf{w}}\cdotp \overrightarrow{\mathbf{x}}-\boldsymbol{b}=\mathbf{1} $$
(1)
$$ \overrightarrow{\mathbf{w}}\cdotp \overrightarrow{\mathbf{x}}-\boldsymbol{b}=-\mathbf{1} $$
(2)
Fig. 1
figure 1

Boarder lines (support vectors) and the super line for data classification

In pragmatic uses of SVM, the datasets commonly are in an N-dimensional space. Support vector machine linear machine of one output y (x), working in the high-dimensional feature space formed by the nonlinear mapping of N-dimensional input vector x into a K-dimensional feature space (k > N) with the nonlinear function (x). The number of hidden units or K is equal to the number of so-called support vector, that are learning data points, closest to the separating super line. The learning task transformed to the minimizing of the error function and simultaneously keeping the weights of the network at the possible minimum. The error function is defined through the so-called ε-insensitive loss function (d.y(x)) (Cortes and Vapnik 1995).

$$ L\varepsilon \left(d.y(x)\right)=\left\{\begin{array}{c}d-y(x)-\varepsilon \kern3em For\ \left(d-y(x)\right)\ge \varepsilon \\ {}0\kern8.5em For\ \left(d-y(x)\right)<\varepsilon \end{array}\right. $$
(3)

where ε supposed accuracy, d as destination, x as the input vector, and y(x) as the actual output signal of the SVM defined by:

$$ y(x)={\sum}_{j=1}^K{W}_j{Q}_j(x)+b={W}^T\varnothing (x)+b $$
(4)

w = [w1. …. wK]T is the weight vector, b represents bias, and ∅(x) = [∅1. …. ∅K]T the bias vector (Osowski and Garanty 2007). The solution of the so defined optimization problem solved by the introduction of the Lagrange multipliers \( {\alpha}_i{\alpha}_i^{\ast } \) (where i = 1.2. …. K) responsible for the functional constraints defined in Eq. (3). The minimization of the Lagrange function has been changed to the dual problem (Sapankevych and Sankar 2009):

$$ \varnothing \left(\alpha .{\alpha}^{\ast}\right)=\left[{\sum}_{i=1}^k{d}_i\left({\alpha}_i-{\alpha}_i^{\ast}\right)-\varepsilon \Big({\sum}_{i=1}^k\left({\alpha}_i-{\alpha}_i^{\ast}\right)-\frac{1}{2}{\sum}_{i=1}^k{\sum}_{j=1}^k\left({\alpha}_i.{\alpha}_i^{\ast}\right)\left({\alpha}_j.{\alpha}_j^{\ast}\right)K\left({x}_i.{x}_j\right)\right] $$
(5)

With constraints’

$$ {\sum}_{\mathrm{i}=1}^{\mathrm{k}}\left({\upalpha}_{\mathrm{i}}.{\upalpha}_{\mathrm{i}}^{\ast}\right)=0 $$
$$ 0\le {\upalpha}_{\mathrm{i}}\le C\ \mathrm{and}\ 0\le {\upalpha}_{\mathrm{i}}^{\ast}\le C $$

where C is a regularized constant that determines the tradeoff between the training risk and the model uniformity. According to the nature of quadratic programming, only those data corresponding to nonzero \( \left({\alpha}_i-{\alpha}_i^{\ast}\right) \) pairs can refer to support vectors (Nsv). In Eq. 5, K(xi. xj) =  ∅ (xi) ×  ∅ (xj) is the inner product kernel which satisfies Mercer’s condition (Schölkfopf et al. 1999) that is required for the generation of kernel functions given by:

K(xi. xj)=〈∅(xi ). ∅ (XJ).〉

Hence, the support vectors associates with the desired outputs y (x) and with the input training data x can define by

\( y(x)={\sum}_{i=1}^{Nsv}\left({\alpha}_i.{\alpha}_i^{\ast}\right)\cdot K\left(x.{x}_i\right)+b \)

Meteorological parameters such as average nebulosity, wind speed, sunshine, maximum, and minimum air temperature, relative humidity, and precipitation in addition to the chemical precursors like CO, SO2, O3, NO2, and PM10 are building the variables and a simple linear classification is not able to categorize the datasets. In much complex problems, nonlinear vectors are required to classify (James et al. 2000). Kernel tricks transform the datasets into a N-dimensional space and then classify (Aronszajn 2009). With respect to the prior research about the kernel functions compatibilities’ on a similar study (Mehdipour and Memarianfard 2017), the radial basis function (RBF) harnessed in the present paper. Meanwhile, the optimum amounts of sin2 and gamma have been revealed in the latest cited article; sin2 = 0.2 and gamma = 1. However, other kernel tricks such as linear kernel, polynomial (homogeneous and inhomogeneous), and hyperbolic tangent kernels have considerable potentials (Genton 2001; Theodoridis 2008). For prognosticating the PM2.5 concentrations by the above-mentioned predictors, the 66% percent of the collected datasets used for training and the 15% allocated for the validation and the residual amount used for testing. In other words, from three consecutive years’ data collection, two initial years’ data allocated for machine training.

Bayesian network

Bayesian network was introduced by Bayes and Price (1763), a method belonging to a group of graphical probability modeling. Graphical structures employed to represent the information of a topic with uncertainty. Each node in a Bayesian graph shows a random variable and arcs or branches are depicting the probable relations between the variables where these conditional relations commonly are assessing by statistical tools (Varis and Kuikka 1999). Bayesian networks consist a combination of graph theory, probability theory, computer sciences, and statistics and have a wide utility in machine learning, data mining, sound identification, signal analyzing, bioinformatics, medical prognoses, and weather forecasting and specifically, there are numerous successful instances of Bayesian network application on environmental engineering (Vicedo-Cabrera et al. 2013; Uusitalo 2007; Wade 2000; Elizondo and Orun 2017; Nickless et al. 2017). The GeNIe 2.0 software has been employed in this study. Regarding collected datasets and their chemical and meteorological relations, as shown in Fig. 2, the arcs and their arrows have been set. Effects of all predictors on the PM2.5 concentration as obligatory arcs, and relations between the predictors as random arcs opted for this study. Nonetheless, copious graphs and their compatibility have been analyzed and the best possible graph introduced. It is notable that, some arcs and arrows are merely statistically founded. As a tangible instance, wind speed (WS) has undeniable impacts on the humidity (H), particulate matter, nebulosity, etc.

Fig. 2
figure 2

Bayesian network of the predictors and predictable

Evaluation and comparison criteria

Root mean square error (RMSE) and correlation coefficient (CC) have been exploited in this research to assess the methods capabilities in producing and simulating the data which is akin to the test datasets. Equations 6 and 7 respectively representing the correlation coefficient and root mean square error. With respect to the recent equations, it can be achieved that lower RMSE (> 0) and higher CC (< 1) relates accuracy for the evolved models. Ym and Yp are the observed and predicted PM2.5 and \( \overline{\mathrm{y}}m \) and \( \overline{\mathrm{y}}\mathrm{p} \) are the average values for observed and simulated target variable. N shows the number of data for each parameter which is equal to the three consecutive years or 1096 days. CC and RMSE are the most reliable evaluation criteria (Chai and Draxler 2014; Roushangar and Homayounfar 2015) where have been used to compare the three above-mentioned methods. Also, Eq. 8 represents the normalized root mean square error (NRMSE) and Eq. 9 represents the Nash-Sutcliff coefficient (E). NRMSE is the non-dimensional form of RMSE and also the E coefficient can range from -∞ to 1 and E = 1 corresponds to a perfect match between the model and observations (Ömer Faruk 2010; Kuo et al. 2015; Lelieveld et al. 2015). Xobs and Xmodel are the observed and modeled values, respectively.

$$ \mathrm{CC}=\frac{\sum_{\mathrm{i}=1}^{\mathrm{N}}\left(\mathrm{Ym}-\overline{\mathrm{y}}\mathrm{m}\right)\times \left(\mathrm{Yp}-\overline{\mathrm{y}}\mathrm{p}\right)}{\sqrt{\sum_{\mathrm{i}=1}^{\mathrm{N}}{\left(\mathrm{Ym}-\overline{\mathrm{y}}\mathrm{m}\right)}^2}\times \sqrt{\sum_{\mathrm{i}=1}^{\mathrm{N}}{\left(\mathrm{Yp}-\overline{\mathrm{y}}\mathrm{p}\right)}^2}} $$
(6)
$$ \mathrm{RMSE}=\sqrt{\sum_{\mathrm{i}=1}^{\mathrm{N}}\frac{{\left(\mathrm{Ym}-\mathrm{Yp}\right)}^2}{\mathrm{N}}} $$
(7)
$$ NRMSE=\frac{\mathrm{RMSE}}{X_{\mathrm{obs},\max }-{X}_{\mathrm{obs},\min }} $$
(8)
$$ E=1-\frac{\sum_{i=1}^n{\left({X}_{\mathrm{obs},i}-{X}_{\mathrm{mo}\;\mathrm{del}}\right)}^2}{\sum_{i=1}^n{\left({X}_{\mathrm{obs},i}-\overline{X_{\mathrm{obs}}}\right)}^2} $$
(9)

Study area and datasets

Twenty ninth biggest metropolitan in the world is an unsecure nest for roughly 14 million residents during nights and 20 million commuter and resident on the daylight. An important industrial center in the heart of middle east plays the biggest role in Iran economy by possessing manifold factories. Factories placement near to residential area and their lack of facilities to reduce the air pollution is a detrimental factor for the people. Highly condensed traffic in Tehran’s streets owing to the weak public transportation, crowded metros, expensive taxis, and other results grounds one the most dangerous air contamination for the Tehran people (Seyedabrishami and Mamdoohi 2012). The target study area has 1274 km2 area and 22 municipal regions where located in 51° E longitude and 35° N latitude and 900 m up to 1830 m above the free seas altitude (Bagha et al. 2014). Each district has the air pollution measurement center; hence, 22 measuring centers hourly are gauging the contaminant concentration. PM2.5, PM10, CO, NO2, SO2, and O3 are the measurable parameters. The meteorological parameters of Tehran are determined in district 9 where the Mehrabad airport is located. In this research, the parameters of the latest district have been employed. Figure 3 illustrates district 9 location.

Fig. 3
figure 3

The district 9 of Tehran county, Iran

Data collection and preparation

The datasets were collected from January 2013 to January 2016 for three consecutive years, 1096 days. The air pollution measuring station at district 9 gauges the air pollutants concentration every 3 h and in this paper, the maximum amount of every parameter was collected for each day. Furthermore, the meteorological variables were measured daily in Mehrabad airport. During the 3 years, Tehran experienced 35 clear and healthy air, 660 moderate air quality, 376 unhealthy for sensitive individuals, 24 unhealthy air pollution, and a day with very unhealthy air quality. Meanwhile, in 401 days, the PM2.5 had the worst condition compared to the other pollutants. The parameters and values were gathered together from archives of Tehran air quality control company (http://airnow.tehran.ir) and meteorological organization of Iran (http://www.irimo.ir). Each of which is a reliable organization and equipped by updated apparatuses. Table 1 represents the statistical description of the collected datasets. WS, RH, Prec, Tmax, Tmin, Sunsh, and Neb are respectively the abbreviations of the wind velocity, relative humidity, precipitation, maximum temperature, minimum temperature, sunshine, and average nebulosity. Also, Table 2 represents the correlation matrix between the all collected parameters that illustrates which data has a positive or negative correlation with another. During data collection for this modeling study, there were undeniable obstacles and deficiencies. It is recommended to input other factors which may play a role in urban pollution in the future studies: daily fuel consumption, average number of commuters in the study area, traffic-related datasets, etc.

Table 1 Statistical descriptions of the input variables
Table 2 The correlation matrix of input datasets

Equation 10 transfers the datasets in to a [0–1] limit to make the datasets comparable with each other. The monitored datasets have different units, e.g., the wind speed is measureable by kilometers per hour and the relative humidity is measuring by percentage; thus, data preparation is an indispensable step in this research. Data normalization makes it possible to have all parameters in a similar scale and more importantly to find a rational mathematical equation between the predictors and predictable. Xmin and Xmax are the minimum and maximum of each variable and Xi represents the daily value of the parameters.

$$ X=\frac{\left(X\mathrm{i}-X\min \right)}{\left(\mathrm{Xmax}-\mathrm{Xmin}\right)} $$
(10)

Results and discussions

In this paper, three modeling methods have been exploited to predict the PM2.5 and each of which approaches abilities in simulating, showcased in this section to finally introduce the ablest tool. The most powerful method will be harnessed to exert the sensitivity analysis to measure the predictors’ impacts on the variation of PM2.5 concentration exerted.

Results of the decision tree

Designed tree could provide acceptable results by generating a set of simulated data which compared to the observed PM2.5 have RMSE equal to 0.0591. Furthermore, Figs. 4 and 5 respectively represent the linear regression for the evolved model and how the simulated datasets can follow the observed PM2.5 in 2015. The correlation coefficient for the modeled data and observed data is 0.9204 which is in a quite acceptable range. The derived explicit equation from DT is provided in the Eqs. 11 and 12:

Fig. 4
figure 4

Observed and predicted data by decision tree

Fig. 5
figure 5

The linear regression between the observed and predicted data by the decision tree

If the PM10 < = 0.291, so

$$ {\displaystyle \begin{array}{l}{\mathrm{PM}}_{2.5}=-{0.0294}^{\ast }\ \mathrm{WS}+{0.0359}^{\ast }\ {\mathrm{T}}_{\mathrm{min}}-{0.0012}^{\ast }\ {\mathrm{T}}_{\mathrm{max}}-{0.0218}^{\ast }\ \mathrm{nebulosity}+0.0909\ \\ {}{}^{\ast }\ \mathrm{RH}+{0.0336}^{\ast }\ \mathrm{CO}-{0.0986}^{\ast }\ {\mathrm{O}}_3\kern1em +{0.1798}^{\ast }\ {\mathrm{NO}}_2+{0.0613}^{\ast }\ {\mathrm{SO}}_2+{1.7382}^{\ast }\ {\mathrm{PM}}_{10}\\ {}-0.1366\end{array}} $$
(11)

But, if the PM10 > 0.291, so

$$ {\displaystyle \begin{array}{l}{\mathrm{PM}}_{2.5}=-{0.0021}^{\ast }\ \mathrm{WS}+{0.0569}^{\ast }\ {\mathrm{T}}_{\mathrm{min}}-{0.0785}^{\ast }\ {\mathrm{T}}_{\mathrm{max}}-{0.0648}^{\ast }\ \mathrm{nebulosity}-{0.0474}^{\ast }\ \\ {}\mathrm{sunshine}+{0.1601}^{\ast}\mathrm{RH}-{0.2033}^{\ast }\ \mathrm{precipitation}-{0.1118}^{\ast }\ {\mathrm{O}}_3+{0.17}^{\ast }\ {\mathrm{NO}}_2+0.0601\ \\ {}{}^{\ast }\ {\mathrm{SO}}_2+{1.2992}^{\ast }\ {\mathrm{PM}}_{10}+0.1146\end{array}} $$
(12)

Results of support vector machine

Figure 6 illustrates the predicted and simulated values of PM2.5 in one graph to show that outputs of the built model roughly follow the observed datasets. Also, Fig. 7 represents the linear regression between the observed and predicted PM2.5 simulated by the support vector machine which shows a quite acceptable result as the correlation coefficient is equal to 0.9414. Overfitting is a menace for soft computing methods which harm the models’ accuracy and this happens when the results for test data are better than the result for the train datasets. However, in this study, over-training or overfitting is controlled, as the CC and RMSE for the training datasets respectively are 0.9426 and 0.0501. Root mean square error for the testing datasets and produced datasets is 0.0519. Thus, the support vector machine is not over-trained.

Fig. 6
figure 6

Observed and predicted data by SVM

Fig. 7
figure 7

Linear regression between the observed and predicted by SVM

Results of Bayesian network

In this study, all predictors’ effects on PM2.5 concentration have been considered. Simultaneously, the predictors have relations with each other and in the present Bayesian Network structure, their relations went under study to have a more accurate structure (see Fig. 2). For estimation of PM2.5, the Bayesian network gave a function considering all parameters which is shown in the Eq. 13, where the WS, Tmin, Tmax, N, S, H, P, CO, O3, NO2, SO2, and PM10 represent the wind speed, daily minimum temperature, maximum temperature of the day, nebulosity, sunshine, relative humidity, participation, carbon dioxide, ground level ozone, nitrogen dioxide, sulfur dioxide, and particulate matters with 10-μm diameters, respectively.

$$ {\displaystyle \begin{array}{l}{\mathrm{PM}}_{2.5}=-0.041\times \mathrm{WS}+0.055\times {\mathrm{T}}_{\mathrm{min}}-0.027\times {\mathrm{T}}_{\mathrm{max}}-0.032\times \mathrm{N}-0.011\times \mathrm{S}+0.093\times \mathrm{RH}-\\ {}0.028\times \mathrm{P}+0.021\times \mathrm{CO}-0.133\ast {\mathrm{O}}_3+0.197\times {\mathrm{NO}}_2+0.101\times {\mathrm{SO}}_2+1.616\times {\mathrm{PM}}_{10}\end{array}} $$
(13)

By exploiting MS Excel software and Eq. 13, the modeled PM2.5 values produced. Figure 8 shows how simulated data follow the test data. The RMSE value between the modeled PM2.5 by the BN and observed is equal to 0.1077, and as shown in Fig. 9, the correlation coefficient is 0.8927.

Fig. 8
figure 8

Observed and predicted data by the Bayesian network

Fig. 9
figure 9

Linear regression between the observed and predicted data by the Bayesian Network

Comparing the methods

The RMSE, NRMSE, CC, and E of each modeled data for the testing datasets have been assessed by four most prominent evaluation criteria. All three methods gave acceptable results in various fields of study. Evolved models are easily comparable regarding Table 3. Further, single factor analysis of variance (ANOVA) tested to compare their robustness of methods with each other (Sihag et al. 2018a, b). Table 4 shows that DT and SVM have an F value less than F critical and the P values for these two methods are greater than 0.05, while the F value for BN is more than critical amount and also the P value of BN is less than 0.05; therefore, the DT and SVM are unbiased methods and their predicted values are insignificantly different from observed data. On the other hand, the BN is biased and results of estimated and actual amount are significantly different.

Table 3 Evaluation criteria values of the developed models
Table 4 The single factor ANOVA for methods

According to Tables 3 and 4, SVM yielded a meaningful power in comparison to the other methods in this study and other studies of the writers; hence, application of this modeling system and combining it with other possible methods is strongly suggested. Specifically, a hybrid of least square and support vector machine or LSSVM anticipated to produce potent models. Respectively DT and BN are the in next places.

Sensitivity analysis of PM2.5 via SVM

PM2.5 sensitivity analysis against all of predictors is depicted in Table 5. SVM as the ablest method of this research is selected to run sensitivity analysis. According to the latest studies about the capability of different Kernel functions, the radial basis function or RBF has been chosen as the Kernel trick of the SVM (Mehdipour and Memarianfard 2018; Sihag et al. 2018a, b). In this analysis, predictor parameters added one by one and the model ran for each input variable. Finally, effects of each parameter on PM2.5 tolerances can be detected by comparison the RMSE, NRMSE, CC, and E values. Model SVM12 has the optimum results.

Table 5 PM2.5 sensitivity analyses’ results for different input combinations by SVM

Conclusion

Air pollution measuring instruments are expensive, massive, and hardly maintainable. Thus, a reliable soft method can be a proper substitute. For this aim, Bayesian network (BN), decision tree (DT), and support vector machine (SVM) applied to model PM2.5 concentration. Regarding the evaluation criteria, SVM introduced as the ablest method and DT and BN are in the next places.

With respect to the provided mathematical equations by BN and DT, and sensitivity analysis of PM2.5 via SVM, the predictors effects are comprehensible; highly effective parameters have a higher coefficient in the suggested equations by BN or DT and vice versa. Also, adding parameters with a higher influence can reduce the RMSE or NRMSE and escalate the CC or E values more than others, in the sensitivity analysis table. PM10 has the greatest impact on the prediction of the PM2.5 and chemical precursors have more influences on the PM2.5 variances in comparison to meteorological parameters. However, as the particulate matters are prone to adhesion and subsiding along with the humidity, it influences the PM2.5 significantly. Also, wind speed was anticipated to have a higher impact, as wind can carry the particulate matter, but in this study, variances of the wind velocity does not undeniably effect the PM2.5 value. Authors suggest to study on the wind speed and possible reasons of its low effects on the particulate matters; however, it is postulated that besieging the city by skyscrapers and low values of wind speed are the main reasons.