Introduction

Background

A remarkable number of cardiovascular and respiratory diseases can be caused by air pollution (Pope et al., 2004). As one of the major environmental problems these days, suspended particulate matter (SPM) in high concentrations can cause climate change (Haywood and Boucher 2000) and growth stunting or mortality of plant species (Bench 2004). SPM has negative effects on housing market (Kim and Yoon 2019), tropopause height (Wu et al. 2013), surface temperature and energy budget (De Menezes Neto et al. 2017; Tzanis and Varotsos 2008), and childcare facility (Oh et al. 2019). Due to its small size, SPM can penetrate the lower and upper parts of our respiratory system (Liu et al. 2019), and thus, it can harm human health (Sahu et al. 2019; Wang et al. 2016; Yadav et al. 2019). Exposure to high levels of PM2.5 causes 3.15 million premature deaths worldwide every year, and overall outdoor air pollution causes 3.3 million mortality annually (Lelieveld et al. 2015). Although in the developing countries, most cities have similar air pollution problems, each city has different sources of air pollution and its own particular geographical and climatic features.

Motivation

Tehran is located in a developing country where rapid urbanization and population growth have resulted in its continuously expanding residential area, considerable changes in its land cover, and land use (Alizadeh-Choobari et al. 2016). Reportedly in 2012, air pollution caused premature deaths of a considerable number of people (N = 4500) in Tehran (Ministry of Health and Medical Education, 2012). There is empirical evidence that indicates Tehran is one of the cities in the world in which high mortality is caused by long-term exposure to fine particular matter (Lelieveld et al. 2015).

Literature

PM2.5 concentrations can be predicted using various forecasting models. Artificial intelligence (Ventura et al. 2019), chemical transport (Sun et al. 2013), linear regression, (Vlachogianni et al. 2011), nonlinear regression (Baker and Foley 2011), and time series (Wang et al. 2012) are some of the commonly used types of forecasting models. Additionally, by combining some of these models, researchers have been able to provide more accurate prediction results (Ausati and Amanollahi 2016; Zhou et al. 2014). One such example is the Adaptive Neuro-Fuzzy Inference System (ANFIS), which has a hybrid algorithm and was proposed by Jang et al. (1997). Research evidence indicates that as a powerful method ANFIS can be used for modeling dust storm occurrences (Kaboodvandpour et al. 2015), air quality forecasting (Ghasemi and Amanollahi 2019), predicting ambient CO concentration (Jian et al. 2010), and predicting PM2.5 based on GTWR model and remotely sensed data (Mirzaei and Amanollahi 2019). Ghasemi and Amanollahi (2019) showed that integrated forward selected method and ANFIS model increased the accuracy of air quality forecasting. Mirzaei and Amanollahi (2019) compared the artificial neural network (ANN), linear regression, general regression neural network (GRNN), and ANFIS models to improve the correlation coefficient between output (PM2.5) of GTWR model and ground measurement PM2.5 concentration. They concluded that ANFIS model had a better performance than other models. In a study aimed at predicting PM2.5 concentration 1-day-ahead, a hybrid ensemble empirical mode decomposition (EEMD) and GRNN were utilized by Ausati and Amanollahi (2016). They compared the prediction accuracy of the results obtained by a principal component regression (PCR) model, an ANFIS, and a hybrid EEMD-GRNN model with a multiple linear regression (MLR) by using the values of mean absolute error (MAE) and root mean square error (RMSE) obtained from each model. Their results indicated that the hybrid EEMD-GRNN model exhibited the highest accuracy in predicting PM2.5 in Sanandaj, Iran. Using EEMD-GRNN model, Zhou et al. (2014) predicted the 1-day-ahead PM2.5 pollution in Xian, China. Zhu et al. (2018) proposed EEMD and endpoint condition mirror method to predict the time series of air quality index in Hefei, the hybrid forecasting model. ANN and hybrid models, such as EEMD-GRNN and ANFIS, appear to be capable of predicting PM2.5 more accurately. Therefore, the objective of the current study was the comparison of PM2.5 prediction accuracy of a linear model, such as MLR, and nonlinear models, such as EEMD-GRNN, ANFIS, and ANN in Tehran.

Material and methods

Study area

The location of this study is a metropolitan area, called Tehran. This city is surrounded by the high Alborz Mountain range to its north and east, and to the south, it meets the Kavir Desert (Fig. S-1). The wind directions in Tehran are greatly affected by these topographical features; during the day, prevailing southwesterly winds blow from the desert toward the mountains while during the night, heading from the mountains toward the plains are the prevailing northwesterly-westerly winds which dominate especially the western half of Tehran. The Department of Environment in Tehran provided the data including PM2.5, PM10, SO2, NO2, CO, and O3. The Bureau of Meteorology of Tehran provided the following meteorological data related to the year 2016: the average atmospheric pressure (AP), average maximum temperature (Max T), average minimum temperature (Min T), daily relative humidity level of the air (RH), daily total precipitation (TP), and daily wind speed (WS). These data were classified into two separate datasets which included 335 datasets for simulation purposes and 30 datasets for the purpose of testing the models.

Multiple linear regression model

Statistical Package for Social Sciences (SPSS, version 16) was used for analyzing the data. Multiple linear regression was used for determining the significance of correlations between independent variables and a dependent variable. The MLR model is presented as follows (Eq. 1):

$$ Y=B0=B1X1=B2X2+\dots + BkXk+\varepsilon $$
(1)

In this equation, the dependent variable is signified by Y; the independent variables are signified by X1, X2, …Xk and the error term by ε. An important assumption of multiple linear regression is that the independent variables should have linear relationships. In MLR, the method which is used for testing linearity is called the variance inflation factors (VIF) (Table 1). VIF values greater than 10 indicate that the assumption of linearity is met.

Table 1 VIF values for the independent variables

According to Table 1, some of the variables (AP, Min T, …) had VIF values more than 5, indicating the existence of multicollinearity among these variables. Stepwise regression was used in order to overcome this problem, by determining the most effective set of independent variables that would predict the dependent variable. Table 2 shows the results of stepwise regression analysis.

Table 2 VIF values after stepwise regression

Based on the results of Table 2, after stepwise method was run, none of the VIF values was larger than 10 for any of the independent variables.

Adaptive Neuro-Fuzzy Inference System model

ANFIS model has a hybrid algorithm. Its learning algorithm was initially created by Jang et al. (1997) who applied the least squares method and gradient descent. Based on a feed forward network, ANFIS is capable of optimizing parameters of a fuzzy system in order to achieve accurate results. ANFIS model consists of two components which are called primary and inference parts. These two are connected with fuzzy rules by a network. The fuzzy inference system (FIS) of this structure, which develops in an adaptable network, is composed of directly connected nodes (Matlab 2018). The output of ANFIS is dependent on its input parameters. The input data of ANFIS are normalized for minimizing the error rate by the learning algorithm. FIS framework, on the other hand, has three major parts, which include (i) a fuzzy rule (if-then), (ii) a database (its membership functions defined according to the fuzzy rule), and arguments mechanism that follows the IF and THEN theory (Matlab 2018). For example, if X and Y are the two inputs of an FIS framework and if Z is its output which follows a fuzzy if-then rule, then:

  1. Rule 1.

    if X is A1 and Y is B1 then f1 = p1x + q1y + r1.

  2. Rule 2.

    if X is A2 and Y is B2 then f1 = p2x + q2y + r2.

In these rules, f(x, y) is a polynomial, and the name of the created model is Sugeno Fuzzy (Guneri et al. 2011). ANFIS model has a five-layer network (Wei et al. 2007). Its first layer is connected to a fuzzy model (Fig. S-2) which follows Eq. 2:

$$ {O}_{\mathrm{i}}^1={\upmu}_{\mathrm{Ai}}\left(\mathrm{x}\right) $$
(2)

in which i and Ai constitute the linguistic variables, x indicates the input node, and \( {O}_{\mathrm{i}}^1 \)stands for the membership function of Ai. The function of the second layer of the model is the implementation of “AND” (Fig. S-2). The second layer consists of ring layers which are multiplied by the input layers while the output is obtained by Eq. 3:

$$ {w}_{\mathrm{i}}={\upmu}_{\mathrm{Ai}}\left(\mathrm{x}\right)\times {\upmu}_{\mathrm{Bi}}\left(\mathrm{y}\right),\mathrm{i}=1.2 $$
(3)

Normalization is the function of the third layer (Fig. S-2), in which the mean score of the ist created rule is calculated for each node using Eq. 4:

$$ {\overline{w}}_{\mathrm{i}}=\frac{w_i}{{\mathrm{w}}_1+{\mathrm{w}}_2},\kern0.5em \mathrm{i}=1.2 $$
(4)

In the fourth layer, the fuzzy rules are used (Fig. S-2) in which every node of i is a square node consisting of a membership function (Eq. 5):

$$ {O}_{\mathrm{i}}^4={\overline{w}}_{\mathrm{i}}{\mathrm{f}}_{\mathrm{i}}= $$
$$ {\overline{w}}_{\mathrm{i}}{\left({\mathrm{p}}_{\mathrm{i}}\mathrm{x}+{\mathrm{q}}_{\mathrm{i}}\mathrm{y}+\mathrm{r}\right)}_{\mathrm{i}} $$
(5)

where \( \overline{w}\mathrm{i} \) shows the third layer’s output, while pi, qi, and ri indicate the final parameters. The fifth layer involves the defuzzification process whereby all input signals are added to compute a single node of total output (Fig. S-2). Equation 6 is employed in this process for transforming the output of every fuzzy rule to the defuzzification output (Guneri et al. 2011):

$$ {O}_{\mathrm{i}}^5=\sum \limits_i{\overline{w}}_{\mathrm{i}}{\mathrm{f}}_{\mathrm{i}}= $$
$$ \frac{\sum_{\mathrm{i}}{\mathrm{w}}_{\mathrm{i}}{\mathrm{f}}_{\mathrm{i}}}{\sum_{\mathrm{i}}{\mathrm{w}}_{\mathrm{i}}} $$
(6)

Empirical mode decomposition and general regression neural network model

EEMD-GRNN is composed of two models, called EEMD and GRNN. First, EEMD is employed for decomposing the original time series into a certain set of IMFs. The residual rn is assumed to be IMF. The next step involves using the GRNN model to predict each decomposed set of the IMFs, which was defined in Step 1. The value of the corresponding IMF series is forecast for the next day by using the GRNN model. As a final step, in order to obtain the final forecast, the output of the previous step is aggregated.

Ensemble empirical mode decomposition

As an adaptive method used for analyzing non-stationary and nonlinear signals, empirical mode decomposition (EMD) was initially proposed by Huang et al. (1998). EMD can be applied for decomposing a signal into several IMFs. The signal must meet two conditions before it can turn into an IMF mode: (i) the mean scores of the lower and upper envelopes should be ubiquitously zero, and (ii) the number of zero crossings and the number of extreme cases should be equal or not greater than one. A major drawback of EMD is the presence of almost identical oscillations in diverse modes or presence of oscillations of very dissimilar amplitudes in a mode, also known as “mode mixing” (Huang et al. 1998). Ensemble empirical mode decomposition (EEMD) was a possible solution offered by Wu and Huang (2009). As an updated version of the EMD, EEMD has a noise-assisted system. The mode mixing problem can be solved with the support of this white noise (Wu and Huang 2009).

In order to determine the EEMD algorithm of a signal x(t), the following steps are taken. First, the amplitude of the added white noise and the ensemble number M are initialized. The mth trial is the second step that is conducted to produce the noise-added data xm(t) by adding random white noise wm(t) into x(t) (Eq. 7).

$$ {x}_m(t)=x(t)+{w}_m(t) $$
(7)

The objective of the third step is to identify all the local minima and maxima of xm(t) and using the cubic spline functions to obtain the lower and upper envelopes. The fourth step involves the computation of the mean m1(t) of the lower and upper envelopes and the calculation of the difference h1t between the mean and the signal (Eq. 8).

$$ {h}_1t={x}_m(t)-{m}_1(t) $$
(8)

In step five, Eq. 9 is used to define r1(t) providing that h1 meets the conditions of IMF, and that h1(t) constitutes the first IMF component from the signal (h1(t= c1(t)):

$$ {r}_1(t)={x}_m(t)-{c}_1(t) $$
(9)

Steps 3 to 5 should be repeated if these conditions are not met.

To identify the residue r1(t) as a new signal and to sift out other IMFs until the stopping criteria are satisfied, steps 3 to 5 are repeated n times. If the residue rn(t) or the IMF component (cn(t)) is smaller than the predetermined value, the stopping criterion has occurred. The original signal can be shown as the total of all IMFs plus the residue after sifting (Eq. 10):

$$ {x}_m(t)={\sum}_{i=1}^n{c}_i(t)+{r}_n(t) $$
(10)

in which n stands for the number of IMFs, ci(t)for the ith IMF and rn(t) for the final residue.

Next, m = m + 1 is set if m < M and steps 2 to 5 above are repeated until m = M, but each time indicating a different white noise. As a final point, for every IMF, we calculate the ensemble mean \( \overline{c_i} \)of the M trials (Lu and Shao 2012).

General regression neural network

By analyzing its past input and output data, GRNN can estimate any function. GRNN is the fastest in training and modeling nonlinear functions in comparison with all the other models. An additional distinctive feature of GRNN is its smoothing factor that enables this model to estimate the optimum value in the process of numerous performances in relation to the mean square error (Leung et al. 2000). GRNN consists of four layers. The first layer is the input layer, in which the data are keyed into the model. In a GRNN, the quantity of input neurons equals that of the variables in the input vector. The next layer is referred to as the pattern layer, the neurons of which can memorize the correlation between the proper response of the pattern layer and the input neurons. The quantity of the neurons in this layer equals that of the training cases. The following equation helps determine the Gaussian function of the pattern Pi:

$$ {\mathrm{P}}_{\mathrm{i}}=\exp \left(\frac{{\left(\mathrm{X}-\mathrm{Xi}\right)}^{\mathrm{T}}\left(\mathrm{X}-\mathrm{Xi}\right)}{2{\upsigma}^2}\right) $$
(11)

In the equation above i = 1, 2,⋯, n, σ signifies the smoothing parameter (also known as a spread parameter); X shows the independent variable; and Xi signifies a training sample for the ith neuron of the second layer.

The third layer is known as the summation layer to which the output of the second layer is imported. In this layer, two total values, referred to as Ss and Sw, are computed. The summation of the pattern outputs is calculated with the help of Eq. 12 below:

$$ {S}_S=\sum \limits_{i=1}^n{P}_i $$
(12)

Equation 13 is used for determining the weighted sum of the pattern outputs:

$$ {S}_W=\sum \limits_{i=1}^n{W}_I{P}_i $$
(13)

In this equation, WI stands for the weight of the ith neuron in the pattern layer which is linked to the third layer. The final layer is the output layer to which the results of the third layer are sent. Equation 14 is used for determining the output that is signified by y:

$$ y=\raisebox{1ex}{${S}_W$}\!\left/ \!\raisebox{-1ex}{${S}_S$}\right. $$
(14)

Artificial neural network model

Artificial neural network (ANN) was initially proposed by McCulloch and Pitts (1943), who were inspired by neural network systems and the brain of living organisms. ANN is known as a simulating method. It is commonly employed for predicting the various methods that could replace linear regression, multivariate regression, and trigonometric functions among other statistical methods (Guneri et al. 2011). Detailed descriptions of ANN are available in the literature (for example, Nørgaard et al. 2000). Among other types of ANN, the most commonly used type is the multi-layer perceptron (MLP). MLP is composed of three distinct layers: (i) its input layer in which the data are distributed over the network; (ii) its hidden layer in which the data are processed; and finally (iii) the output layer where the results for certain inputs are extracted (Fig. S-3). Sometimes, there can be more than one hidden layers and a main parameter of the network may be set by the number of its units. In this study, Matlab, 2018 was used for running ANFIS, EEMD-GRNN, and ANN models.

Evaluation of models

R2 is a statistical parameter commonly used to determine the validity of the model’s output. The value of R2 which ranges between 0 and 1 is used as an index for determining the precision of the regression line. The closer the value of R2 is to one, the better compliance is estimated for the predicted and observed data. Nevertheless, according to Legates and McCabe (1999), R2 should be used cautiously as its value can be affected by the Perth data. Therefore, it should be employed alongside other parameters such as mean absolute error and root mean square error (RMSE) in order to determine the validity of the output (Noori et al. 2010). In order to calculate the root mean square error and mean absolute error, Eqs. 15 and 16 are used (Alimissis et al. 2018; Ding et al. 2016; Tzanis et al. 2019; Willmott and Matsuura 2005), respectively.

$$ RMSE=\sqrt{\frac{1}{n}}{\sum}_{\mathrm{i}=1}^n{\left({Y}_{dp}-{Y}_{da}\right)}^2 $$
(15)
$$ MAE=\frac{1}{n}{\sum}_{\mathrm{i}=1}^n\left|{Y}_{dp}-{Y}_{da}\right| $$
(16)

where Yda denotes the real value, Ydp signifies the predicted value, and n shows the sample size.

Results and discussion

Prediction of PM2.5 concentrations in Tehran via multiple linear regression model training and testing

In the current study, MLR was the first model which was used for predicting the PM2.5 concentration. MLR is used for determining the variables which have statistically significant effects on the dependent variable. Many previous studies have frequently applied this method (Golchoubian et al. 2012; Pouretedal et al. 2018). Among commonly used methods in the MLR model is the stepwise method. The predicting variables that remained in the model in the current study were O3, PM2.5 on the previous day, PM10, RH, and WS. This means that the concentration of PM2.5 of Tehran is affected by the concentrations of these variables. According to MLR results (Eq. 17), the variables O3, PM2.5 on the previous day, PM10, RH, AP, and WS had positive associations with PM2.5 concentrations in Tehran. Figure 1 illustrates the training results of multiple linear regression model for prediction of PM2.5 suspended particle concentration in Tehran and what follows is the obtained equation (Eq. 17):

$$ {PM}_{2.5F}=8.263+0.203 RH+0.11{PM}_{10}+0.361{PM}_{2.5P}+0.127{O}_3+1.21 WS+0.61 AP\kern2em $$
(17)
Fig. 1
figure 1

MLR results for simulated and observed PM2.5 concentrations (μg/m3)

Fig. 2
figure 2

MLR results for predicted and observed PM2.5 concentrations (μg/m3)

In this equation, PM2.5F is the predicted concentration of PM2.5 while PM2.5P is the PM2.5 concentration on the previous day. The equation that follows is formulated according to the results of this section: R2 = 0.38, RMSE = 11.8095, and MAE = 8.8234 (Fig. 1).

The MLR testing phase results (R2 = 0.44 RMSE = 7.6402 and MAE = 5.9961) are presented in Fig. 2

Numerous assumptions have to be met before applying MLR. Considering all these assumptions makes running MLR a challenge. Therefore, this method seems to be less efficient and less practical than nonlinear models

Prediction of PM2.5 concentrations in Tehran via artificial neural network (multiple linear regression) model training and testing

Based on the results, as compared with the other methods, MLP model was ranked second for its accuracy for estimation of the PM2.5 concentrations in Tehran. As a nonlinear model, the advantage of MLP is its high tolerance for the small number of errors in the related data. Several studies (for example, Mashaly and Alazba 2016; Messikh et al. 2017; Thorkashvand et al. 2017) provide evidence for the superiority of MLP model over other models. The training results of the MLP model to forecast PM2.5 concentrations in Tehran are shown in Fig. 3. In comparison with MLR results, the simulated accuracy of MLP turned out to be higher in terms of R2, RMSE, and MAE between the simulated and observed data at 0.67,7.8849, and 6.4209, respectively.

Fig. 3
figure 3

MLP results for simulated and observed PM2.5 concentrations

The results of the MLP model, which was used to predict PM2.5 concentrations in Tehran, are illustrated in Fig. 4.

Fig. 4
figure 4

MLP results for predicted and observed PM2.5 concentrations

In comparison with the MLR results, the consistency was higher for the predicted and observed data of MLP model (Fig. 4). Moreover, as compared with the MLR data, RMSE and MAE in MLP model were lower (RMSE = 6.2522 and MAE = 4.4781) while R2 in MLP model (R2 = 0.51) was higher than those of MLR.

EEMD results—predicting PM2.5 suspended particle concentrations for Tehran using general regression neural network model training and testing

The first step in the EEMD-GRNN is breaking down the original signal (PM2.5), and then, using the GRNN model to predict each component. In EEMD model, ranges of white noise and the number of tests are respectively 0.2 and 100 for analyzing PM2.5 signal (Lu and Shao 2012). PM2.5 signal consists of seven intrinsic mode functions and a residual. The residual is regarded as an intrinsic mode function (Fig. 5).

Fig. 5
figure 5

EEMD-GRNN structure (Ausati and Amanollahi 2016)

Fig. 6
figure 6

Using EEMD-GRNN for predicting simulated and observed PM2.5 concentrations (μg/m3)

As ANFIS model inputs, GRNN model inputs are MLR model outputs. That is to say, for the purpose of training, the independent variables are used by the model. These variables influence the PM2.5 concentrations, as the dependent variable. For estimating the output with varying smoothing factors, network training was conducted. The factor value always ranges from 0 to 1 (Gheyas and Smith 2009). Gheyas and Smith (2009) showed that GRNN model has superiority over MLP in forecasting univariate time series and suggested that GRNN performance is not very sensitive to smoothing factor. As the smoothing factor increases, the correlation coefficient value gradually decreases in data training, and yet this value slowly rises for test data. Figure 6 indicates the training results of the EEMD-GRNN model that was used to predict concentrations of PM2.5 suspended particles in Tehran. Here is a summary of the results presented in this section: R2 = 0.98, RMSE = 1.8622, and MAE = 1.1639.

Figure 7 shows the results of suspended particle concentrations of PM2.5 tests in Tehran using the EEMD- GRNN model. A summary of the results reported in this section may be presented thus: R2 = 0.76 RMSE = 4.2655 and MAE = 3.7427.

Predicting PM2.5 suspended particle concentrations for Tehran using Adaptive Neuro-Fuzzy Inference System model training and testing

As compared with MLR and MLP, ANFIS model performed better. The model, which functions based on Takagy-Sugeno, consists of five input and one output parameters. ANFIS follows five phase rules. Each of these rules will be influenced by input parameters; that is, any changes in the input value will result in a change in the respective output value. One of the major limitations of the models which do not follow Fuzzy logic-based methods is that they are sensitive to errors in the data. MLR model output comprises the inputs of ANFIS model. That is to say, in order to train, ANFIS uses the predicting (independent) variables that affect the concentrations of predicted (dependent) variable (PM2.5). The testing phase results of fuzzy inference system according to neural network are illustrated in Fig. 8. The results reported in this section may be summarized thus: R2 = 0.99,RMSE = 0.4794, and MAE = 0.1305.

Fig. 7
figure 7

Using EEMD-GRNN to predicted and observed concentrations of PM2.5 (μg/m3)

The testing phase results of ANFIS model are presented in Fig. 9. These results could be summarized thus: R2 = 0.82,RMSE = 3.2979, and MAE = 2.1668.

Based on the results of ANFIS (Figs. 8 and 9) that was used to predict the suspended particle concentrations of PM2.5 in Tehran, it was found that the value was higher than the values of MLR and MLP models. These results are in agreement with the findings reported by Shahbazi et al. (2013), Amirkhani et al. (2015), Kaboodvandpour et al. (2015), and Zendehboudi et al. (2017). As the results indicated, however, the suspended particle concentrations of PM2.5 in Tehran as predicted by ANFIS were close to those of EEMD-GRNN model. As it appears and Table 3 shows, the linear model could not deliver a model to take the fluctuations of time series of the PM2.5 concentrations into account since these fluctuations were high. It seems that nonlinear models such as ANFIS and EEMD-GRNN can be considered very practical replacements for linear models since they are able to test the nonlinear associations between the inputs and outputs. As the results illustrated, among the models, in the training phases, the lowest R2 was reported for MLR at 0.38 while the highest R2 value was obtained by ANFIS at 0.99. Likewise, in the testing phases, MLR model acquired the lowest R2 value (0.44) while the highest R2 value (0.82) was obtained for ANFIS model. In terms of model accuracy, as compared with the other models, ANFIS and EEMD-GRNN models exhibited the best results in predicting PM2.5 in Tehran with the lowest RMSE and MAE values and the highest R2 values in training and testing phases

Fig. 8
figure 8

Using ANFIS to predict simulated and observed PM2.5 concentrations (μg/m3)

Fig. 9
figure 9

Using ANFIS to analyze predicted and observed PM2.5 concentrations (μg/m3)

Table 3 Models error

Conclusion

Public health can be affected by the prediction accuracy of PM2.5 concentrations. This study compared the accuracy of linear model (MLR), nonlinear models (MLP) and hybrid models (EEMD-GRNN and ANFIS) in predicting PM2.5 concentrations in Tehran. As it could be concluded based on the overall results, in comparison with the linear and nonlinear models, the hybrid of nonlinear models exhibited higher accuracy in prediction of PM2.5 concentrations. However, the comparison of the results emphasizes that the ANFIS model obtained the highest accuracy for training (R2 = 0.99, RMSE = 0.4794, and MAE = 0.1305) and testing phases (R2 = 0.82, RMSE = 3.2979, and MAE = 2.1668) to predict the PM2.5 concentrations but the results of hybrid models used in this study were close to each other. To generate the ANFIS model, the grid partition FIS (pimf) was applied. The best model generated by ANFIS consisted of three input MFs and nine fuzzy rules. To conclude, the best model, which was obtained for predicting PM2.5 concentrations in Tehran was created by ANFIS with pimf-type input and three input MFs.