Introduction

Soft computing consists of different techniques, which are helpful to solve uncertain and complex problems (Corchado et al. 2011; Corchado and Herrero 2011; Vaidya et al. 2012, Kisi and Parmar 2016). It is used to investigate, simulate, and analyze complex issues and phenomenon in an attempt to solve real-world problems. Soft computing is useful where the precise scientific tools are incapable of giving analytic, low cost, and complete solution. The problem of air pollution is one of the most important problems among all, and it had come into play since the beginning. Air pollution affects both the developing and the developed countries alike. Air pollutants consist of gaseous pollutants (SO2, NO2, CO, etc.), odors, and suspended particulate matter (SPM) such as fumes, dust, smoke, and mist. The high concentration of air pollutants in and near the urban region causes severe pollution to the surroundings. Sulfur dioxide is a pungent, toxic gas that is in the atmosphere. Moreover, it harms the society, as it causes acid rain which affects the environment (Rizwan et al. 2013). Sulfur dioxide reacts in the atmosphere to form aerosol particles, which can create outbreaks of haze and other climate problems. The main sources of SO2 are volcanic and anthropogenic emissions from burning sulfur-contaminated fossil fuels and the refinement of sulfide ores (Seinfeld and Pandis 2006). According to the new analysis of data from NASA’s Aura satellite, the emissions of sulfur dioxide from power plants in India increased by more than 60% between 2005 and 2012 (Krotkov et al. 2016). In 2010, India surpassed the USA as the world’s second largest emitter of SO2 after China (EPA 2015a, b). The capital of India, Delhi, is considered among the most polluted megacities of the world (Gurjar et al. 2010). In the past, some studies were undertaken for air quality assessment of Delhi (Aneja et al. 2001; Goyal 2003; Gurjar et al. 2004; Mohan and Kandya 2007; Soni et al. 2014). Recently, Krotkov et al. (2016), studied ozone layer and major atmospheric pollutant gases (nitrogen dioxide (NO2) and sulfur dioxide (SO2)) by using the Ozone Monitoring Instrument (OMI) onboard NASA’s Aura satellite and examined changes in SO2 and NO2 over some of the world’s most polluted industrialized regions during the first decade of OMI observations and observed that in India, SO2 and NO2 levels from coal power plants and smelters are growing at a fast pace, increasing by more than 100 and 50%, respectively, from years 2005 to 2015.

The advanced soft computing techniques such as artificial neural networks (ANNs), adaptive-network-based fuzzy inference system (ANFIS), genetic algorithm (GL), fuzzy inference system (FIS), decision trees, and support vector machines have been successfully applied for modeling from the last decade (Kisi 2009a, b; Guven and Kisi 2011; Voukantsis et al. 2011; Kisi and Cengiz 2013; Antanasijević et al. 2013; Kisi and Tombul 2013; Gennaro et al. 2013; Goyal et al. 2014; Parmar and Bhardwaj 2014; Wanga et al. 2015; Kisi et al. 2016). Etemad-Shahidi and Mahjoobi (2009) used M5 algorithm for prediction of wave height, and the results of model trees were also compared with those of artificial neural networks. In some applications, generalized regression neural networks (GRNN), multilayer perceptron-neural networks (MLP), and support vector machine (SVM) are used to calculate the predicted value (Kim et al. (2012). In comparison to empirical and MLR models, ANN models performed better. In order to check the accuracy of the these models, Kisi (2015) applied multivariate adaptive regression splines (MARS), least square support vector regression (LSSVR), and M5 Model Tree (M5Tree) in Pan evaporation at Antalya and Mersin sample stations in Turkey. The LSSVR model performs more accurately in the case of using local input and output; in the second case, the MARS model is better. Kim et al. (2015) reported daily pan evaporation prediction by using soft computing models. Recently, Shafaei and Kisi (2016) employed three WANFIS (wavelet-ANFIS), WSVR (wavelet-SVR), and WARMA (wavelet-ARMA) hybrid methods for estimating monthly lake-level changes and found that three hybrid models forecasted more accurate than the single models.

However, not many scientific literature discuss a number of robust forecasting methods using soft computing techniques for air pollution modeling. The present paper includes MARS, the LSSVR, and M5 Model Tree (M5Tree) techniques. Each one of these algorithms is discussed separately and the results discussed. In addition, a comparison of all methods is made to emphasize their advantages as well as their disadvantages. To the best of the authors’ knowledge, it is the first time that such an analysis related to the LSSVR, MARS, and M5Tree is being performed for air pollutants of Delhi. This constitutes a real challenge as the urban pollution gets mixed with the desert dust aerosols during pre-monsoon and summer seasons over Delhi (Singh et al. 2005; Prasad et al. 2007; Soni et al. 2015; Parmar et al. 2016), whereas the winters are extremely polluted with high concentrations of black carbon aerosols from vehicular and other anthropogenic pollution sources leading to the formation of foggy and haze conditions over Delhi (Ganguly et al. 2006; Singh et al. 2010).

Delhi is the second most populous urban agglomeration in India and third largest urban area in the world. NASA’s report creates the importance to investigate the pollutant levels at different sites (residential and industrial) in Delhi. Air pollutants directly affect the health of residents. This research has an importance as 19 million people have to breathe in this air and quality of air is directly related to health.

Data and methodology

In the Northern region, Delhi is located in central India and 715 ft. above the sea level (Fig. 1). The region has a semi-arid or steppe climate, with extremely hot summers, heavy rain falls in the monsoon months, and cold winters. There are dust storms in summer and foggy mornings in winter. Temperatures gradually rise to 46 °C in the summer and falls to 4 °C in winter. In winter months, temperature inversion and low wind speed are the main cause of accumulation of airborne pollutants in Delhi. In Delhi, industries, vehicular activities, power plants, and frequent dust storms are majorly contributing in the high concentration of the pollutants. The Central Pollution Control Board (CPCB) SO2 data over four sites, in which two are residential, Janakpuri and Nizamuddin, and one industrial, Shahazada Bagh, are utilized for the present study. The ambient air quality and long-term data used in the present study covers the period 1993–2012 which is obtained from the CPCB. The monthly statistical parameters of the used data set for Janakpuri, Nizamuddin, and Shahzadabad stations are given in Table 1.

Fig. 1
figure 1

The sample site area map

Least square support vector regression

Vladimir Vapnik and his co-workers developed this least square support vector machine models at AT&T Bell Laboratories in 1995, which are applied to calculate the nonlinear relationship between input variables and output variables with least error (Cortes and Vapnik 1995; Suykens 2001; Smola 2004). LSSVR generated from SVR (support vector regression), which is a great technique to solve the real-life problems by a combination of regression, function estimation, and classification. This SVR is developed on the ground of structural risk minimization (SRM), which provides the least error in forecasting problems. It is mostly suitable for signal processing, pattern recognition, and nonlinear regression estimation.

Firstly, the LSSVR model was projected by Suykens and Vandewalle in 1999 (Suykens and Vandewalle 1999), which is applied on chaotic time series forecasting. The main difference between LSSVR and SVR is consideration of the equations; during the training phase, LSSVR uses linear equations while SVR uses quadratic optimization. The other conventional models like back propagation neural networks (BPNN), partial least square regression (PLS), and multivariate linear regression (MLR) are computationally more extensive than LSSVR. So it is easy to apply this model as compared to others.

Consider a given training set \( {\left\{{p}_k,{q}_k\right\}}_{k=1}^N \) with input data p k  ∈ R n and output data q k  ∈ R with class labels q k  ∈ {−1, +1} and linear classifier

$$ q(p)=\operatorname{sign}\left[{w}^T p+ b\right] $$
(1)

When the data of the two classes are separable, one can say

$$ \left\{\begin{array}{l}{w}^T{p}_k+ b\ge +1,\kern0.84em \mathrm{if}\kern0.36em {q}_k=+1\\ {}{w}^T{p}_k+ b\le -1,\kern0.84em \mathrm{if}\kern0.36em {q}_k=-1\;\end{array}\right\} $$
(2)

These two sets of inequalities can be combined into one single set as follows

$$ {q}_k\left[{w}^T{p}_k+ b\right]\ge 1,\kern0.48em k=1,2,3,...., N $$
(3)

The convex optimization theory is used to formulate SVR. In this methodology, firstly, it starts formulating the problem as a constrained optimization problem. In the second step, it formulates the Lagrangian and then takes the conditions for optimality and finally solves the problem in the dual space of Lagrange multipliers. With the resulting classifier

$$ q(p)=\operatorname{sign}\left[\sum_{k=1}^N{\alpha}_k{q}_k{p}_k^t p+ b\right] $$
(4)

Cortes and Vapnik (1995) extended this linear SVR classifier to a non-separable case by using an additional slack variable in the problem formulation. Now, after applying additional slack variable, the set of inequalities is as

$$ {q}_k\left[{w}^T{p}_k+ b\right]\ge 1-{\xi}_k,\kern0.48em k=1,2,3,...., N $$
(5)

In classic SVR, inequality type constraints are considered, but in LSSVR equality type of constraints are used. This equality type of constraints simplifies the problem as the solution of LSSVR, received directly after solving a set of linear equations instead of solving a convex quadratic program. In this LSSVR classifier, in the primal space is as follow,

$$ q(p)=\operatorname{sign}\left[{w}^T p+ b\right] $$
(6)

where b is a real constant. In the nonlinear classification, the LSSVR classifier in the dual space is like below

$$ q(p)=\operatorname{sign}\left[\sum_{k=1}^N{\alpha}_k{q}_k K\left( p,{p}_k\right)+ b\right] $$
(7)

In Eq. (7), α k is the +ve real constants and b a real constant, in general, K(p k , p) = 〈ϕ(p k ), ϕ(p)〉 , 〈•, •〉 is the inner product, and ϕ(p) the nonlinear map from the original space to a high dimensional space. In the function inference, the LSSVR model is in the below form

$$ q(p)=\sum_{k=1}^N{\alpha}_k K\left( p,{p}_k\right)+ b $$
(8)

In radial basis function (RBF), kernels are in use, two alteration parameters (γ, α) are inserting. Where γ is the regularization constant and α the width of the radial basis function kernel.

In this current research work, LSSVR is used for modeling of air pollutants level in Delhi. By using the LSSVR model, the output prediction error is least. As compared to conventional models, the LSSVR model is best to remove noises and reduces the computational labor. So, because of these benefits, the conventional models can be replaced with LSSVR. This will be more useful in application of forecast modeling in different areas of research.

Multivariate adaptive regression splines

Multivariate adaptive regression splines model is a non-parametric regression model, which is applied to predict continuous numeric outcomes. MARS was developed by Friedman (1991), which is a flexible procedure to organize relationships that are nearly additive or involve interactions with other variables. MARS makes no assumptions about the underlying functional relationship between dependent and independent variables in order to estimate the general functions of high dimensional arguments given sparse data, which is the main beauty of this model (Friedman 1991). The MARS model explains the complex nonlinear relation between predictor and response variables; this is the major beauty of this model. Apart from other conventional models, it can work by both backward and forward stepwise procedures. By using the backward stepwise procedure, it is helpful to remove preventable variables from the earlier selected set and this elimination improves prediction accuracy (Andres et al. 2010). The forward stepwise procedure helps to choose the appropriate input variables.

The value of other variables can be defined by using two basis functions or by inflection point along the range of inputs; then, we will have the new variable Y, which is mapped from variable X as below:

$$ Y= \max \left(0, X- c\right) $$
(9)
$$ Y= \max \left(0, c- X\right) $$
(10)

where c represents the threshold value. There are two adjacent splines, which are intersecting at knot to maintain the continuity of the basis functions (Sephton 2001; Bera et al. 2006). The MARS model has differently many applications in research, like prediction modeling, financial management, time series analysis, etc. Here, the MARS model is applied to calculate the level of air pollution at different sites in Delhi, India.

M5 model tree

Quinlan (1992) developed the model for continuous class learning, which is named as an M5 model tree. The main strength of this model is a binary decision tree. To find the relation between independent and dependent variables, linear regression function is applied to terminal leaf nodes (Mitchell 1997).

The M5 model tree is commonly used for categorical data; this is better than other conventional decision tree models. The main advantage of this model is to handle quantitative data, which makes it different from other tree models. It has two steps; in the first step, the data is divided into subsets to generate a decision tree (Solomatine and Xue 2004). In the second step, the standard deviation of the class value reached at a node is used for splitting criterion. Here, expected reduction is measured, in order to check the error of testing each attribute at node. Then compute the SDR (standard deviation reduction) (Pal and Deswal 2009) as below:

$$ \mathrm{SDR}=\mathrm{sd}(T)-\sum \frac{\left|{T}_{\mathrm{i}}\right|}{\left| T\right|}\;\mathrm{sd}\left({T}_{\mathrm{i}}\right) $$
(11)

where sd expressed as standard deviation, T represents a set of examples which reaches at the node, and T i is the i th outcome of the possible set. The data’s standard deviation (SD) are less than parent nodes. In this phase, large tree-like design have poor generalization and result in around appropriate. Quinlan (1992) suggests a solution for this circumstance, in the real dense sapling is actually clipped and then clipped subtrees are usually changed by using linear regression functions. By this process, accuracy and reliability of the design tree are very much improved. The M5 model tree is applied in this research work for decision making for the air quality level in Delhi, India.

Table 1 The monthly statistical parameters of data set for Janakpuri, Nizamuddin, and Shahzadabad stations

Application and results

Monthly SO2 of three different regions, Janakpuri, Nizamuddin, and Shahzadabad, located in India were modeled using three different heuristic methods, LSSVR, MARS, and M5Tree. Three previous lags were used as inputs to the models to forecast 1-month ahead SO2 parameter. The cross validation method was used for each model by dividing data into four subsets. Table 2 reports the training and test data sets of each model. In this, table M1 indicates model 1 and vice versa. Evaluation criteria used in the applications are root mean square errors (RMSE), mean absolute errors (MAE), and correlation coefficient (R). The RMSE and MAE statistics can be given as

$$ \mathrm{RMSE}=\sqrt{\frac{1}{N}\sum_{i=1}^N{\left(\mathrm{SO}{2}_{i, o}-\mathrm{SO}{2}_{i, e}\right)}^2} $$
(12)
$$ \mathrm{MAE}=\frac{1}{N}\sum_{i=1}^N\left|\mathrm{SO}{2}_{i, o}-\mathrm{SO}{2}_{i, e}\right| $$
(13)

where N is the number of data, SO2 i,o is the observed SO2 values, and SO2 i,e is the model’s estimate.

Table 2 The parameters of the optimal LSSVR models for each combination Janakpuri, Nizamuddin, and Shahzadabad stations

For each LSSVR model in each data set, various parameters were tried and the best ones that gave the minimum RMSE error in the test period were obtained. The parameters of the optimal LSSVR models for each combination of Janakpuri, Nizamuddin, and Shahzadabad stations are provided in Table 2. In this table, (100, 5) indicates the regularization constant and RBF kernel values of the LSSVR model, respectively. Test results of the optimal LSSVR models for each station and for each data set are given in Table 3. This table obviously shows that all the LSSVR models give different accuracies for different data sets. In Janakpuri, average accuracies reveal that the best results were generally obtained for the third input combination. It is clear from Table 3 that the LSSVR model provides the worst results in forecasting SO2 of three stations for the M4 data set (1987–1995). The basic reason for this might be the fact that the data range of this test data set is very different from those of the M1, M2, and M3 (see Table 1). The maximum values (x max = 24.6, 36.6, and 42.7) of the M4 test data set are higher than those of the other test data sets for the Janakpuri, Nizamuddin, and Shahzadabad stations, respectively. Training with M1, M2, and M3 data sets causes some extrapolation difficulties for the applied LSSVR models. Standard deviation of the M4 data set is also higher than those of the others. In Janakpuri, the LSSVR model provides the best accuracy for the M1 data set and second and third input combinations while the M3 and M2 data sets with third input combinations give the best results for the Nizamuddin and Shahzadabad stations, respectively. Figure 2 illustrates the observed and predicted SO2 values using the LSSVR model for each data set. The models’ accuracies seem to be better in forecasting the SO2 of Janakpuri station than those of the other stations. This is also confirmed by the comparison statistics reported in Table 3. This may be due to the difference of SO2 data range of each station. Janakpuri has a lower data range (x min = 3 and x max = 24.6) than those of the Nizamuddin (x min = 2.1 and x max = 36.6) and Shahzadabad (x min = 3.2 and x max = 42.7) stations, respectively.

Table 3 Comparison of LSSVR models
Fig. 2
figure 2

The observed and forecasted SO2 by the LSSVR model

Table 4 compares the accuracy of different MARS models for different test data sets. Unlike the previous application models generally yield better results in combination ii of Janakpuri. The best MARS model was obtained from M1 and second inputs in Janakpuri while the M1 with third inputs provided the best results in Nizamuddin and Shahzadabad stations, respectively. The observed and forecasted SO2 by the MARS models is demonstrated in Fig. 3 in the form of scatter plot. Similar to the LSSVR, here, also less scattered forecasts were obtained for the Janakpuri in relative to the other two stations. SO2 modeling accuracy of the optimal M5-Tree models is provided in Table 5. Different from the LSSVR and MARS models, the M5-Tree model gives the best accuracy for M2 and third inputs in Janakpuri while the M3 and M2 with first input provide the best results in Nizamuddin and Shahzadabad stations, respectively. The scatter plots given in Fig. 4 clearly show that the M5-Tree model forecasts SO2 of Janakpuri better than those of the other stations. Comparison with Figs. 2 and 3 obviously indicates that the M5-Tree model gives more scattered forecasts than the LSSVR and MARS models. The reason of this may be the fact that the linear structure of the M5-Tree model prevents it from accurately predicting highly nonlinear SO2. Comparison of average statistics provided in Tables 3, 4, 5 says that the LSSVR models are generally more successful than the MARS and M5-Tree models in forecasting SO2.

Table 4 Comparison of MARS models
Fig. 3
figure 3

The observed and forecasted SO2 by the MARS model

Table 5 Comparison of M5-Tree models
Fig. 4
figure 4

The observed and forecasted SO2 by the M5-Tree model

Sahin et al. (2005) modeled SO2 distribution in Istanbul using artificial neural networks (ANNs) and non-linear regression (NLR), and they found that the optimal ANNs and NLR provided RMSE = 23.13 μg/m3 and 22.35 μg/m3, MAE = 14.97 μg/m3 and 18.41 μg/m3, and R = 0.528 and 0.638, respectively. Akkoyunlu et al. (2010) used the ANN-based approach for the prediction of urban SO2 concentrations and found correlation coefficients of about 0.770, 0.744, and 0.751 for the winter, summer, and overall data, respectively. Sahin et al. (2011) used cellular neural network (CNN) and the statistical persistence method (PER) to model SO2 concentrations of Istanbul, and they found RMSE = 14.2 and 13.9, MAE = 9.9 and 7.8, and R = 0.85 and 0.83 for the CNN and PER models, respectively. It is clear from Tables 3, 4, and 5 that the applied LSSVR, MARS, and M5-Tree models in this study generally provide satisfactory results in modeling SO2 concentrations.

Conclusions

The ability of three different soft computing methods, LSSVR, MARS, and M5-Tree, in forecasting SO2 concentration is evaluated. Data from three stations, Janakpuri, Nizamuddin, and Shahzadabad, located in Delhi, India, were used in the applications. The cross validation method was employed for presenting generality of the applied models. LSSVR performed superior to the other models in forecasting monthly SO2 concentration. MARS was found to be the second best method. Because of its linear nature, M5-Tree provided worse results than the nonlinear LSSVR and MARS models. All the models provided better accuracy in forecasting SO2 of Janakpuri station than those of the other stations because of its lower data range. Comparison with previous studies showed that soft computing models applied in this study generally provided satisfactory results in modeling SO2 concentrations.