Introduction

Environmental degradation is one of the main challenges faced by policymakers and decision-makers. The increasing emission rate of CO2 into the atmosphere is one of the important causes of environmental degradation. Increasing industrialization and urbanization are the other causes of CO2 emissions in developing countries (Zhao et al. 2018). The increasing levels of CO2 emissions affect global warming and climate change. Additionally, ocean acidification and desertification are the other consequences of the emission of CO2 into the atmosphere. Pollution levels and related diseases are increased by the emission of CO2. Hence, CO2 emissions affect human health. Thus, predicting CO2 emissions is one of the most important issues for researchers. The agricultural sector is one of the most important sources of CO2 emissions. In 2018, 9.9% of greenhouse gas emissions were related to the agricultural sector. Cows, agricultural soils, and rice production can increase CO2 emissions. The agricultural sector causes 10–14% of global anthropogenic greenhouse gas emissions (Shabani et al. 2021). If decision-makers want to manage CO2 emissions, it is necessary to accurately estimate the CO2 emissions caused by different sectors, such as the agricultural sector. Predictive models utilize parameters that are relevant to CO2 emissions to estimate the amounts of emitted CO2 in different years. Shi et al. (2019a) stated that the CO2 emission depends on different parameters such as industrial activity and energy intensity. Shi et al. (2019b) stated that the household sector of china is responsible for 12.6% of CO2 emissions.

In recent years, machine learning algorithms (MLAs) have been widely used for predicting different variables, such as climate variables, pollutants, gas emissions, and hydrological variables (Banadkooki et al. 2020a). The advantages of MLAs include their handling of unlimited input data, fast processing speed, and accurate predictions. In the context of CO2 emissions, researchers have used different MLAs to predict the CO2 emission and greenhouse gases (GG). Table 1 reviews CO2 emission forecasting approaches. As observed in Table 1, the MLAs have the high ability for predicting CO2 emissions. SVM is a family member of MLAs that are widely used for predicting target variables. The SVM models have the high ability in high-dimensional space. The modelers can define different kernel functions depending upon their requirement. The SVM models provide good generalization capability. Also, they can reduce the computational complexity. Saleh et al. (2016) investigated the ability of SVM model in predicting CO2 emission. The data of electrical energy and burning coal were used as the inputs to the models. They used trial and error to adjust the parameters of the SVM model. It was concluded that the SVM model was effective for predicting CO2 emission. Sun and Liu (2016) exploited the least square SVM (LSSVM) model to predict different kinds of CO2 emission. They concluded that the classification of CO2 emission enhanced forecast accuracy. Ahmadi et al. (2019a) explored the use of least square SVM (LSSVM) in predicting CO2 emission. They used evolutionary algorithms to train the SVM model. They coupled genetic algorithm (GA) with particle swarm optimization (PSO) to make a new hybrid algorithm for training SVM model. It was concluded that the LSSVM-PSO-GA had better efficiency in predicting CO2 emission compared with LSSVM-GA and LSSVM-PSO models. Wu and Meng (2020) coupled LSSVM with bat algorithm (BA) for predicting CO2 emission. It was observed that the LSSVM-BA outperformed the extreme learning model and backpropagation neural network.

Table 1 Detailed paper reviewed

Generally, MLAs have strong prediction abilities with respect to CO2 emissions, but there are challenges:

  1. 1-

    The model parameters of MLAs need to be tuned based on powerful training algorithms such as advanced optimization algorithms (Ehteram et al. 2020; Banadkooki et al. 2021).

  2. 2-

    Some MLAs, such as the ANN, SVM, and ANFIS models, have different kinds of kernel functions and activation functions. Thus, the best function should be selected to predict the target variable based on the received input data (Darabi et al. 2021).

  3. 3-

    The selection of the best input scenario for each MLA requires preprocessing methods.

  4. 4-

    Previous studies compared various models and determined a superior model for predicting CO2 emissions. In fact, competing models are rejected or accepted based on their accuracy, but the main question is how the synergy among multiple models can be used.

To address the abovementioned challenges, the current study uses a new hybrid framework for predicting CO2 emissions. One of the most powerful models for predicting target variables such as CO2 emissions is the SVM. The kernel functions of the SVM have the same parameters. The values of the SVM parameters can be obtained based on robust optimization algorithms. In this study, four multiobjective algorithms (MOAs), namely, the MO seagull optimization algorithm (MOSOA), MO salp swarm algorithm (MOSSA), MO bat algorithm (MOBA), and MO particle swarm optimization (MOPSO), are used to improve the performance of the SVM model and address the challenges mentioned above:

  1. 1-

    To obtain the values of the SVM parameters, an objective function, such as the root mean square error (RMSE), is used as the first objective function.

  2. 2-

    To choose the best kernel function, the names of kernel functions are inserted into the optimization algorithms as decision variables. A second objective function, such as the mean square error (MAE), is used to choose the best kernel function.

  3. 3-

    The names of the input variables are inserted into the algorithms as the decision variables. The Nash Sutcliffe efficiency is used as the third objective function for choosing the name each of input variables.

  4. 4-

    An inclusive multiple model is used to predict CO2 emissions based on the synergy among the SVM-MOSOA, SVM-MOSSA, SVM-MOBA, and SVM-MOPSO algorithm.

Regarding the abovementioned points, the main novelties of the current paper are as follows:

  1. 1-

    The establishment of new hybrid SVM models that were not used in previous articles for predicting CO2 emissions.

  2. 2-

    The creation of an inclusive multiple model to use the contributions of different SVM models for predicting CO2 emissions.

  3. 3-

    The application of the current study is not limited to CO2 emissions, and modelers can use the proposed models for predicting other variables, such as hydrological variables.

  4. 4-

    The presentation of an effective approach for finding the best values of the random parameters of the compared multiobjective algorithms.

To the best of the authors’ knowledge, no previous article has investigated the new hybrid SVM models proposed in the current study for predicting CO2 emissions. In this study, new hybrid SVM models are used to predict CO2 emissions in the agricultural sector of Iran based on data from 20 provinces (Fig. 1). Section 2 of the current study explains the structures of the compared models and the methods utilized. The case study is explained in Section 3. Section 4 presents a discussion and the experimental results. Section 5 explains the conclusions of the paper.

Fig. 1
figure 1

Methodology flowchart

Materials and methods

Structure of the support vector machine

The first version of the SVM was introduced by Sain and Vapnik (1996). The SVM model uses a kernel function to find the relationships between the model inputs and outputs. The linear form of the SVM is as follows:

$$ f(x)={\eta}^{Tr}.x+\beta $$
(1)

where x is the input, Tr denotes the transpose operation, η denotes the weighting coefficients of the input variables, β is the bias, and f(x) is the variable predicted by the SVM.

The aim of the SVM is to minimize the difference between the predicted values and observed values. Thus, an optimization problem is defined to minimize the error function, which is named the e-insensitive loss function. The SVM acts based on the following equations:

$$ \mathit{\operatorname{Minimize}}\frac{1}{2}{\left\Vert \eta \right\Vert}^2+ PE\sum \limits_{i=1}^m\left({\psi}_i^{-}+{\psi}_i^{+}\right) $$
(2)
$$ {\displaystyle \begin{array}{c}\boldsymbol{subject}\left(\boldsymbol{to}\right)\left({\eta}_i.{x}_i+\beta \right)-{z}_i<\varepsilon +{\psi}_i^{+},i=1,2,..,m\\ {}{z}_i-\left({\eta}_i.{x}_i+\beta \right)\le \varepsilon +{\psi}_i^{-}\end{array}} $$
(3)

where PE is the penalty coefficient, m is the number of observed data, \( {\psi}_i^{-} \) and \( {\psi}_i^{+} \) represent the violation of the ith training data ε: permitted error threshold, xi is the input variable, zi is the target variable, and ηi is the ith weight variable. The weight variable and bias are obtained based on Eqs. 2 and 3. Then, they are inserted into Eq. 1 to obtain f(x). The SVM uses several kernel functions to map the dataset to the linearly separable space.

$$ f(x)={\eta}^{Tr}.K\left(x,{x}_i\right)+\beta $$
(4)

Sigmoid function:

$$ K\left(x,{x}_i\right)=\mathit{\tanh}\left(\gamma \left(x.{x}_i\right)+r\right) $$
(5)

Radial basis function (RBF):

$$ K\left(x,{x}_i\right)=\mathit{\exp}\left(-\gamma {\left\Vert x-{x}_i\right\Vert}^2+ PE\right) $$
(6)

Polynomial function:

$$ K\left(x,{x}_i\right)={\left(\gamma \left(x,{x}_i\right)+r\right)}^d $$
(7)

where K(x, xi) is the kernel function and γ, r, PE, d, and ε are kernel parameters (the values of the kernel parameters are obtained based on the corresponding MOAs).

Seagull optimization algorithm

The SOA has been widely applied in different fields, with applications such as multiobjective optimization (Dhiman et al. 2020; Dhiman and Kumar 2019), feature selection (Jia et al. 2019), experimental fuel cell modeling (Cao et al. 2019), and classification (Jiang et al. 2020). Few parameters, easy implementation, fast convergence, and the use of swarm experience are the advantages of the SOA. The migrating birds are the prey of the seagulls. The seagulls use natural spiral-shaped movement to attack migrating birds. Seagulls live in a group and travel towards the direction of the seagull that is most fit for survival. The mathematical model of the SOA is based on the migration phase and attack phase:

(1) Migration phase

When the seagulls move in the search space, collisions between neighbors should be avoided. Thus, the new locations of the search agents (seagulls) are computed as follows:

$$ {\overrightarrow{P}}_s=\alpha \times {\overrightarrow{L}}_s(x) $$
(8)

where \( {\overrightarrow{P}}_s \) is the location of the search agent that prevents collisions with the other seagulls, \( {\overrightarrow{L}}_s(x) \) is the location of the seagull during iteration (i), and α is the motion behavior of the seagull. αis computed as follows:

$$ \alpha ={f}_c-\left(i\times \left(\frac{f_c}{\mathit{\operatorname{Max}}(i)}\right)\right) $$
(9)

where fc is the frequency control of α and i is the number of iterations. In the next phase, the seagulls move towards the direction of the best neighbor:

$$ {\overrightarrow{N}}_e=\zeta \times \left({\overrightarrow{P}}_b-{\overrightarrow{L}}_s\right) $$
(10)

where \( {\overrightarrow{N}}_e \) is the location of seagull \( {\overrightarrow{L}}_s \) (moving towards the best seagull \( {\overrightarrow{P}}_b \)) and ζ is a random value. A random value is obtained based on the following equation:

$$ \zeta =2\times {\alpha}^2\times RA $$
(11)

where RA is a random number. Finally, the seagulls update their locations as follows:

$$ {\overrightarrow{F}}_s=\left|{\overrightarrow{P}}_s+{\overrightarrow{N}}_s\right| $$
(12)

where \( {\overrightarrow{F}}_s \) is the distance between the search agent and best seagull.

(2) Attack phase

During the attack, seagulls maintain their altitude using their wings and weight. The speed and angle of the seagull attack are changed continuously. They use spiral movements to attack their prey, as follows:

$$ x{x}^{\prime }=r\times \mathit{\cos}\left(\tau \right) $$
(13)
$$ y{y}^{\prime }=r\times \mathit{\sin}\left(\tau \right) $$
(14)
$$ z{z}^{\prime }=r\times \tau $$
(15)
$$ r=u\times {e}^{\tau v} $$
(16)

where r is the radius of each spiral turn, xx is the position of the seagull in plan x, yy is the position of the seagull in plan y, zz is the location of the seagull in plan z, u and v are constants, and τ is a random number. Finally, the seagulls change their locations as follows:

$$ {\overrightarrow{L}}_s=\left({\overrightarrow{F}}_s\times x{x}^{\prime}\times y{y}^{\prime}\times z{z}^{\prime}\right)+{\overrightarrow{P}}_b $$
(17)

where \( {\overrightarrow{L}}_s \) stores the best solution. Figure 2 shows the flowchart of the SOA.

Fig. 2
figure 2

The flowchart of the different optimization levels of SOA

Salp swarm algorithm

The SSA is widely applied for optimization problems in different fields, such as feature selection (Tubishat et al. 2021), engineering optimization (Salgotra et al. 2021), optimal power flow calculation (Abd el-sattar et al. 2021), and ANN training (Kandiri et al. 2020). The SSA has a good balance between exploration and exploitation, as well as fast convergence. The positions of salps in an optimization problem signify a candidate solution. The first salp at the front of the salp chain is called the leader, and the other salps are called followers. Figure 3 shows a salp chain. The position of the leader is changed as follows:

$$ {\boldsymbol{Salp}}_{1,j}=\left[\begin{array}{c}{\boldsymbol{Food}}_j+{\rho}_1\times \left(\left({up}_j-{low}_j\right)\times {\rho}_2+{low}_j\right)\leftarrow {\rho}_3\ge 0.50\\ {}{\boldsymbol{Food}}_j-{\rho}_1\times \left(\left({up}_j-{low}_j\right)\times {\rho}_2+{low}_j\right)\leftarrow {\rho}_3<0.50\end{array}\right] $$
(18)

where Salp1, j is the location of leader, Foodj is the food source in the jth dimension, ρ1 is a control parameter, and ρ3 and ρ2 are random parameters. ρ1 is updated as follows:

$$ {\rho}_1=2\times {e}^{{\left(\frac{-4\times l}{L}\right)}^2} $$
(19)
Fig. 3
figure 3

The schematic structure of the salp chain (Zhang et al. 2021b)

where L is the maximum number of iterations and l is the current number of iterations. The locations of the followers are updated as follows:

$$ {fol}_{i,j}=\frac{1}{2}\left({fol}_{i,j}+{fol}_{i-1,j}\right) $$
(21)

where foli, j is the position of the ith follower in the jth dimension. Figure 4 shows the SSA flowchart.

Fig. 4
figure 4

The flowchart of the SSA

Bat algorithm

The BA was inspired by the behavior of bats when finding food. The BA has been applied in different fields, such as image segmentation (Yue and Zhang 2020), parameter extraction for photovoltaic models (Deotti et al. 2020), MLA training (Dong et al. 2020), optimal reactive power dispatching (Mugemanyi et al. 2020), continuous optimization (Chakri et al. 2017), and numerical optimization (Wang et al. 2019). The bats use echolocation behavior to distinguish food from obstacles. A bat adjusts its distance from food based on its wavelength, frequency, and pulsation rate. The bats update their frequencies, velocities, and locations as follows:

$$ {fre}_i={fre}_{\mathrm{min}}+{r}_1\left({fre}_{\mathrm{max}}-{fre}_{\mathrm{min}}\right) $$
(22)
$$ {ve}_{t+1}^i={ve}_t^i+{f}^i\left({X}_t^i-{X}_t^{\boldsymbol{best}}\right) $$
(23)
$$ {X}_i^{t+1}={X}_i^t+{ve}_i^{t+1} $$
(24)

where frei is the frequency of the ith bat, fremin is the minimum frequency, r1 is a random value, \( {ve}_{t+1}^i \) is the velocity of the ith bat at iteration t+1, \( {X}_t^i \) is the location of the ith bat at iteration t, and \( {X}_t^{\boldsymbol{best}} \) is the location of the best bat (the best solution). The bats use a random walk to perform the local search operation:

$$ {X}_i^{t+1}={X}_i^t+\varphi {A}_i^{t+1} $$
(25)

where φ is a random number and \( {A}_i^{t+1} \) is a loudness parameter. The pulsation rate and loudness are adjusted based on the following equations:

$$ {A}_i^{t+1}=\mu {A}_i^t $$
(26)
$$ {r}_i^{t+1}={r}_i(0)\left[1-\mathit{\exp}\left(-\gamma t\right)\right] $$
(27)

where μ and γ are constant values, ri(0) is the initial value of the pulsation rate, \( {A}_i^{t+1} \) is the value of loudness at iteration t+1, and \( {r}_i^{t+1} \) is the pulsation rate at iteration t+1. Figure 5 shows the BA flowchart.

Fig. 5
figure 5

The flowchart of the BA

Particle swarm optimization

PSO is a powerful optimization algorithm that is widely used in different fields, such as for multiobjective optimization problems (Zhang et al. 2020), feature selection problems (El-Kenawy and Eid 2020), environmental economic dispatch problems (Xin-gang et al. 2020), green coal production problems (Cui et al. 2020), and the training of ANN models (Darwish et al. 2020). The collaborative behavior of the swarm in the PSO algorithm is one of the advantages of PSO with regard to finding optimal solutions. In PSO, first, the random positions and velocities of the particles are initialized. Then, an objective function is computed for each particle so that the best particle position achieved by the population so far is identified. The velocities and positions of the particles are updated based on the following equations. The process continues until the stopping criterion is satisfied.

$$ {v}_{i,d}\left(t+1\right)=\kappa {v}_{i,d}(t)+{c}_1{r}_1\left({p}_{\boldsymbol{best},i,d}-{x}_{i,d}(t)\right)+{c}_2{r}_2\left({g}_{\boldsymbol{best},d}-{x}_{i,d}(t)\right) $$
(28)
$$ {x}_{i,d}\left(t+1\right)={x}_{i,d}(t)+{v}_{i,d}(t) $$
(29)

where vi, d(t + 1) is the velocity of the ith particle in the dth dimension during iteration t+1, c1 and c2 are acceleration coefficients, pbest, i, d is the personal best position, gbest, d is the global best position, xi, d(t + 1) is the position of the ith particle during iteration t+1, and r1 and r2 are random numbers.

Multiobjective optimization problems

In a multiobjective optimization problem (MOOP), some objective functions can conflict with each other. A solution to an MOOP cannot be compared with other solutions based on relational operators. One of the most important conceptions in MOOPs is Pareto dominance. Solution A dominates solution B if it has equal values (with at least one better value) on all objectives. The circles in Fig. 6 are better than the squares in Fig. 6 based on the fact that the circles achieve lower objective function values of for a minimization problem with the aim of minimizing both objectives. While the circles dominate the squares, they do not dominate each other. Each MOOP has a set of best non-dominated solutions, namely, the Pareto optimal set. The projection of Pareto optimal solutions in the search space is known as the Pareto optimal front. In an MOOP, there is an external archive in which the non-dominated solutions of MOAs are stored. The next challenge is the selection of a target for each iteration of the algorithm. In fact, the targets are the best positions of the leader, bat, particle, and seagull for the MOSSA, MOBA, MOPSO algorithm, and MOSEOA, respectively. To find the target for updating the positions of other agents, the number of neighboring solutions (NSs) in the neighborhood of each solution is counted. During this phase, the target should be chosen from the set of non-dominated solutions with the least crowded neighborhood. To determine the populated neighborhood, NSs within a certain maximum distance are counted.

$$ \overrightarrow{d}=\frac{\mathrm{ma}\overrightarrow{\mathrm{x}}-\mathrm{mi}\overrightarrow{\mathrm{n}}}{\left(\boldsymbol{Archive}\right)\boldsymbol{size}} $$
(30)
Fig. 6
figure 6

Dominance conception

\( \overrightarrow{d} \) is the crowding distance, \( \mathrm{ma}\overrightarrow{\mathrm{x}} \) denotes maximum value for every objective, and \( \mathrm{mi}\overrightarrow{\mathrm{n}} \) represents minimum value for every objective. Based on the computed distances, a rank is assigned to each solution. Then, a roulette wheel is used to choose the target. The solutions with more NSs have higher ranks, and thus, the target is chosen among the solutions with the lowest ranks. Setting the size of the archive is another challenge. Archives can store a limited number of non-dominated solutions. Thus, the solutions with the most crowded neighborhoods are chosen for removal from the archive. Following a similar process, the crowding distances are computed, and a rank is assigned to each solution. A roulette wheel is used for selecting the solutions with the highest ranks (the most crowded neighborhoods). To update the archive of non-dominated solutions, the following rules should be considered:

  1. 1-

    If one of the archive solutions dominates the external solutions, the external solution should be discarded.

  2. 2-

    If a solution dominates all non-dominated solutions in the archive, the external solution should be added to the archive.

The MOSOA, MOBA, MOPSO algorithm, and MOSSA are performed based on the following procedure:

  1. 1-

    The random parameters of the MOAs are defined for each algorithm.

  2. 2-

    The random positions (velocities) of agents (particles, bats, seagulls, and salps) are defined.

  3. 3-

    The objective function (OBF) is calculated for each agent.

  4. 4-

    The non-dominated solutions are determined based on the value of the OBF.

  5. 5-

    The size of the archive is checked, and if it is full, the solutions with the most NSs are removed from the archives.

  6. 6-

    The archive should be updated based on the rules mentioned above.

  7. 7-

    The target is selected from the solutions with the least crowded neighborhoods.

  8. 8-

    The process is repeated until the satisfaction of an end condition occurs.

Case study

According to the Paris Agreement, Iran has agreed to decrease CO2 emissions by 4% by 2030. Iran has stated that if there is international support without any risk of sanctions, a 12% reduction in CO2 emissions is possible. However, there are challenges with respect to decreasing CO2 emissions. During the 2000s, environmental issues had lower importance than economic and social issues. The developing industrial unit around Tehran (capital of Iran) is one of the main causes of increased CO2 emissions and air pollutants. Iran is known as the 7th largest CO2 emitter in the world. The agricultural sector of Iran is one of the important causes of increased CO2 emissions. There are different parameters in the agricultural sector that affect CO2 emissions. One of the most important parameters is the gross domestic product (GDP). Different studies have investigated the effect of the GDP on CO2 emissions. Cowan et al. (2014) stated that there was a Granger causality between the GDP and CO2 emissions in Brazil during the period from 1990 to 2000. Zubair et al. (2020) investigated the relationship between CO2 emissions and the GDP in Nigeria. They stated that the CO2 emissions during the long-term period (1980–2018) in Nigeria decreased with increasing GDP. Additionally, the literature has used the environmental Kuznets curve (Kuznets 1955) to find the relationship between economic growth and environmental quality. Grossman and Krueger (1991) used the Kuznets curve to explain the relationship between environmental degradation and economic growth. Based on the Kuznets curve, first, economic growth degraded the environment. After economic growth reached its maximum value, the growth improved the environment. Additionally, the square of the per capita GDP is one of the most effective parameters for predicting CO2 emissions (Shabani et al. 2021; Hosseini et al. 2019). Another effective parameter for predicting CO2 emissions is the Gini coefficient (Cheng et al. 2021; Shabani et al. 2021). The Gini index estimates income inequality in the agricultural sector of Iran. Energy consumption is yet another effective parameter for estimating CO2 emissions. The amount of fossil fuels utilized in this sector is presented as the energy consumption variable. Ali et al. (2021) stated that increasing fossil fuel usage increases CO2 emissions. Koengkan et al. (2019) stated that renewable energies should be used as alternatives to fossil fuels because fossil fuels increase CO2 emissions.

Generally, based on the abovementioned discussions, the following inputs are used in the current study for predicting CO2 emissions:

  1. 1-

    GDP

  2. 2-

    Square of the GDP

  3. 3-

    Energy use

  4. 4-

    GINI index

To obtain the real GDP, the nominal values are divided by the producer price index, and the total values are divided by the population to obtain the per capita GDP. However, different input combinations can be provided based on the inputs above. To find the best input combination, the SVM is coupled with the MOAs. Table 2 shows the statistical characteristic data. The annual data for 25 provinces of Iran from 1990 to 2018 are extracted to predict the CO2 emissions caused by the agricultural sector of Iran. The website of the Statistics Center of Iran is used to collect the dataset.

Table 2 The statistical characteristic of input and output data

Hybrid SVM and MOAs

In this section, the MOAs are used to improve the accuracy of the SVM based on the following procedure:

  1. 1-

    The input data are prepared for the SVM models. Seventy percent of the data are used for training, and 30% of the data are used for testing. These percentages are chosen because they provide the lowest error for the SVM model.

  2. 2-

    The SVM model uses the training data for predicting CO2 emissions. If the stopping criterion is satisfied, it proceeds to the testing phase; otherwise, the MOAs are linked to the SVM model.

  3. 3-

    The MOAs are defined based on the rules in section 2.5. The names of the inputs, the names of the kernel functions, and the initial guesses of the kernel parameters are inserted into the MOAs as decision variables. Three objective functions are used to find the parameter values, the best kernel function, and the best input combination. The RMSE is used as the first objective function for finding kernel parameters. The MAE is used to find the best kernel function. The NSE is used to find the best input combination. In fact, the positions of the agents determine the values of the decision variables.

  4. 4-

    The Pareto front is created, and the nondominated solutions are placed on the Pareto front. Each solution includes three kinds of information: the best input combination, the values of the parameters, and the best kernel function.

  5. 5-

    A multicriteria decision process is used to choose the best solution from the Pareto front as the final solution.

  6. 6-

    The SVM model based on the obtained formations runs again, and the process is repeated.

Inclusive multiple model

Previous studies used competitive models for predicting different target variables, but there are several concerns:

  1. 1-

    The final outputs of the previous studies were the selections of the worst model and the best model. The best model was suggested for subsequent studies, and the worst models were discarded.

  2. 2-

    The intercomparisons between the predictive models were never exhaustive.

  3. 3-

    There was no effort to provide more accurate results based on the synergy among all competitive models used.

In this study, first, SVM-MOSOA, SVM-MOPSO, SVM-MOSSA, and SVM-MOBA are used to predict CO2 emissions. In the next stage, to increase the accuracy of the outputs, an inclusive multiple model is used to improve the accuracy of the final outputs based on the synergy among the SVM-MOSOA, SVM-MOPSO, SVM-MOSSA, and SVM-MOBA models as follows:

  1. 1-

    First, the SVM-MOSOA, SVM-MOPSO, SVM-MOSSA, and SVM-MOBA models are used to predict CO2 emissions.

  2. 2-

    In the next stage, the outputs of SVM-MOSOA, SVM-MOPSO, SVM-MOSSA, and SVM-MOBA are used as the inputs for the ANN model to predict CO2 emissions. In fact, the outputs of the first stage are considered lower-order modeling results. The utilization of the inclusive multiple model causes the accuracy of the results to be increased based on the utilization of the advantages of all competitive models. The modeler ensures that the capacities of all models are used to extract the most accurate results possible.

However, the ANN model used in the current study includes one input layer, one hidden layer, and one output layer, as observed in Fig. 7. The input layer receives the outputs of the competitive models as its inputs. The hidden layer processes the received inputs based on the chosen activation function. In this study, the sigmoid function is used as it was one of the successful activation functions used in previous studies (Banadkooki et al. 2020b, Ehteram et al. 2021). The ANN processes data based on the following equation:

$$ Y={b}_0+\sum \limits_{i=1}^{Nh}{\omega}_jf\left({b}_{oj}+\sum \limits_{i=1}^{Nin}{\omega}_{ij}{\boldsymbol{IN}}_{ij}\right) $$
(31)

where b0 and boj are the bias of the output and the hidden layer, respectively; ωij is the weight of the ith input in the jth hidden layer neuron; INij is the network input; f is the activation function; ωj is the weight of the output from hidden neuron j; Nh is the number of hidden neurons; Y is the output; and Nin is the number of inputs.

$$ f(y)=\frac{1}{1+{e}^{-y}} $$
(32)
Fig. 7
figure 7

(a) The measured CO2 for 700 data points and (b) the starcture of the IMM model

f (y) is the output of the activation function based on the input y (y: \( {b}_{oj}+\sum \limits_{i=1}^{Nin}{\omega}_{ij}{\boldsymbol{IN}}_{ij} \)). In this study, the back-propagation algorithm is used to train the ANN model.

Multicriteria decision model

A Pareto front includes multiple solutions. To choose the best solution, a multicriteria decision model is used. One of the powerful multicriteria decision models is the weighted aggregate sum product assessment (WASPAS) technique, which is widely used in different fields, such as solving solar wind power problems (Nie et al. 2017), fuel technology selection (Rani and Mishra 2020), and the development of smart cities (Khan et al. 2020). To use the WASPAS technique to choose the best solution from the Pareto front, the following steps are considered:

  1. 1-

    A decision matrix is adjusted so that the values of each criterion (NSE, RMSE, and MAE) are inserted into the matrix for each solution.

  2. 2-

    The decision matrix is normalized based on the following equations:

For the NSE criterion:

$$ {z}_{ij}^{\ast }=\frac{z_{ij}}{{\mathit{\max}}_i{z}_{ij}} $$
(33)

For the RMSE and MAE criteria:

$$ {z}_{ij}^{\ast }=\frac{{\mathit{\min}}_i.{z}_{ij}}{z_{ij}} $$
(34)

\( {z}_{ij}^{\ast } \) is a normalized value, and zij is the performance of the ith alternative with respect to the jth criterion.

  1. 3-

    In the next stage, the weighted sum model and weighted product model are computed as follows:

$$ {K}_i=\sum \limits_{j=1}^n{\kappa}_{ij}.{z}_{ij} $$
(35)
$$ {L}_i=\prod \limits_{j=1}^n{\left({z}_{ij}^{\ast}\right)}^{w_j} $$
(36)

where Ki is the weighted sum model, Li is the weighted product model, κij denotes the weight of each criterion, and n is number of criteria.

  1. 4-

    The aggregated measure is computed as follows:

$$ {S}_i=\sigma {K}_i+\left(1-\sigma \right){L}_i $$
(37)

where σ is a constant parameter (σ:0.5). The solution with the highest value of Si is chosen as the best solution.

In this study, to evaluate the performances of the tested models, the following indexes are used:

  • Root mean square error (ideal values are close to zero)

$$ \boldsymbol{RMSE}=\sqrt{\frac{1}{m}\sum \limits_{i=1}^m{\left({CO}_{2 es}-{CO}_{2 ob}\right)}^2} $$
(38)

• Scatter index (SI<0.10: excellent performance, SI:0.10 <SI< 0.20: good performance, SI:0.20 <SI< 0.30: fair performance, and SI>0.30: poor performance (Li et al. 2013):

$$ SI=\frac{\boldsymbol{RMSE}}{C{\overline{O}}_{2 ob}} $$
(39)

• Mean absolute error (MAE):

$$ MAE=\frac{1}{m}\sum \limits_{i=1}^m\left|{CO}_{2 es}-{CO}_{2 ob}\right| $$
(40)

• Nash Sutcliffe efficiency (NSE) (values close to 1 are ideal)

$$ NSE=1-\frac{\sum \limits_{i=1}^m{\left({CO}_{2 es}-{CO}_{2 ob}\right)}^2}{\sum \limits_{i=1}^m\left({CO}_{2 es}-C{\overline{O}}_{2 es}\right)} $$
(41)

• Uncertainty with a 95% confidence level (lowest values are ideal)

$$ {U}_{95}=1.96\sqrt{\left({SD}^2+{\boldsymbol{RMSE}}^2\right)} $$
(42)

• Percentage bias

$$ \boldsymbol{PBIAS}=100\ast \frac{\left[\sum \limits_{i=1}^n{\left({CO}_{2 es}-{CO}_{2 ob}\right)}^2\right]}{\sum \limits_{i=1}^n{CO}_{2 ob}} $$
(43)

where SD is the standard deviation of the residual, m is the number of data points, CO2ob denotes the observed CO2 levels, and \( C{\overline{O}}_{2 es} \) is the average estimated \( C{\overline{O}}_{2 es} \).

Additionally, to evaluate the quality of the Pareto fronts obtained by the different MOAs, the following indices are used:

  1. 1-

    Spacing index: this shows the spread of the computed Pareto front.

$$ SP=\sqrt{\frac{1}{np-1}\sum \limits_{i=1}^{np}\left(\overline{d}-{di}_i\right)} $$
(44)

where np is the number of Pareto solutions, dii is the Euclidean distance between the two consecutive in the Pareto front, and \( \overline{d} \) is the average distance between solutions. The lowest SP value corresponds to the best algorithm.

Maximum spread (MS): the MS shows the distance between the boundary conditions. The highest MS value corresponds to the best algorithm:

$$ MS=\sqrt{\sum \limits_{i=`}^{nf}\mathit{\max}\left(d\left({r}_i,{t}_i\right)\right)} $$
(45)

where d is a function used to compute the Euclidean distance, ri is the maximum value in the i-th objective function, ti is the minimum value in the ith objective function, and nf is the number of objective functions.

Discussion and results

Determination of random algorithmic parameters

The MOAs used have random parameters, and their accurate values should be determined because the accuracies of MOAs depend on the random parameters chosen. One of the most robust methods for designing parameters and experiments is the Taguchi model. The Taguchi model is widely used in different fields, such as for optimizing the tuning parameters of models (Dutta and Kumar Reddy Narala 2021), optimizing thermal conductivity (Aswad et al. 2020), optimizing air distributor channels (Feng et al. 2020), optimizing soil erosion (Zhang et al. 2021a), and optimizing runoff water quality (Liu et al. 2021). To determine the values of the random parameters, the following steps are considered:

  1. 1-

    The levels of the parameters and the number of parameters should be determined. For example, the population size and the maximum number of iterations should be determined for the MOSOA. Thus, there are two parameters. As observed in Tables 3a and 4 values are assigned to the population size and the maximum number of iterations to find the best values of these random parameters.

  2. 2-

    Regarding the number of parameters and their levels, an orthogonal array is chosen from the Taguchi table. The Taguchi model uses the orthogonal array to decrease the required number of experiments. Regarding the two parameters with four levels each in the MOSOA, L16 covers the required experiments for the parameters of the MOSOA. As observed in Table 3b, there are 16 experiments for two parameters with four levels.

  3. 3-

    In the next stage, the signal (S)-to-noise (N) ratio is computed for each experiment:

Table 3 (a) The level and factors of MOSOA, (b) the S/N value of 16 experiments, (c) the mean S/N value for different parameters, and (d) the optimal value of other parameters of other MOAs
Table 4 The obtained information by the best solution of Pareto fronts for (a) the selection of kernel function, (b) the values of SVM parameters, and (c) the best input combination
$$ \frac{S}{N}\boldsymbol{ratio}=-10{\mathit{\log}}_{1o}{\left(\boldsymbol{objective}\left(\boldsymbol{function}\right)\right)}^2 $$
(44)

High values of the S/N are ideal.

  1. 4-

    The mean of the S/N values is computed for factors at the different levels and computed as follows:

$$ {\left(\boldsymbol{Mean}\right)}_{\boldsymbol{factor}=l}^{\boldsymbol{level}=i}=\frac{1}{n_{ij}}\sum \limits_{j=1}^{n_{ij}}\left[{\left(\frac{S}{N}\right)}_{\boldsymbol{factor}=l}^{\boldsymbol{level}=i}\right] $$
(45)

where Mean S/N denotes the mean signal-to-noise ratio.

Table 3b shows the S/N values for 16 experiments. The average S/N for each of the levels of the parameters is computed in Table 3c. The highest mean S/N values correspond to the best parameter values. As observed in Table 3c, the population size and the maximum number of iterations in level 1 have the highest mean S/Ns. Thus, the best values for the population size and the maximum number of iterations are 14.26 and 15.33, respectively. Obtained via a similar process, the optimal values of the other parameters for the other compared MOAs are reported in Table 3d.

The Pareto fronts of different models

Figure 8 shows the Pareto fronts obtained for different models. As observed in Fig. 8, the RMSE, MAE, and NSE of SVM-MOSOA vary from 0.25 to 0.75 (kg), from 0.12 to 0.62 (kg), and from 0.92 to 0.98, respectively. The RMSE, MAE, and NSE of SVM-SSA vary from 0.35 to 10.05 (kg), from 0.24 to 0.72, and from 0.91 to 0.97, respectively. The RMSE and MAE of SVM-MOSOA for the non-dominated solutions are lower than those of the other models. The red circle shows the best solution obtained by the multicriteria decision process. As observed in Table 4, the best solution determines the best input scenario, the values of the SVM parameters, and the best kernel function. Table 4a shows the best kernel functions for the different SVM models. The best kernel function for SVM-MOSOA is the RBF, and it is the sigmoid function for SVM-MOSSA, SVM-MOBA, and SVM-MOPSO. The values of the SVM parameters are shown in Table 4b, and the best input combinations are shown in Table 4c. Figure 9 compares the quality of the Pareto fronts obtained by the different models. As observed in Fig. 9, MOSOA provides the lowest SP and the highest MS among the models. The MOPSO provides the lowest MS and the highest SP. Thus, MOPSO and MOSOA provide the worst and the best Pareto fronts among the tested models, respectively.

Fig. 8
figure 8

Obtained Pareto fronts for different models

Fig. 9
figure 9

The SP and MS values for comparing different models

Comparison of model accuracies with respect to predicting CO2 emissions

Table 5 compares the performances of different models. The RMSE of SVM-MOSOA during the training stage is 16, 55, 57, and 60% lower than those of the SVM-MOSSA, SVM-MOBA, and SVM-MOPSO models. The MAE obtained by SVM-MOSOA is 0.29, and they are 0.32, 0.69, and 0.91 for the SVM-MOSSA, SVM-MOBA, and SVM-MOPSO models, respectively. SVM-MOSOA obtains the highest NSE and the lowest PBIAS during the training stage among the different SVM-MOA models. If the accuracies of the SVM-MOA models are compared with those when the IMM model is added, it is clear that the IMM model increases the accuracy of each of the SVM-MOA models during the training stage. The IMM model decreases the MAEs of the SVM-SOA, SVM-MOSSA, SVM-MOBA, and SVM-PSO models by 24, 31, 68, and 76%, respectively. The model performance comparison indicates that the IMM and SVM-MOBA have the highest and lowest NSEs, respectively, among the tested models during the training phase. The performances of the models indicate that SVM-MOSOA has the best accuracy among the SVM-MOA models. The NSE of SVM-MOSOA is 0.93, and they are 0.90, 0.89, and 0.84 for the SVM-MOSSA, SVM-MOBA, and SVM-MOPSO models, respectively. The performances of the models indicate that the IMM model outperforms all SVM-MOA models in the testing phase. The model accuracy comparison indicates that the IMM and SVM-MOPSO have the lowest and highest PBIAS among the models, respectively. As observed in Table 5, the IMM model decreases the RMSEs of the SVM-MOSSA, SVM-MOBA, and SVM-MOPSO models by 20, 36, 47, and 50%, respectively. Figure 10a compares the performances of the models based on the IS index. As observed in this figure, the SI values of the IMM model are 0.04 and 0.060 during the training and testing stages, respectively, and thus, the performance of the IMM model is excellent. The performances of SVM-MOSOA, SVM-SSA, SVM-BA, and SVM-PSO are good during the training and testing phases. Figure 10b compares the performance of the models based on the U95% measure. As observed in Fig. 10b, it is clear that the IMM and SVM-MOSOA provide the lowest uncertainty levels in comparison with the other models. Figure 11 shows the scatterplots obtained using testing data. Additionally, the R2 values obtained on the training data are mentioned for each model. The R2 of the IMM is 0.995, while they are 0.9899, 0.9830, 0.9798, and 0.9752 for the SVM-MOSOA, SVM-MOSSA, SVM-MOBA, and SVM-MOPSO models, respectively. Thus, the IMM has the highest R2 values. Taylor diagrams are effective diagrams for comparing models. A Taylor diagram uses three evaluation criteria, namely, the standard deviation, root mean square error, and correlation coefficient, to evaluate the accuracies of the compared models. A model has a better performance than those of other models if its point is closest to the reference point (observed data). As observed in Fig. 12, the IMM and SVM-MOSOA have better performances than the other models. Figure 13 shows the boxplots of the models. As observed in this figure, the IMM and SVM-MOSOA have good agreement with the measured data. The relative error of each data point for the different models is computed based on the heat map in Fig. 14. As observed in Fig. 14, the relative error induced by the IMM model varies from 10 to 20%, and this model has the lowest relative errors among the tested models. The relative error of SVM-MOBA varies from 40 to 60% and 50 to 60%for the SVM-MOPSO model.

Table 5 The comparison of the accuracy of different models
Fig. 10
figure 10

Comparison of the accuracy of different models based on a: SI index and b: U95%

Fig. 11
figure 11

The scatterplots for the (a) IMM (R2 (training:0.997)), (b) SVM-MOSA ((R2 (training:0.9912)), (c) SVM-MOSSA (R2 (training:0.9876)), (d) SVM-BA (R2 (training:0.9812)), and (e) SVM-MOPSO (R2 (training:0.9798))

Fig. 12
figure 12

Taylor diagram for comparing different models

Fig. 13
figure 13

The box plot of models for predicting CO2 emission

Fig. 14
figure 14

The heat map for comparing models based on relative error

Concluding discussion

Regarding the obtained results, the following points should be considered:

  1. 1-

    One of the main advantages of the current study is that it finds the best input combination without preprocessing data, such as through the use of the gamma test and principal component analysis. Each multiobjective algorithm can automatically find the best input combination with respect to a defined objective function.

  2. 5-

    The findings of the current research confirm the results from previous article by Ahmadi et al. (2019b), who showed the optimization algorithm could improve the accuracy of SVM and LSSVM models for predicting CO2 emission. They stated that the optimization algorithms such as PSO and GA had the high abilities for improving the accuracy of soft computing models such as SVM and LSSVM model. Sun and Liu (2016) used the LSSVM as a family member of SVM model to predict CO2 emission. They reported that the LSSVM model outperformed the backpropagation neural network and grey models. Shabani et al. (2021) confirmed that the square of the GDP, GDP, energy use and GINI index were the effective inputs for predicting CO2 emission. A comparison of the results obtained by the current study and Shabani et al. (2021) revealed accurate predictions with least errors for IMM model.

  3. 2-

    The findings of the current study confirm the results from previous studies by Ehteram et al. (2021) and Seifi et al. (2020), who showed the high potential of multiobjective algorithms for improving the performances of MLAs.

  4. 3-

    The results of the current study are similar to those of Shabani et al. (2021) and Khatibi et al. (2017), who showed that the IMM-based models provide high accuracy for the prediction of target variables.

  5. 4-

    Future studies can investigate the effects of uncertain inputs and model parameters on the accuracies of the resulting models.

  6. 5-

    All of the evaluation criteria show that the performances of the IMM and SVM-MOSOA are the best, but future studies can use the multicriteria model to assign a rank to each model and then select the one with the best performance.

  7. 6-

    The performances of the models used in the current study are not limited to theoretical aspects; policy-makers and decision-makers can utilize the tools used here to identify the effective parameters of CO2 emissions and then find real relationships between the inputs and outputs. Thus, the models are suitable for environmental management and provide good insights for predicting CO2 emissions.

  8. 7-

    However, it should be noted that there are challenges with improving MLA models based on MOAs, such as converting a single objective optimization algorithm to multiobjective optimization algorithms, defining various objective functions, and selecting the best solution on the given Pareto front.

  9. 8-

    Depending on the kinds of MLAs used in a given study, objective functions can be defined. For example, the number of hidden layers, the ANN parameters (weight and bias), the selection of the activation function, and the best input combination selection can be defined as the decision variables for an ANN model. Thus, four objective functions can be defined for the model under study.

  10. 9-

    The increasing number of inputs and model parameters may increase the computational complexities and time consumptions of the models.

  11. 10-

    Another ability of the introduced models is that they provide spatial maps of CO2 emissions so that the regions with high and low CO2 emissions are classified accurately.

  12. 11-

    An ensemble of different kinds of kernel functions for SVMs can also be one of the strategies for increasing the accuracy of the SVM model because the model can use the advantages of different kernel functions.

Conclusion

Predicting CO2 emissions is a real challenge for policy-makers and decision-makers with respect to managing the environment. CO2 emissions rely on different parameters, and thus, finding accurate relationships between model inputs and outputs is an important issue. The agricultural sector is one of the important sources of CO2 production. Thus, the current study used SVM models for predicting CO2 emissions based on data from the agricultural sector. In this study, some MOAs were used to improve the accuracies of several SVM models based on finding the best input combinations, model parameters and kernel functions. Then, an IMM model used the outputs of the models as its inputs to increase their accuracies. The best input combination for all models was the use of the Gini index, GDP, GDP2, and EU. The SVM-MOA models selected different kernel functions for mapping complex relationships between the inputs and outputs. The results indicated that the SOA provided the best Pareto front among the tested MOAs. It was observed that the IMM model decreased the MAEs of the SVM-MOSOA, SVM-MOSSA, SVM-MOBA, and SVM-PSO models by 24, 31, 75, and 76%, respectively. Additionally, SVM-MOSOA achieved the highest accuracy among the SVM-MOA models. The MAE obtained by SVM-MOSOA was 0.29, and they were 0.32, 0.69, and 0.91 for the SVM-MOSSA, SVM-MOBA, and SVM-MOPSO models, respectively. The general results indicated that the IMM model could significantly increase the accuracies of the SVM models. The results of the current study could be useful for decreasing pollutants in the environments of different countries. The next papers can cover the deficiencies of the current paper. They can test the models for predicting CO2 in the different regions of the world. Also, they can add the other socioeconomic to the inputs to consider the effect of different inputs on the accuracy of the models.