Introduction

Environmental degradation and the role of humanity’s carbon footprint create great concern on sustainable economic development. However, due to the unavailability of long-run and reliable data, literature has been rather scarce in terms of testing the impact of economic development on environmental degradation for data covering centuries long. Following the seminal papers of Grossman and Krueger (1991) and Shafik and Bandyopadhyay (1992), the environmental Kuznets curve (EKC) has become highly popularized and a growing body of research has emerged. According to the EKC hypothesis, the environmental degradation and economic development relation is rather stable and follows an inverted U-shape, i.e., environmental degradation increases with a decreasing rate up to a certain threshold level of economic development after which the positive relation becomes reversed through various channels. The additional seminal papers that led to the emergence of the investigation of the EKC include Selden and Song (1994), Stern et al. (1996), and Cole et al. (1997).

One point that cannot be overlooked is that without a dataset that covers a large period that includes centuries, the possible existence of an inverted U-shaped relation between the environmental degradation and economic development as put forth by the EKC hypothesis is highly questionable. For this end, the paper contributes by focusing on one of the largest datasets starting from the eighteenth century covering the 1871–2016 period for the USA and for the UK. The main advantage of this approach is its coincidence with the historically long-run characteristic of the EKC hypothesis. The EKC hypothesis also requires a dataset that covers various stages of economic development in Rostow’s terms, in addition to evaluating countries that underwent those stages. Another advantage of this approach is, if the complex relation between the analyzed variables could be modeled with an approach that captures the regime shifts and the varying nonlinear causal relations due to many historical events, the historically long data could be harnessed to provide significant information regarding the relations between the emissions, economic prosperity, and petrol prices. Nevertheless, such long periods of historical data also has a disadvantage, since it is also subject to trajectory changes that result in deviations from a stable relation between the analyzed variables. These trajectory changes include regime changes due to many historical events, such as policy changes, economic crises, wars, disputes, trade relations, and policies not to mention the industrial revolutions that took place in the history. Due to the trajectory changes in such long period of historical data, the relation between the economic development and emissions become more complex compared to samples covering one or two decades. Overall, the historically long data has significant advantages in terms of meeting the long-run characteristic of the EKC curve in addition to providing important insights by allowing the researcher to evaluate the relationship within a broad perspective by incorporating the impacts of the trajectory changes in the relationship. However, it is also a necessity to augment the econometric tools to control regime shifts, nonlinearity, and asymmetry in the nonlinear causal relations among the analyzed variables. Otherwise, the regime shifts and nonlinearity could result in biased estimates if regression analysis is utilized without controlling such effects.

As to be shown in the literature section, if an overlook to the debate regarding the EKC is to be presented, the reliance to linear or nonlinear in variables econometric models that assume stable relations between the variables constitutes the largest share. Nonlinear in variables models are based on polynomial regressions, i.e., squared and or cubic GDP per capita or other national income variables to evaluate the inverted U-shaped EKC in the pursuit of the determination of turning point or points. More recently, a few number of papers focus on nonlinear in parameters models, such as the threshold autoregressive (TAR) and Markov-switching auto-regressions (MS) in the spirit of Tong (1990) and Hamilton (1990). One of the earliest papers to our knowledge is Esteve and Tamarit (2012), who employ the TAR approach to evaluate the carbon dioxide emission-economic growth relation, since the TAR approach provides statistical determination of optimum thresholds or turning points within the spirit of EKC. Other attempts include the introduction of control variables, such as the carbon option pricing and energy consumption (Arouri et al. 2012a, b; Aslanidis and Iranzo 2009; Fezzi and Bunn 2009; Fosten et al. 2012; Kim et al. 2010; Yavuz and Yilanci 2013) and energy consumption and production to address the indeterminacy issue of the EKC (Richmond and Kaufmann 2006; Tucker 1995; Unruh and Moomaw 1998; Ghosh 2010; Ang 2007; Jalil and Mahmud 2009; Zhang and Cheng 2009; Sanjari and Delangizan 2010; Fodha and Zaghdoud 2010; Menyah and Wolde-Rufael 2010; Arouri et al. 2012a, b; Wang et al. 2011; Pao and Tsai 2010; Alam et al. 2012; Bildirici 2013; Bildirici and Gökmenoğlu 2017). In sum, the majority of the literature could be summarized as following linear econometric methodologies, including Granger causality, VAR, Johansen cointegration, Engle and Granger cointegration, structural VAR, linear panel regressions, and panel cointegration regressions, which fail to capture the nonlinearity in the emissions-economic development relationship, not to mention the trajectory changes that alter the stability of the EKC relation. More recent papers rely on linear autoregressive distributed lag (ARDL) methodology (Pesaran et al. 2001) with squared and cubic terms; however, the inclusion of such terms to cointegration vectors is rather problematic. The evaluation suggests the usage of more advanced empirical methods to evaluate the emissions and economic development relation since complexity is inherent in the relation. As a result, one linear relation or one stable inverted U effect of economic development on emissions fail to exist if a long period of historical data that is subject to continuously changing production technologies, economic policies, and technological advancements is taken into analysis.

To provide a contribution and to suggest a solution to the problem of indeterminacy of EKC, the study proposes the Markov-switching vector autoregressive multilayer perceptron (MS-VAR-MLP) model to capture the complex nonlinear relations between the emissions, economic development, and the petrol prices. Petroleum is a fossil fuel-based non-renewable commodity, and its consumption has strong environmental implications on the emissions. Further, petrol prices are shown to have significant nonlinear impacts on economic growth rates and on the business cycles in addition to being influential on the historical economic crises (Hamilton 1990, 2011; Krolzig and Clements 2002). Pan et al. (2017) shows that petrol prices have time-varying nonlinear effects on the U.S.’s GDP and the predictability of the GDP fluctuations. Rehman (2018) shows the strong association of petrol price fluctuations and global economic policy changes, including the developed and developing nations. Winchester and Ledvina (2017) provide a global economy wide model and by simulating oil price fluctuations, they show that high (low) petrol prices lead to lower (higher) GHG emissions.

One additional contribution of the paper is the extension of the sensitivity analysis used for neural networks for the analysis of regime-dependent sensitivity analysis that enables the investigation of nonlinear causal relations. Further, the paper utilizes one of the largest datasets covering the 1871–2016 period for UK and USA based on the following: (i) both countries have passed the necessary stages of development as in the spirit of Rostow (1960), and (ii) both are at the 5th stage of economic development, the high mass consumption. (iii) To summarize, the study has three aims to contribute to the literature. The first is to investigate the relationship between economic growth, CO2 emissions, and petrol prices for one of the largest datasets available in the spirit of the EKC hypothesis with the newly proposed MS-VAR-MLP- and MS-VAR-MLP-based regime-dependent sensitivity methodology. The second aim is to investigate the complex relationship between the petrol prices, environmental pollution, and economic growth, which cannot be captured adequately, especially by polynomial regressions. The MS-VAR approach assumes regime-switching determined with Markov chains between to sub-sample spaces that follow VAR processes. In contrast, the MS-VAR-MLP allows MLP-type nonlinear neural network processes within each sub-sample space. Further, due to the trajectory changes resulting from oil shocks, economic crises, policy changes, and wars in such a historical data, the MS-VAR model could fail to capture only the abrupt shifts in the time series and could result in non-rejection of certain regimes that include the large decrease or increases in magnitudes of variables, such as GDP per capita. The third aim of the paper is to propose MS-VAR-MLP-based regime-dependent sensitivity analysis that could be used to provide regime-dependent causal interpretations between the analyzed variables. As to be shown, the MS-VAR-MLP and its regime-dependent sensitivity analysis provide nonlinear causal interpretations of the analyzed variables within each distinct regime, and the causal impact is nonlinear even in each regime allowing the researcher to investigate complexity in the nonlinear causality. The usage of the large span of data is also fruitful in such a complex nonlinear model, since the sample covers many important factors, including world wars and economic crises especially those caused by the first and second oil shocks in 1970s. This type of regime and magnitude dependent sensitivity analysis allows the researcher to easily analyze the causal relations between the variables at different stages of business cycles in addition to overcoming the difficulties of the MS-VAR models.

In the MS-VAR literature, the usage of filter selection, i.e., the Kalman or the Hamilton filters, has been discussed in many studies. An additional difference between the MS-VAR and MS-VAR-MLP is, with the use of the neural network learning algorithms in the estimation, the MS-VAR-MLP model is capable of modeling such large span of data without the application of any filters, since the hidden Markov chains are inherent in the MLP models. As a result, the dataset could be directly estimated as their raw form, since the NN learning algorithms are efficient in overcoming this difficulty.

The literature review is given in “Literature review.” The econometric methodologies utilized in this study are provided in “Empirical methodology.” The data, empirical results and discussions are given in “Dataset and the empirical results.” The section “Conclusion” concludes.

Literature review

Similar to our study in the sense of using a large set of data, Lindmark (2002), Stern and Enflo (2013), Fosten et al. (2012), Ersin (2016), and Bildirici and Ersin (2018) analyze a long range of dataset starting from the early nineteenth century. Among these studies, Fosten et al. (2012) is one of the early studies utilizing nonlinear in parameters time series models, namely TAR and momentum-TAR models with data starting from 1830 for the UK. With the use of the TAR models, Fosten et al. (2012) conclude that the positive relation between economic growth and emissions in the UK is reversed after a certain threshold and any temporary disequilibrium is corrected. Further, by evaluating the candidate variables that optimize this threshold effect, one interesting finding of Fosten et al. (2012) is, the reversal is not a function of or a threshold of per capita GDP, but a function of per capita CO2 emissions and the results are justified as the emissions are reduced by legislation rather than being a function of economic growth. Mensah (2014) uses the threshold analysis to evaluate the CO2, energy consumption and economic growth relationship and displays that structural changes could result in alterations in the economic activity-environment relation in addition to nonlinearity. Zi et al. (2016) employ TAR models to inspect the effects of urbanization on CO2 emissions in China by taking the level of urbanization, residential income, urban population density, and industrial production as threshold variables. Ersin (2016) studies the nonlinear effects of GDP on CO2 emissions for a long span of data during 1870–2011 for 13 developed countries with heterogenous panel smooth transition autoregressive (Panel-STAR) framework, and his empirical findings suggest that the effects of economic growth on emissions growth had been positive in both regimes, a finding being the opposite of the EKC or a weak form of EKC. Bildirici and Ersin (2018) suggested the STAR distributed lag (STARDL) model that augments the STAR models (Terasvirta 1994) with nonlinear ARDL cointegration framework to investigate long-run and short-run relations between CO2 emissions and economic growth in the USA.

A literature search on the most recent empirical analysis on the EKC type environment-income relationship could be summarized as being evolving under three strands in addition to the recent STAR approaches to EKC given above. The first strand utilizes the polynomial regressions and this strand constitutes to the majority of the literature. Among the first strand, Atasoy (2017) investigates the EKC for 50 states of USA with panel regressions by using the CO2, GDP per capita, and its square, energy consumption, and population variables for the 1960–2010 period. Atasoy (2017) concludes that only a weak form of EKC exists, since the inverted U-shape could be achieved only in 10 out of 50 states. Another recent study, which investigates the possibility of EKC in the states of the USA for the 1960–2010 period is Apergis (2016), whose findings are also in favor of no EKC for 38 out of 48 states in the USA. Özokçu and Özdemir (2017) estimate panel models with squared and cubic variables of CO2 emissions and economic growth, and their findings suggest N-shaped and inverted N-shaped relations for 26 OECD and 52 emerging countries, i.e., findings against an EKC type relation. Özokçu and Özdemir (2017) suggest that the results show that environmental pollution could not be inverted by economic growth. Xu (2018) examines the EKC with sulfur dioxide, GDP, and squared GDP variables for China at the aggregate level and at the state level. Xu (2018) concludes that though EKC is shown at the aggregate (country) level, however, the disaggregate data (state level) do not support the EKC; therefore, the regression models suffer from biased parameter estimates due to aggregation. Zambrano-Monserrate et al. (2018) test the EKC hypothesis with ARDL and additionally with causality tests and conclude that inverted U-shape cannot be observed in Peru for the 1980–2011 period. Rasli et al. (2018) examine the EKC relation with “new toxics” (various forms of pollutants including CO2) for a panel of 36 developed and developing countries for the period of 1995–2013, and they conclude that EKC does not hold for models with CO2 and nitrous oxide (N2O) models, EKC holds for traffic volume and CO2 emissions and for the majority of evaluated models, trade openness and energy consumption worsen the environment. Sinha and Shahbaz (2018) examine the EKC in India by the use of CO2 emissions, renewable energy and its quadratic form, and they suggest non-rejection of EKC resulting from the inclusion of renewables, which underlines the importance of the need to assent renewable energy.

The second strand focuses on MS and MS-VAR models to augment the EKC with regime dependency and business cycles. In the second strand, Halkos and Tsionas (2001) evaluate the EKC with the Bayesian regime switching along with cross-sectional PPP adjusted per capita GDP, population density, infant mortality rates, urban population distribution of GDP in manufacturing, deforestation, and CO2 data for 61 countries. Halkos and Tsionas (2001) also point at the complexity of the emission-income relation, since they conclude that a monotonic relation between emissions and income, such as the EKC cannot be accepted. Halkos and Tsionas (2001) also point at the inability of the demographic and income variables in identifying the differences between high and low polluters for the analyzed 61 countries. Within this respect, Plassmann and Khanna (2006) criticize the nonmonotonic EKC result obtained, and they note problems associated with the use of aggregated and pooled data under three sources of bias: (i) GDP data is country level, while emissions data is for certain cases area-specific; (ii) usage of panels is highly problematic, since countries are not homogenous; (iii) changes in the state of technology and consumer awareness cause shifts in the income-pollution relation, and panels and regressions could not capture this effect. Though Plassmann and Khanna (2006) suggest micro/household level data to overcome difficulties, they underline the impact of assumptions of the model selected in the literature, and they suggest the utilization of Markov chain Monte Carlo (MCMC) and Gibbs sampler to overcome the problems encountered. Martinez-Zarzoso and Maruotti (2013) investigate the EKC curve with hidden Markov regression models for a sample of 28 OECD countries for the 1968–2006 period, and they conclude that the emissions-income relation varies depending on the analyzed countries, the counties could be assumed as falling into five distinct groups and distributional mixture models provide flexibility instead of assuming a single distribution. Roach (2015) shows results in favor of regime dependency with MS models for the real GDP, retail prices, and motor-gasoline CO2. Park and Hong (2013) investigate the EKC relation with two regime MS and MS-RW (MS-random walk) models for South Korea. Chevallier (2011a) obtains MS-VAR type nonlinearity in the EKC type relation between macroeconomic variables and CO2. Chevallier (2011b) extends the analysis to evaluate the nonlinear and regime-dependent relations between the carbon prices and macroeconomy for EU-27 member countries with MS-VAR models. The findings of Chevallier (2011a, b) suggest that the MS-VAR models capture the relation more effectively, the regime dependency characteristics could not be rejected, the two states of boom-bust cycle exist for the analyzed countries, and the industrial production has a positive impact on the carbon prices during expansions, while the relation becomes reverted for the recessions. Bildirici and Gökmenoğlu (2017) investigate environment-income relation in the context of CO2 and energy consumption from hydropower for the G7 countries with MS-VAR models and their findings are in favor of decelerating effect of hydropower energy on environmental degradation. Charfeddine (2017) applies the MS-VEC (MS-vector error correction) model to examine the economic development-CO2 relation, and the results are in favor of EKC hypothesis. Charfeddine (2017) also notes that if the regime-switching is not controlled, the structural breaks result in deviations from true relations; hence, the utilization of nonlinear model is a necessity.

The differences of our approach compared to Halkos and Tsionas (2001) and Plassmann and Khanna (2006), instead of using cross-sectional data, utilize a large set of historical time series data, and to overcome the difficulties resulting from technology and awareness changes, we apply the MS-VAR-MLP model. The model achieves the regime dependency by benefiting from the MS-VAR and the flexibility with the use of MLP type NNs. As to be mentioned, the MS-VAR-MLP is highly flexible in terms of producing within-regime nonlinear causal relations by assuming MLP processes in each regime without prior assumption regarding the distribution compared to the MS-VAR model that assumes linear VAR relations in each regime. Similar to Charfeddine (2017), regime-switching could be utilized to capture structural changes; however, as to be shown in the last section, the forecast performance of the proposed MS-VAR-MLP improves significantly over its MS-VAR counterpart. In contrast to Chevallier (2011a, b) and Bildirici and Gökmenoğlu (2017) that benefit from the MS-VAR model, in the MS-VAR-MLP model, the relationship between the analyzed variables and their nonlinear effects are maintained to be nonlinear even within each regime following MLP processes. As a result, the model captures the nonlinearity within the expansionary and recessionary regimes occurring as a nonlinear function of the size and magnitude of each variable under the investigation.

A third strand in the literature follows more complex methodologies, such as the neural networks and machine learning approaches instead of the previous models, including polynomial regressions, panel regressions, and nonlinear econometric models, such as the MS or TAR. However, the number of the papers in this strand is rather limited. Sun and Liu (2016) investigate the EKC in three major industries in addition to residential consumption in China for the 1978–2012 period based on the support vector machine (SVM) and back-propagation neural networks (BP-NN) in addition to applying cointegration and Granger causality tests. The findings of Sun and Liu (2016) reveal that SVM model performs better in forecast accuracy compared to BP-NN. Liu et al. (2018) conducts a BP-NN -based modeling to augment the forecast capabilities, in addition to applying ridge regression and the evaluation of systematic dynamics to determine key factors of regional carbon emissions in Beijing. Following their empirical results, Liu et al. (2018) suggest various important policies, including urbanization and green transportation strategies. One drawback of these studies is that their sample size is very limited and the estimation of neural networks and machine learning models require large number of time series observations. Compared to our paper, though nonlinear, their models do not utilize regime-switching, and they implicitly ignore the impact of business cycles, in addition to assuming the proposed models to be capturing the complex relations by ignoring the trajectory changes even within the limited time span of their data. Similar to these papers, our model also necessitates the use of large datasets, and one of the largest datasets is introduced in the study. However, as to be discussed in the last section, the requirement of large datasets is also a limit for the MS-VAR-MLP and for the future applications.

Considering the abovementioned strands, our paper could be taken as following the first and second strands in terms of the usage of historically long span of data and by taking the effects of business cycles and regime-switching in the spirit of MS-VAR approach. By providing a hybrid methodology, the MS-VAR-MLP model and its generalization to MS-VAR-MLP-based regime-dependent sensitivity analysis our methodology has important advantages. The model assumes Markov-switching-based splitting of the regression space into two or possibly more sub-regression spaces, where the regime switching is governed by unobserved Markov chains. In contrast to the MS-VAR models, the relations between the analyzed variables are highly nonlinear, since each regime follows VAR-MLP processes in the MS-VAR-MLP model. As noted in the previous section, this type of modeling strategy becomes relevant, especially due to the complexity of the emissions-income-petrol relation in such historically long period of data corresponding to the 1871–2016 period. Further, the MS-VAR-MLP and the sensitivity approach benefits not only from the hybridization of regime dependency and neural networks but also from the neural network learning algorithms. The overlook is that our paper follows the historically long data approach to EKC in the spirit of Lindmark (2002), Stern and Enflo (2013), Fosten et al. (2012), and Ersin (2016); the paper provides a bridge between the literature suggesting the need to take nonlinearity and MS-type regime switching into account (Roach 2015; Park and Hong 2013; Chevallier (2011a, b); Bildirici and Gökmenoğlu 2017), and our paper provides regime-switching VAR-MLP neural networks modeling of emissions and economic growth relation in relation to its single regime variants of BP-NN models as in Sun and Liu (2016) and Liu et al. (2018). Further, the proposed MS-VAR-MLP model follows the statistical and stochastic neural network modeling approach of Cheng and Titterington (1994) and Ersin (2009) in addition to extending the single regime VAR-NN model proposed in Wutsqa et al. (2006).

Empirical methodology

At the first stage, a general outlook to MS-VAR models will be provided. At the second stage, the MLP neural networks and neural networks sensitivity analysis will be extended to the MS-VAR-MLP model and MS-VAR-MLP-based regime-dependent sensitivity analysis.

MS-VAR model

This perspective is also in the spirit of Hamilton (1990) that influenced many studies. Hamilton (1990) showed that petrol prices and sudden changes in petrol prices had significant importance on dating the expansions, recessions, peak-through dating, and durations of these cycles based on the Markov-switching (MS) models. Krolzig (1998, 2000), Krolzig and Toro (2005), and Krolzig and Clements (2002) extended the analysis of national income cycles to MS-VAR models to provide visual representations with impulse response functions.

The MSI(r)-VAR(l) model is defined as follows:

$$ {y}_t={\mu}^{\left({s}_t\right)}+\sum \limits_{i=0}^i{A}_i^{\left({s}_t\right)}{x}_{\mathrm{t}}+{u}_t^{\left({s}_t\right)} $$
(1)

where ut/st~N(0, δ2(st)) and Ai (.) show the coefficients of the lagged values of the variable in different regimes.δ2(st) shows the variance of the residuals in each regime. μ(st) defines the dependence of mean μof the K-dimensional time series vector on the regime variable st. The input variables are defined in matrix form as \( {x}_t={\left[{x}_t^{\prime}\right]}^{\prime }={\left({y}_{t-1},...,{y}_{t-p},{x}_{t-1},...,{x}_{t-p}\right)}^{\prime } \) for t = 1,2,...,n number of observations. r is the number of regimes and l is the optimum lag length selected with Akaike or Schwarz information criteria (AIC and SIC). st is governed by a Markov chain as follows:

$$ {P}_r\left[{s}_t|{\left\{{s}_{t-1}\right\}}_{i=1}^{\infty },{\left\{{y}_{t-1}\right\}}_{i=1}^{\infty}\right]={P}_r\left\{{s}_t|{s}_{t-1};\rho \right\}, $$
(2)

where p includes the probability parameters. The conditional probabilities are stated as P(yt| Yt − 1, st − 1) = Pr(yt| Yt − 1).

The Markov chain is ergodic and irreducible, and an absorbing state does not exist, i.e., \( {\overline{\xi}}_p\in \left(0,1\right) \) for all m = 1,…,M and \( {\overline{\xi}}_p \) which itself is an ergodic or unconditional probability of regime q. pij has unconditional distribution given by the following:

$$ \Pr \left({s}_t=1\right)=\frac{1-{p}_{22}}{2-{p}_{11}-{p}_{22}},\Pr \left({s}_t=2\right)=\frac{1-{p}_{11}}{2-{p}_{11}-{p}_{22}} $$
(3)

For the estimation of the Markov-switching models, the maximum likelihood estimators (MLE) and the expectation maximization (EM) algorithm are possible approaches. To make an inference, an iterative procedure for t = 1,2,...,T is followed, while the previous value of the probability ξit − 1 = Pr[st − 1 = i| Ωt − 1; θ] is taken as an input. εtlt denotes the vector of forecast probabilities. Optimal forecast probabilities are obtained using the following:

$$ {\varepsilon}_{t\mid t}=\frac{\varepsilon_{t\mid t-1}{\varphi}_t}{1^{\prime }{\left({\varepsilon}_{t\mid t-1}{\varphi}_t\right)}^{\prime }} $$
(4)

where εt + 1 ∣ t = Pεt ∣ t and φt is the vector of conditional densities and 1 is a unit column vector with element-by-element multiplication. The estimation is conducted with the following:

$$ {E}_t\left({y}_{t+1}\right)=\sum \limits_{j=1}^s\sum \limits_{i=1}^s{\Pr}_t\left({S}_t=j\right){P}_{ij}\;\left({w}_0^{(j)}+\sum \limits_{l=1}^{p(j)}{\beta}_l^{(j)}{y}_{t-l+1}\right) $$
(5)

MS-VAR-MLP model

The Hamilton (1990) model is a nonlinear mixture of autoregressive functions, such as the MLP and is evaluated under the hybrid of the MLP and the hidden Markov chain (HMC) model (Olteanu et al. 2004). However, the estimation of the MS-VAR model is generally based on the EM and MLE approaches without benefiting from the neural networks learning algorithms. In fact, the hybrid MLP accepts that the input layer is linked to the output nodes with weighted connections to form a linear model that is parallel to the nonlinear multilayer perceptrons as shown by Bildirici and Ersin (2013). Moreover, Wutsqa et al. (2006) extended the Sims’s (1980) VAR model to NN by proposing a hybrid approach, the VAR-NN model. The model architecture of the VAR-NN model of Wutsqa et al. (2006) is defined with linear activations to mimic the VAR model, and the model is estimated with the BP, one of the most common algorithms in the NN literature.

The MLP is a nonlinear function N(.) that links the variables in the input layer \( {\tilde{x}}_t \) to the output layer yt of the form as follows:

$$ {y}_t=N\left({\tilde{x}}_t;\varphi, f\right)+{\varepsilon}_t $$
(6)

which could be defined as a single hidden layer MLP model as follows:

$$ {y}_t=\varphi \left({\gamma}_0+\sum \limits_{i=1}^h{\lambda}_i\psi \left({\tilde{\omega}}^{\prime }{x}_t\right)\right)+{\varepsilon}_t $$
(7)

Through a set of the different activation functions \( \psi \left({\tilde{\omega}}^{\prime }{x}_t\right) \) is possible (Bishop 1995), the study assumes taking the sigmoid function as follows:

$$ \psi \left({\tilde{\omega}}^{\prime }{\tilde{x}}_t\right)={\left(1+{e}^{-\left({\tilde{\omega}}^{\prime }{x}_t-{\omega}_0\right)}\right)}^{-1} $$
(8)

Hence, Eq. (8) is continuous and twice differentiable and is bound in the range of [0,1]. Other possible activation functions include the hyperbolic tangent, sine, heavy side, and the identity functions (Bishop 1995). For the universal approximation property of MLPs with sigmoidal functions, readers are referred to Cybenko (1989). Following Kuan and White (1994), Swanson and White (1997), White (1992), and Granger and Terasvirta (1993), the εt random shocks are incorporated as εt~iidN(0, δ2) white noise process. It should be noted that Eq.(7) is defined as a single regime VAR-MLP model as long as the output variable vector and input variable matrix are defined as yt = (yt, xt), and the input variable matrix is \( {x}_t={\left[{x}_t^{\prime}\right]}^{\prime }={\left({y}_{t-1},...,{y}_{t-l},{x}_{t-1},...,{x}_{t-l}\right)}^{\prime } \), in addition to the connection parameters given as \( {\tilde{\omega}}^{\prime }={\left[{\omega}_{0,1},...,{\omega}_{0,h},{\alpha}_1,...,{\alpha}_p,{\beta}_1,...,{\beta}_q\right]}^{\prime } \)

By extending the model to regime switching, following the methodology given for the MS-VAR model, the MS-VAR-MLP model is defined as follows:

$$ {y}_t={\varphi}^{\left({s}_t\right)}\left({\gamma}_0^{\left({s}_t\right)}+\sum \limits_{i=1}^h{\lambda}_i^{\left({s}_t\right)}\psi \left(\tilde{\omega}{\prime}^{\left({s}_t\right)}{\tilde{x}}_t\right)\right)+{\varepsilon}_t^{\left({s}_t\right)} $$
(9)

where

$$ \psi \left(\tilde{\omega}{\prime}^{\left({s}_t\right)}{\tilde{x}}_t\right)={\left(1+\exp \left(-\left\{\tilde{\omega}{\prime}^{\left({s}_t\right)}{\tilde{x}}_t\right\}\right)\right)}^{-1} $$
(10)

Similar to the VAR-NN model, which could be considered as a single regime MS-VAR-NN model, the output activation function in the hidden layer in each regime is defined as a linear activation function for the sake of simplicity. As shown in the literature, this type of specification decreases the number of parameters to be estimated without a loss in the generalization capability of the NN model (Bishop 1995). The input vector is defined as \( {\tilde{x}}_t={\left[1,{x}_t^{\prime}\right]}^{\prime } \), and a vector of unity is added to obtain the bias coefficient; the input variables are defined as \( {x}_t={\left[{x}_t^{\prime}\right]}^{\prime }={\left({y}_{t-1},...,{y}_{t-l},{x}_{t-1},...,{x}_{t-l}\right)}^{\prime } \) and t = 1,2, ..., n, with n representing the number of observations. The MS-VAR-MLP model in Eq. (9) extends the model to m number of regimes with transition probabilities being governed by Markov chains and the connection parameters are defined as \( {\tilde{\boldsymbol{\upomega}}}^{\prime }={\left[{\omega}_{0,v,1}^{\left({s}_t\right)},...,{\omega}_{0,v,h}^{\left({s}_t\right)},{\alpha}_{1,v}^{\left({s}_t\right)},...,{\alpha}_{l,v}^{\left({s}_t\right)},{\beta}_{1,v}^{\left({s}_t\right)},...,{\beta}_{l,v}^{\left({s}_t\right)}\right]}^{\prime } \), v defining the vector of coefficients as in the VAR and MS-VAR models. l is the optimum lag length determined by the Schwarz information criterion (SIC) for parsimony. The hidden unit is linked to the output layer with the hidden unit parameters \( {\lambda}_{i,v}^{\left({s}_t\right)}={\lambda}_{1,v}^{\left({s}_t\right)},...,{\lambda}_{h,v}^{\left({s}_t\right)} \), where h is the number of hidden units and \( {\gamma}_0^{\left({s}_t\right)} \) is the bias parameter. Similarly, the residuals of the model are assumed to follow \( {\varepsilon}_{t,v}^{\left({s}_t\right)}\sim \mathrm{iid}\;N\left(0,{\delta}_v^{2,\left({s}_t\right)}\right) \) white noise processes with zero conditional mean and constant variance in each regime, while the conditional variances are allowed to be regime dependent.

Another representation of the MS-VAR-MLP model in Eq. (9) and Eq. (10) is stated as follows:

$$ {y}_{t,v}^{\left({s}_t\right)}={\varphi}^{\left({s}_t\right)}\left({\gamma}_0^{\left({s}_t\right)}+\sum \limits_{i=1}^h{\lambda}_i^{\left({s}_t\right)}\times {\left(1+\exp \left\{-\left({\alpha}_{1,v}^{\left({s}_t\right)}{y}_{t-1}+{\alpha}_{2,v}^{\left({s}_t\right)}{y}_{t-2}+,...,+{\alpha}_{l,v}^{\left({s}_t\right)}{y}_{t-l}+{\beta}_{1,v}^{\left({s}_t\right)}{x}_{t-1}+{\beta}_{2,v}^{\left({s}_t\right)}{x}_{t-2}+,...,+{\beta}_{l,v}^{\left({s}_t\right)}{x}_{t-l}+{\omega}_{0,v}^{\left({s}_t\right)}\right)\right\}\right)}^{-1}\right)+{\varepsilon}_{t,v}^{\left({s}_t\right)} $$
(11)

the regimes are governed by the unobservable Markov process are as follows:

$$ \sum \limits_{i=1}^m{y}_{t(i)}P\left({S}_t=i|{z}_t\right),i=1,\dots m. $$
(12)

Within each regime, the logistic-type sigmoid function is defined as in Eq. (10). The parameter vector is subject to uniform transformation as follows:

$$ \left(\raisebox{1ex}{$1$}\!\left/ \!\raisebox{-1ex}{$2$}\right.\right){\lambda}_h^{\left({s}_t\right)}\sim uniform\left[-1,+1\right] $$
(13)

and P(St = i| zt), the filtered probability function has the following representation as follows:

$$ \left(P\left({s}_t=i|{z}_t\right)\alpha f\left(P\left({\sigma}_{t-1}|{z}_t,{s}_{t-1}=1\right)\right)\right) $$
(14)

The transition probability nj, iis assumed to be P(St = i| St − 1 = j). Following Bildirici and Ersin (2009), the input variables are normalized as follows:

$$ {z}_t=\left[{z}_t-E\left({z}_t\right)\right]/\sqrt{E\left({z_t}^2\right)} $$
(15)

The s → max {p, q} recursive procedure is started by constructing P(zs = i| zs − 1) where ψ(Ztλh) takes the form 1/(1 + exp(−x)) with twice-differentiability, continuity properties, and is bounded [0, 1]. The weight vector is defined as ξ = ω, and as before, ψ is the logistic activation function. The input variables are defined as Ztλh = Xi with λh defined in Eq. (13). If nj, i transition probability P(zt = i| zt − 1 = j) is assumed, the following representations are obtained as follows:

$$ f\left({y}_t|{x}_t,{z}_t=i\right)=\frac{1}{\sqrt{2\pi {h}_{t(i)}}}\exp \left\{-{\left({y}_t-{x}_t^{\prime}\varphi -\sum \limits_{j=1}^H{\theta}_jp\left({x}_t^{\prime }{\gamma}_j\right)\right)}^2/2{h}_{t(j)}\right\} $$
(16)
$$ L=\prod \limits_{t=1}^Tf\left({y}_t|{s}_t=i,{Y}_{t-1}\right)\;\Pr \left[{s}_t=i|{Y}_{t-1}\right] $$
(17)

The probability Pr[st = i| Yt − 1] is calculated through iteration as follows:

$$ {\displaystyle \begin{array}{l}{\pi}_{\mathrm{jt}}=\Pr \left[{s}_t=j|{Y}_{t-1}\right]\;\\ {}={\sum}_{i=0}^1\Pr \left[{s}_t=j|{s}_{t-1}=i\right]\;\Pr \left[{s}_t=j|{Y}_{t-1}\right]\;{\sum}_{i=0}^1{\eta}_{\mathrm{ji}}{\pi}_{\mathrm{it}-1}^{\ast}\end{array}} $$
(18)

In terms of the variables under investigation of the study, namely per capita CO2 emissions, per capita GDP, and petrol prices, assume that their growth rates are denoted as dlco2t, dlyt, and dlpt after taking logarithms and after taking first differences. For the purpose of the study, the MS-VAR-MLP model given in Eq.(9) and (10) could be written as a three-vector MS-VAR model in which each vector follows MLP processes as follows:

$$ {\displaystyle \begin{array}{l} dl co{2}_t={\varphi}^{\left({s}_t\right)}\left({\gamma}_0^{\left({s}_t\right)}+\sum \limits_{i=1}^h{\lambda}_{i,v}^{\left({s}_t\right)}\psi \left({{\overset{\sim }{\omega}}^{\hbox{'}}}^{\left({s}_t\right)}{\overset{\sim }{x}}_t\right)\right)+{\varepsilon}_{t,v}^{\left({s}_t\right)}\\ {} dl{y}_t={\varphi}^{\left({s}_t\right)}\left({\gamma}_0^{\left({s}_t\right)}+\sum \limits_{i=1}^h{\lambda}_{i,v}^{\left({s}_t\right)}\psi \left({{\overset{\sim }{\omega}}^{\prime}}^{\left({s}_t\right)}{\overset{\sim }{x}}_t\right)\right)+{\varepsilon}_{t,v}^{\left({s}_t\right)}\\ {} dl{p}_t={\varphi}^{\left({s}_t\right)}\left({\gamma}_0^{\left({s}_t\right)}+\sum \limits_{i=1}^h{\lambda}_{i,v}^{\left({s}_t\right)}\psi \left({{\overset{\sim }{\omega}}^{\prime}}^{\left({s}_t\right)}{\overset{\sim }{x}}_t\right)\right)+{\varepsilon}_{t,v}^{\left({s}_t\right)}\end{array}} $$
(19)

where the sigmoid activation function is defined as follows:

$$ \psi \left({{\overset{\sim }{\omega}}^{\prime}}^{\left({s}_t\right)}{\overset{\sim }{x}}_t\right)={\left(1+\exp \left(-\left\{{{\overset{\sim }{\omega}}^{\prime}}^{\left({s}_t\right)}{\overset{\sim }{x}}_t\right\}\right)\right)}^{-1} $$
(20)

Hence, for Eq.’s (19) and (20), the input vector is defined as \( {\overset{\sim }{x}}_t={\left[1,{x}_t^{\prime}\right]}^{\prime } \) where \( {x}_t={\left[{x}_t^{\prime}\right]}^{\prime }={\left( dlco{2}_{t-1},..., dl co{2}_{t-l}, dl{y}_{t-1},..., dl{y}_{t-l}, dl{p}_{t-1},..., dl{p}_{t-l}\right)}^{\prime } \). The MS-VAR-MLP model is the theoretic representation for r number of regimes, where the transition probabilities being governed by Markov chains. Given the fact that the model requires the estimation of a large number of connection parameters, \( {\overset{\sim }{\omega}}^{\prime }={\left[{\omega}_{0,v,1}^{\left({s}_t\right)},...,{\omega}_{0,v,h}^{\left({s}_t\right)},{\alpha}_{1,v}^{\left({s}_t\right)},...,{\alpha}_{l,v}^{\left({s}_t\right)},{\beta}_{1,v}^{\left({s}_t\right)},...,{\beta}_{l,v}^{\left({s}_t\right)},{\theta}_{l,v}^{\left({s}_t\right)},...,{\theta}_{l,v}^{\left({s}_t\right)}\right]}^{\prime } \), the hidden unit parameters \( {\lambda}_{i,v}^{\left({s}_t\right)}={\lambda}_{1,v}^{\left({s}_t\right)},...,{\lambda}_{h,v}^{\left({s}_t\right)} \), where h is the number of hidden units and v defines each vector for each state st. A simplification is made in the study by restricting the number of regimes to r = 1,2; two states representing the recessionary and expansionary regimes. The residuals are assumed to follow \( {\varepsilon}_{t,v}^{\left({s}_t\right)}\sim \mathrm{iid}N\left(0,{\delta}^{2,\left({s}_t\right)}\right) \) white noise processes with regime-dependent conditional variances in the spirit of the MSIAH-VAR model. The model given in Eq.’s (19) and (20) could also be written as follows:

$$ {\displaystyle \begin{array}{l} dl co{2}_t^{\left({s}_t\right)}={\varphi}_v^{\left({s}_t\right)}\left({\gamma}_{0,v}^{\left({s}_t\right)}+\sum \limits_{i=1}^h{\lambda}_{i,v}^{\left({s}_t\right)}\times {\left(1+\exp \left\{-\left({\alpha}_{1,v}^{\left({s}_t\right)} dl co{2}_{t-1}+{\alpha}_{2,v}^{\left({s}_t\right)} dl co{2}_{t-2}+,...,+{\alpha}_{l,v}^{\left({s}_t\right)} dl co{2}_{t-l}+{\beta}_{1,v}^{\left({s}_t\right)} dl{y}_{t-1}+{\beta}_{2,v}^{\left({s}_t\right)} dl{y}_{t-2}+,...,+{\beta}_{l,v}^{\left({s}_t\right)} dl{y}_{t-l}+{\theta}_{1,v}^{\left({s}_t\right)} dl{p}_{t-1}+{\theta}_{2,v}^{\left({s}_t\right)} dl{p}_{t-2}+,...,+{\theta}_{l,v}^{\left({s}_t\right)} dl{p}_{t-l}+{\omega}_{0,v}^{\left({s}_t\right)}\right)\right\}\right)}^{-1}\right)+{\varepsilon}_{t,v}^{\left({s}_t\right)}\\ {} dl{y}_t^{\left({s}_t\right)}={\varphi}_v^{\left({s}_t\right)}\left({\gamma}_{0,v}^{\left({s}_t\right)}+\sum \limits_{i=1}^h{\lambda}_{i,v}^{\left({s}_t\right)}\times {\left(1+\exp \left\{-\left({\alpha}_{1,v}^{\left({s}_t\right)} dl co{2}_{t-1}+{\alpha}_{2,v}^{\left({s}_t\right)} dl co{2}_{t-2}+,...,+{\alpha}_{l,v}^{\left({s}_t\right)} dl co{2}_{t-l}+{\beta}_{1,v}^{\left({s}_t\right)} dl{y}_{t-1}+{\beta}_{2,v}^{\left({s}_t\right)} dl{y}_{t-2}+,...,+{\beta}_{l,v}^{\left({s}_t\right)} dl{y}_{t-l}+{\theta}_{1,v}^{\left({s}_t\right)} dl{p}_{t-1}+{\theta}_{2,v}^{\left({s}_t\right)} dl{p}_{t-2}+,...,+{\theta}_{l,v}^{\left({s}_t\right)} dl{p}_{t-l}+{\omega}_{0,v}^{\left({s}_t\right)}\right)\right\}\right)}^{-1}\right)+{\varepsilon}_{t,v}^{\left({s}_t\right)}\\ {} dl{p}_t^{s_t}={\varphi}_v^{\left({s}_t\right)}\left({\gamma}_{0,v}^{\left({s}_t\right)}+\sum \limits_{i=1}^h{\lambda}_{i,v}^{\left({s}_t\right)}\times {\left(1+\exp \left\{-\left({\alpha}_{1,v}^{\left({s}_t\right)} dl co{2}_{t-1}+{\alpha}_{2,v}^{\left({s}_t\right)} dl co{2}_{t-2}+,...,+{\alpha}_{l,v}^{\left({s}_t\right)} dl co{2}_{t-l}+{\beta}_{1,v}^{\left({s}_t\right)} dl{y}_{t-1}+{\beta}_{2,v}^{\left({s}_t\right)} dl{y}_{t-2}+,...,+{\beta}_{l,v}^{\left({s}_t\right)} dl{y}_{t-l}+{\theta}_{1,v}^{\left({s}_t\right)} dl{p}_{t-1}+{\theta}_{2,v}^{\left({s}_t\right)} dl{p}_{t-2}+,...,+{\theta}_{l,v}^{\left({s}_t\right)} dl{p}_{t-l}+{\omega}_{0,v}^{\left({s}_t\right)}\right)\right\}\right)}^{-1}\right)+{\varepsilon}_{t,v}^{\left({s}_t\right)}\end{array}} $$
(21)

The estimation of the model is conducted with learning algorithm cooperation of back-propagation and conjugate gradient descent algorithms with early stopping and weight elimination to obtain the parameter estimates and to prune the network. For details, readers are referred to Bildirici and Ersin (2009, 2013, 2014) and Bishop (1995).

The modeling of the MS-VAR-MLP with more than two regimes is possible; however, the model requires the estimation of rv(h + 3p + l + 3 + 3 + 3) number of parameters where r,v, and h represent the number of regimes, number of vectors, and number of hidden neuron weights and p represents the optimum lag length selected with SIC. Due to the loss in the degrees of freedom, large datasets are a necessity. The EKC-type environment-economic growth relation requires the investigation of a large span of data covering distinct stages of economic development; however, due to the availability of the historical data in yearly basis, the model is evaluated for two regimes only. Contrarily, the next section covers the extension of the sensitivity analysis to the MS-VAR-MLP model, which provides important insights regarding the complex nonlinear relations under two distinct regimes of the business cycles.

Sensitivity analysis under the MS-VAR-MLP model

Another approach designed to evaluate the importance of the input variables in ANN literature is sensitivity analysis. Engelbrecht et al. (1999) discussed sensitivity analysis as a way of determining the significance of variables in an ANN model. Dimopoulos et al. (1995) and Dimopoulos et al. (1999) proposed sensitivity criteria to choose between significant variables in ANN models. In addition, Gevrey et al. (2003) investigated various types of analyses, including sensitivity analysis, and provided a thorough investigation of methods focusing on the contributions of the variables.

Sensitivity analysis is based on partial derivatives of the dependent variable with respect to input variables. By evaluating the gradient vector, the NN model could be utilized to obtain not only nonlinear causal relationships, but also the magnitude of the input variables. Guo et al. (2011) showed that through sensitivity analysis, information could be extracted using partial derivatives for a small neighborhood of data to derive a local visual pattern among variables.

In the MS-VAR-MLP model, the sensitivity analysis is regime dependent: within each regime, local patterns for a small neighborhood of data could be visually expressed for different regimes. As a result, the regime-switching sensitivity analysis provides different asymmetric and regime-dependent nonlinear relations.

In line with the PaD and profile approaches (Lek et al. 1996; Gevrey et al. 2003), the study proposes the MS-VAR-MLP-based regime-dependent sensitivity analysis, i.e., regime-dependent causality, which could also provide visual representations of the nonlinear relations among the analyzed variables at different phases of the business cycles. (For the sensitivity analysis, readers are referred to Engelbrecht et al. 1995, 1999; Molas and Yamazaki 1995; Gevrey et al. 2003; Hadzima-Nyarko et al. 2011). For a logistic function the partial derivatives of the output yt with respect to the qth input variable for each VAR-MLP contains h number of neurons with v.r.l number of input variables. By applying the chain rule for the logistic function of the form \( \psi (x)=\frac{1}{1+{e}^{-(x)}}, \), the partial derivative with respect to x is given by ψ(x) = ψ(x)(1 − ψ(x)). Hence, the application of the chain rule to the single regime VAR-MLP yields as follows:

$$ {d}_{t,q}={\vartheta}_t\sum \limits_{h=1}^h{\lambda}_h{\psi}_{\mathrm{ht}}\left(1-{\psi}_{\mathrm{ht}}\right){\omega}_{\mathrm{qh}} $$
(22)

where dt, q = ∂yt/∂xt, qand ϑt are the partial derivatives of the output neuron with respect to the input it received from the hidden layer, ψht is the response of the hth neuron, λh is the connection parameter of the hidden unit, and ωqh is the connection parameter of the qth variable in the hth hidden unit. Analogous to Eq.(9), for more than one regime MS-VAR-MLP model, the regime-dependent sensitivities is calculated as follows:

$$ {d}_{t,q}^{\left({s}_t\right)}={\vartheta}_t^{\left({s}_t\right)}\sum \limits_{h=1}^h{\lambda}_h^{\left({s}_t\right)}{\psi}_{\mathrm{ht}}\left(1-{\psi}_{\mathrm{ht}}\right){\omega}_{\mathrm{qh}}^{\left({s}_t\right)} $$
(23)

which denotes the regime-dependent response of the dependent variable in each vector to the independent variables in each vector. Further, the regime-dependent sensitivity in Eq. (23) represents the regime-dependent nonlinear causality among the analyzed variables.

The proposed approach requires the following steps. In the first step, the regime probabilities are obtained through the Markov chain procedure and the data set is divided into sub-regression spaces within the MS-VAR-MLP framework. In the second step, the connection weights and biases (constant terms) for each regime are estimated. In the third step, the partial derivatives are calculated for each input variable for each regime, while giving the mean values to each of the other variables. Fourth, since the model is a time series model with lagged terms, the partial derivatives of each lag are summed to obtain the overall effect. In “Conclusion,” a decile range with ten equal points for the input variables is calculated based on the min-max of each variable. The range corresponds to a one to ten scale that extends between the minimum and the maximum levels for each of the inputs for each regime. In the sixth step, the response of the selected output variable in each regime is plotted against the decile range, while holding other variables at their mean values to gather a visual representation of the effect of each variable to the others. This process is repeated for each regime to obtain regime-dependent nonlinear causality between the input and output variables within the MS-VAR-MLP framework.

Dataset and the empirical results

Data

To obtain a large span of data, this study utilizes annual data for the UK and the USA. The CO2 data for both countries are taken from the Carbon Dioxide Information Analysis Center (CDIAC) and Gapminder databases. The CO2 data used in the study represent the total per capita CO2 emissions from fossil fuels in 1000 metric tons. The CDIAC per capita CO2 dataset covers a large span of data corresponding to the 1751–2016 period for UK and the 1800–2016 period for USA. However, according to our own inspections, though the data starts from 1750s, it is subject to rolling for certain years due to missing data. Yearly, variations in the process followed by the data are a necessity for econometric techniques. As a result, the CO2 data is restricted to the 1831–2016 period. The crude petrol price data also resulted in a restriction in our analysis since the most historical data could be obtained for 1871. The petrol prices are obtained from the British Petrol Statistical Review of World Energy, which represents the crude oil prices in US dollars per barrel and covers the 1871–2016 period. At the third stage, the GDP per capita data for USA is taken from the FRED database of Central Bank St. Louis and is given in nominal dollars. The GDP per capita data for the UK is obtained from the Central Bank of England and is given in nominal pounds. Therefore, due to the availability of the emissions’ and petrol data, the dataset analyzed in the study is restricted to the 1871–2016 period for UK and USA. In the study, all variables are subject to logarithmic transformation. The logarithmic GDP per capita, crude petrol prices, and CO2 emissions per capita are denoted as lyt, lpt, and lco2t respectively. Further, the first differences of the variables are denoted as dlyt, dlpt, and dlco2t, which also show their respective yearly growth rates.

The selection of UK and USA is not solely based on the availability of data. It is well-known that both countries had passed the stages of economic development in terms of Rostow’s stages of growth and reached the final stage of high mass consumption (Rostow 1960). Additionally, though China has reached the top polluter status in the last two decades in terms of metric tons of CO2, China is followed closely by the USA. Further, as the recent studies show, in per capita terms, the USA is still the top polluter of CO2 emissions in the world (Olivier et al. 2015 and EDGAR 4.3 database). Between 1990 and 2012, the per capita CO2 emissions in China increased by 333%. However, USA is promising, since the per capita increase was only 7% in the USA for this respective period. This argument is also in favor of a Rostow-type portrait and hints the possibility of an inverted U-type EKC relationship. However, this argument is more revealing for the UK, in contrast, the per capita emissions decreased by 28% in this period in the UK (Olivier et al. 2015). In 2012, per capita CO2 emissions in China, USA, and UK are 7.11, 16.21, and 7.50 metric tons; showing that in per capita basis, USA is followed by the UK, which is closely followed by China.

The CO2 per capita emissions in the UK and USA are given for the 1751–2012 period in Fig. 1. The figure corresponds to a large time span that includes the first and second industrialization periods in eighteenth and nineteenth centuries, the Great Depression in 1929, the WW I and II in 1914–1918 and 1939–1945, the oil shocks in 1970s, and the 2008 global crisis.

Fig. 1
figure 1

Historical CO2 per capita emissions (in tons) in the UK, the USA, and China, 1751–2012. Source: CDIAC Database, Carbon Dioxide Information Analysis Center. http://cdiac.ornl.gov (accessed 11 Nov. 2016)

During the first and the second industrialization period, the UK had been the top per capita polluter. After 1960s, the USA experienced an increase in its per capita emissions that led the USA to catch UK in 1900. Following the 1929 Great Depression, both the UK and USA experienced a sharp drop in CO2 emissions per capita. Following the beginning of WWII, while the USA experienced a drastic increase in per capita CO2 emissions, the UK experienced a drastic decline. Before the first and second oil shocks, where the petrol experienced a sharp increase in its price per barrel, the CO2 emissions per capita also reached its historical peak in the USA at 22.17 tons, which is a clear sign hinting the positive impact of GDP growth before the crisis, followed by a decreasing pattern after 1973. After the 2008 Global Crisis, the CO2 per capita was recorded as 19.3 tons in the USA, almost half of that in the UK (CDIAC 2016).

One could assume that the USA is a significant petrol producer in addition to being a petrol consumer, therefore, expected to have strong influence on petrol prices while the UK does not. As a result, the inclusion of petrol prices to the CO2–GDP model of UK might be thought as being unnecessary. However, this is not the case. UK is not only a petroleum consumer, but also a significant producer given its petroleum extraction operations in the North Sea for the last four decays, in addition to its role in the petroleum production starting from the late nineteenth century. In addition, UK-originated multinational corporations, such as the Exxon Mobil and British Petrol (BP) are the world’s largest and the sixth largest petroleum companies (Bergin 2008). BP has been operational since 1908, the year it established its petroleum extraction operations when petroleum was found in Persia (BP 2018). According to U.S. Energy Information Administration (EIA), except for the OPEC oil shocks in 1970s, the UK has been a net exporter of petroleum until 2013, showing that UK is capable of meeting its domestic petroleum demand, and historically, the UK has been a net exporter (EIA 2018). The UK also has large amount of onshore reserves; however, 98% of the petroleum comes from 113 petroleum installations offshore in North Sea Central Graben where 2two rich clusters (the Forties and the Brent oilfields) are located in addition to its single onshore oilfield in Dorset (EIA 2018). According to the BP Statistical World Energy Report, the USA is accountable for the 7.8% of the total crude petrol production in the world and is the major oil producer in the American mainland, compared to Norway being the largest (2.4% of the world) and the UK being the second largest (1.8%) in the Europe (EIA 2018). The formerly International Petroleum Exchange, now called the ICE futures, is a leading petrol market located in the London, UK. Regarding the multinational corporations of the UK, BP has operations in 72 countries worldwide, one third of global operations of BP is located in the USA, from which daily crude oil production equals to 335,000 barrels per day (Bloomberg Businessweek 2018). Other international operations include Egypt, where BP produces 15% of the countries’ total crude petrol in Angola, a total of 9 oil exploration blocks in Iraq, where BP operates in Rumaila field, which produces over 1 million barrel per day in 2010, which was estimated by the UK Trade and Investment to be increased to 2.5 million per day between 2010 and 2015 (UKTI Report 2012).

The percentage yearly changes in excess demand (ED) of petroleum, calculated as domestic consumption minus domestic production are given in the figures below. Regarding the argument above, the close positive association between petroleum production and consumption with crude petrol prices could be easily observed for both countries. Note that the figures for the USA and the UK include the barrel prices, which are determined in the financial markets located in each country, namely Brent for the UK and WTI for the USA. Our calculations show that raw US dollars, in Brent and WTI prices, have a positive correlation of 0.968, whereas their yearly percentage growth rates have a correlation of 0.979 (Fig. 2a). A comparison of Fig. 2b, c shows that the petroleum production and consumption follow a strong co-movement with crude petroleum prices. One interesting finding is opposed to the argument directed towards the UK, the correlations between ED and crude petroleum prices are 0.256 and 0.205 for the UK and the USA, respectively, showing the association between the UK and petroleum prices not only exists, but is also comparatively stronger.

Fig. 2
figure 2

Brent and WTI Crude Petrol Prices, Excess Petroleum Demand (ED) in the UK and the USA, Yearly % Growth Rates, 2007–201. Source: BP Statistical Review and World Energy, June 2017. Notes: excess demand of domestic petrol is calculated as ED = consumption-production, in million tones. WTI and Brent represent yearly % change in West Texas Intermediate and Brent crude petrol prices per barrel ($US)

These findings clearly reveal that the petrol prices could be taken as not only an important control variable but also as an important explanatory variable to the GDP and CO2 emissions-based EKC for the UK and the USA. The influence of the UK on petrol prices is based on three pillars, the size of its petroleum extraction operations mainly located offshore; the multinational operations of the UK originated petroleum firms and having an important energy market, where the internationally accepted Brent is traded.

Empirical results

The study focuses on the following steps in the empirical section.

  1. 1.

    The MLP augmented MS-VAR-MLP is estimated in parametric form, as a result, the model could be interpreted directly without the sensitivity analysis to interpret the nonlinear impacts of variables. Once the MS-VAR-MLP model is estimated, the direction of causality is evaluated.

  2. 2.

    The MS-VAR-MLP model is augmented with sensitivity analysis to get hold of the shape of the environmental Kuznets curve. The regime-dependent sensitivity analysis eases the evaluation of the causal impact of each variable by providing visual interpretations. The third step should be coupled with the second in terms of providing causal interpretations.

  3. 3.

    MS-VAR-MLP models are compared with the MS-VAR in terms of forecasting and modeling capabilities. This stage is important to analyze the efficiency of the models evaluated.

MS-VAR-MLP neural networks estimations

In this section, the neural network-based models discussed in the previous section will be utilized to evaluate the possible relationships between the growth rates of GDP and CO2 per capita and petrol prices, where the variables are denoted as dlyt, dlco2t, and dlpt after log-first difference transformation. The MS-VAR-MLP methodology follows the following steps:

  1. i.

    Input selection: The optimum number of input variables is selected with SIC information criteria in the input layer to minimize loss of degrees of freedom and to maintain parsimony.

  2. ii.

    EM algorithm: The model’s regime-switching structure is estimated using the EM algorithm and the Markov chain regime probabilities are obtained.

  3. iii.

    Estimation and evaluation: The model is estimated using back-propagation, conjugate gradient descent and Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm cooperation in addition to early stopping and weight decay. The model architecture of the models is selected through estimation of 200 MS-VAR-MLP models with varying model architectures and the best model is selected in terms of the generalization accuracy based on MSE and rho statistics.

MS-VAR-MLP and sensitivity results: the UK

The MS-VAR-MLP estimation results obtained for the UK and are reported in Table 1. The model for the UK is an MS-VAR-MLP (12-2-1) in which the input layer consists of three variables dlyt, dlco2t, and dlpt, each variable has four lagged values, resulting in a total of 12 inputs in each regime. The input layer is connected to two sigmoid activation functions. The hidden layer is connected to the output layer with a linear identity function to predict the dependent variable of each VAR vector in each regime.

Table 1 MS(2)-VAR(4)-MLP(12-2-6) Model, UK

Training and test performances are reported as the Pearson correlation (rho) coefficient between the predicted and the target variables, training, and test errors are with MSE error criteria. Learning is conducted by the conjugate graduate descent algorithm with the BFGS algorithm for which the epoch of convergence is reported. The MSE errors of the first vector, dlyt, were calculated as 0.007750 and 0.019446 for the training and test samples, respectively, in the first regime and as 0.012399 and 0.012399 for the second regime. The correlations were calculated as 0.845727 and 0.529508 for the training and test samples for the first regime and as 0.544468 and 0.294674 for the second regime of the GDP growth rate vector. The highest training performance was obtained for the dlptvector with MSE statistics, being calculated as 0.001038 and 0.002241 with rho coefficients of 0.947366 and 0.505228 for regimes 1 and 2 for the petrol price growth rate vector. It should be noted that, in the second regime of the CO2 growth rate vector (i.e., the results in the test sample), the rho statistic is 0.126 which the lowest rho overall. However, the rho statistic reported for the training sample is 0.71, showing strong results in terms of the goodness of fit in the model. If we evaluate the models with all regimes and vectors, the results show good generalization capabilities and promising results in terms of goodness of fit.

The calculated parameter estimates of the MS-VAR-MLP model for the UK are reported in the second part of Table 1. The architecture selected for the model is in the form of MS(2)-VAR(4)-MLP(12-2-6), a model with two regimes, four lags for each variable as in the VAR models. Each input variable has 4 lags totaling to 12 input variables in the input layer. In the neural network structure of each regime, the input layer with 12 input variables are linked to the hidden layer that consists of 2 nonlinear sigmoid functions. The hidden layer is linked to the output layer that consisting of identity functions, which produce three target variables for each regime, totaling to six output variables in the output layer. The signs of weights that link the input variables to the hidden layer play a vital role regarding the impact of the variables in the input layer. Therefore, the important factors that define causality between input and output are based on the directional impact among the variables and on the signs and sizes of the neuronal links to the output layer, while the learning algorithm applied with early stopping also allows weight decay and pruning during estimation.

Instead of analyzing each estimated parameter, one common way to provide interpretable results within the neural network framework is through local sensitivity analysis. This process involves substituting the average of each variable that is to be held constant, while calculating how sensitive a neural network vector is for a given domain of input variables, thus yielding interpretable causal relationships. Accordingly, for each input, the partial derivatives are measured at ten evenly spaced locations starting from the minimum level and ending at the maximum level of the input variable, while giving the mean values to the other variables. As a result, the partial derivative procedure provides an interpretation of the directional causality for various levels of GDP, CO2, and petrol prices. The pointwise sensitivity results are given in Fig. 3a, b for regime 1 and regime 2 (the recessionary and expansionary regimes, respectively).

Fig. 3
figure 3

The causal effects of the input variables on GDP growth rates in theUK

It should be noted that the direction of causality between variables is nonlinear and it is governed by the direction (sign) magnitude, i.e., negative and positive contributions and their size, as a result the overall effect of each input variable on the dependent variable is nonlinear within each regime. Firstly, the causal impact of petrol price growth rates on GDP growth rates will be analyzed for different magnitudes in the recessionary and expansionary periods. In the recessionary regime depicted in Fig. 3a, it is evident that the size of the impact of petrol price growth rates on GDP growth rates varies from its minimum towards the maximum values (i.e., along the decile range, 1–10, in the horizontal axis). The impact is comparatively high and positive for the low values of GDP growth, though it is continuously decreasing. The impact becomes negative after the sixth decile. The results show that the increases in petrol prices have a positive but a decreasing impact on GDP for very low and negative GDP growth rates and it becomes negative after the seventh decile. In the second regime, given in Fig. 3b, the impact of petrol prices on GDP growth rates is negative for the majority of cases, it increases towards zero and becomes positive at the highest decile of expansions. The results should be evaluated with care, since the recessionary regime also includes the oil price shocks that resulted in deep recessions by causing large and negative supply side shocks. However, the results are as expected, since the overall causal effect of petrol prices is negative on GDP growth rates.

Secondly, the analysis of the impact of CO2 on GDP growth rates indicates that the overall causal effect on GDP is positive, although the positive impact attained its peak at the third decile and maintained a decreasing path towards the tenth decile. In the recessionary periods, during which negative growth rates were recorded, there exists a positive relation between CO2 and GDP growth rates. If evaluated with the GDP’s impact on itself (dashed-dotted line), the impact of GDP on CO2 follows a similar pattern. This type of co-movement is expected; however, it cannot be interpreted as a directional impact of CO2 on GDP growth rates, since CO2 emissions should be taken as a product of industrial production, though the impact of CO2 on GDP could be taken as a sign of policies directed towards the pollutants. The co-movement is expected and is evaluated as a sign of the strong association of emissions to the level of GDP. The analysis will direct its focus towards evaluating the impact of GDP on CO2 growth rates. The pointwise sensitivity results are shown in Fig. 4a, b.

Fig. 4
figure 4

The causal effects of the input variables on CO2 emissions in the UK

The causal impacts of GDP growth rates are given for decile ranges starting from the minimum towards its maximum (one through ten) for the UK’s GDP growth rates in the recessionary and expansionary regimes. During recessions (Fig. 4a), the impact of GDP growth rates on CO2 emissions (dashed-dotted line) had been above zero and was thus positive. As GDP growth rates increase from the first decile towards the tenth, the impact on CO2 emissions follows a decreasing path, up to the tenth decile in the recessions where it reaches zero. Overall evaluation shows that even at the very low negative economic growth rates, the effect of GDP on emissions had been positive for 90% of the deciles. Further, the impact of GDP growth rates on CO2 emissions growth rates (Fig. 4b) follows a positive and increasing pattern, mimicking an S-shaped relation for the expansionary regimes, and the positive effect accelerates for moderate to high economic growth rates. The overall evaluation of Fig. 4a, b suggests that the impact of GDP growth rates follows an asymmetric path of altering magnitudes, and the impact is positive for the majority of the recessionary regime and for the whole of the expansionary regime. As noted in the data section, evaluation of the impact of petrol prices on CO2 could also reveal interesting findings. According to the results (dashed line), increases in petrol prices had a negative impact on emissions, which could be justified with the historical impact of petrol prices on economic growth rates, since petrol price increases are associated with declines in production (GDP), which could lead to increased emissions. As a result, for the UK, the effect of petrol prices on emissions cannot be disregarded, and the overall evaluation confirms the business cycle effects of petrol prices, in addition to the impact of petrol on emissions. High petrol prices affect production negatively in addition to discouraging petroleum consumption, which leads to lower emissions.

One point that cannot be overlooked is the fact that the determination of the shape of the EKC bears significance in the study. Compared to the literature, which focus on the calculation of a single turning point as is for the inverted U-shaped EKC, the results reveal significant differences. In Fig. 4a, b, CO2 emissions are continuously in relation to the GDP and petrol prices, and lowering of emissions has not been possible except for the economic growth rates, being almost equal to zero. Further, even for the negative growth rates, positive impact of production on emissions cannot be rejected. The results suggest strong positive effects of economic growth on emissions. Further, after eliminating the effects of the crisis by isolating the recession and crisis periods, economic growth has a continuous positive impact on CO2 emissions. As discussed in “Introduction” and “Literature review,” without isolating the effects of crises and without differentiating between the business cycle regimes, the generalized results could be misleading in terms of deriving a conclusion in terms of the shape of the EKC. If the GDP -˃ CO2 line is evaluated, the causal impact of GDP on CO2 follows a decreasing path that never falls below the zero line. If taken as the shape of the EKC curve, instead of an inverted U, the overlook of the shape reveals a U-shape that accelerates at highly positive and moderate negative economic growth rates. The results for the petrol prices are also in the expected direction in the spirit of MS and MS-VAR literature (Krolzig 1998; Hamilton 1990) underlining the influence of petrol prices on economic growth cycles. In our results, increases in petrol prices are expected to have a negative impact on production on the supply side in addition to a negative impact on fossil fuel consumption on the demand side. Accordingly, the overlook of the causal impact of increases in petrol prices on CO2 emissions is negative. Though not directly analyzed in the study, the usage of hydroelectric energy and various renewable energy sources after 1960s, in addition to the decline in the dependence on petrol after 1970s are important factors on emissions.

Thirdly, the causal impacts of the variables on petrol prices are given in Fig. 5a, b for the UK. The causal impact of GDP growth on petrol price is positive up to the sixth decile and negative afterwards; however, the dashed-dotted line follows a path very close to zero, suggesting very little impact of GDP of the UK on petrol prices. One cannot conclude that economic growth rates in the UK have no impact on petrol prices, but this impact is limited. As a result, compared to the previous graphs (Fig. 4a, b), figures show the major direction of causality flows from petrol to GDP and to CO2. This relation is also maintained in the expansions. In addition, the petrol prices are most significantly under the influence of its past values, the previous values of the petrol prices. This type of path dependence is evident, since the largest fluctuations occur in the dotted line compared to the solid- and dashed-dotted lines. In the recessions, the path followed by the petrol is positive up to the fifth decile, following a negative and sharp decline towards to tenth, and a negative effect of petrol prices on itself is maintained up to the third decile during expansions after which the impact becomes positive. The combined analysis of both regimes shows a U-shaped path followed by the petrol, reaching a trough at zero economic growth, following a negative but increasing pattern afterwards in the expansions.

Fig. 5
figure 5

The causal effects of the input variables on petrol prices in the UK

The overall results confirm the impact of petrol prices on business cycles, which in return has strong effects on CO2 emissions. By evaluating Figs. 3, 4, and 5, the results indicate that GDP growth rates, CO2 emissions and petrol prices show different dynamics in terms of magnitude and direction in the UK, and these complex relations cannot be summarized by an EKC-type-inverted U finding. Based on the MS-VAR-MLP approach, there is clear evidence of the causal effects among the GDP and petrol prices, which affects CO2 emissions nonlinearly at various levels. In particular, by focusing on Fig. 4, the positive causal effect of GDP growth rates on CO2 emissions does not only occur during expansions, but also during recessions: the impact approaches zero only when the economy approaches the last periods of recession. The emissions follow a steady, S-shaped, positive path during expansion and the positive impact settles at a horizontal path during moderate and high growth in expansions. However, as economic growth approaches its peak around the eighth to tenth decile, CO2 emissions start to follow another accelerating path. The results of the paper clearly show that the impact on CO2 emissions are highly nonlinear, and they differ significantly from the findings obtained in the literature, which utilized the polynomial regressions and various nonlinear approaches, including the TAR, MS, and MS-VAR models. Further, compared to the MS-VAR model, the MS-VAR-MLP approach reveals more complex nonlinear relations, since the relations are allowed to follow NN processes within distinct regimes. The analysis determines that the relationship between the variables are not only depends on which regime the economy is in, but also on the magnitude (or size) of GDP growth rates, CO2 emissions and petrol prices. Compared to the literature given in “Literature review,” these findings provide significant deviations especially from those obtained with the linear polynomial regression models. Linear models assume the parameters to be constant over the sample period, which also assumes the relationship between GDP and CO2 emissions being rather stable implicitly. In addition to observed regime dependency and magnitude dependency in our models, many important factors could result in trajectory changes in the environment-economic growth relationship. As mentioned in Fig. 1, the important historical factors leading to trajectory changes include the first, second, and third industrial revolutions, trade liberalization, and financial liberalization policies, various economic crises, including the 1929 Great Depression, Oil Shocks in 1970s, the Asian Crisis in mid 1990s, the ERM crisis in late 1990s, the Great Recession of 2008, and World Wars I and II. The results suggest that the shape of the EKC is altered historically depending on the phase of the business cycle in addition to magnitude dependency of the responses of environment and economic growth variables within a nonlinear setting. As a result, the econometric analyses with samples covering a long historical period should consider the incorporation of more advanced methodologies, including machine learning and neural networks.

MS-VAR-MLP and sensitivity results: the USA

The results for the USA are given in Table 2. The model selected for the USA is a two-regime MS(2)-VAR(3)-MLP(9-3-6) model with nine input variables in the input layer and three neurons in each layer that are linked to the output layer that produces three predicted values for each regime consisting of six predictions (output vectors) in total.

Table 2 MS(2)-VAR(3)-MLP(9-3-6) Model, USA

The MSE error criteria and rho coefficients point to satisfactory training and test results. The MSE errors of the first vector, dlyt, were calculated as 0.0049 and 0.0005 for the training and test samples of the first regime and 0.000376 and 0.000165 for the training and test samples of the second regime. The correlations between the target and output variables are significantly large. The highest training performance was obtained for the first regime of the dlyt vector, and the lowest was obtained for the second regime of the dlpt vector. The results suggest good generalization and goodness of fit. Similar to the UK’s model, the interpretations will be conducted through the MS-VAR-MLP based sensitivity analysis though the parametric interpretation is also possible.

Similar to the analysis conducted for UK, the horizontal axis represents the decile range for the GDP growth rates in the recessionary and in the expansionary regimes. In Fig. 6a, the impact of petrol prices on GDP growth rates is positive and largest for the most negative values of GDP growth rates, and the positive impact decays towards the fifth decile, staying positive and horizontal afterwards during recessions. In the expansionary regime (Fig. 6b), this negative impact is maintained for the whole decile ranges during expansions. As for the UK, the causal impact of petrol price varies for different magnitudes of GDP growth rates. In terms of the impact of the petrol prices in the UK, the majority of decile ranges suggests a negative impact of petrol prices on GDP with various degrees in both regimes. The economic interpretation is obtained similar to the results for the UK. Further, the results should be evaluated with caution, considering the petrol price shocks that also coincide with the recessionary regimes. These negative and large supply side shocks are coupled with drastic declines in the GDP growth rates (Hamilton 1990), increases in petrol prices are coupled with negative economic growth rates especially in the deep recessions.

Fig. 6
figure 6

The causal effects of the input variables on GDP growth in the USA

The impact of CO2 emissions on GDP growth rates is represented by the solid-dotted line. The line follows a path close to the horizontal line suggesting almost no effect of emission growth increases on GDP growth during the recessions. In contrast, the results suggest the negative impact of emissions on GDP growth rates up to the sixth decile after which the impact becomes positive in the expansionary regime. The result is in favor of a simultaneous occurrence between the GDP and CO2 emissions. Similar to the results obtained for the UK, the co-movement of CO2 on GDP also occurs for the USA. However, the direction of the effect should be evaluated after the investigation of the impact of GDP on CO2 for USA in Fig. 7 coupled with Fig. 6, which deserves special importance in the analysis of EKC in terms of the motivation of the study.

Fig. 7
figure 7

The causal effects of the input variables on CO2 emissions in the USA

The impact of GDP and petrol prices on CO2 emission growth rates is depicted in Fig. 7a, b for the USA. In Fig. 7a, the impact of GDP growth rates on CO2 emissions (solid-dashed) shows a positive impact for the lowest GDP growth rate deciles, follows a decreasing path that reaches zero at the sixth decile, the path becomes increasing after reaching the lowest point at the eighth decile. The impact of GDP growth rates on emissions accelerates and crosses zero at the tenth decile. In Fig. 7b, the impact becomes positive after the second decile and follows an increasing path, suggesting an accelerating effect of economic growth on emissions in the expansionary regime. The overall evaluation suggests a U-shape relation rather than an inverted U relation between GDP and CO2 growth rates. The results also expose that different dynamics exist for the recessions compared to the expansions and the impact of GDP growth rates on CO2 emissions is highly asymmetric at different magnitudes of economic growth, and the aggregate evaluation of two regimes suggest that the effect is positive for the majority of the economic growth decile ranges in the USA.

If the impact of petrol price growth rates on CO2 growth rates is evaluated, an overlook suggests a U-shaped relation, which hint the negative impact of petrol prices on GDP. In the recessions, the causal impact of petrol prices on CO2 emissions follows a positive path only for the lowest deciles. After the fourth, the impact becomes negative. In the expansionary regime, the impact of petrol prices at the first decile is highest; however, as the higher deciles are reached, the negative impact is maintained. The results are in the expected direction, the increases in petrol prices are expected to have a negative impact on both productions on the supply side and on fossil fuel consumption on the demand side in the 1871–2016 period.

Similar to the results for the UK, the dynamics modeled by the MS-VAR-MLP for the USA deviate from the EKC examinations in the existing literature. Our results determined a U-shaped curve, where the response of the CO2 emissions varies nonlinearly depending on the magnitude of the GDP growth in distinct regimes. Accordingly, the environmental deterioration increases as per capita income growth changes, and the response of environmental deterioration is asymmetric depending on the phase of the business cycle. The results also show that at higher levels of economic growth, the elasticity of demand for environmental quality becomes highly positive. The finding is in line with the findings of Atasoy (2017) and Apergis 2016 who suggest that the environmental quality is sacrificed to achieve higher economic growth at the majority of states of the USA.

Lastly, the causal impacts of CO2 emission growth rate on itself are indicated by a solid line, which follows a U-shape in Fig. 8a, b. The pattern mimics to that found for the GDP growth rates (dashed-dotted line). The shape is in favor of path dependence of CO2 emissions, dependence to its past values similar to the results for the UK. The resemblance of the GDP growth and CO2 emissions curves also confirm the co-occurrence of CO2 emission and GDP growth rates for the USA.

Fig. 8
figure 8

The causal effects of the input variables on petrol prices in the USA

The causal impact of the variables on petrol prices is shown in Fig. 8a, b. The causal impact of GDP growth rates on petrol prices (dashed-dotted) is positive for the entire decile range and decays towards zero only at the tenth decile, for zero economic growth rates in the recession. During the expansions, the negative effect of GDP on petrol prices continues; however, the effect decelerates and reverses after the sixth decile, where it achieves a positive and increasing path. Accordingly, GDP growth of the USA has a positive impact on petrol prices in the recessions and except for low economic growth rates in the expansions. The path followed by the impact of CO2 emissions (solid line) is negative and fluctuating up to the last deciles of the recession. After the ninth decile, the effect of the CO2 emissions becomes positive and increasing. In the overall expansions, the effect is positive, but decays towards zero at the eighth decile. The results for the USA differentiated complex causal dynamics among the analyzed variables compared to the UK. However, there is a clear positive and accelerating effect of GDP growth rates on CO2 emissions during the expansions in addition to positive effects of GDP during the deep recessions such as crises, whereas the lowering of emissions could only occur during slight recessions in the neighborhood of zero economic growth.

Comparison of generalization performances of MS-VAR and MS-VAR-MLP models

The overall results suggest significant dynamics in terms of direction, size, and magnitude, all of which may be evaluated using the recommended MS-VAR-MLP methodology. The results also indicate that although MS-VAR models provide important improvements over the linear VAR model, especially in terms of evaluating asymmetric and nonlinear causal relationships between different regimes, the MS-VAR-MLP model provides a richer analysis within each regime, since the processes in each regime are modeled with VAR-MLP processes.Footnote 1 The MS-VAR model accounts for asymmetry and nonlinearity for causal relationships among regimes and the MS-VAR-MLP further improves the causality analysis by nonlinearity within each distinct regime.

To investigate the discussion above, the models are compared in terms of MSE, MAE, and RMSE error criteria for their forecast accuracy. The results are provided in Table 3. For the MS-VAR models, single MSE, MAE, and RMSE statistic is reported. For the MS-VAR-MLP models, these statistics are reported separately for each regime. The table is conducted to provide one-step-ahead forecast evaluations; nevertheless, out-of-sample forecast practices are also possible. Due to the aim of the study, the models are investigated solely in terms of in-sample generalization. As noted in the empirical section, the training of the MS-VAR-MLP models was conducted with early stopping, to avoid over-fitting.Footnote 2 For the MS-VAR model, a general statistic that represents the whole regression space is reported. For the MS-VAR-MLP model, each regime is simulated to produce predictions to be compared to the results of the MS-VAR to ease the interpretation. The model with the lowest error criteria statistic is taken as having better one-step-ahead forecast accuracy.

Table 3 Comparisons of MS-VAR and MS-VAR-MLP Models

The MS-VAR-MLP results for the UK suggest that in terms of MSE, the MS-VAR-MLP model performs better for the dly, for the dlco2 and for the dlp vectors in both of the regimes over the MS-VAR model. For the MS-VAR model, the MSE statistics for the dly, dlco2, and dlp vectors are calculated as 0.0078, 0.2291, and 0.2549, which are drastically larger than those obtained for the MS-VAR-MLP. Same finding also holds for the MAE and RMSE statistics. The overall results suggest significant improvement of MS-VAR-MLP over the MS-VAR for UK. The results obtained for the USA suggest that in two out of three vectors, the dly and dlp vectors, the MS-VAR-MLP model provides better forecast accuracy in terms of MSE, MAE, and RMSE criteria. However, the MS-VAR model performed better for both dlco2 vectors in both regimes, though the second regime of the MS-VAR-MLP could be taken to perform close in terms of the MAE statistic (0.0381) compared the MS-VAR (0.0339). This finding also holds for the MSE calculated for the second regime of the MS-VAR-MLP model having a very close performance with MSE = 0.0033. Compared to the significant improvement achieved by the MS-VAR-MLP model over the MS-VAR for the UK, this finding holds two out of three vectors of the MS-VAR-MLP for the USA and in the GDP growth rate and petrol price growth rate vectors, the MSE, MAE, and RMSE error criteria are significantly lower than the error criteria calculated for the MS-VAR model. The overall evaluation of the statistics suggests that the MS-VAR-MLP performs better in forecast accuracy in both countries; however, due to the focus of the study on CO2 emissions, the two models estimated for the USA are assumed to provide almost equal forecast accuracy. It should be noted that the result for forecasting accuracy holds only for the data set in consideration. Consequently, the improvement of the results for the neural network models in terms of forecasting is highly sensitive to the number of observations. Hence, though the data set covers the period from 1871 to 2016, the length of data is restrictive insofar as the data were collected annually. As a result, the selection of the optimum number of neurons is highly sensitive when trying to maintain parsimony and loss caused by degrees of freedom. Thus, the results obtained for forecasting cannot be generalized and should be evaluated for different and larger datasets. The overall findings of the study suggest that MLP augmented MS-VAR-MLP models provide significant improvement over the MS-VAR models in terms of the interpretability of causality and capturing the complex nonlinear relations between the analyzed variables. The MS-VAR-MLP model makes important contributions by providing differentiated dynamics that are asymmetric and nonlinear in relation to size, magnitude, and direction for the investigated variables. For future studies, the investigation of the EKC relation with the MS-VAR-MLP model with monthly data could provide a comparatively larger sample size. This analysis could hinder important insights regarding the emissions, income, and petrol relation. However, due to the long-run characteristic of the EKC, which requires a sample that covers various stages of economic development historically, the analysis is conducted with the use of annual data and the results suggest strong nonlinear effects of the variables under investigation.

Discussion and policy recommendations

The MS-VAR-MLP model allows asymmetric and size-dependent causality; therefore, there are additional differences when compared to the MS-VAR models. The proposed MS-VAR-MLP model also aims at the derivation of regime-specific sensitivity analysis, which allows the researcher to investigate regime-dependent asymmetric and nonlinear causality between the analyzed variables. Additionally, the proposed regime-dependent sensitivity analysis extended MS-VAR-MLP approach also has the capability to provide visual interpretations, i.e., graphical mappings of causal relations, which show the magnitude and regime-dependent causality.

The MS-VAR-MLP model overcomes the necessity of filter selection for the MS-VAR models. The MS-VAR models are generally estimated with either the EM or ML algorithms, and the selection of filter has been debated. As discussed, the Hamilton and the Kalman filters had been suggested and the majority of empirical studies could be considered as utilizing the Hamilton filter. In addition, trajectory changes resulting from oil shocks, economic crises, policy changes, and wars in such a historical data could result in achieving MS-VAR regimes that capture only these abrupt shifts in the time series. This could result in non-rejection of certain regimes, which capture a small number of observations including the outliers. Since the hidden Markov chains are inherent in the MLP models and the NN learning algorithms, no filtering is necessary in the estimation of the MS-VAR-MLP model. With the use of the learning algorithms and inherent nature of the hidden Markov chains in the MLP models, the MS-VAR-MLP model is capable of modeling such large span of data without the application of any filters. The MS-VAR-MLP models proposed are fully parametric and therefore allow ease in interpretability though the functional representations are highly complex compared to the linear regression models and nonlinear MS-VAR models. By the use of the sensitivity analysis, the causal interpretation of the neural network is possible considering the neurons, the weights of each variable (and its lagged variants as in the time series approach) and the relevant connections between the neurons, input layer, and the output layer. The sensitivity approach allows evaluation of asymmetry with visual representations for different episodes, i.e., recessions and expansions.

As discussed in the literature section, a relatively small body of research focused on the MS-VAR models to investigate the environmental degradation and pollution relationship compared to the good amount of the polynomial regressions. Further, various papers also underlined the importance of more advanced methods, including the BP-NN and SVM. The MS-VAR-MLP further contributes by dividing the regression space into two or more regimes to investigate regime-dependent behavior. The MS-VAR-MLP model is also capable in overcoming the difficulties achieved with the MS-VAR model in terms of modeling the magnitude dependent, in addition to regime-dependent characteristics between the causal relations among the analyzed variables. The neural networks in each regime provide a more detailed analysis of the relationships between the variables: instead of the ceteris-paribus approach, as done in the linear econometric models, the variables in the analysis are nonlinear functions of themselves and are not held constant. Therefore, as an economist’s perspective, the model assumes mutatis-mundi, i.e., variables cannot be assumed constant, and the changes in the relations between the variables are not only specific to the regime. The economy is at but also the stages in each regime result in differentiated causal relations depending on the magnitude. In the MS-VAR-MLP specification, complex nonlinear relations could be effectively captured by the use of neural networks.

The main advantages of the MS-VAR-MLP over its MS-VAR counterpart could be summarized as (i) its flexibility in terms of modeling regime-dependent asymmetry, while allowing neural networks specifications within each regime; therefore, having no a-priori functional form between the analyzed series, (ii) the model benefits from the improved generalization capabilities of NN learning algorithms; hence, augments the estimation of the MS-VAR models with the neural networks algorithms, overcoming problems, such as convergence to local minima or mistakenly defining outliers as distinct regimes. (iii) The model relaxes the ceteris-paribus assumption, since the variables are not assumed to be constant at different size and magnitudes. (iv) The model could be extended to the regime-dependent sensitivity analysis to investigate the impact of each variable on the other variables analyzed, (v) in terms of the investigated EKC relation, the model provides important insights, the relations between emissions, economic growth, and petrol prices are more complex than assumed and the model is capable of contributing by capturing such relations.

One should also note that the MS-VAR-MLP model has disadvantages over the MS-VAR model: (i) the estimation of the MS-VAR-MLP model requires comparatively larger sample sizes given the number of parameters to be estimated, the loss in degrees of freedom is larger and this could be controlled only for a certain degree with SIC information criteria, (ii) if the researcher aims at estimating the model with more than two regimes, with each additional regime, the number of parameters to be estimated is multiplied, the requirement of large sample becomes more serious. (iii) Though the MS-VAR-MLP model proposed in the study is parametric but the model assumes no ceteris-paribus, the parametric interpretation is harder than its MS-VAR variant. However, one could overcome the last disadvantage given above with the suggested regime-dependent sensitivity analysis. In summary, the overall evaluation suggests a trade-off between complexity and easiness in interpretability. Depending on the complexity of the analyzed phenomenon, the researcher could choose between MS-VAR and its MLP variant. As in the case for EKC, one hint of such complexity is the derivation of significantly differentiated findings in the literature depending on the econometric methodology and the selection of variables. As a result, if one selects polynomial regressions over more complex analysis techniques, the over generalization of the phenomenon could result in wrong policy implications and linearity in parameters could be mostly misleading.

In addition to abovementioned advantages and disadvantages, similar to the discussion given in the literature section, another limitation of our paper is the selection of CO2 as the variable of pollution. Our selection is based on two pillars, first one is the availability of CO2 data starting from the eighteenth century and the second is to reach comparable results with the majority of literature that follow early studies in the investigation of the EKC. By doing so, the paper aimed at underlining the capability of the MS-VAR-MLP model in capturing the complex relations between emissions, income and petrol prices, and providing comparable results to papers utilizing polynomial regressions and their panel variants. Recent studies propose the differentiation of the data representing types of ecological footprint, including land and water resources. Al-Mulali et al. (2016) and Bello et al. 2018) show that the impact of the explanatory variables could be significantly altered on the type of ecological footprint selected including, water, land, and air. Gil-Alana and Solarin (2018) investigate the effectiveness of the environmental policies of the USA by global and per capita NOx and VOC emissions for data covering 50 years, and show that the environmental policies implemented in 1965, 1967, 1970, 1977, and 1990 are effective in reducing emissions. They also note different estimates for fractionality parameters with wide confidence intervals after the policy implementations. Another important paper is Solarin and Al-Mulali (2018), which investigate the impact of the foreign direct investment (FDI) and economic factors, including GDP in 20 countries by differentiating between types of environmental degradation as CO2, carbon footprint and ecological footprint. Their results reveal the necessity of investigation of the EKC with ecological footprint indicators, in addition to pollution indicators generally accepted by the scholars. Due to availability of data and to obtain comparable results to the mainstream papers following long-run analysis of the EKC, the paper is limited to the investigation of CO2 series. The application of the MS-VAR-MLP model to other ecological footprint indicators is left for future studies if larger datasets are available.

Our findings based on the suggested MS-VAR-MLP model and the proposed regime-dependent sensitivity analyses lead to important policy suggestions. If the impact of economic growth on CO2 emissions growth is evaluated for the UK and for the USA, the negative effect of economic growth on emissions is restricted to the cases, where economic growth rates are very close to 0%. Even for the majority of first decile ranges that represent very negative growth rates, the impact is positive. This result confirms the previous results obtained that suggest the selection of economic prosperity over environment for the majority of cases. Further, the petrol prices in the history had strong impact on the business cycles in both countries. The policies towards the environment should take the complex relations between the analyzed variables into consideration and even for the analyzed developed countries, the environmental degradation could not be reversed unless various measures are taken. One important point to be made regarding the Rostow-type analysis is that, at the last stages of development, the society is transformed into a state of high mass consumption, in which the share of industry is lowered and the economy is dominated with the service sectors. Even though this is the case, in the stage of high mass consumption, the domestic demand for various manufactured goods is satisfied with the supply of goods originated from the industries in the developing world including China. As a result, the results for the EKC curve are restricted to be domestic. The role of high mass consumption in the developed world on the pollution created through the production in the East should not be overseen. The environmental degradation is a public good with negative externalities to the world. Though such effects are not in the focus of our study, as CDIAC data showed, the CO2 emissions in China increased by three times in the last two decades. Further, as shown by Liu et al. (2018), with a slowly decreasing trend, more than 70% of the energy demand is satisfied with non-renewable energy sources, such as coal. Solarin and Al-Mulali (2018) point at the finding that the lack of environmental policies in the developing countries, including China create incentives for attracting FDI to their home country under the strict and contrasting environmental policies of the developed countries. Shen et al. (2017) investigate the environmental policies, FDI, and manufacturing relations, and their findings are in favor of the validity of the pollution haven hypothesis for China as a location for various pollutant industries.

China is not in the focus of the study and nor is the only responsible country for the global emissions and global warming. As noted in the introduction section, the shift of production from the more developed to less developed countries is not only a transfer of emissions to the developing world but also is a result of the increased GDP and consumption levels of the more developed countries. Even an inverted U-shaped EKC would have been observed for the developed countries, the role in their high mass consumption of goods produced in the developing countries translate into a derived demand of emissions in the world. In addition to the carbon taxes both at the consumption and production level in the developed countries, the developed countries should take role in investing on non-polluting energy production in the developing countries. These policies could include indirect policies, such as carbon tariffs; however, direct policies, such as investing in renewable energies, such as solar and wind in developing nations should be taken into agendas. Nevertheless, emissions and global warning are global public goods and the negative externality created by the developed countries should be internalized by the developed nations themselves, in addition to developing countries to achieve global environmental sustainability.

Conclusion

This study aimed to analyze the dynamics between CO2 emissions, GDP growth rates, and petrol prices for the UK and the USA during the period 1871–2016. For this purpose, the study contributed to the literature by providing a neural network approach to nonlinear causality analysis based on the newly introduced regime-switching MS-VAR-MLP models, which allow causality to follow nonlinearity and asymmetry in terms of direction, size, and magnitude among different regimes and within each regime under analysis. The MS-VAR-MLP model provided visual sensitivity analysis that eases the interpretation of complex relations between the analyzed variables. The proposed MS-VAR-MLP model introduced the regime-switching to the MLP (and VAR-MLP) processes governed by Markov chains. With MS-VAR-MLP-based regime-dependent sensitivity analysis, the general dependency of causal effects of each variable to another could be easily modeled and interpreted. As a result, the proposed methodologies allow the researcher to investigate dependency of the causality between the analyzed variables to the magnitude (size) of each variable within each distinct regime of the business cycle.

The empirical results suggested that the MS-VAR-MLP model and the regime-dependent sensitivities confirmed the causal relations between the GDP growth rates, petrol price growth rates, and CO2 emissions during both of the regimes, expansions and recessions, with varying magnitudes within each distinct regime. The overlook of the results showed that the causality from GDP to CO2 growth rates could not be rejected and the positive impact of economic growth on emissions failed to be reversed at various magnitudes of economic growth. The findings should be taken as a rejection of an inverse U-shaped EKC relation for the analyzed countries. The results are listed in detail as follows. The determination of the direction of causality deserved special importance in the study. The empirical results obtained for the MS-VAR-MLP and the regime-dependent sensitivity analyses confirmed that the direction of relationship ran not only from GDP to CO2, but also in the opposite direction. By including the impact of petrol prices to the environment-economic development relation, important information had been gathered not only in terms of its impact on the business cycles only, but also in terms of the two-directional relation between the emissions and the petrol prices. In contrast to the general expectation that economic development and CO2 emissions were closely linked, GDP and CO2 variables also had significant effects on the petrol prices in both the UK and USA. As discussed in the “Literature review” section, in addition to the USA’s influence in the petrol market, the UK’s production and emissions had strong influence on the petrol prices through its production and refinery operations in addition to its influential role in a global scale.

The results favored that the relationship between the economic and environmental variables determined as being far from an inverted U-shape. Instead, the relation showed a J- or an S-shaped curve depending on the regime the economy is in. If two regimes are analyzed together to cover the whole decile ranges of economic growth, the shape of the relation is close to a U-shape. The J and S curves in two regimes suggested that as economic growth was accelerated, CO2 emissions accelerated by following a sigmoidal S curve, and the acceleration rates differed drastically at various magnitudes of economic growth. Further, the dependence on magnitudes of variables within each regime had shown significant deviations from the findings, which could not be achieved with the polynomial regressions that assumed a stable relation for the whole period, in addition to the MS-VAR models that assumed constancy of parameters and the stable relation for the determined regimes.

The impact of petrol price growth rates had significant nonlinear impacts at various stages of economic growth. In the recessions, the causal impact of petrol prices on CO2 emissions followed a decreasing but positive path so that the impact became negative as growth approached 0 %. In the expansionary regimes, the positive impact of petrol prices in the lowest deciles was the largest. As the economy moved towards the deciles that represented higher petrol price growth rates, the impact was comparatively lower. Similar complex nonlinear relations were captured for both countries with differences. Moreover, the causal impact of CO2 emissions and GDP showed similarities in terms of the path followed. This co-movement suggested a significant lead-lag relation between GDP growth rates and CO2 emissions, pointing to a co-occurrence of CO2 emissions and GDP growth rates for the overall results.

The results obtained for the MS-VAR-MLP model further enhanced the nonlinear structure of the causality between variables. Using the MS-VAR-MLP model, this paper aimed at nonlinear causality analysis in light of the provided pointwise sensitivity methodology. According to the sensitivity results, the causal links between petrol price growth rates, GDP growth rates, and CO2 emissions had different impacts depending on the size and magnitude of GDP growth rates even within the expansionary and recessionary regimes. According to the results, positive causal effects of GDP growth rates on CO2 emissions could not be rejected during recessions and expansions, while the GDP growth rates clearly had a positive impact on CO2 emissions. One interesting finding was the impact decayed towards zero at the last episodes of recessions only. Further, during the last decile ranges in the expansions where the economic growth rate was moderate and high, the emissions followed a positive and accelerating path. Hence, the positive path followed by emissions proved that the impact of economic growth on CO2 emission growth rates possessed important departure from the literature. Therefore, the proposed hybrid approach that benefited from the MS-VAR, neural networks and sensitivity analysis offered flexibility in terms of analyzing the complex nonlinear causal relations among variables without assuming linear and stable relations.

Due to number of parameters to be estimated for the MS-VAR-MLP similar to other NN models, the large sample size is a necessity and is recommended for researchers. In addition to the required large samples, the findings of the study are also limited to the investigation of the CO2 data instead of evaluating different ecological footprint indicators. As discussed in the discussions section, the justification of CO2 is based on aiming at providing comparable results to the literature to highlight the complexity of the EKC relation compared to the findings in the literature that follow long-run data approaches. In addition, the availability of the CO2 data starting from the eighteenth century also limited the study to the investigation of CO2 emissions. As of our knowledge, the differentiation between various forms of footprint, such as land and water are not available for this long series. However, if the frequency of the data could be increased, such as monthly data, the application of the MS-VAR-MLP to other ecological footprint indicators could provide important insights; this research is left for future studies.