1 Introduction

Recently, quantile regression (QR) is broadly applied on panel data covering a wide research area. Koenker [15] suggested a general approach of QR into panel data model, defining the conditional quantile functions approach in which quantiles of the conditional distribution of the dependent variable are expressed as functions of observed covariates [16]. QR is used to estimate the conditional median or any other quantile of the dependent variable. Sometimes it is called least absolute value (LAV) model, minimum absolute deviation (MAD) model, or L1-norm model. QR seeks to search for the regression model that minimizes the sum of the absolute residuals rather than the sum of the squared residuals as in the ordinary least squares (OLS) model. Gilchrist [8] defined the quantile as the value that corresponds to a specified proportion of an ordered sample. For example, the 0.5 quantile from ordered data is the median M, which corresponds to a quantile with a probability of 0.5 of occurrence. QR measures the effects of unobserved heterogeneity in the included variables in the estimated model, but the panel data model properly controls the fixed effects of some unobserved independent variables. Moreover, if the distribution of the dependent variable changes together with the independent variables, the result is misleading when using the OLS regression, whereas QR shows how such changes in the independent variables affect the distribution shape of the dependent variable [1].

Most of the empirical research in the environmental area suffers from two common shortcomings. One of these shortcomings is the use of the OLS mean regression models to find the conditional mean of the dependent variable in response to the independent variables [11]. Another shortcoming is that experimental data have often a heterogeneous distribution. Thus, OLS may not provide efficient estimations [20]. QR is an important and well-established tool for planning and resource management which could provide a meaningful explanation of the environmental relationships. To the best of our knowledge, this study is the first environmental experiment study applying the QR for multiple percentiles (i.e., 0.05, 0.25, 0.50, 0.75, and 0.95) against the OLS model to detect the relationship between CO2, energy consumption (EC), and gross domestic product (GDP) into two different economic groups of countries, developed and developing countries. QR enables us to evaluate the levels of CO2 emissions at different points of the dependent variable distribution.

The remaining sections of this paper are organized as follows: Section 2 illustrates the related literature to QR. Section 3 explains the QR approach. Section 4 provides results and discussion. Section 5 concludes the paper.

2 Literature Review on Quantile Regression

Generally, many scientists focused on estimating the rates of changes in the mean of the response variable distribution. However, few studies have applied QR along with the upper and lower boundaries of the conditional distribution of response variables. OR also could measure the effects of independent variables on location, shape, and scale of the distribution into the dependent variable. The person who pioneered the application of QR in science is Kaiser et al. [13]. Since after that, there were more researches that applied QR in the other fields. Among them, Dunham et al. [7] analyzed the abundance of Lahontan cutthroat trout to the ratio of stream width (as the predictor variable) to depth. They constructed the value of the additional information provided by QR for different percentiles (0.95, 0.75, 0.50, 0.25, and 0.05). The QR reported a negative nonlinear relationship with the upper 30% of cutthroat densities across 13 streams and 7 years, while the OLS indicated no significant relationship in mean densities with stream width to depth. Cade and Guo [3] examined the reduction in densities of mature plants with increasing germination densities of seedlings of annual plants in the Chihuahuan desert of the southwestern USA. They estimated the QR for 0.99 and 0.90 quantile to measure the changing in the survival of Chihuahuan desert by modeling changes in mature plant density (y) as a function of germination density of seedlings (x). The conventional OLS regression was inaccurate for estimating the relationship. On the other hand, QR indicated that the effects of seed density are best revealed at the higher plant densities associated with upper quantiles, where there is a strong decline in density of mature plants at higher germination densities. Vaz et al. [19] applied the QR for five quantile intervals from the 75th to the 95th into 16 of the most abundant marine fish and cephalopods in the eastern English Channel. The purpose of applying the QR is to estimate the upper quantile model which could define the limiting factors and design the potential habitat given the environmental data available for model construction. The results of the experiment study indicated that QR provides effective and significant differences with a p value less than 0.05 between the estimated coefficients for the different quantile values to detect that relationship. Taheripour et al. [18] applied three different methods, i.e. OLS, QR, and Tobit regression, to detect the relationship between leasing and debt in farm capital structure in Illinois by including other factors in the model such as the age of farm’s operator, soil quality, and net worth of the farm. The results supported that all estimated parameters are highly significant. Also, the QR could give a clearer idea than the OLS regression on the different effects in farm characteristics on the distribution of leased to assets ratio.

Hennings and Katchova [10] applied QR approaches for different values (10th, 20th, to 90th percentiles) to examine the relationship between the business strategies employed by Illinois farms with equity growth. The Breusch-Pagan test for heterogeneity was applied. The results showed that the data is heterogeneous, meaning that the conditional variance of the equity growth distribution is not constant across different levels of equity growth ratios; hence the QR should be applied. The main results supported that the estimated coefficients of the 10th and 90th percentiles are significantly different from the OLS coefficients. In other words, the effect of different business strategies on equity growth rates differs between values of quantiles. However, OLS regression showed significant positive effects of these variables on equity growth. Cade et al. [5] applied the QR to estimate the effects of physical habitat resources on bivalve in spatially structured landscape on a sand flat in New Zealand. The results for the 75th percentile were less biased than the estimated mean parameters by OLS. However, the variation of estimated parameters for modeling the spatial trend surface reduced the quantiles associated with heterogeneous effects of the habitat variable. Gorg et al. [9] employed QR to analyze the determinants of firm start-up size. They showed that QR can provide more precise information on the determinants of start-up size than OLS regression model. Jayachandram et al. [12] investigated the dietary impact of nutrition from several factors: income, education, and age. The results showed that quantile regression is effective in estimating conditional function and provides more information for that relationship than OLS regression. The results of QR for different percentiles (10th, 25th, 50th, 75th, and 90th) suggested that income, education, and age have larger effects at intake levels where the risk of excess is greater compared to the intake levels where risk of excess is lower. In particular, people with higher income and education level may have benefitted more from nutrition information than people with a lower level of income and education level.

3 Quantile Regression in the Panel Data Approach

In this study, two statistical methods are applied; the OLS and the QR models. QR allows the researcher to account for unobserved heterogeneity and heterogeneous independent variable effects, while the availability of panel data potentially allows the researcher to include fixed effects to control for some unobserved covariates. QR was introduced by Koenker and Bassett [14] as a generalization of the sample quantiles for the estimation of conditional quantile functions, expressed as linear functions of the independent variables. QR is the extension of OLS regression allowing for the specification of conditional functions at any quantile. QR approach is more accurate to detect the effect of independent variables on the dependent variable than the OLS approach, particularly if data contain heterogeneity. OLS is based on the average relationship between a set of independent variables and the dependent variable by the conditional mean function E(y/x), which provides only a partial view of the relationship. In contrast, QR could describe that relationship at different points in the conditional median or quantiles distribution of dependent variable Qq(y/x), where q is the quantiles or percentiles and the median is the 50th percentile of the empirical distribution, and the dependent variable should be continuous with no zero values or no many repeated observations [6]. For that, the QR is especially meaningful in environmental applications where extreme values or outliers are important to study, where upper quantiles of pollution levels are critical from a public health perspective. Median regression is more robust in the presence of outliers than the OLS regression, and it is a semi-parametric method as it avoids the assumptions regarding the error process and the parametric distribution.

In conditional quantile models, the parameters of interest are assumed to vary based on a non-separable disturbance term [14]. However, when the additional variables are added, the interpretation of these parameters will change. The computation of QR uses the linear programming methods in contrast to that in OLS and maximum likelihood approach. In both OLS and QR, being based on the sum of square error \( \sum \limits_i{e}_i^2 \) and the absolute-error \( \sum \limits_i\mid {e}_i\mid \), respectively, are symmetric, making the sign of the prediction error not relevant. However, if the quantile q differs from the median (50th), there is an asymmetric penalty with increasing asymmetry as q approaches 0 or 1.

3.1 Advantages of Quantile Regression

One of the advantages of QR versus OLS regression is to provide different estimators for each quantile, which may allow the analysis of the various effects of the independent variables on the dependent variable. Consequently, it allows for a clearer path to compare its estimated coefficients and standard errors with those of OLS. The other advantage of QR is that that it is less sensitive to the tail behavior of the underlying random variables, thus it will be less sensitive to outliers and has a high breakdown point compared to the OLS regression [17].

Furthermore, QR is insensitive to any monotonic transformations, the latter referring to a transformation by a strictly increasing function, such as log(.), so the h(y) quantile of y-monotone transform is h(Qq(y)), and by using the inverse transformation, it could transform the results back to x values. This characteristic cannot be used for the OLS mean regression, as E[h(y)] ≠ h[E(y)] [2].

In addition, if the data has homogeneous distribution, then the estimated slopes by QR at each point of the dependent variable will be identical with each other and with the estimated slopes given by OLS. In other words, the QR will produce again the same values of the OLS estimated slopes at any point across the distribution of the dependent variable, being the only differences in the intercepts.

Further, if the data has heterogeneous conditional distributions (the error terms is not constant across a distribution, or the level of independent variables) and the distribution of errors is non-normal, then the QR provides efficient results, while OLS will be inefficient if the errors are highly non-normal, as one of the main assumption of OLS is that the errors must be normally distributed to estimate the coefficients, whereas QR does not assume that. As a result, the estimated slopes by conditional quantile functions will differ from each other and from the OLS slopes. Thus, estimating the conditional quantiles at different points of the dependent variable will provide different marginal responses of the dependent variable itself, according to the change of the independent variables in these points [4].

3.2 The Quantile Regression Model

QR becomes one of the most suitable methods to apply if the estimated coefficients are significantly different from zero and also from OLS coefficients, so showing different effects across the distribution of the dependent variable.

QR minimizes the sum that gives asymmetric penalties for over-prediction (1 − q) ∣ ei∣ and q ∣ ei∣for under-prediction. If \( \widehat{y} \) is the predicted variable and e = y −  y^ is the prediction of error, then L(e ) = L ( y −  y^) indicates the loss associated with the prediction errors. If L(e) = e2, then the OLS results the optimal predictor. QR has the following form:

$$ {y}_{it}=\alpha +{X}_{it}^{\hbox{'}}{\beta}_q+{e}_i, $$

where βq is a vector of parameters associated with the qth quantile, 0 < q < 1. Assuming \( E\left({e}_i^2\right)<\infty \), so that the distribution of eiis not too spread out, the median regression minimizes the sum of absolute deviation (LAD)\( \sum \limits_i{e}_i \), and QR minimizes the equation

$$ \sum \limits_iq\mid {e}_i\mid +\sum \limits_i\left(1-q\right)\mid {e}_i\mid . $$

The qth quantile regression estimator βq minimizes the objective function

$$ Q\left({\beta}_q\right)=\sum \limits_{i:{y}_i\ge {x_i}^{\hbox{'}}\beta}^Nq\mid {y}_i-{x}_i^{\hbox{'}}{\beta}_q\mid +\sum \limits_{i:{y}_i<{x_i}^{\hbox{'}}\beta}^N\left(1-q\right)\mid {y}_i-{x}_i^{\hbox{'}}{\beta}_q\mid $$

4 Preliminary Analysis

The main purpose of this study is to apply the OLS estimator and QR for different percentiles: 5th, 25th, 50th, 75th, and 95th to detect the significant effects of GDP and EC on the CO2 emissions at different levels for developed and developing countries. CO2 emissions are indicated in metric kilogram per capita, but GDP and EC are indicated by USD per capita and kiloton of oil equivalent per capita respectively. The panel data includes 29 countries over the period 1960 to 2008. The countries are categorized into developing and developed countries according to the World Bank classification. The list of countries is shown in the appendix.

Figures 1 and 2 illustrate the distribution of CO2 in developed and developing countries, respectively, both clearly appearing increasing. Note that the increasing rate in developed countries was slow from 10th to 90th percentiles but after the latter point, the level of CO2 increased dramatically. On the contrary, the increase of CO2 in developing countries started from the 10th percentile and was quite gradual in all the distribution.

Fig. 1
figure 1

CO2 distribution for developed countries

Fig. 2
figure 2

CO2 distribution for developing countries

Before performing the QR analysis, the modified Wald test and Breusch-Pagan test were performed to test the heterogeneity in the data. The results in Table 1 show that the data do not have constant variance, which supports the using of QR, as it could provide more information and accurate results than the OLS method in detecting the relationship between the variables, as may there is a strong relationship with some parts of the CO2 emissions but there is no any significant relationship with other parts.

Table 1 Heterogeneity test

4.1 Model Coefficient Interpretation on QR and OLS Regression

Results of estimation using QR and OLS regression in the contaminated dataset are summarized in Table 2. The majority of the estimated coefficients under QR and OLS methods have significant effects on the CO2 emissions. Besides, the lowest values of RMSE and MAD for developed countries estimated models are in favor of the 50th percentile model with 4939.8 and 2812.4 respectively, while the lowest values of RMSE and MAD in the developing countries models are in favor of the 25th percentile model with 2522.5 and 1423.3 respectively.

Table 2 CO2 models by OLS and QR at different percentiles

We can summarize the main results of our analysis as following. First, in the developed countries group, the GDP coefficient based on the OLS estimation is − 0.11, which indicates that the GDP has a negative relationship with CO2 emissions, as 1 USD increase of GDP in developed countries leads to a decline of 0.11 metric kilogram per capita in CO2 emissions. The QR results show that the 25th and 50th quartiles of GDP have a significant stronger negative effect on CO2 emissions than the other higher percentiles (75th and 95th), also larger than the effects of the OLS estimated model. In other words, 25% and 50% of the data from developed countries panel could show a stronger relationship between CO2 and GDP than by using 75% or 95% of the data in the analysis. Further, the estimated coefficient of EC by OLS in developed countries model is 2.01, which indicates that EC has a positive relationship with CO2 emissions, i.e., one unit increase in EC will lead to an increase of CO2 emissions by 2.01 metric kilogram per capita. Moreover, QR results indicate that the effect of EC has similar effects on CO2 emissions across the percentiles except in the highest percentile 95th which indicates that the EC has about two times stronger positive effects on CO2 emissions than that at lower percentiles and OLS estimation coefficients. Developed countries with low CO2 emissions (at the lowest percentile considered, the 5th percentile) have 2.03 unit increase of CO2 emissions corresponding to one EC unit increase, whereas developed countries with a higher release of CO2 emissions (at the higher percentile) have a significant 4.07 unit increase in CO2 emissions for each unit increase in EC. In other words, the effect of EC is increasing for countries with higher CO2 emissions (higher percentiles).

Second, in the developing countries group, the estimated coefficient of GDP by OLS estimation is − 0.25, which illustrates that GDP has a negative relationship with CO2, as 1 USD increase in GDP in developing countries leads to a decline of 0.25 metric kilogram per capita in CO2 emissions. On the other hand, results of the QR models show that the 25th and 50th percentiles of GDP have a significantly stronger negative effect on CO2 emissions than the other higher percentiles (75th and 95th), also larger than that effects of the OLS estimated coefficient. Therefore, these results are in line with the results obtained in the developed countries group. Moreover, the estimated coefficient for EC in the OLS model for the developing countries is 2.64, which indicates that EC has a positive relationship with CO2 emissions, i.e., one unit increase in EC will lead to an increase of CO2 emissions by 2.64 metric kilogram per capita. However, the QR results reveal that the effect of EC on the 50th quartile of GDP have about two times the significant stronger effect on CO2 emissions than the other percentiles and OLS estimated coefficients. Developing countries with low CO2 emissions (at the lowest percentile) release 1.76 units of CO2 emissions for one EC unit increase, whereas countries with a middle level of CO2 emissions (i.e., at the 50th percentile) release 4.13 units of CO2 emissions for one EC unit increase and countries at the highest percentile release 2.3 units of CO2 emissions for one EC unit increase. In other words, the effect of EC is higher on releasing CO2 emissions for countries with a middle level of CO2 emissions (50th percentiles).

In addition of that, the best comparison between the QR estimated models for developed and developing countries could be made at level of the 50th percentile models, as the most significant difference between the (GDP and EC) regression coefficients of the QR with respect to OLS can be also found at this level. The estimated coefficient of GDP in developing countries is − 0.65, which is about five times larger than that negative effects in developed countries (− 0.17). This indicates that the increase in one GDP unit will affect negatively the CO2 emissions in developing countries almost five times more than in developed countries. The developing countries tend to have five times lower CO2 emissions by increasing one unit GDP in comparison to the developed countries at the 50th percentile level. However, the estimated coefficient of EC in developing countries is 4.13, which is approximately two times larger than the one in developed countries (2.42). This means that one unit increase in EC will affect positively CO2 emissions (i.e., countries release more CO2) about two times more in developing countries than in developed countries. In conclusion, the explanatory variables GDP and EC show different effects on CO2 at different percentile levels for developed and developing countries.

Figures 3 and 4 show the effects of GDP and EC for developed and developing countries with respect to the percentiles. The estimated coefficients with respect to various percentile levels clearly differ from the OLS coefficients and their confidence intervals. OLS coefficients are plotted as a horizontal dashed line with two horizontal dotted lines for the confidence intervals. The OLS coefficients do not vary along the distribution of the data (percentiles). The QR estimated coefficients are plotted as curve lines varying along the percentiles together with their confidence intervals (indicated by the shadowed area around them). In case the percentile coefficients are outside the OLS confidence interval borders, then they can be considered significantly different from those from the OLS models (significant differences are indicated with a + sign in Table 2).

Fig. 3
figure 3

Quantile regression coefficients for developed countries

Fig. 4
figure 4

Quantile regression coefficients for developing countries

The estimated QR coefficients in developed countries showed in Fig. 3 for both GDP and EC as predictors of CO2 emission are almost within the interval of the OLS estimation coefficient until about the location of the 90th percentile. This means that there is no significant difference between the estimated coefficients in the case of the QR model and the estimated coefficient in the case of the OLS model until a certain level which is about the location of the 90th percentile and above; in this part of the distribution, the QR coefficients become significantly different from those of the OLS model.

On the other hand, in Fig. 4 which shows the estimated QR coefficients for the developing countries model, the effect of GDP on CO2 decreases within the OLS estimation interval until around the location of the 25th percentile; after this threshold, it exceeds the OLS estimation interval and becomes significantly different from the OLS estimation until the area around the 75th percentile where it starts to increase for countries with higher release of CO2 emissions (higher percentiles), while the increasing effect of EC on CO2 becomes significantly different for the OLS coefficient estimation after the 25th percentile until around the location of 75th percentile, where it lies within the OLS interval.

5 Conclusion

The topic of the relationship between CO2 emission, EC, and economic growth has got much efforts by many researchers, but conflicting results are often obtained due to using different approaches/efficiency. In environmental experimental studies, it is often the case that the collected data suffer from the heterogeneous problem, which could cause inaccurate results by using the OLS regression, which may provide a weak or no relationship between the variables, while there could exist a stronger and useful relationship in some parts of the dependent variable distribution. Therefore, in this paper, the quantile regression was applied to estimate the coefficients of this relationship by tackling at the same time the heterogeneous problem in the data. The main focus objective was therefore to detect the effects of the economic growth and energy consumption towards the CO2 emissions at different release amount for developed and developing countries and at different percentile levels (0.05, 0.25, 0.50, 0.75, and 0.95) and compare these effects with those found in the OLS model. The panel data used consisted of 29 countries from two different economic development groups, 17 developed versus 12 developing countries over the period 1960–2008.

Results of QR for developed and developing countries showed similar patterns. Both groups showed that the (25th and 50th) quartiles of GDP have a significantly stronger negative effect on CO2 emissions than the other higher percentiles (75th and 95th), and these effects are also larger than those detected in the OLS regression. However, the effect of EC in developed countries had similar positive effects on CO2 emission all across the percentiles except in the highest percentile (95th). Results also revealed a tendency of developing countries to have five times lower CO2 emission by increasing one unit of GDP compared to developed countries. Increasing one unit in EC had a positive effect on CO2 emission (countries release more CO2) within the developing country group. This group had coefficients twice larger than those within the developed country group, based on the 50th percentile model. In conclusion, results differ significantly across the two groups of countries with EC contributing to a higher environmental degradation in CO2 emissions. Thus, EC monitoring is a key factor for an environmentally balanced and sustainable development especially within the developing countries.