1 Introduction

The relationship between economic growth and the environment remains a debatable issue in the field of environment economics. Recent scientific evidence of global warming, resource depletion, air and water pollution have been linked to the harmful effects of human activities on the environment (i.e. the so-called anthropogenic effects). This has raised concerns among researchers and policy makers on how best to make economic growth (hence, income) compatible with the environment, that is, the sustainability of economic growth. Since the economy and environment link is complex and highly controversial, much of the analysis focus on the trade-off between income and environmental degradation (pollution) within the framework of the so-called “Environmental Kuznets Curve” (EKC) hypothesis .

According to the EKC hypothesis, the income-pollution relationship exhibit an inverted U-shaped pattern similar to the inequality-growth proposition of Simon Kuznets. Intuitively, it posits that economic growth associated with higher income level results in environmental pollution during the early stage of development with the drive towards industrialization. However, pollution reduces when higher income levels reaches a certain threshold coupled with the adoption of cleaner and environmentally friendly technology in the production process. Thus, the EKC hypothesis emphasizes economic growth as a pre-condition for reducing environmental pollution. This viewpoint is aptly summarized by Beckerman (1992) that “although economic growth usually leads to environmental deterioration in the early stages of the process, in the end, the best and probably the only way to attain a decent environment in most countries is to grow rich”. Such a proposition makes identifying the shape of the relationship significant for meaningful design of appropriate economic and environmental policy (Azomahou et al. 2006). This is because the impact of economic growth on the environment could either be positive, negative or a combination of both. For instance, a positive relationship would imply higher pollution levels as income rises, and only when income stagnates that the trajectory can be reversed. Alternatively, if the relationship exhibits an inverted U-shaped curve, then environmental pollution can be reversed at higher income levels, and thereby making economic growth compatible with environmental quality.

Since the seminal work of Grossman and Krueger (1991), an extant literature have emerged to identify the EKC hypothesis of an inverted U-shaped income-pollution relationship. To date, empirical studies have produced mixed and inconclusive evidence depending on the choice of countries, measures of environmental pollutants and econometric techniques used. Dinda (2004), Galeotti (2007), and more recently, Kaika and Zervas (2013a, b) provide an excellent survey of the literature. Even in the context of African countries, the evidence on the EKC hypothesis is far from a consensus. For instance, Orubu and Omotor (2011) find evidence of an inverted U-shaped relationship for particulate matter (\(\hbox {PM}_{10}\)) emissions. However, in the case of organic water pollutants, their evidence suggested a positive relationship. Osabuohien et al. (2014) find evidence of the EKC hypothesis for both \(\hbox {CO}_2\) and \(\hbox {PM}_{10}\) emissions. Ogundipe et al. (2014) controls for income heterogeneity in African countries, and find no evidence for the EKC hypothesis for Africa (all countries combined), low-income and upper middle-income countries except for lower middle-income countries in Africa. Yaduma et al. (2015) finds a monotonically increasing income and \(\hbox {CO}_2\) emissions relationship for Africa based on quantile regression.

In the literature, the shape of the EKC relationship as determined by the underlying forces is often captured using reduced form models. In this context, studies commonly use parametric model specifications of either quadratic or cubic polynomial functions to capture non-linearities and to gauge the threshold levels. These models assume ex ante specific functional forms in validating the EKC hypothesis. Such an ad hoc approach may not completely account for the complexity in the EKC relationship. Moreover, when the model assumptions are at variance with the true data generating process, then a functional form misspecification will lead to wrong policy prescriptions. On the other hand, studies using longitudinal data with standard panel data techniques often neglect the heterogeneity of countries or regions due to economic, social, political, structural and biophysical differences which may have varying effects on environmental quality (Dinda 2004). These techniques assume parameter homogeneity which suggest that the income-pollution trajectory will be the same for all countries. This assumption has been rejected as being inadequate with suggestions for a more flexible approach that is robust to functional form specification and parameter heterogeneity (Vollebergh et al. 2005).

Following the need for more flexible techniques, nonparametric and semiparametric regression models have become popular among researchers for detecting the true shape of the income-pollution relationship (Taskim and Zaim 2000; Azomahou et al. 2006; Bertinelli and Strobl 2005; Nguyen and Azomahou 2007; Luzzati and Orsini 2009; Kim 2013; Chen and Chen 2015; Nigatu 2015; Wang et al. 2016). For instance, Taskim and Zaim (2000) finds a U-shaped relationship between environmental efficiency index and income only for countries with sufficiently high GDP per capita income using a nonparametric methodology for cross-sectional data on \(\hbox {CO}_2\) emissions. Bertinelli and Strobl (2005) used a partially linear model with fixed-effects estimators for a panel of countries and finds a positive relationship at low incomes which flattens out before increasing again for high incomes. Azomahou et al. (2006) finds evidence of an upward sloping, monotonous income and \(\hbox {CO}_2\) emissions relationship with structural stability for a panel of 100 countries. Chen and Chen (2015) examined the EKC hypothesis for industrial \(\hbox {CO}_2\) emissions for 31 Chinese provinces and finds the existence of an inverted U-shaped curve. Nigatu (2015) find that as income rises the level of particulate matter (\(\hbox {PM}_{10}\)) pollution rises and falls for low-income and middle income countries. Wang et al. (2016) using the semi-parametric panel fixed effects estimator of Baltagi and Li (2002), finds evidence supporting an inverted U-shaped curve for the relationship between economic growth and sulfur-oxides (\(\hbox {SO}_2\)) emissions for China. These methods have the advantage of not requiring correct functional form specification especially when the exact nature of the relationship is unknown. Instead, it allows the data generating process to determine the true shape of the relationship by finding a smooth representation of the data dynamics. Hence, these methods are robust to arbitrary forms of functional form specification, non-linearities and parameter heterogeneity.

From the following background, and with the lack of consensus on the income-pollution relationship in Africa based on parametric estimation techniques, this paper revisits the EKC hypothesis in line with the recent re-orientation of the literature towards non- and semi-parametric methods. Specifically, it aims to determine the definite shape of the income-pollution relationship for Africa. To achieve this objective, the paper uses data from a sample of 49 African countries for the period 1990–2010, and focuses on two atmospheric air pollutants namely, carbon-dioxide (\(\hbox {CO}_2\)) and ambient particulate matter (\(\hbox {PM}_{10}\)) emissions. More importantly, the Stochastic Impacts by Regression on Population, Affluence and Technology (STIRPAT) model is used as the reference analytical framework for evaluating the anthropogenic forces behind environmental change; and the semiparametric panel fixed effects regression technique of Baltagi and Li (2002) which assumes ex ante no specific functional form is used to gauge the true shape of the income-pollution relationship.

Going forward, the balance of the paper is as follows: Sect. 2 describes the STIRPAT framework and methodology. Section 3 describes the dataset. Section 4 presents the empirical results of the estimations; and lastly, Sect. 5 gives the concluding remarks

2 Theoretical framework and methodology

As mentioned earlier, the paper uses the IPAT framework to investigate the income-pollution relationship. Ehrlich and Holdren (1971) proposed the IPAT model (\(I=PAT\)) to describe the changes in environmental impacts as induced by human activities (i.e. so-called anthropogenic effects). In other words, it evaluates the environmental impact of population, affluence, and technology on the environment. The intuition is that environmental impacts (I) are a multiplicative function of population size (P), affluence described per capita of economic activity (A), and the level of technology per unit of consumption and production (T):

$$\begin{aligned} I = P \cdot A \cdot T \end{aligned}$$
(1)

The model simply describes the anthropogenic driving forces behind environmental damages as a mathematical identity. However, this makes the IPAT model rigid in terms of the proportionality restrictions between the variables. Following this shortcoming, Dietz and Rosa (1997) developed a stochastic version of the IPAT, designated as STIRPAT, which provides a flexible quantitative framework to investigate environmental impacts. The model specification is

$$\begin{aligned} I_i = aP^b_iA^c_iT^d_i\varepsilon _i \end{aligned}$$
(2)

where I, P, A, and T remains as described above; a, b, c and d are parameters of the model; \(\varepsilon\) represents the idiosyncratic error term, and the subscript i denotes observational units (e.g. countries) in a cross-section data. Taking the natural logarithm of Eq. (2) provides a convenient linear specification as follows:

$$\begin{aligned} lnI_i = a + b\;lnP_i + c\;lnA_i + d\;lnT_i + \varepsilon _i \end{aligned}$$
(3)

As a refinement to the STIRPAT model, York et al. (2003) argues that the quadratic terms of the components P, A, and T along with additional environmental impact factors can be incorporated into the model provided it is consistent with the multiplicative specification. Thus, Eq. (3) can be extended with the incorporation of a quadratic term for the affluence (A) variable in line with the EKC hypothesis to capture possible existence of an inverted U-shaped relationship. This inverted U-shaped relationship can be explained by three basic mechanisms, namely, the scale, composition, and technique effects. The scale effect suggest the idea that environmental quality deteriorates with expansion in economic activities, and generally economic growth. In the early stage of development as the economy shift from primary to industrial production, more inputs of natural resources are exploited to increase the scale of production, and output. This generates wastes and emissions as by-products which contribute to the environmental pollution. However, economic growth generates structural change and technological progress, which in turn, creates the composition and technique effects. The composition effect is linked with the production shift from pollution-intensive industries to services-based ones which are less polluting. On the other hand, the technique effect is associated with the adoption of cleaner and environmentally-friendly production technology that that faces out dirtier techniques and reduces pollution per unit of output. Closely linked with this production perspective is the consumption viewpoint which suggest that higher levels of income for consumers intensifies their demand for cleaner and greener environment as well as the institution of stricter environmental regulations. Overall, the EKC suggest that the negative scale effect will be offset by the combined positive composition and technique effects which should reduce pollution over time (see Dinda 2004; Kaika and Zervas 2013a).

Consequently, the extended version of the STIRPAT model with all variables transformed to their natural logarithmic form and estimated coefficients interpreted as elasticities is specified as follows:

$$\begin{aligned} E_{it} = \beta _1 gdpc_{it} + \beta _2 gdpc_{it}^2 + \beta _3 pop_{it} + \beta _4 enit_{it} + \alpha _i + \tau _t + \varepsilon _{it} \end{aligned}$$
(4)

where E is a measure of environmental quality of country i at time t; pop denotes the population size; gdpc is GDP per capita; enit denotes technology which is proxied by energy intensity to capture technology damaging effect on the environment. \(\alpha _i\) represents country-specific effect that is constant with time, and a time-specific effect \(\tau _t\) to account for time-varying omitted variables and stochastic shocks that are common to all countries. Depending on the sign and statistical significance of the slope parameters of the income (gdpc) variable, an important information as to the form of the income-pollution relationship is discernible: (1) if \(\beta _1 > 0\) (\(\beta _1 < 0\), respectively) and \(\beta _2 = 0\), then the relationship is monotonically increasing (decreasing); and (2) if \(\beta _1 > 0\) and \(\beta _2 < 0\), then an inverted U-shaped curve is observed for the relationship with the turning point given as \(E^* = \frac{- \beta _1}{2\beta _2}\).

Within this framework, standard panel data techniques can be use to estimate Eq. (4). However, a major drawback of the above parametric model analysis is that it assumes ex ante specific functional form and does not account for parameter heterogeneity across countries in the sample. Moreover, higher polynomial regression, and more generally parametric regression models, possesses undesirable “nonlocal effects” (Magee 1998). As Yatchew (1998) points out, most economic theory does not provide sufficient information with regards to the specific functional form between a dependent variable and its covariates in a regression. Thus, to avoid possible functional form misspecification in the above parametric framework, we take an alternative approach using a semi-parametric regression framework which relaxes the functional form assumptions and allows the data generating process to determine the true shape of the income-pollution relationship. Given that the true relationship is ex ante unknown, we specify a semi-parametric partially linear panel model with fixed effects as follows:

$$\begin{aligned} E_{it} = m(gdpc_{it})+ \beta _3 pop_{it} + \beta _4 enit_{it} + \alpha _i + \tau _t + \varepsilon _{it} \end{aligned}$$
(5)

where \(m(\cdot )\) is an unknown smooth function with only income, gdpc, entering the regression nonparametrically while other control variables are specified parametrically. This model accommodates the inclusion of more control variables without concerns for the curse of dimensionality problem associated with fully nonparametric models. The presence of the unobserved heterogeneity \(\alpha _i\) can be removed through first-differencing:

$$\begin{aligned} E_{it} - E_{it-1}= & {} [m(gdpc_{it}) - m(gdpc_{it-1})] + \beta _3(pop_{it} - pop_{it-1}) \nonumber \\&+\,\beta _4(enit_{it} - enit_{it-1}) + \varepsilon _{it} - \varepsilon _{it-1} \end{aligned}$$
(6)

To consistently estimate Eq. (6), Baltagi and Li (2002) proposed to approximate \([m(gdpc_{it}) - m(gdpc_{it-1})]\) by the series differences \(p^k(gdpc_{it}, gdpc_{it-1}) = [p^k(gdpc_{it}) - p^k(gdpc_{it-1})]\) where \(p^k(gdpc)\) are the first k terms of a sequence of functions \((p_1(gdpc), p_2(gdpc), \ldots )\). In practice, a typical example of \(p^k\) series could be a spline, which corresponds to piecewise polynomials with pieces defined by a sequence of smooth knots which when joined smoothly reduces Eq. (6) down to

$$\begin{aligned} E_{it} - E_{it-1}= & {} \left[ p^k(gdpc_{it}) - p^k(gdpc_{it-1})\right] \vartheta + \beta _3(pop_{it} - pop_{it-1})\nonumber \\&+\,\beta _4(enit_{it} - enit_{it-1}) + \varepsilon _{it} - \varepsilon _{it-1} \end{aligned}$$
(7)

which can be consistently estimated by ordinary least squares. Once parameters \(\hat{\beta }\)’s and \(\hat{\vartheta }\) have been estimated, the values of the unit-specific intercepts \(\hat{\alpha _i}\) can be calculated in order to recover the error component residual

$$\begin{aligned} \hat{u}_{it} = E_{it} - \hat{\beta _3} pop_{it} - \hat{\beta _4} enit_{it} - \hat{\alpha _i} = m(gdpc_{it}) + \varepsilon _{it} \end{aligned}$$
(8)

The curve \(m(\cdot )\) can be easily estimated by regressing \(\hat{u}_{it}\) on \(gdpc_{it}\) using flexible estimation methods such as kernel or spline regression. Here, we use the B-spline regression model of order \(k = 4\).

3 Data

We investigate the definite shape of the income-pollution relationship for a sample of 49 African countries over the period 1990–2010 (see Table 4 in “Appendix” for country listing). Population is measured as total population, affluence which captures economic prosperity is measured as real GDP per capita (constant 2005 US dollars). Technology is measured using energy intensity. Energy intensity is often expressed as total energy use per dollar GDP. Here, energy intensity is expressed as total primary energy consumption per dollar GDP (Btu per year 2005 PPP US dollars). Environmental degradation is captured using two atmospheric air pollutants, namely, \(\hbox {CO}_2\) emissions and ambient particulate matter (\(\hbox {PM}_{10}\)). \(\hbox {CO}_2\) emissions (metric tons per capita) include burning of fossil fuels and cement manufacturing, but excludes emissions from land use such as deforestation. \(\hbox {PM}_{10}\) captures fine suspended particles less than \(10\,\upmu \hbox {m}\) in diameter, and is capable of penetrating deeply into the respiratory tract, causing significant health damage to humans and animals. This consist of chemically stable substances such as dust, soot, ash, smoke, and liquid droplets from fuel consumption, industrial and construction activities. The data on per capita carbon emissions, and ambient particulate matter, population size, and GDP Per capita is sourced from the World Bank’s World Development Indicators online database while energy intensity is obtained from the International Energy Statistics of the U.S. Energy Information Administration (EIA).Footnote 1 Table 1 presents the summary statistics with all variables transformed to their natural logarithm form.

Table 1 Summary statistics
Table 2 Parameter estimates of income-\(\hbox {CO}_2\) emissions nexus

4 Results

In revisiting the EKC analysis for Africa, Eq. (4) is estimated using two standard panel data techniques of OLS and fixed effects (FE) models while Eq. (5) is estimated with Baltagi and Li (2002) semi-parametric panel fixed effects models (SEMI-PAR) as the exact income-pollution relationship is not known and can differ across countries or regions. Table 2 presents the empirical estimates in each columns for each estimation technique respectively. The population variable is statistically significant and has a positive coefficient estimates (i.e. 0.074 and 0.599) for the OLS and SEMI-PAR estimations respectively. This implies that higher population exacerbates pressure on environmental quality. However, the significance is lost when estimated with the FE technique. For energy intensity, the estimated coefficients are positive and statistically significant only for both OLS and FE estimations. This implies that higher consumption of fossil fuels in the production process will increase carbon emissions which in turn, will put further pressure on environmental quality. However, the coefficient estimate for energy intensity for the SEMI-PAR estimation is negative and not significant. Across all three estimations, the impact and importance of population and energy intensity differs which reiterates the issue of robustness in the literature as different estimation techniques return different outcomes.

Fig. 1
figure 1

Partial fit of income and \(\hbox {CO}_2\) emissions relationship: points in graph are estimated partial residuals for \(\hbox {CO}_2\) emissions; maroon curve represents fitted values for adjusted effects of other explanatory variables, and bounded by the 95% confidence bands

Considering the income variable and its quadratic term, the OLS estimation result indicates that income is positive and statistically significant whereas its quadratic term although negative is not significant. On this basis, the result indicate that the income-\(\hbox {CO}_2\) emissions nexus in Africa follows a positive relationship. In other words, higher income with economic growth will increase carbon emissions which in turn worsens environmental quality. On the other hand, both income and its quadratic term are not significant in FE estimation, as such there is no evidence to support the EKC hypothesis in Africa. Unlike these parametric models which yields a unique coefficient estimate, non- and semi-parametric models provide a partial regression plot that describes the true shape of the relationship between a dependent variable and the regressor of interest while holding other regressors at a fixed point such as their means. Figure 1 presents the partial fit for the income-\(\hbox {CO}_2\) emissions relationship. The fitted curve shows a relatively flat but positive relationship which supports the OLS estimation in column (1). Further, this indicates that there is no evidence supporting the validity of the EKC hypothesis for African countries. Thus, for African countries that are still at the intermediate stage of development with the agriculture sector being dominant and a less sophisticated industrial sector, economic growth will typically have a scale effect on the environment. This can be anticipated as Africa’s contribution to greenhouse gases emissions has increased recently although it is the least when compared with emissions from industrialized countries.

Table 3 Parameter estimates of income-\(\hbox {PM}_{10}\) emissions nexus

Turning to the alternative measure of environmental pollution, Table 3 presents the empirical results for all three estimation techniques in the case of the income-\(\hbox {PM}_{10}\) emissions nexus. The population variable is statistically significant with a positive coefficient estimate for the OLS estimation (i.e. 0.071) in column (1). Other estimations report a negative coefficient with statistically significance for FE and SEMI-PAR estimations. Energy intensity has a negative coefficient estimates across all three parametric models whereas it is positive for the semi-parametric model. However, statistical significance is only obtained in the OLS estimation. This is understandable as domestic fuel burning for cooking and heating represents the major source of \(\hbox {PM}_{10}\) emissions in Africa rather than industrial-related sources (Karagulian et al. 2015). In terms of the income variable and its quadratic term, there is no evidence of the EKC hypothesis for the FE estimation approach except with OLS estimation in column (1). This means that as income rises in African countries due to economic growth, \(\hbox {PM}_{10}\) emissions will rise and after reaching a turning point of approximately 609 U.S. dollars will reduce and much so with environmental pollution, as people switch from pollution-intensive activities such as cooking with biomass fuel to gas while environmentally-friendly and cleaner technologies replace dirtier production techniques. In order to validate the robustness of this outcome, Fig. 2 presents the partial fit of the income-\(\hbox {PM}_{10}\) emissions. The fitted curve shows that the relationship is non-monotonically decreasing as income rises.

Fig. 2
figure 2

Partial fit of income and \(\hbox {PM}_{10}\) emissions relationship: points in graph are estimated partial residuals for \(\hbox {PM}_{10}\) emissions; maroon curve represents fitted values for adjusted effects of other explanatory variables, and bounded by the 95% confidence bands

From the foregoing, the empirical results show that the nature and validity of the income-pollution relationship based on the EKC hypothesis depends on the estimation approach used and its associated model assumptions on functional form specification. For this analysis, standard panel data technique of fixed effects model does not offer insight into the existence of an inverted U-shaped EKC curve, a monotonically increasing or decreasing relationship. However, its OLS counterpart shows evidence of a monotonically increasing relationship for \(\hbox {CO}_2\) emissions as well as the inverted U-shaped curve for income-\(\hbox {PM}_{10}\) emission relationship. This inconsistency reiterates the econometric caveats in the literature surrounding ex ante restrictions on the functional form specification and robustness issues. On the contrary, the semi-parametric analysis provides a more definite shape of the income-pollution relationship with flexibility in functional form specification as a non-monotonically increasing and decreasing relationship is observed for \(\hbox {CO}_2\) and \(\hbox {PM}_{10}\) emissions respectively.

In addition, both the OLS and semi-parametric results show that differences in the income-pollution relationship depends on the indicator for environmental pollutants. For atmospheric air pollutants, evidence suggest that the EKC relationship is associated with environmental pollutants with short-term and local impacts, rather than with global, indirect and long-term impact on human health and overall environmental quality (Arrow et al. 1995; Dinda 2004). Local pollutants such as ambient particulate matter have recognizable negative effects at the local level with a low abatement cost, whereas global pollutants such as \(\hbox {CO}_2\) emission have a high abatement cost with long-term effects. Thus, most empirical studies involving \(\hbox {CO}_2\) emission typically indicate a positive relationship rather than the inverted U-shaped curve since economic growth is associated with increased energy use (Dinda 2004; Kaika and Zervas 2013a, b). Following from the semi-parametric analysis, the evidence show that higher income levels with economic growth in Africa will lead to increased energy demand, and in turn, increased \(\hbox {CO}_2\) emissions, as African countries are supposedly in their intermediate stage of development. In other words, African countries are still on the upward trajectory of the EKC relationship for \(\hbox {CO}_2\) emission which is characterized by the scale effect of economic activities on the environment. Meanwhile, economic growth with increased income levels is compatible with a reduction in \(\hbox {PM}_{10}\) emission, and an improvement in environmental quality . As shown in Karagulian et al. (2015), domestic fuel burning constitutes the dominant source of ambient particulate matter in Africa. This for example includes wood (biomass) and coal for domestic cooking and heating. Other sources such as industrial-related emissions from oil combustion, coal burning in power plants represents a smaller fraction than traffic emissions from automobiles (vehicles). Therefore, as income rises following economic growth, ambient particulate matter emissions from this sources is expected to decline with the use of environmentally cleaner alternatives.

5 Conclusion

This paper revisits the environmental Kuznets curve (EKC) hypothesis with the aim of determining a definite shape of the income-pollution relationship for a sample of 49 African countries for the period 1990–2010. Recent orientation of the literature has led to the use of non- and semi-parametric methods which are robust to functional form misspecification and potential parameter heterogeneity as it allows the data dynamics to determine the true shape of the relationship contrary to widely used parametric methods that assumes ex ante specified functional forms. Using the STIRPAT model as its analytical framework and the semi-parametric panel fixed effects estimator of Baltagi and Li (2002) which mitigates against functional form misspecification, the true relationship between income and two atmospheric air pollutants, namely carbon dioxide (\(\hbox {CO}_2\)) and suspended particulate matter (\(\hbox {PM}_{10}\)) emissions is investigated.

The empirical evidence is summarized as follows. First, the parametric OLS estimation suggest a monotonically increasing relationship between income and \(\hbox {CO}_2\) emissions whereas an inverted U-shaped relationship is obtained for the income-\(\hbox {PM}_{10}\) emission relationship. Meanwhile, no form of relationship is observed with panel fixed effects estimation. Thus, different parametric specifications could lead to different empirical conclusion and ultimately a wrong policy prescription. Second, the semi-parametric counterpart clearly shows that the income-\(\hbox {CO}_2\) emissions relationship is non-monotonically increasing while a non-monotonically decreasing relationship is observed for the income-\(\hbox {PM}_{10}\) relationship. Thus, while economic growth is beneficial for the reduction of suspended particulate matter, on the other hand, it leads to an increase in \(\hbox {CO}_2\) emissions in the region. Consequently, economic growth might not be a sufficient condition for improving environmental quality especially in the case of \(\hbox {CO}_2\) emissions. Hence, there is need for an integrated policy design with instruments that makes promoting economic progress compatible with a green environment such as emphasizing the use of cleaner energy sources.