1 Introduction

Grossman and Krueger (1995) introduced the Environmental Kuznets Curve (EKC) concept as a way to describe correlations in aggregate data that might subsequently be explained through more detailed analysis involving transmission pathways and micro foundations.Footnote 1 This has generated an expansive literature that treats the EKC model itself as providing a causal theory. However, EKC models typically omit variables such as energy prices and policy instruments (i.e financial variables, etc) that are known to causally influence pollutants like \(\hbox {CO}_{2}\) and \(\hbox {SO}_{2}\) emissions (Polemis and Stengos 2017).

The majority of the EKC studies estimate reduced-form equations that enter the model in a parametric form with controversial results (Hsueh 2015; Figueroa and Pasten 2013; Halkos and Tzeremes 2009). Parametric estimates assume a unique response coefficient for the covariates in environmental pollution regressions. However, this assumption is not warranted since in the EKC literature, theory and empirical evidence emphasize the existence of multiple regimes (i.e threshold effects) followed by strong-non linear effects (Halkos and Polemis 2016; Millimet et al. 2003). Following this spirit we apply a non-parametric technique to extend this line of research. In this way, we are able to trace possible non-linear effects in more detail and accuracy and shed some light to the competition-pollution nexus totally ignored by the existing literature.

Despite the profound interest by policy makers and government officials on the possible spillovers between market concentration and environmental degradation the existing literature is still in its infancy, with controversial results. These can be justified by the fact that many researchers acknowledge that competition may have positive as well as negative effects on environmental pollution (Simon and Prince 2016; Branco and Villas-Boas 2015; Shleifer 2004).

We contribute to this strand of literature by applying a flexible Semi-Parametric Fixed Effects Model (SPFEM) to analyze the impact of industrial output on environmental degradation when we account for the level of market concentration developed by Baltagi and Li (2002) with significant applications (see among others Libois and Verardi 2013; Desbordes and Verardi 2012). The main advantage of this technique is that it allows us to investigate any possible spillover effect between market structure and industrial pollution without knowing either the form of this relationship (i.e curvature) or the exact interactions among the main variables of interest. In other words, one of the possible gains from using non-parametric techniques is that they do not rely on strong assumptions of the functional form, usually without an initial analysis on the properties of the data. It is worth mentioning that such an ad hoc analysis using parametric methods can lead to biased estimation of economic relationships (Polemis and Stengos 2015; Delis et al. 2014; Tran and Tsionas 2010).

Our findings reveal the existence of an inverted “U-shaped” relationship between industrial output and environmental degradation already hidden when we impose parametric techniques. The rest of the paper is organized as follows. Section 2 describes the data. Section 3 develops the econometric methodology and the SPFEM, while Sect. 4 focuses on the empirical findings. Lastly, Sect. 5 concludes the paper.

2 Data

Toxic chemical releases were drawn from the Toxic Release Inventory of Environmental Protection Agency covering the period 1987–2015. The panels used in this study consist of 2461 industrial facilities broken down by 473 6-digit NAICS codes over 5 year intervals; namely 1987, 1992, 1997, 2002, 2007 and 2012.Footnote 2

The sample period is selected in order to reduce the measurement error bias generated by the use of higher-frequency data (Griliches and Hausman 1986). The structural variables such as market concentration and value added that correspond to each 6-digit code were drawn from the National Bureau of Economic Research (NBER) and especially from Manufacturing Industry Database (CES).

This database contains annual industry-level data from 1958 to 2011 on output, employment, payroll and other input costs, investment, capital stocks, and various industry-specific price indexes. Because of the change from SIC to NAICS industry definitions in 1997, the database is provided in two versions (one with 459 four-digit 1987 SIC industries and the other with 473 six-digit 1997 NAICS industries). Especially for the year 2012, and due to data restrictions concerning the level of market concentration as measured by certain indicators (i.e CR4, CR8, CR20, CR50 and HHI), we used data directly from the US Census of Manufacturers. The latter is only conducted every 5 years limiting our time span to 6 years (1987, 1992, 1997, 2002, 2007 and 2012). Similarly to Polemis and Stengos (2017) and in order to check for the robustness of our findings, we take five measures of market concentration: HHI is the Herfindahl-Hirschman index for the 50 largest firms in the industryFootnote 3, CR4 is the four-firm concentration ratio, CR8 is the eight-firm concentration ratio, CR20, is the twenty-firm concentration ratio and finally CR50, is the fifty-firm concentration ratio.Footnote 4 It is worth mentioning that our measures of market structure reveal the existence or the absence of effective competition in the industry since concentration is simply the inverse of competition (Simon and Prince 2016).

3 Empirical framework

We estimate the SPFEM following the methodology described in Baltagi and Li (2002). Let the model be given by the following equation:

$$\begin{aligned} y_{it} =a_i +x_{it}^T \beta +w_{it} \gamma +f(z_{it} )+\varepsilon _{it} \end{aligned}$$
(1)

where \(f(z_{it} )\) is an unknown function of \(z_{it},\) entering the model in a non linear way. \(\hbox {Y}_{it}\) is the dependent variable. \(\hbox {X}_{it}\) is the vector of exogenous linear regressors, while the w-vector includes the year dummy variables. Lastly, \(\varepsilon _{it}\) are zero mean i.i.d. innovations.

Following Baltagi and Li (2002), we approximate \(f(z_{it} )\) by series differences \(p^{K}(z_{it} )\) where the latter are the first k terms of a sequence of functions [\(\hbox {p}_{1}\)(z), \(\hbox {p}_{2}\)(z), ...]. By taking first differences in order to remove fixed effects, we end up with the following equation:

$$\begin{aligned} \Delta (y_{it} )=\Delta (x_{it}^T )\beta +\Delta (w_{it} )\gamma +\Delta \{p^{k}(z_{\iota t} )\}\delta +\Delta (\varepsilon _{\iota t} ) \end{aligned}$$
(2)

Equation 2 can be estimated by using OLS. In the next step, we use the fitted fixed effects \(\hat{{a}}_i \) in order to estimate the error component residual of Eq. 1. Thus we have:

$$\begin{aligned} \hat{{u}}_{it} =y_{it} -x_{it}^T \hat{{\beta }}-w_{it} \hat{{\gamma }}-\hat{{a}}_i =f(z_{it} )+\varepsilon _{it} \end{aligned}$$
(3)

As it is evident with the above transformation we eliminate the fitted fixed effects \(\hat{{a}}_i \) (Baltagi and Li 2002). Therefore, in the next step, we could estimate f\((z_{it})\) using a nonparametric estimator based on a kernel local polynomial fit or spline interpolation. We use the latter approach (B-spline of order K = 2) since it better approximates complex shapes and does not suffer from Runge’s phenomenon (Newson 2012).

4 Results and discussion

We carry out the first part of the analysis with the existence of cross-section independence since it is common for panel data to violate this assumption which will result in low power and size distortions of tests. We use the cross-section dependence test proposed by Pesaran (2004). The test strongly rejects the null hypothesis of cross-section independence for all the variables, providing evidence of cross-sectional dependence in the data given the statistical significance of the cross-section dependence (CD) statistic (Table 1).

Table 1 Pesaran CD test

In light of this evidence we proceed to test for unit roots using tests that are robust to cross-section dependence (“second generation” tests) proposed by Breitung and Das (2005) and Pesaran (2007). Both tests suggest that all the sample variables are stationary (Table 2).

Table 2 Unit root tests
Table 3 Estimation results

In the next step we estimate separately (see Eqs. 4, 5) the following (parametric) panel data models by employing various techniques such as pooled OLS, Fixed Effects (P-FE) and Instrumental Variables fixed effects (IV P-FE) with the latter controlling for possible endogeneity and reverse causality generated by the inclusion of industry output (SHIP) in the relevant regressions. For this reason we followed a similar approach with the study of Halkos and Polemis (2017) in which a simple model is developed that incorporates the relationship of financial development and economic growth to environmental degradation together with the validation of the EKC hypothesis. Our parametric equations thus take the following form:

$$\begin{aligned} \ln (\textit{REL}_{jit} )= & {} \alpha _i +\beta _t +b_0 +b_1 \ln (\textit{SHIP}_{it} )+b_2 \ln (\textit{SHIP}_{it} )^{2}\nonumber \\&+\, b_3 \ln (\textit{HHI}_{it} )+u_{jit} \end{aligned}$$
(4)
$$\begin{aligned} \ln (\textit{REL}_{jit} )= & {} \alpha _i +\beta _t +b_0 +b_1 \ln (\textit{SHIP}_{it} )+b_2 \ln (\textit{SHIP}_{it} )^{2}\nonumber \\&+\, b_3 \ln (CR4_{it})+u_{jit} \end{aligned}$$
(5)

Where ln(\(\hbox {REL}_{ijt})\), denotes the logged total chemical releases emitted by facility j in industry i across the year t. Ln(\(\hbox {SHIP}_{it})\) is the logged value of shipments as a proxy for industry output i during year t. Ln(\(\hbox {HHI}_{it})\) is the logged Herfindahl-Hirschman index for the 50 largest firms in the industry i, while ln(CR4\(_{it})\) is the four-firm concentration ratio in the industry i expressed in natural logarithm. Moreover, \(\alpha _{i}\)is the unit-specific residual that differs between sectors but remains constant for any particular facility (unobserved facility level effect) while \(\beta _{t}\)captures the time effect and therefore differs across years but is constant for all sectors in a particular year. Finally, \(\mathrm{u}_{jit}\) denotes the idiosyncratic error term which is i.i.d. From Table 3, it is shown that all the coefficients of the SHIP terms are statistically significant alternating their signs starting from negative to positive, while the two market concentration indices are positively correlated with the chemical releases.Footnote 5 In other words, industry output decreases up to a certain “turning” point and then increases gradually. This suggests the existence of a ‘U’ shaped relationship between industrial output and pollution when market concentration is taken into consideration. However, the Wooldridge test (W-T) for serial correlation in the error term reveals that there is a lot of serial dependence in the residuals and the standard errors since the null hypothesis (homoskedasticity) is strongly rejected in all of the six alternative specifications with a p value equal to 0.000.Footnote 6

In the next step, we proceed to estimate the SPFEM given by the following equation:

$$\begin{aligned} \ln (\textit{REL}_{jit} )=\alpha _i +\beta _t +f[\ln (\textit{SHIP}_{it} )+\ln (\textit{SHIP}_{it} )^{2}]+b_1 \ln (\textit{HHI}_{it} )+\mathrm{u}_{jit} \end{aligned}$$
(6)

where f\((\cdot )\) is a function which is non-parametrically estimated. The graphical presentation of the semi-parametric estimation of f\((\cdot )\) is displayed in Fig. 1.Footnote 7 As it is evident, the SPFEM redefines the curvature as the relationship between industrial output and pollution when market concentration is taken into account appears to be inversely “U-shaped” and is now precisely estimated. In other words, competition intensity tends to decrease the level of toxic chemical releases and thus industrial pollution up to a certain point reversing its trend henceforth.

These results highlight the superiority of semi-parametric models in exploring non-linear relationships compared to classical parametric approaches. This could be attributed to the fact that parametric regression models are highly nonlocal as the conditional expectation in one part of the data can be heavily influenced by observations in another part (Desbordes and Verardi 2012).

Fig. 1
figure 1

Inverted U-shape curve between industrial output and pollution. The horizontal axis measures the logged industrial output ln(SHIP), while in the vertical axis the logged level of industrial toxic releases ln(REL) is measured. The points in the graph are partial residuals for industrial toxic releases which are centered around the mean. Blue shaded area corresponds to 95% confidence bands

5 Conclusions

In this paper we compare and contrast standard parametric techniques with a non-parametric one in order to study the impact of market concentration on environmental pollution within an EKC framework. In this way we draw sharp differences between the usual parametric methodologies (i.e Pooled OLS, FE-OLS and IV) and the semiparametric fixed effects estimator along the lines of Baltagi and Li (2002).

The empirical findings justify for the first time in the empirical literature the existence of an inverted ”U-shaped” curve linking toxic chemical pollution with industrial output when market concentration is taken into account. This relationship provides new insights into the environmental policy since the regulators must take into account if they are on the upward or the downward slopping part of the curve in order to pursue the effective environmental policies. Lastly, we argue that using a flexible semi-parametric fixed effects estimator we do not misleadingly fail to find an inverted U-shape curve linking competition with environmental degradation compared with the parametric techniques.

The existence of an (inverted) “U” shaped curve provides new insights into the environmental policy debate toward emissions releases abatement. This means that the increasing non parametric regression line up to a certain concentration level approximately indicates a negative effect on facilities’ emissions levels whereas a decreasing line indicates a positive effect. Lastly our semiparametric econometric model concurs that the results remained robust under different specifications of market concentration.