1 Introduction

As the largest carbon emitter worldwide (Guan et al., 2018; Shao et al., 2019), China accounted for 28.8% of global CO2 emissions in 2019 according to British Petroleum (2020). To reduce the country’s CO2 emissions, the Chinese government has planned to achieve 17% and 18% reductions in carbon intensity in the 12th and 13th Five-Year Plans, respectively. Moreover, China has proposed reaching a peak in CO2 emissions by 2030 according to China’s Intended Nationally Determined Contributions and achieving carbon neutrality by 2060 according to a statement at the 75th General Assembly of the United Nations.

Considering the significance of carbon reduction in China, the current literature has made a great contribution to accounting for CO2 emissions and exploring the driving factors of CO2 emissions. In CO2 emissions accounting, there are two main methods: consumption-based accounting (Mi et al., 2016; Zhang et al., 2014) and production-based accounting (Liu et al. 2020; Shan et al., 2018a, b; Wang et al., 2012a, b;). Meanwhile, in recent years, some scholars have creatively applied night-time light data to estimate CO2 emissions (Chen et al., 2020a, b; Meng et al., 2014). In regard to the driving factors of CO2 emissions in China, economic growth (Zhang & Da, 2015) and energy consumption (Wang et al., 2014) are generally regarded as dominant drivers. Moreover, there are increasing studies that focus on other factors, such as energy efficiency (Yao et al., 2015), technology improvement (Chen et al., 2020a, 2020b), and industrial structure (Dong et al., 2018).

In addition to carbon reduction, urbanization is a major concern in China. Since the economic reform began in 1978, urbanization has been rapidly developing in China. The urbanization rate in China rose from 17.9% in 1978 to 59.6% in 2018, and in the same period, the urban population size grew from 172.5 million people to 831.4 million people according to the Chinese Statistical Yearbook. On the one hand, population agglomeration is regarded as the engine of economic growth because the rapid expansion of the urban population size has facilitated the high-speed development of Chinese cities (Chan, 1992; Chen et al., 2014). On the other hand, a growing urban population size usually results in greater energy consumption, which can lead to serious environmental issues, especially increasing CO2 emissions (Cole & Neumayer, 2004; Li, Zhou, et al., 2018a, b; Zhang & Lin, 2012). Against this complex backdrop of urbanization and carbon reduction, exploring the impact of the urban population size on CO2 emissions holds great significance, both theoretically and practically.

To the best of our knowledge, some scholars have already conducted a range of studies on the impact of the population size on CO2 emissions. However, current studies have mostly focused on countries (Cole & Neumayer, 2004; Parikh & Shukla, 1995; Poumanyvong & Kaneko, 2010), regions (Zhang & Lin, 2012) and megacities (Li et al., 2018a, b; Tan et al., 2016; Wang et al., 2017; Wang et al., 2012a, b). Little attention has been paid to cities at the prefecture level due to a lack of comprehensively available and reliable CO2 emissions data, especially in China. Considering the key role of cities in urbanization and the prominent autonomy of cities, research on cities will contribute many precise and differentiated carbon reduction strategies. However, studies have been conducted on countries, regions and megacities, and little research has been conducted on the carbon mitigation of cities at the prefecture level. Therefore, detailed and continuous studies on cities in China are urgently required.

Existing studies have proposed some models of the drivers of CO2 emissions of countries, regions and megacities. To investigate the impact of the population size on CO2 emissions, some scholars have applied the STIRPAT model (Cui et al., 2019; Tan et al., 2016; Wang et al., 2017), which evolved from the IPAT model (Dietz & Rosa, 1997). The model focuses on total emissions rather than per capita emissions, which is often used to measure efficiency and justice (Hayward, 2007; Mussini & Grossi, 2015). In addition, IDA is widely used to investigate the elements of CO2 emissions for the socioeconomic sector (Li et al., 2019; Liang et al., 2017; Wang & Feng, 2018; Zhang et al., 2016; Zhu et al., 2017) and megacity (Gu et al., 2019). The IDA method is sufficiently flexible, and we can decompose CO2 into factors including the urban population size. Nevertheless, little attention has been paid to the city level because of the limitation of data availability. To fill this gap, we adopt the IDA method to investigate the impact of the urban population size on CO2 emissions using data from 175 cities at the prefecture level and above in China for the first time.

Based on available data on CO2 emissions and decomposition models, our work contributes to urban CO2 emissions reduction in China both theoretically and empirically. Our main contributions are as follows: (1) This study is the first to investigate the impact of the urban population size on the CO2 emissions of 175 Chinese cities. (2) We explore both the direct and indirect effects of the urban population size theoretically and empirically. (3) This paper investigates the influence of the urban population size on both total and per capita CO2 emissions.

2 Theoretical model

2.1 Total CO2 emissions model

Total CO2 emissions can be decomposed based on Eq. (1):

$$TCE = \mathop \sum \limits_{i = 1}^{N} E_{i} a_{i} = E\mathop \sum \limits_{i = 1}^{N} e_{i} a_{i}$$
(1)

where \(TCE\) represents the total CO2 emissions of a city, \(E_{i}\) denotes the consumption of fuel \(i\) (tce), \(a_{i}\) is the CO2 emissions factor of fuel i (tonne CO2/tce), \(E\) represents the total energy consumption (tce), and \(e_{i}\) is the proportion of \(E_{i}\) in \(E\), indicating the energy consumption structure.

We take the logarithm of Eq. (1) and obtain Eq. (2):

$$\ln TCE = \ln E + \ln \mathop \sum \limits_{i = 1}^{N} e_{i} a_{i}$$
(2)

Then, we differentiate both sides of Eq. (2):

(3)

Here,

$$\begin{gathered} \frac{{\mathop \sum \nolimits_{i = 1}^{N} \frac{{\dot{e}_{i} }}{{e_{i} }}e_{i} a_{i} }}{{\mathop \sum \nolimits_{i = 1}^{N} e_{i} a_{i} }} = \frac{{\mathop \sum \nolimits_{i = 1}^{N} \frac{{\dot{e}_{i} }}{{e_{i} }}e_{i} a_{i} E}}{{\mathop \sum \nolimits_{i = 1}^{N} e_{i} a_{i} E}} = \frac{{E_{1} a_{1} \widetilde{{e_{1} }}}}{{\mathop \sum \nolimits_{i = 1}^{N} e_{i} a_{i} E }} + \cdots + \frac{{E_{N} a_{N} \widetilde{{e_{N} }}}}{{\mathop \sum \nolimits_{i = 1}^{N} e_{i} a_{i} E}} \hfill \\ \quad = g_{1} \widetilde{{e_{1} }} + \cdots + g_{N} \widetilde{{e_{N} }} = \mathop \sum \limits_{i = 1}^{N} g_{i} \widetilde{{e_{i} }} \hfill \\ \end{gathered}$$
(4)

In the same way,

$$\frac{{\mathop \sum \nolimits_{i = 1}^{N} \frac{{\dot{a}_{i} }}{{a_{i} }}e_{i} a_{i} }}{{\mathop \sum \nolimits_{i = 1}^{N} e_{i} a_{i} }} = \mathop \sum \limits_{i = 1}^{N} g_{i} \widetilde{{a_{i} }}$$
(5)

Therefore, we obtain the following:

(6)

In Eq. (6), " ~ " represents the ratio of the change in a variable to the original value. \(g_{i}\) indicates the proportion of CO2 emissions from fuel \(i\) in TCE. \(\tilde{E}\) denotes the scale effect, which measures the increase in total energy consumption. \(\mathop \sum \limits_{i = 1}^{N} g_{i} \widetilde{{e_{i} }}\) represents the structure effect, which denotes the change in the energy consumption structure. \(\mathop \sum \limits_{i = 1}^{N} g_{i} \widetilde{{a_{i} }}\) is the technique effect, which represents the changes in emission intensity.

CO2 emissions factor \(a_{i}\) is a constant, assuming no technology improvement. Thus, \(\widetilde{{a_{i} }}\) is equal to zero. Equation (6) can be expressed as follows:

(7)

Equation (7) indicates that a change in TCE is driven by changes in E and ei.

Urban total energy consumption, \(E\) can be decomposed as follows:

$$E = Pop{*}\frac{1}{Den}*persquGDP*perGDPEng$$
(8)

In Eq. (8), \(Pop\) denotes the urban population size. \(\frac{1}{Den}\) refers to the reciprocal of population density, which is used to measure the degree of population agglomeration. \(persquGDP\) is the GDP per unit area, which denotes economic agglomeration. \(perGDPEng\) refers to energy intensity, which denotes urban energy efficiency.

Now, we can take the logarithm of both sides of Eq. (8) and obtain Eq. (9):

$$\ln E = \ln Pop + \ln \frac{1}{Den} + \ln persquGDP + \ln perGDPEng$$
(9)

We differentiate both sides of Eq. (9) and obtain the following equation:

$$\tilde{E} = \widetilde{Pop} + \widetilde{{\left( \frac{1}{Den} \right)}} + \widetilde{persquGDP} + \widetilde{perGDPEng}$$
(10)

Combining both Eqs. (7) and (10), we obtain the total CO2 emission decomposition formula, which is expressed as follows:

(11)
(12)

Based on Eq. (12), the following conclusions can be obtained:

\(\widetilde{Pop}\) reflects the direct impact of the urban population size on TCE, indicating that the urban population size stimulates TCE. \(\widetilde{{\left( \frac{1}{Den} \right)}}\) denotes the negative impact of population density on TCE. \(\widetilde{persquGDP}\) reflects the effect of economic agglomeration on TCE. In general, economic agglomeration is supposed to promote CO2 emissions (Antweiler et al., 2001). \(\widetilde{perGDPEng}\) refers to the impact of energy intensity on TCE, which implies that improving energy efficiency is conducive to reducing E and TCE. The last term reflects the impact of the urban energy consumption structure. Promoting clean energy is important for environmental protection and CO2 emission reduction.

In addition to the direct impact, the urban population size can affect CO2 emissions indirectly. To clarify this mechanism, we individually analyse the impact of the urban population size on the factors on the right side of Eq. (12).

  1. (1)

    Urban population size and population density

    We know that

    $$Den = \frac{Pop}{{area}}$$
    (13)

    Now, we can take the logarithm of the above equation and obtain the following equation:

    $$\ln Den = \ln Pop - \ln area$$
    (14)

    Then, we take the derivatives of both sides of Eq. (14) relative to \(Pop\) and obtain Eq. (15):

    $$\frac{1}{Den}\frac{\partial Den}{{\partial Pop}} = \frac{1}{Pop} - \frac{1}{area}\frac{\partial area}{{\partial Pop}}$$
    (15)

    We assume that \({\raise0.7ex\hbox{${\partial area}$} \!\mathord{\left/ {\vphantom {{\partial area} {\partial Pop}}}\right.\kern-\nulldelimiterspace} \!\lower0.7ex\hbox{${\partial Pop}$}} = 0\) because the urban area will not change with the population in the short term. Thus, we obtain the following inequality:

    $$\frac{\partial Den}{{\partial Pop}} = \frac{Den}{{Pop}} > 0$$
    (16)
  2. (2)

    Urban population size and economic agglomeration

    We know that

    $$persquGDP = \frac{GDP}{{area}}$$
    (17)

    Now, we take the logarithm of Eq. (17) and obtain the following equation:

    $$\ln persquGDP = \ln GDP - \ln area$$
    (18)

    Then, we take the derivatives of both sides of the above equation relative to \(Pop\) and obtain the following formula:

    $$\frac{1}{persquGDP}\frac{\partial persquGDP}{{\partial Pop}} = \frac{1}{GDP}\frac{\partial GDP}{{\partial Pop}} - \frac{1}{area}\frac{\partial area}{{\partial Pop}}$$
    (19)

    From the formula, we already know that \({\raise0.7ex\hbox{${\partial area}$} \!\mathord{\left/ {\vphantom {{\partial area} {\partial Pop}}}\right.\kern-\nulldelimiterspace} \!\lower0.7ex\hbox{${\partial Pop}$}} = 0\). Furthermore, an increase in population promotes economic growth by providing labour and stimulating consumption. Thus, we can obtain the following result:

    $$\frac{\partial persquGDP}{{\partial Pop}} = \frac{persquGDP}{{GDP}}\frac{\partial GDP}{{\partial Pop}} > 0$$
    (20)
  3. (3)

    Urban population size and energy intensity

    Based on the definition of energy intensity, we can obtain the following:

    $$perGDPEng = \frac{E}{GDP}$$
    (21)

    Then, we take the derivatives relative to \(Pop\) and obtain the following:

    $$\frac{\partial perGDPEng}{{\partial Pop}} = \frac{1}{GDP}\frac{\partial E}{{\partial Pop}} - \frac{E}{{GDP^{2} }}\frac{\partial GDP}{{\partial Pop}}$$
    (22)

    The impact of \(perGDPEng\) on \(Pop\) depends on the result of \(\frac{1}{GDP}\frac{\partial E}{{\partial Pop}} - \frac{E}{{GDP^{2} }}\frac{\partial GDP}{{\partial Pop}}\). According to existing studies, it is generally believed that the increasing urban population size can improve energy efficiency, mainly because of economies of scale and economic agglomeration. First, infrastructure, such as energy supply, sewage treatment and public transportation, is shared efficiently. Second, the unit costs of production and consumption will decline greatly due to economies of scale. Finally, larger cities are more attractive to researchers and research institutions, which can improve energy efficiency through advanced technologies and the corresponding spillover effects. Therefore, a negative correlation between the urban population size and energy intensity is assumed:

    $$\partial perGDPEng/\partial Pop <0$$
    (23)

    To summarize, we have the following equations:

    $$Den = Den\left( {Pop} \right)$$
    (24)
    $$persquGDP = persquGDP\left( {Pop} \right)$$
    (25)
    $$perGDPEng = perGDPEng\left( {Pop} \right)$$
    (26)

    Based on Eqs. (24)–(26), we can obtain the following results:

    $$\widetilde{Den} = \varepsilon_{Den\_Pop} \widetilde{Pop}$$
    (27)
    $$\widetilde{persquGDP} = \varepsilon_{persquGDP\_Pop} \widetilde{Pop}$$
    (28)
    $$\widetilde{perGDPEng} = \varepsilon_{perGDPEng\_Pop} \widetilde{Pop}$$
    (29)

    Combining Eqs. (27)–(29) with Eq. (12), we obtain the following equation:

    (30)

    Equation (30) shows the correlation between TCE and the urban population size.

Based on the equations above, we know that \(\varepsilon_{Den\_Pop} > 0\), \(\varepsilon_{persquGDP\_Pop} > 0\) and \(\varepsilon_{perGDPEng\_Pop} < 0\). Since the magnitudes of the three parameters are unpredictable, the impact of the urban population size must be verified by actual data.

2.2 Econometric model of per capita CO2 emissions

In addition to total CO2 emissions, another issue of concern is the impact of the urban population size on per capita CO2 emissions, which is still controversial. For example, Fragkias et al. (2013) indicated that small cities are more efficient than large cities. However, Puga (2010) found that the emissions efficiency of large cities is higher than that of small cities. Examining this issue is very important because if we cannot determine the optimal urban population size for CO2 emissions, the suboptimal choice in theory may be the optimal urban population size for the highest emission efficiency.

We decompose per capita CO2 emissions as in Eq. (31):

(31)

where \(PCE\) denotes per capita CO2 emissions.

Based on Eqs. (30) and (31), we obtain the following:

$$\widetilde{PCE} = - \widetilde{Den} + \widetilde{persquGDP} + \widetilde{perGDPEng} + \mathop \sum \limits_{i = 1}^{N} g_{i} \widetilde{{e_{i} }}$$
(32)
$$\widetilde{PCE} = \left( { - \varepsilon_{Den\_Pop} + \varepsilon_{persquGDP\_Pop} + \varepsilon_{perGDPEng\_Pop} } \right)\widetilde{Pop} + \mathop \sum \limits_{i = 1}^{N} g_{i} \widetilde{{e_{i} }}$$
(33)

3 Empirical model and estimation

3.1 Empirical model

Based on the theoretical models above, the empirical models in this paper are built as follows:

$$\ln TCE_{k} = \alpha \ln Pop_{k} + \beta \ln M_{k} + \gamma \ln X_{k} + \varepsilon_{k}$$
(34)
$${\text{ln}}PCE_{k} = \alpha \ln Pop_{k} + \beta \ln M_{k} + \gamma \ln X_{k} + \varepsilon_{k}$$
(35)

where \({\text{ln}}TCE_{k}\) is the logarithm of \(TCE_{k}\) and \({\text{ln}}PCE_{k}\) is the logarithm of \(PCE_{k} + 1\); \(Pop_{k}\) is the population of city k;\({ }M_{k}\) is the set of mechanism variables, including \(Den\), squgdp \((persquGDP)\) and gdpeng \((perGDPEng)\); \(X_{k}\) is the set of control variables for city k; and \(\varepsilon_{k}\) is the random disturbance term.

\(X_{k}\) includes the following variables:

  1. (1)

    FA (fixed assets): FA refers to investment in fixed assets, indicating urban infrastructure construction. As an important driving force of the Chinese economic miracle, infrastructure construction contributes not only to economic growth but also to energy consumption and CO2 emissions. To reflect the impact of urban infrastructure construction on CO2 emissions, we choose urban investment in fixed assets to indicate infrastructure construction.

  2. (2)

    Green (green area): Green denotes the per capita public green area, reflecting urban environmental awareness. Accompanied by the increasing urbanization rate, urban environmental awareness is growing remarkably and is supposed to curb CO2 emissions.

  3. (3)

    CS (energy consumption structure): CS indicates the proportion of coal consumption to energy consumption. China is rich in coal and poor in oil and natural gas. However, compared with oil and natural gas, coal is a low-combustion-efficiency fuel (Guo et al., 2011; Liu et al., 2015), meaning that it is supposed to increase CO2 emissions.

  4. (4)

    IS (industrial structure): IS denotes the proportion of the output value of the secondary industry to GDP. The secondary industry, including manufacturing and construction, is widely accepted as an industry with a high energy intensity and substantial waste gas emissions (Dong et al., 2014; Xia et al., 2011), including CO2 emissions.

  5. (5)

    FDI (foreign direct investment): FDI refers to the proportion of FDI to GDP. There are two opposing opinions about the impact of FDI on pollution. On the one hand, the pollution haven hypothesis states that FDI will lead to more pollution in host countries. In general, the environmental regulations of developing countries are weaker than those of developed countries. Foreign pollution-intensive firms prefer developing countries with weak environmental regulations, resulting in a greater burden on the pollution in such countries. On the other hand, the pollution halo hypothesis insists that FDI is beneficial for reducing the pollution of host countries. In summary, the effect of FDI is ambiguous.

  6. (6)

    TI (technology innovation): TI represents the proportion of science and technology expenditure by the government to GDP. This term denotes the technology improvement that is necessary to curb pollution. Technology innovation is vital for improving productivity but can lead to greater emissions (Acemoglu, 2002; Shao et al., 2016). Therefore, the effect of TI is also ambiguous.

3.2 Data

Due to data availability, we choose cross-sectional data from 175 Chinese cities at the prefecture level and above in 2010. The CO2 emissions data are from the China Emission Accounts and Datasets (CEADS),Footnote 1 to which we are one of the main contributors. This database has been employed in many studies (Shan et al., 2019; Shan et al., 2018a; b). Specifically, energy consumption is calculated based on the data on industrial enterprises above a designated size. The data on other variables are from the China City Statistical Yearbook.Footnote 2 The summary statistics is as shown in Table 1.

Table 1 Summary statistics

3.3 Results and discussion

3.3.1 Baseline estimation results and discussion

3.3.1.1 Total CO2 emissions baseline estimation results and discussion

The ordinary least squares (OLS) regression results of total CO2 emissions are shown in Table 2.

Table 2 Regression results of Eq. (34)

To examine the robustness of the core explanatory variable, we add variables individually as in existing studies (He et al., 2018; James & Aadland, 2011). As shown in Table 2, the coefficients of lnpop in all models are positive and significant, indicating that urban population expansion will promote total CO2 emissions. Additionally, the coefficients of the mechanism variables and control variables are essentially consistent, implying that the estimation results of Eq. (34) are robust. The coefficient of lnpop indicates that a 1% increase in the urban population size will lead to a nearly 1% increase in total CO2 emissions. Next, we pay attention to the results of model (10), which contains all of the variables.

Regarding the mechanism variables, the coefficient of den is significantly negative and consistent with the results of Eq. (30), implying that a higher population density will contribute to carbon reduction. The coefficients of squgdp and gdpeng are both significantly positive and meet our expectations, indicating that increases in economic agglomeration and energy intensity promote CO2 emissions.

Moreover, regarding the control variables, the coefficient of FA is positive and significant, indicating that infrastructure construction greatly contributes to CO2 emissions. Meanwhile, the coefficient of Green is significantly negative. An increase in the public green area per capita indicates an improvement in urban environmental awareness and is conducive to reducing carbon emissions. Additionally, the coefficients of CS and IS are both positive. As we analysed above, China is highly dependent on the consumption of coal and the secondary industry. Due to the high emission characteristics of coal and the secondary industry, the massive consumption of coal and the dependence on the secondary industry promote total CO2 emissions. These results confirm our expectations. The coefficients of FDI and TI are both significantly negative. On the one hand, FDI has a pollution halo effect on total CO2 emissions, which means that FDI is conducive to reducing carbon emissions. On the other hand, technology innovation is a key factor in low carbon development and carbon mitigation.

3.3.1.2 Per capita CO2 emissions baseline estimation results and discussion

Table 3 shows the OLS regression results of per capita CO2 emissions.

Table 3 Regression results of Eq. (35)

As shown in Table 3, the coefficients of lnpop in all models are significantly negative, indicating that a larger urban population size will promote per capita CO2 emission mitigation. The coefficient of lnpop indicates that a 1% increase in the urban population size will lead to a nearly 0.3% decrease in per capita CO2 emissions. Additionally, the coefficients of the mechanism variables and control variables are essentially consistent, implying that the estimation results of Eq. (35) are robust.

Furthermore, regarding the control variables of model (10), although most of the coefficients in Table 3 are consistent with those in Table 2, some of them perform quite differently. In particular, the coefficients of Green and FDI in Table 2 are significantly negative but are not significant in Table 3, indicating that Green and FDI play important roles in curbing total CO2 emissions but not in curbing per capital CO2 emissions.

In the next section, we conduct additional mechanism research to assess the indirect impacts of den, squgdp and gdpeng on CO2 emissions.

3.3.2 Mechanism estimation results and discussion

As shown in Table 4, model (1) and model (2) are the mechanism estimations of the effect of the urban population size on population density, model (3) and model (4) are the mechanism estimations of the effect of the urban population size on economic agglomeration, and model (5) and model (6) are the mechanism estimations of the effect of the urban population size on energy intensity.

Table 4 Mechanism estimation results

First, the coefficients of lnpop in models (1) and (2) are both significantly positive and meet our expectations, indicating that an increase in the urban population size will result in a higher population density. Moreover, the coefficients of lnpop in models (3) and (4) are both significant and positive, which means that increasing the population can lead to higher economic agglomeration. Finally, the coefficients of lnpop in models (5) and (6) are positive and significant, implying that an increase in population can lead to a lower energy intensity.

Overall, the coefficients of lnpop in the models correspond with the results of Sect. 2.

3.3.3 Robustness and heterogeneity

3.3.3.1 Robustness

Based on the available data, we substitute two variables to test the robustness of the models. The results are summarized in Table 5.

Table 5 Robustness estimation results

First, we substitute CS-coal with CS-oil, which is the proportion of oil consumption to total energy consumption. The results of model (1) and model (2) show that the sign and significance of the interest variable and mechanism variables remain consistent. Second, we substitute IS-second with IS-third, which denotes the proportion of the output value of the tertiary industry to GDP. The results of model (3) and model (4) also show that the sign and significance of the interest variable and mechanism variables remain consistent.

Additionally, we consider the endogeneity problem and apply two-stage least squares (2SLS) regression analysis to address endogeneity. We choose the urban population in 2000 as the instrumental variable. The results of model (5) and model (6) show that the coefficients of lnpop remain consistent with models (1)–(4) after considering endogeneity.

3.3.3.2 Heterogeneity

Cities differ greatly in population and other aspects. To propose more specific suggestions, we classify cities by population size into three groups. Table 6 shows the heterogeneity estimation results.

Table 6 Heterogeneity estimation results

The results of models (1)–(3) indicate that the urban population size significantly promoted total CO2 emissions in all three city groups. The coefficients of lnpop in large, medium and small cities are 1.3896, 0.8768 and 0.7706, respectively, indicating that compared to other cities, an increase in the urban population size in large cities will drive a larger increase in CO2 emissions. The results of models (4)–(6) indicate that the urban population size has a negative impact on per capita CO2 emissions only in small cities.

Additionally, we explore the impact of other variables on CO2 emissions and attempt to propose targeted suggestions for the three city groups. First, gdpeng, FA and TI are all significant and consistent in the three groups, indicating that energy intensity and infrastructure construction contribute to the growth in CO2 emissions, while technology innovation depresses carbon emissions for all three groups. Second, the coefficient of FDI is significantly negative only in model (1), which means that FDI has a pollution halo effect on total CO2 emissions in large cities. Considering that the large city group accounts for approximately half of the total CO2 emissions, FDI in this group will greatly contribute to reducing carbon emissions. Finally, Green has a significantly negative impact on the total CO2 emissions of small cities, as shown in model (3). Cities with a smaller population are more capable of reducing their CO2 emissions by increasing their public green areas.

4 Conclusions and policy implications

4.1 Conclusions

Based on the results of the theoretical and empirical models, we draw the following conclusions:

First, from the direct impact perspective, expansion of the urban population size is supposed to stimulate total CO2 emissions but curb per capita CO2 emissions. A 1% increase in the urban population size will lead to a nearly 1% increase in total CO2 emissions and a 0.3% decrease in per capita CO2 emissions. Moreover, the urban population size indirectly affects CO2 emissions through population density, economic agglomeration and energy intensity.

Second, in addition to the urban population and mechanism variables, there are other vital factors that influence CO2 emissions. Rapidly increasing infrastructure construction and the dependence on the consumption of coal and the secondary industry drive the growth in CO2 emissions. Fortunately, the expansion of public green areas, FDI and technology innovation are conducive to carbon mitigation.

Finally, there is significant heterogeneity between the three city groups. Specifically, the coefficients of lnpop for large, medium and small cities are 1.3896, 0.8768 and 0.7706, respectively, indicating that compared to other cities, an increase in the urban population size in large cities drives a larger increase in CO2 emissions. Regarding the control variables, an increase in FDI and in public green areas is more efficient in curbing CO2 emissions for the large city group and small city group, respectively. Moreover, technology innovation can suppress CO2 emissions in all three city groups.

4.2 Policy implications

Based on the conclusions above, we propose the following suggestions.

  1. (1)

    At present, China is undergoing rapid urbanization according to the “Two Centennial Goals”. With the further development of China's urbanization, CO2 emissions in China will continue to increase because the expansion of the urban population size leads to an increase in CO2 emissions. This fact should be considered a basic point in China’s environmental and energy policies.

  2. (2)

    China is the largest emitter of CO2 worldwide and faces great pressure in international CO2 emission negotiations. China has proposed reaching peak carbon emissions by 2030 and carbon neutrality by 2060. Based on our results, there are some measures for CO2 emissions mitigation in China. In addition to increasing population density and improving energy efficiency, public green areas, FDI and technology innovation greatly contribute to carbon reduction. Notably, the government should pay more attention to large cities that contribute more CO2 emissions.

  3. (3)

    It is inappropriate to prevent an increase in overall CO2 emissions by restricting the process of urbanization and the expansion of the urban population size. On the one hand, the urban population size is still growing with urbanization. Urbanization is an important way to narrow the urban–rural development gap and to adjust the imbalance in regional development in China. On the other hand, the urban population size affects CO2 emissions in multiple ways. Therefore, rather than directly controlling the urban population size, controlling indirect effects may be more effective.