1 Introduction

Ecological footprint (abbreviated as EF), proposed by William in the early 1990s and then developed by Wackernagel in 1996, is one of the most commonly used quantitative indicators on evaluating the sustainable development status of a region (Rees and Wackernagel 1996). The natural resources consumed could be tracked by the ecological footprint model, and the EF values quantitatively reflected the influence of human activities on local natural resources, which were converted into the common bio-productive area named as “global hectares.” Thus, the region’s sustainability status could be judged by comparing the EF values with the local ecological capacity.

The EF model has been introduced at all scales including the globe (Rice 2007), nations (Viglizzo et al. 2011; Tian et al. 2019), provinces (Bao et al. 2011; Jia et al. 2010), cities (Li et al. 2010; Lu and Chen 2017), communities (Li et al. 2008) and individuals (Yang et al. 2018; Honti et al. 2019) and has been applied in climate change researches (Klein-Banai and Theis 2011; Beaussier et al. 2019), ecological risk assessment (Herva et al. 2011) and regional policy studies (Jin et al. 2009). In addition, the footprint model has also been improved into the energy footprint, carbon footprint, and water footprint (Carballo Penela and Sebastian Villasante 2008; Chen et al. 2007; Chen and Lin 2008).

How to precisely predict the EF values’ development has been a heat spot, and various methodologies were applied in simulating the EF of the future (Hanley et al. 1999; van Vuuren and Smeets 2000; Erb 2004; Medved 2006). The polynomial regression analysis was introduced to describe the development trends of the EF values (Yue et al. 2006). The radial basis function neural network (abbreviated as RBFNN) was selected to forecast the total EF values of Wuhan, China (Li et al. 2010). The autoregressive integrated moving average (abbreviated as ARIMA), the mostly used model in the prediction of the footprint values, was applied in predicting the future EF values of Slovenia (Medved 2006), simulating the aquatic footprints of Guangzhou and Henan (Jia et al. 2010; Yao 2012). However, one definite effective methodology or procedure still needs to be found on forecasting the EF values. In a given region further work is needed on the simulation and prediction.

Suzhou is one of the developed cities in the Yangtze Delta, China (Yao 2012). The total population of Suzhou was 10.74 million, and the gross domestic product (abbreviated as GDP) was ¥1,859,747 million in 2018. The magic growth of the population and economy in recent years might pose great negative influence on the natural resources of the city, and it is necessary to evaluate the influence of the socioeconomic growth on the local natural resources (Yao et al. 2019).

It was of great importance to forecast the EF in order to disclose and predict the regional sustainable development level in the future if the economy developed continuously at the high speed. Now the question is which model is appropriate to be applied in simulating the total EF trend for Suzhou, China. Thus, the aims of the study were: (1) to calculate the ecological footprint values of Suzhou, China, from 1999 to 2018, a long-time data series; (2) to attempt to simulate the EF values of the city by the ARIMA model and the GM(1,1) model (one model not found in the studies on the EF simulation); (3) to try to analyze the effect of the simulation of the two models; and (4) to predict the EF values of the city, analyze the development of the EF and provide reasonable and practical policy suggestions for decision makers on regional sustainable improvement.

2 Materials and methods

2.1 The ecological footprint model

The ecological footprint was defined as the area of biologically productive land consumed by the residents’ activities of the region. The calculation of the ecological footprint consists of the two sections: the footprint of biological resources’ consumption and the footprint of energy consumption.

The calculation procedure follows the following two equations (Wackernagel and Rees 1996). The definitions and the measures are listed for all variables in the ecological footprint calculation in Table 1. All the information in the ecological footprint calculation is related to the resources of regional residents’ consumption and those in industrial production. The energy consumption data and population data in the EF calculation came from the local statistical yearbooks (Suzhou 2000–2019). The resources’ consumption of the residents was obtained from the survey of the nearly 300 households implemented by Suzhou statistical bureau. Besides, the coefficients in the EF model were from the FAO Yearbook published by the Food and Agriculture Organization of the United Nations (Nations 2000–2019):

$$\begin{aligned} {\text{EF}} & = \sum\limits_{j = 1}^{6} {\left( {f_{j} \sum\limits_{i = 1}^{26} {\left( {\frac{{{\text{UR}}_{i} }}{{{\text{UP}} \cdot {\text{GP}}_{ij} }} + \frac{{{\text{RR}}_{i} }}{{{\text{RP}} \cdot {\text{GP}}_{ij} }}} \right)} } \right)} \\ {\text{Total}}\;{\text{EF}} & = {\text{EF}}*\left( {{\text{UP}} + {\text{RP}}} \right) \\ \end{aligned}$$
Table 1 The summary of the variables in the ecological footprint calculation

2.2 The ARIMA model tried

The autoregressive integrated moving average (abbreviated as ARIMA) model, developed by Box and Jenkins in the 1970s (Dent 1977), is one of the most frequently used models for the time series data simulation and prediction. The ARIMA model was originated from the autoregressive model (abbreviated as AR), the moving average model (abbreviated as MA) and the combination of the AR, MA and the ARMA model, which was, respectively, introduced in 1926, 1937, and 1938 (Ediger et al. 2006). Compared with the early AR, MA, and ARMA model, the ARIMA model is more flexible in the application.

The ARIMA model is particularly useful when little knowledge is available or when there is no satisfactory explanatory model that can directly relate the prediction variables to other explanatory variables. The basic idea of the ARIMA model is that the value of the prediction variable is supposed to be a linear combination of the past values and the past errors, and the future values of the time series can be simulated and predicted only from past values and present values not considering any other factors.

The basic form of the model is described by the equation:

$$\begin{aligned} X_{t} & = C + a_{1} X_{t - 1} + a_{2} X_{t - 2} + \cdots + a_{m} X_{t - p} \\ & \quad + \varepsilon_{t} - b_{1} \varepsilon_{t - 1} - b_{2} \varepsilon_{t - 2} - \cdots - b_{n} \varepsilon_{t - q} \\ \end{aligned}$$

where Xt refers to the actual value of the variable and \(\varepsilon_{t}\) is the corresponding random error at the time t, ai and bi are the model coefficients, and p and q denote the autoregressive average orders and the moving average orders, respectively.

The model could interpret the trend of the variable by its past values. The presupposition is that the simulated data are stationary and the variable with the stationary time series has the property that its statistical characteristics, the mean and the autocorrelation structure are constant over time. If the series values are not stationary, we could calculate the logarithmic or differential function and transfer the dynamic variable into the stationary one.

2.3 The GM(1,1) model introduced

Grey systems theory was firstly proposed by professor Deng in 1982 (Deng 1982). Grey systems theory focuses mainly on such issues as those partial information unknowns. With the rapid development of science and technology, more and more issues have the uncertain characteristic of the grey systems. A group of grey models have been focused by researchers and applied in a variety of fields such as natural science, social science and engineering science etc. The GM(1,1) model is one of the most important models in the grey models group. It is an approximate model and effective if the data series has a trend of exponent distribution. Applying the GM(1,1) model in the EF simulation and prediction included the following steps:

  1. (1)

    Step 1: Definition of the GM(1,1) model

    In the GM(1,1) model, the raw data series were described as the following sequence: \(X^{(0)} = (x^{(0)} (1),x^{(0)} (2), \ldots ,x^{(0)} (n))\), where X(0) is the nonnegative sequence of raw data. And the first-order accumulative generation operator on X(0) was denoted as: \(X^{(1)} = (x^{(1)} (1),x^{(1)} (2), \ldots ,x^{(1)} (n))\), where X(1) is one newly generated sequence with the application of the first-order accumulative generation operator on X(0) and \(x^{(1)} (k) = \sum\nolimits_{i = 1}^{k} {x^{(0)} (i)}\), k = 1,2,…,n.

    From the sequence X(1), the sequence of the generated mean value of consecutive neighbors was derived:\(Z^{(1)} = (z^{(1)} (2),z^{(1)} (3), \ldots ,z^{(1)} (n))\), where Z(1) is one new sequence with the application of the generated mean value of consecutive neighbors operator on X(1) and \(Z^{(1)} (k) = \frac{1}{2}(x^{(1)} (k) + x^{(1)} (k - 1))\), k = 2,3,…,n.

    Then, the grey differential equation, called the whitened equation of the GM(1,1) model, could be described as the two equations:

    $$\begin{aligned} & x^{(0)} (k) + az^{(1)} (k) = b \\ & \frac{{{\text{d}}x^{(1)} }}{{{\text{d}}t}} + ax^{(1)} = b \\ \end{aligned}$$

    where a and b are the parameters of the model introduced.

  2. (2)

    Step 2: Calculation of the parameters

    According to the least square estimate method, the parameter estimators of the GM(1,1) model could be calculated by the equation:

    $$\hat{a} = [a,b]^{\text{T}} = (B^{\text{T}} *B)^{ - 1} *B^{\text{T}} Y$$

    where \(B = \left[ {\begin{array}{*{20}c} { - z^{(1)} (2)} & 1 \\ { - z^{(1)} (3)} & 1 \\ \ldots & \ldots \\ { - z^{(1)} (n)} & 1 \\ \end{array} } \right]\), \(Y = \left[ {\begin{array}{*{20}c} {x^{(0)} (2)} \\ {x^{(0)} (3)} \\ \ldots \\ {x^{(0)} (n)} \\ \end{array} } \right]\).

  3. (3)

    Step 3: Prediction of the first-order accumulative generation series

    The time response function of the whitened equation could be yielded according to the equation:

    $$x^{(1)} (t) = (x^{(1)} - b /a)\exp ( - a(t - 1)) + b /a$$

    And the time response equation of the GM(1,1) model was:

    $$\hat{x}^{(1)} (k + 1) = (x^{(0)} (1) - b /a)\exp ( - a*k) + b /a,\quad k = 1,2, \ldots ,n$$

    Thus, the first-order accumulative generation series could be predicted according to the two equations.

  4. (4)

    Step 4: The restored values of simulation

    The restored values of raw data were calculated by the equation:

    $$\hat{x}^{(0)} (k + 1) = \hat{x}^{(1)} (k + 1) - \hat{x}^{(1)} (k) = (1 - e^{a} )*(x^{(0)} (1) - b /a)*\exp ( - a*k),\quad k = 1,2, \ldots ,n$$
  5. (5)

    Step 5: Model precision test

    In order to validate the model be appropriate or not in the application, the precision of the fitting performance of the model could be tested using the three ways: the relative error test, the correlation test and the posterior variance test.

    The relative error (\(\varepsilon\)) was calculated by the equation:

    $$\varepsilon = \frac{ep(k)}{{x^{(0)} (k)}},\quad {\text{Where}}\quad ep(k) = x^{(0)} (k) - \hat{x}^{(0)} (k)$$

The correlation test was evaluated by the value of correlation coefficient, which was calculated by the equation:

$$r = \sum\limits_{k = 1}^{n} {(\hbox{min} ep + 0.5*\hbox{max} ep) /(\left| {ep(k)} \right| + 0.5*\hbox{max} ep)} /n,$$

where \(\hbox{min} ep = \hbox{min} (\left| {ep(k)} \right|)\), k = 1,2,…,n and \(\hbox{max} ep = \hbox{max} (\left| {ep(k)} \right|)\), k = 1,2,…,n.

The posterior variance was to test the variance C and the small error probability (P), and the calculation equations of the two indicators were, respectively,

$$C = \frac{{S_{2} }}{{S_{1} }}\quad {\text{and}}\quad P = p(\left| {ep(k) - \sum\limits_{i = 1}^{n} {\left| {ep(i)} \right| /n} } \right|\langle 0.675*S_{1}$$

where \(S_{1} = \sqrt {\frac{{\sum\nolimits_{i = 1}^{k} {\left( {x^{(0)} (i) - \sum\nolimits_{i = 1}^{n} {x^{(0)} (i)} /n} \right)^{2} } }}{n}}\) and \(S_{2} = \sqrt {\frac{{\sum\nolimits_{i = 1}^{n} {\left( {\left| {ep(i)} \right| - \sum\nolimits_{i = 1}^{n} {\left| {ep(i)} \right| /n} } \right)^{2} } }}{n}}\).

Good fitting of the GM(1,1) model means that the correlation (r) should be greater than 0.6 and the model has a small relative error (\(\varepsilon\)) and variance value (C) (Dent 1977). In the application, the simulation performance was comprehensively assessed. The fitting condition of the model was characterized by the small error probability P and the variance C, and the fitting performance was judged by the criterion listed in Table 2.

Table 2 Simulation performance judgment of the GM(1,1) model

3 Results and discussion

3.1 EF values from 1999 to 2018

The EF values from 1999 to 2018 were calculated and are described in Fig. 1. The EF values per capita in Suzhou (shown in the left ordinate) increased rapidly from 2.2 gha in 1999 to 7.9 gha in 2018. The total ecological footprint (shown in the right ordinate) was continuously increasing during these 20 years in the city. According to the consumption category, EF values were divided into six types of land areas: arable land, pasture, forest, water and fossil energy land and built-up land. In the figure, EFa per cap denotes the arable area per capita, EFw denotes the footprint of the water area, EFb denotes the footprint of the built-up land area, EFp is the ecological footprint of the pasture area consumed, EFfor is the forest area footprint, EFfos is the footprint of the energy consumption. The development trend of the EF values of the six types of bio-productive areas and the corresponding contributions to the EF values in Suzhou from 1999 to 2018 were also shown in the left ordinate of the figure.

Fig. 1
figure 1

The development trend of the EF values in Suzhou from 1999 to 2018

Among the six types of bio-productive lands, the footprint of the arable area, the water area, the pasture area and the forest area represented people’s living consumption and the time series data of the four land areas had little change in the 20 years (Fig. 1). The footprints of the energy land area and the built-up land area represented the resources’ consumption in the economic development of the region. The growth rates of the two indicators were both huge from 1999 to 2018, especially the area of the fossil fuel land consumption.

For the six land categories, both the trend of the EF values per capita and their contributions to the total EF values were not in equilibrium in the city from 1999 to 2018. The EF values per capita of the arable land area and the pasture area remained approximately constant and changed little in the 20 years. The demands for the forest area and the water area of the residents increased slightly in the whole period. The consumption of the fossil energy land and the built-up area increased markedly, especially in the 2010s, which indicated the growth of the industrial production and the urban area. The distributions of the development trends in the six land areas showed that the resources consumed by the residents relatively kept stable in the past 20 years and the driver of the EF values increasing was mainly from the energy consumption and expand of the built-up area.

3.2 Try of the ARIMA model

ARIMA is a popular model in simulating data with time series and its fitting performance is constrained by the assumption of its linear properties. If the time series could be generated from the ARIMA process, it should have the two theoretical autocorrelation properties, being stationary and not including white noise. So firstly, the stationary and the white noise should be tested.

A stationary time series has the property that its statistical characteristics such as the mean and the autocorrelation structure are constant over time. The EF data series of Suzhou (abbreviated as {EF}) was increasing over time and was not stationary (Fig. 2, in the right ordinate). The logarithmic and differential function of the data series was calculated to try its derivative stationary series. The logarithmic series of {EF} was denoted as {LEF}. {LEF1} and {LEF2} were the first-order difference series and second-order difference series of {LEF}, respectively. As Fig. 2 shows, the data series {LEF} and {LEF1} still had slightly increasing trend, and {LEF2} has almost no increasing or decreasing trend (the left ordinate in Fig. 2).

Fig. 2
figure 2

Trend plots of the data series {TEF}, {LTEF}, {LTEF1} and {LTEF2}

The non-stationary characteristics of the data series might be originated from several factors, the most important of which may be the existence of the unit roots of the data. Augmented Dickey–Fuller (abbreviated as ADF) test and Phillips–Perron (abbreviated as PP) test were the most common methods of unit root test (Erdogdu 2007), which were both applied in the case study. The test results are listed in Table 3. ADF and PP values of the series {LEF2} were less than the critical value of 1%, 5% and 10% level, respectively. Thus, the time series {LEF2} could be considered stationary. However, the other three series were not stationary. The test result indicated that the ARIMA model was not suitable to be applied in the simulation of the data series {TEF}, {LTEF} and {LTEF1}.

Table 3 ADF and PP test of {EF}, {LEF}, {LEF1} and {LEF2}

White noise could be tested by the autocorrelation (abbreviated AC) and partial correlation (PAC) coefficients. According to the AC and PAC graph of the data series {LEF2} listed in Fig. 3, the AC series of {LEF2} died off smoothly at a geometric rate. However, the PAC series was tailing and the P values were very large. The test result indicated that the data series {LEF2} might consist of white noise and the ARIMA model might not be suitable to simulate the data series and the appropriate fitting parameters for the model with small P value could not be obtained.

Fig. 3
figure 3

The AC and PAC test of the data series {LEF2}

Thus, it is deduced that the data series of the total EF values in Suzhou from 1999 to 2018 were not autocorrelative and could not be simulated by the ARIMA model.

3.3 EF simulation and prediction by the GM(1,1) model

3.3.1 Simulation and the precision test

As shown in Fig. 1, the EF values of Suzhou from 1999 to 2018 increased rapidly all through the studied period, being close to the exponential trend and it might be simulated by the GM(1,1) model.

Using the data series of the EF values, the GM(1,1) model was introduced. After the program was run, the fitting EF values could be obtained and so could the model precision parameters. The simulation performance of the GM(1,1) model is plotted in Fig. 4. The relative errors (\(\varepsilon\)) were between 0.4 and − 0.1 (in the left ordinate of Fig. 4). The value of the correlation coefficient (r) was 0.6388. The small error probability value (P) was 1, and the variance value C was 0.1316. According to the step 5 listed in Sect. 2.3, the introduced model was “accurate fitting.”

Fig. 4
figure 4

The fitting performance of the GM(1,1) model

Grey system modeling could forecast nonlinear data series events and could be applied to find the characteristics of complicated data and build accurate simulation models. The ecological footprint values in the region might be affected by numerous socioeconomic factors and there existed dynamic, complex and nonlinear characteristics among them. According to the fitting performance listed above, the grey model might efficiently grasp the nonlinear relationship among the sequence and the performance of the GM(1,1) model in the simulation was accurate (Fig. 4).

Meanwhile, it is inevitable for an approximate model to result in some errors in the simulation (relative errors in Fig. 4). Before 2005, the EF values were relatively low and the model underestimated the actual EF; the relative error was a bit large sticking out in 2000, and then, the relative errors were decreased from 2000 to 2005. Between 2006 and 2010, the model slightly overestimated the EF values and the relative errors were very small. After 2011, the model underestimated the indicator again. On the whole, the relative errors were fluctuating and the trend of the EF values in Suzhou was described closely conforming to the actuality. Simulation correlation was 0.6388, greater than 0.6, which was one of the proofs that it was practical to simulate the EF values for Suzhou using the GM(1,1) model. The small error probability (P) was 1, and the variance C was 0.1316, which indicated the fitting performance was accurate and the precision was qualified.

From the analysis above, the GM(1,1) model was introduced to be suitable for simulating the development trend of the EF values for Suzhou precisely and predicting the EF of the city in the short term. It is noted that the EF value of one region is a very complex indicator, and something as follows needs to be verified: whether the model can be used in other cities, or whether other spatial scales should be further analyzed and whether the model can be applied in the long-term prediction.

3.3.2 Prediction of the EF in 2019–2024

The development trend of the EF values in Suzhou was assumed to continue tracking the exponential route just as that in 1999–2018. By the GM(1,1) model introduced above, Suzhou’s EF values from 2019 to 2024 were estimated (Table 4).

Table 4 The estimated EF values from 2019 to 2024 (unit: 106 gha)

The prediction provided the estimation for human’s consumption of natural resources in future. According to the prediction, the city will bear the accumulative ecological footprint of 71.7 million gha in 2024, which would be 2.07 times that of 2010 and nearly 7 times that of 1999. More and more resources to support the region’s rapid development imported from outside of the city will be required, which makes Suzhou face great challenge on its sustainability. In order to keep sustainable development of the city, the local governments should make relevant regional strategic plans, focusing on enhancing the energy using efficiency, developing the high-tech industry, adjusting the industrial structure gradually, reducing the share of industrial products and raising the portion of tertiary industries in GDP, etc., so as to consume fewer natural resources and restore the city to the sustainable development track.

4 Conclusions

The EF values of Suzhou from 1999 to 2018 were calculated. Both the ARIMA model and the GM(1,1) model were tried to simulate the trend of regional ecological footprint which were statistical models being frequently applied in analyzing data series in many different fields.

EF data series were not stationary, and its logarithmic and the first-order differential function series were not stationary too. The second-order differential function series were diagnosed to be stationary; however, in autocorrelation and partial correlation test, the PAC of the data series was found to be tailing. Thus, the series were inferred consisting of white noise and the EF values were not autocorrelative and could not be fitted by the ARIMA model.

The GM(1,1) model was introduced to simulate and predict the EF values of Suzhou. Through model precision test, the GM(1,1) model was proved to be appropriate to simulate the data series precisely and predict the EF values of the city in short term. The prediction results provided the estimation for human’s consumption of natural capital in future. According to the prediction, more and more resources to support the region’s rapid development would be required imported from outside of the city, which might make the city face great challenge concerning the sustainability.