1 Introduction

One of the main global concerns is global warming and its impact on our lives. Temperature is increasing worldwide [13], and the main goal of the Paris Agreement (2015) is to take efforts to maintain the rise of the global temperature in our century below 2 \(^\circ\)C above pre-industrial levels and limit the temperature increase further to \(1.5{\,}^{\circ }\)C.

There are many lines of evidence about global warming. In March 2020, there were 1.89 million published papers including the words global warming and temperature increase in the Google Scholar website. Some of them discuss the impact of global warming, and that evidence of the increase in temperature is undeniable. The global annual temperature has increased at an average rate of \(0.18{\,}^{\circ }\)C per decade since 1981 [15].

This paper aims to present and discuss time series models to estimate the increase in temperature. In special, the main goal of this analysis consists of estimating the mean temperature increase in the last 50 years based on the monthly mean temperature from 1961 to 2017 in the city of São Paulo, Brazil.

São Paulo is a vast city with an estimated population of 12 million people in 2016 [10]. Another motivation for this analysis is that in the last ten years the number of vehicles increased by 39% in São Paulo, passing from 6,369,581 in 2008 to 8,861,208 in 2018 (Detran), whereas the population increased by 8% in the last two decades.

Linear regression models are convenient to estimate the linear trend and seasonality, but significant residual autocorrelations show that the usual assumption of error independence is violated when analyzing time series. To estimate the mean temperature increase, we propose a linear regression model including linear trend and sine and cosine functions, but it is necessary to include also autoregressive errors, named AR error model. The alternative model is a linear regression model in which we include the lagged temperature (called ARX model) and consider the serial correlation between the temperatures in consecutive months. In this last model, the trend effect is accumulated in the long term and depends not only on the parameter of the linear trend, as some practitioners may think, but also on the parameters associated with the lagged temperatures.

In this paper, we propose, fit, and discuss both models. First, we check if all model assumptions are valid through a residual analysis. Then, the mean temperature increase is estimated for 50 years with the corresponding prediction interval. The method to estimate the mean annual increase is explained, and more complex calculations are required to estimate the long-term trend effect for the model with lagged temperatures.

Based on the observed monthly mean temperature from 1961 to 2017 in the city of São Paulo, it was observed a mean increase of \(0.4{\,}^{\circ }C\) in each decade, and a mean increase of approximately \(2{\,}^{\circ }\)C in the last 50 years. We also fitted a dynamic linear model only to confirm that the trend effect is not varying over time.

This paper unfolds as follows. Section 2 presents the method used. Section 3 presents all the estimation results. Section 4 discusses the advantages and disadvantages of each model. Finally, some conclusions are given in Sect. 5, along with some explanations on why the model with autoregressive errors is easier to estimate covariate effects, such as the trend effect.

2 Models, estimation methods, and residual analysis

The monthly mean temperatures in São Paulo, measured at the Mirante de Santana station from 1961 to 2017, were obtained from the Instituto Nacional de Pesquisas Espaciais (INPE) [11]. This time series consists of 684 monthly observations, corresponding to 57 years. This time series presents 20 missing values, mainly in the beginning of the 80’s. No imputation method was applied and the models proposed in this paper may be fitted with few missing values.

The basic model includes a linear trend and harmonic components to take into account the seasonality as in [1]. Then, the regression model for the mean temperature in the tth month, \(y_t\), is given by

$$\begin{aligned} y_t&= \beta _0 + \beta _1 t/12 + \sum _{j=1}^6 \left[ \gamma _{cj} \cos \left( \frac{2 \pi j t}{12}\right) \right] \nonumber \\&+ \sum _{j=1}^5 \left[ \gamma _{si} \sin \left( \frac{2 \pi j t}{12}\right) \right] + e_t, \end{aligned}$$
(1)

for \(\; t=1,\ldots , 684\).

Note that all 11 harmonics, \(\cos (2 \pi j t /12)\) and \(\sin (2 \pi j t /12)\), were included, except the sine function for \(j=6\) since \(\sin (2 \pi 6 t /12)= \sin (\pi t)=0\). Beyond the orthogonality of these trigonometric functions, another important feature is that for any integer k we have

$$\begin{aligned} \sum _{t=k+1}^{k+12} \cos \left( \frac{2 \pi j t}{12}\right)= {} 0,&\quad j= 1,\ldots , 6 \;\;\;\;\;\;\;\;\;\\ \sum _{t=k+1}^{k+12} \sin \left( \frac{2 \pi j t}{12}\right)= {} 0,& \quad j= 1,\ldots , 5. \end{aligned}$$

Then, the seasonal component oscillates around zero and its inclusion in (1) implies that the mean of \(y_t\) oscillates around the linear trend \(\beta _0 + \beta _1 t/12\).

The main concern is that the error process is probably autocorrelated. Then, it was proposed that the error may be modeled as an autoregressive (AR error) process, as suggested by the residual autocorrelation and partial autocorrelation functions.

After removing the nonsignificant harmonic terms, based on the likelihood ratio test to verify the null hypothesis that 7 coefficients are all simultaneously null (p=0.5521), the model with AR(1) errors is

$$\begin{aligned} y_t & = \beta _0 + \beta _1 (t/12) + \sum _{j=1,2,6} \gamma _{cj} \cos \left( \frac{2 \pi j t}{12}\right) \nonumber \\&+ \gamma _{s1} \sin \left( \frac{2 \pi t}{12}\right) + e_t, \nonumber \\ e_t &= \phi e_{t-1} + a_t, \end{aligned}$$
(2)

\(t=1,\ldots , 684\), where \(a_t\) are zero-mean, independent, Gaussian, and homoscedastic random errors. Also, after a sequence of replacements, the error can be written as \(e_t=\sum _i \phi ^i a_{t-i}\) and the process \(e_t\) is stationary assuming that \(|\phi |<1\).

This model was estimated using the maximum conditional likelihood method. The likelihood is the product of the Gaussian conditional density functions of \(y_t\) given all the available information up to \(t-1\) [19]. The asymptotic distribution of these maximum likelihood estimators is Gaussian and their variance are obtained from the inverse of Fisher information matrix. More details are, for example, in [3, 19].

In this analysis, the model was estimated using the Arima command of the forecast library in [8] using the R programming language.

As an alternative, we also proposed a model that includes the last observation \(y_{t-1}\) in the regression model to take into account the autocorrelation. This proposal is simpler to practitioners since it is a regression model where the lagged observation is another covariate. We call this model ARX(1) model because it may be understood as an AR(1) model with covariates (X = trend and seasonal components), as the ARMAX model in [19]. The ARX(1) model is defined for \(t=1,\ldots , 684\) as

$$\begin{aligned} y_t&= \beta _0 + \beta _1 (t/12) + \sum _{j=1,2,6} \gamma _{cj} \cos \left( \frac{2 \pi j t}{12}\right) \nonumber \\&+ \gamma _{s1} \sin \left( \frac{2 \pi t}{12}\right) + \phi y_{t-1} + e_t, \end{aligned}$$
(3)

where \(e_t\) are independent Gaussian errors with standard deviation \(\sigma _e\) and \(|\phi |<1\).

The conditional mean of the temperature is

$$\begin{aligned} E(y_t | y_{t-1}) & = \beta _0 + \beta _1 (t/12) + \sum _{j=1,2,6} \gamma _{cj} \cos \left( \frac{2 \pi j t}{12}\right) \\&+ \gamma _{s1} \sin \left( \frac{2 \pi t}{12}\right) + \phi y_{t-1}, \end{aligned}$$

but the annual increase in the mean temperature does not directly correspond to \(\beta _1\) since \(y_{t-1}\) also depends on t. By increasing one month, the covariate \(x_t=t\) goes to \(t+1\), and in the first month \(y_{t+1}\) increases on average \(\beta _1\), in the second month \(y_{t+2}\) increases \(\beta _1+ \beta _1 \phi\) and after h months \(y_{t+h}\) increases \(\beta _1 (1+\phi +\cdots +\phi ^h) = \beta _1 \frac{1-\phi ^{h+1}}{1-\phi }\) [7, 9]. As the process is a stationary autoregressive process, \(|\phi |<1\). Then, in the long term (as \(n \rightarrow\) \(\infty\)), the mean annual increase corresponds to

$$\begin{aligned} \beta _1^*= \frac{\beta _1}{1-\phi }. \end{aligned}$$
(4)

Using the Delta method [18] and assuming that \(|\phi | < 1\), the asymptotic variance of the maximum likelihood estimator \(\widehat{\beta ^*_1}=\widehat{\beta _1}/(1-{\widehat{\phi }})\) is

$$\begin{aligned} Var({{\widehat{\beta }}}_1^*) &= \frac{1}{(1-\phi )^2}\nonumber \\&\left[ Var({{\widehat{\beta }}}_1) - 2\frac{{{\widehat{\beta }}}_1 Cov({{\widehat{\beta }}}_1, {{\widehat{\phi }}})}{1-\phi } + \frac{{{\widehat{\beta }}}_1^2 Var({{\widehat{\phi }}})}{(1-\phi )^2} \right] . \end{aligned}$$
(5)

The estimated variance is obtained by replacing the parameters \(\beta _1\) and \(\phi\) by their respective estimates.

The residual analysis of both models included residual time series plots, residual autocorrelations and residual quantile–quantile (QQ) plots. After concluding that all model assumptions are valid, the predicted temperatures were calculated for both models. The predicted temperatures for the AR(1) error model in (2) are

$$\begin{aligned} \widehat{y_t} &= {{\widehat{\beta }}}_0 +{{\widehat{\beta }}}_1 t/12 + \sum _{j=1,2,6} {{\widehat{\gamma }}}_{cj} \cos \left( \frac{2 \pi j t}{12}\right) \nonumber \\&+ {{\widehat{\gamma }}}_{s1} \sin \left( \frac{2 \pi t}{12}\right) , \end{aligned}$$
(6)

and, for the ARX(1) model in (3), the prediction includes the lagged term

$$\begin{aligned} \widehat{y_t}&= {{\widehat{\beta }}}_0 +{{\widehat{\beta }}}_1 t/12 + \sum _{j=1,2,6} {{\widehat{\gamma }}}_{cj} \cos \left( \frac{2 \pi j t}{12}\right) \nonumber \\&+ {{\widehat{\gamma }}}_{s1} \sin \left( \frac{2 \pi t}{12}\right) + {{\widehat{\phi }}} y_{t-1}. \end{aligned}$$
(7)

The temperature forecasts for the next 50 years are calculated with corresponding 95% prediction intervals using the AR(1) error model through the forecast library [8] in the R software [17]. The variance of forecasts using ARMA models are obtained, writing the model as an infinite order-MA model as explained in [3]. As we are considering an AR(1) error model, the coefficients of the MA model are \(\theta _i=\phi ^i\). Including the covariates at time h (\(\mathbf {x}_h\)), the variance of a forecast h steps ahead is

$$\begin{aligned} Var({\widehat{y}}_{t+h} | \text {observed until }y_t)&= \sigma ^2 \left( 1+ \sum _{i=1}^{h-1} \theta _i^2 \right) \nonumber \\&+ \mathbf {x}_h' \mathbf {Var}({\hat{\varvec{\beta }}} ) \mathbf {x}_h. \end{aligned}$$
(8)

A state space model [16] with a time-varying trend and also seasonality was also proposed only to verify if the trend parameter is constant over time. The non-observed state variables to measure the time-varying trend \(\beta _t\) evolves as a random walk in

$$\begin{aligned} y_t&= \beta _0+ \beta _t t + s_t + v_t, \\ s_t &= -s_{t-1} - \cdots -s_{t-11} + w^S_t, \\ \beta _t &= \beta _{t-1} + w^{\beta }_t, \end{aligned}$$

where \(\beta _0\) is the intercept, \(\beta _t\) is the time-varying trend parameter, each \(s_t\) is the monthly effect and the errors \(v_t \sim N(0,V)\), \(w^S_t \sim N(0,W^S)\) and \(w^\beta _t \sim N(0,W^\beta )\) are independent. In state space models [16], it is usual to assume that the seasonal effects \(s_t\) are random variables such that \(s_t +s_{t-1} + \cdots +s_{t-11}= w^S_t\) have zero-mean and the time-varying effect is a random walk.

This state space model was fitted using the Kalman Filter and the maximum likelihood method through the dlm library [16] in the R software [17]. The predicted time-varying effect \(\hat{\beta }_t\) is obtained with the smoothed equations of the Kalman filter and is plotted to verify if it may be considered a constant or time-varying effect.

The computing code used and the database is available upon request.

3 Results

In this section, we shall present the results of the present temperature data analysis.

Figure 1 shows the monthly mean temperature in São Paulo. We note a linear increase from 1960, and the mean temperature increase is around \(0.4{\,}^{\circ }\)C per decade as presented in Table 1. This corresponds to a mean increase of almost \(2{\,}^{\circ }\)C in the last 50 years (\(0.39 \times 5=1.95{\,}^{\circ }\)C).

Fig. 1
figure 1

Monthly mean temperature in São Paulo City from 1960 to 2017

Table 1 Descriptive statistics of monthly mean temperature by decade in São Paulo

All the estimates and corresponding standard errors for the proposed models are presented in Table 2. For both models, the estimated annual mean increase in temperature is \(0.038{\,}^{\circ }\)C, which corresponds to an increase of \(1.9{\,}^{\circ }\)C in 50 years with 95% confidence intervals equal to \([1.91 {\mp } 0.29]=[1.62; 2.20]\) for the AR(1) error model and \([1.91 {\mp } 0.43]=[1.48; 2.34]\) for the ARX(1) model.

Both models, the AR(1) error model and the ARX(1) model, presented nonsignificant residual autocorrelations (Fig. 2) and also there is no significant residual autocorrelation until lag 10 according to the Ljung–Box test (\(p\;\text {value}=0.5161\) for the AR(1) error model and \(p\;\text {value}=0.2424\) for the ARX(1) model) [2]. Also, Fig. 2 shows that the residuals are homoscedastic and normally distributed. Then, all model assumptions are valid for the AR(1) error and ARX(1) models and the prediction intervals and forecast may be used for inference.

Table 2 Estimates and corresponding standard errors, first residual autocorrelation, and estimated residual standard deviation
Fig. 2
figure 2

Residual analysis

Figure 3 presents the predicted values only from 2010 for better visualization. The predictions for the AR(1) error model in (6) and ARX(1) model in (7) are similar and close to the observed temperature values.

Notice that the weather will be warmer in São Paulo if the current trend remains the same in the coming years, as observed in Fig. 4. These forecasts are calculated assuming that the trend will remain the same for a long time and it is shown only to visualize the future temperatures.

Fig. 3
figure 3

Observed and predicted temperatures from 2010 to 2017 in São Paulo City

Fig. 4
figure 4

Observed temperatures in São Paulo City from 1960 to 2017 and forecasts with 95% prediction intervals for the next 50 years using the AR(1) error model

Based on the dynamic model with time-varying trend effect, the predicted time-varying effect \({{\hat{\beta }}}_t\) oscillates close to 0.0383 as shown in Fig. 5. The variance of the time-varying error is too small (\(< 0.0001\)), indicating that the linear trend effect does not change over time.

Fig. 5
figure 5

Predicted time-varying trend effect for the dynamic linear model from 1960 to 2018

4 Discussion

Both the regression model with AR(1) errors in (2) and the ARX(1) model in (3) provide similar estimates and predicted values.

For several practitioners, it is easier to include the lagged observation in the regression model, as in the ARX(1) model (3), since it can be estimated by the least squares method using any spreadsheet program. If the main goal is to calculate predictions or forecasts for the temperature, this model may be chosen or even a SARIMA model. However, considering the trend coefficient as the mean increase in temperature is not adequate since this would underestimate the trend. In Table 2, the trend parameter estimate is 0.032, but the mean increase estimator must be calculated using (4), providing an annual increase equals to 0.038 with corresponding asymptotic variance in (5).

On the other hand, to estimate and obtain confidence intervals or tests involving the model parameters, for example, for the secular trend, it is easier to consider the model with AR(1) errors. This occurs because its trend parameter directly measures the trend effect, and it is easier to obtain standard errors and confidence intervals.

It is worth noting that usually it is expected that the range of the forecast interval grows for larger horizons. However, the forecast interval are not getting wider in Fig. 4 and this may occur due to the very small variance of the trend estimator.

In [6], a regression model with trend and autoregressive error is fitted, but there is no seasonal component. This may have increased the estimated autoregressive coefficient, turning the error process similar to a random walk (unit root process). After considering differences, they also found significant temperature increases in Alaska, but the estimated parameters do not correspond to the mean annual increases presented here, which is easier to interpret.

A dynamic linear model to estimate a time-varying trend effect was also fitted indicating that this effect does not vary over time and the temperature is always increasing with the same rate, as also concluded by [5] for the global temperature data until 1980.

Recent papers concluded that global warming was overestimated. For example, [4] indicated that, in the period 1993–2012, the mean global temperature increased \(0.14{\,}^{\circ }\)C(\({\mp } 0.06^\circ\)C) per decade. This increase is lower than the previously estimated \(0.3{\,}^{\circ }\)C using simulations of complex models. They concluded that maybe the models did not reproduce the observed global warming over the past 20 years or the slowdown in global warming over the past fifteen years.

Also, the mean increase in the global temperature is \(0.18^\circ\) per decade since 1981 in the Global climate report [15] and, for example, it is \(0.2^\circ\) in Ghana [12]. All these increases are smaller than the estimated increase around \(0.4{\,}^{\circ }\)C per decade in São Paulo City, as observed in Table 1 and estimated by the proposed models (\(0.38{\,}^{\circ }\)C in Table 2). Our data from São Paulo showed a constant increase in temperature, with no slowdown.

5 Conclusions

Whenever there is evidence that the temperature increase is not changing over time, it is recommended to fit a regression model including trend and seasonal components and autoregressive errors to estimate the mean temperature increase with the corresponding confidence interval. If a model with lagged temperatures is chosen, it is necessary to calculate the long-term mean increase as in (4) and its variance as in (5), because the estimator of the trend coefficient is a biased estimator of the mean annual increase in the temperature.

Focusing on the climate change issue, there is sufficient evidence that temperature is increasing and there is an increase of \(1.9{\,}^{\circ }\)C in the last 50 years in the city of São Paulo, with a 95% confidence interval of [1.6; 2.2] for the AR(1) error model, including the upper limit, stated in the Paris Agreement [14] of \(2{\,}^{\circ }\)C-increase. Maintaining this increase, it is expected another increase of around \(1.9{\,}^{\circ }\)C in the next 50 years.