Introduction

It is widely accepted that extreme climatic conditions can pose a major public health hazard (Guest et al. 1999; Diaz et al. 2002; Davis et al. 2003; Fouillet et al. 2006). Some studies show that the seasonal pattern of daily mortality as a function of temperature has a V- or U-shaped relationship (Carson et al. 2006). The effects on health of the combination of temperature with other environmental factors (e.g. air pollution) have also been investigated (Simpson et al. 2005; Ren et al. 2006).

Time-series generalised linear models (GLMs) with parametric splines (e.g. natural cubic spines) and generalised additive models (GAMs) with non parametric splines (e.g. smoothing splines or lowness smoothers) are the methods most widely used for assessing the short-term health effects of climate variables and air pollution (Schwartz 1999; Braga et al. 2002; Dominici et al. 2005; Baccini et al. 2006). Mortality, climate and air pollutants often involve high-order interactions and co-linearities, which are often difficult to cope with using these models (Eckel and Louis 2007). Time-series classification and regression trees (CARTs) provide an alternative or complementary non-parametric approach that can perhaps accommodate these complex interactions since they avoid any assumption of linear relationships among the variables or homoscedasticity in variances. Through successive binary splits, CART analysis segments the data into homogenous subgroups ideally suited for both exploring and modelling such data (Breiman et al. 1984). Temporal features can also be included in time-series CARTs. Despite these advantages, CARTs have seldom been used in public health and epidemiological research to date.

This study investigated the relationship between temperature and mortality during summers in Sydney, Australia, using time-series GAM, and quantified the effects of temperature and air pollution on mortality. A time-series CART model was developed to explore the interaction effects of temperature and air pollution on mortality. We focussed on the summer for two reasons: first, any association between high temperature and mortality will possibly be exacerbated in this period; and second, if such an association holds in general then mortality during this period is likely to increase around the world as global warming continues (Intergovernmental Panel on Climate Change 2001).

Materials and methods

Data collection

Daily mortality data from 1 January 1994 to 31 December 2004 were obtained from the Australian Bureau of Statistics for the Sydney metropolitan area. We constructed a time-series of daily death counts for all causes.

The Environment Protection Authority of New South Wales provided daily air pollution data at 13 monitoring sites in Sydney areas on levels of ozone (O3), nitrogen dioxide (NO2), particulates (PM10), carbon monoxide (CO) and sulphur dioxide (SO2). We estimated the population exposure to each air pollutant by averaging the daily exposure measures from all available sites to obtain city-wide means. Meteorological data monitored at Sydney airport were provided by the Bureau of Meteorology of New South Wales. Meteorological variables used in this study included population weighted averages of daily maximum temperature, minimum temperature and relative humidity. Population weighted average climate variables were performed by sum (collector’s district climate variables × collector’s district population)/sum (collector’s district population). Population weighted exposure data are conceptually appealing as they more closely estimate the weather being experienced by the majority of the population (Hanigan et al. 2006).

Statistical analysis

Spearman correlations and time-series cross-correlation function

Spearman correlation analyses were conducted to assess the bivariate associations between daily deaths, weather variables and air pollutants in summers over the period of study. Cross-correlations were used to compute a series of correlations between maximum temperature, air pollutants and mortality over a range of time lags. A time lag was defined as the time span between observation of maximum temperature, air pollutants and mortality (Chatfield 1975). Any significantly lags were fed into the following two models for further testing.

Construction of time-series GAM model

The analysis included the construction of a time-series GAM model to examine the relationship between temperature and daily deaths after adjustment for confounding factors. We fitted the following GAM model: Yt = Poisson (μt). Log μt = α + S1(maximum temperature, 4) + Log(MRref) + Season + S2(minimum temperature, 4) + S3(humidity, 4) + S4(CO, 3) + S5(NO2, 3) + S6(PM10, 3) + S7(O3, 3) + S8(SO2, 3) + factor (weekday) + autoregressive term at lags of 1 day + autoregressive term at lags of 2 days. Where y represents the daily number of deaths, μt  denotes the log relative rate of mortality associated with a unit increase in temperature and air pollutants, and Si denotes smooth functions (smoothing splines) of climate variables and pollution. The purpose of selecting the smoothing spline for weather factors and air pollutants in the model was to semiparametrically remove long-term trends and seasonality and other natural patterns that may be related to mortality. To determine the amount of smoothing based on residuals, deviance was included in the model diagnostics. Although we focused only on the summer period in this study, ‘seasonality featuring’ could still occur. A quadratic function of the day was used to account for such a tendency (Vigotti et al. 2006; Fouillet et al. 2007, 2008). Counts on neighbouring days may be more similar to one another than counts on more distant days (i.e. counts may be autocorrelated). Autoregressive terms were included to correct for variance in the effect estimate ascribable to autocorrelation. Since the variance of daily mortality counts may be greater than that assumed under the Poisson model, quasi-likelihood estimation was used to obtain standard error estimates adjusted for over-dispersion (S-Plus Insightful Corporation 2003). Analyses were performed using more restrictive than usual convergence parameters, as suggested by Dominici et al. (2002) in order to avoid any bias due to problems of convergence in the iterative calculation of the estimates. To control for long-term trends, mortality in summer was adjusted by the mortality rate observed over a reference period (MRref), where four months (October and November for year N−1, April and May for year N) were chosen to constitute the reference period (Fouillet et al. 2008). To deal with concurvity in non-parametric smoothers, the gam.exact function in Splus was used to produce exact standard errors for each linear term (Ramsay et al. 2003; Dominici et al. 2004).

Construction of time-series CART model

A time-series CART was developed to explore the possible the interaction effects of temperature and air pollution on mortality adjusting for confounders (e.g. seasonality, weekday and autocorrelations at lags of 1 and 2 days). The CART is built through a process known as binary recursive partitioning. Given a dataset comprising a response variable and a set of explanatory variables, the algorithm examines every possible binary split on every explanatory variable and chooses the split that optimises some pre-defined criterion, for example minimising the sum of the squared deviations from the mean in the resultant two subgroups for continuous data. This splitting or partitioning is then applied to each of the new subgroups (Breiman et al. 1984). A time-series CART model is described as: Mortality = Maximum temperature + minimum temperature + Humidity + CO + NO2 + PM10 + O3 + SO2 + Season + Log(MRref) +factor (weekday) + autoregressive term at the lag of 1 day + autoregressive term at the lag of 2 days. A minimum node deviance of 30% of the total deviance was used to prune the trees. Mortality was fitted as a continuous variable in the model. The CART analysis consisted of four basic steps. First, a preliminary tree was grown by recursive data partitioning. Second, nested trees were formed by reducing the number of nodes in the tree (pruning). Third, the optimal tree was selected by taking into account its predictive ability. Finally, the goodness-of-fit of the models was assessed using both time series (to check autocorrelation functions of residuals) and classical tools (to check the normality of residuals). All statistical analysis was conducted using S-plus software package (S-Plus Insightful Corporation 2003).

Results

Statistical summaries of daily deaths, meteorological variables and air pollutants in summer months (December–February) over 1994–2004 (Table 1) indicate substantial variation in these variables. Mortality ranged from 40.00 to 88.00 deaths per day, with a mean of 63.36 ([standard deviation (SD): 8.58]. Maximum temperature ranged from 16.94°C to 43.02°C (SD: 4.24°C) and humidity ranged from 25.79% to 96.67% (SD: 10.8%). The air pollution variables summarised in Table 1 include CO, NO2, PM10, O3 and SO2.

Table 1 Summary statistics for daily deaths and meteorological and air pollution indicators in summers, Sydney, Australia between 1 January 1994 and 31 December 2004. MaxT Maximum temperature, MinT minimum temperature, PM particulates, ppm parts per million, pphm parts per hundred million, SD standard deviation

Table 2 shows Spearman correlations between daily deaths, weather variables and air pollutants in summers over the study period. Weather and pollution variables were all statistically significantly associated with mortality, especially maximum temperature (r = 0.231), NO2 (r = 0.161), PM10 (r = 0.139) and SO2 (r  = 0.134). All associations were positive except for humidity. There was also a statistically significant and positive relationship between each pair of the independent variables O3, NO2, PM10 and maximum temperature. The strongest associations were observed for maximum temperature and O3 (r = 0.622), and O3 and PM10 (r = 0.648). A pairwise scatter plot with smoother spline depicts the relationships between all the variables (Fig. 1); there were nonlinear relationships between some explanatory variables themselves (i.e. maximum temperature, humidity, O3 with PM10).

Fig. 1
figure 1

Scatter matrices for dependent and independent variables (a smooth spline line denotes curvature)

Table 2 Spearman correlation coefficients between mortality, air pollution and weather variables

The results of the cross-correlations also show that mortality was significantly associated with maximum temperature at lags of 0–1 days, and SO2 at lags of 0 days (Figs. 2, 3).

Fig. 2
figure 2

Cross-correlation function between total mortality and maximum temperature [solid lines denote 95% confidence intervals (CI)]

Fig. 3
figure 3

Cross correlation function between total mortality and sulphur dioxide (SO2) (solid lines denote 95% CI)

The results of the GAM model (Table 3) indicate that the average increase in total daily mortality was 0.9% (95% CI: 0.6 – 1.3%) and 22% (95% CI: 6.4 – 40.5%) for a 1°C increase in daily maximum temperature and 1 pphm increase in daily average concentration of SO2, respectively. No significantly relationships were found between mortality and other covariates. Figure 4 shows a smoothed plot of all-cause mortality associated with maximum temperature. Figure 5 reveals the relationship between all-cause mortality and SO2. The log relative risk for mortality increased consistently with increasing maximum temperature and SO2.

Fig. 4
figure 4

Smoothed plot for the log-relative risk of total mortality versus maximum temperature (dashed lines denote upper and lower twice standard error curves)

Fig. 5
figure 5

Smoothed plot for the log-relative risk of total mortality versus sulphur dioxide (SO2) (dashed lines denote upper and lower twice standard error curves)

Table 3 Changes (%) in relative risks with 95% confidence intervals (CI) for all-cause mortality per unit increase in weather, air pollution on current day

Figure 6 depicts a representation of the final CART model, which indicates that the probability of daily death was best decided by an interaction between maximum temperature and SO2. When maximum temperature was over 32°C (116 days) the expected mortality increased by 7.3% and no further splits were found to significantly improve the homogeneity of the subgroup outcome (mortality). When mean daily sulphur dioxide (SO2) exceeded 0.315 pphm and maximum temperature was in the range of 29°C to 32°C (17 days), the expected mortality rose by 12.1%. The analysis of the residuals showed that there was no significant autocorrelation between residuals at different lag times in the model, and residuals appeared to fluctuate randomly around zero with no obvious trend in variation as the predicted incidence values increased.

Fig. 6
figure 6

Regression tree for the relationship between maximum temperature, SO2 and mortality. Expected daily increase rate = (expected mortality−mean of mortality)/mean of mortality. SD Standard deviation

Discussion

The results of this study show that maximum temperature and SO2 at current day had significant interaction effects on total mortality. There was a 7.8% increase in mortality when the maximum temperature reached 32°C, and a 12.1% increase when the mean daily sulphur dioxide (SO2) exceeded 0.315 pphm.

Global climate change is likely to increase the frequency and intensity of heatwaves (McMichael et al. 1996). The impact of extreme summer heat on human health may be exacerbated by other factors (e.g. air pollution) (Gaffen et al. 2000). Daily numbers of deaths are reported to increase during very hot weather in temperate regions (Kunst et al. 1993). For example, a heatwave in Chicago in 1995 caused 514 heat-related deaths (Whitman et al. 1997), and a heatwave in London in 1995 caused an increase in all-cause mortality of about 15% (Rooney et al. 1998). An excess mortality rate, with disparities from 4% to 14.2% increases, was observed in 13 French cities during a heatwave in August 2003 (Vandentorren et al. 2004).

The reported quantitative relationship between SO2 and mortality is less conclusive than that of other air pollutants (Ballester et al. 2002). The APHEA (air pollution and health: a European approach) project found that a relative risk of 1.004 (95% CI 1.003 to 1.005) for total deaths was associated with an increase of 10 μg/m3 in the daily concentration of SO2 (Zmirou et al. 1998). Using the data collected from three United States counties, Moolgavkar found a robust association between SO2 concentrations and mortality (Moolgavkar 2000). However, a study in Mexico found that SO2 had no significant health effects (Borja-Aburto et al. 1997). A study in Philadelphia found that the association between air pollution and daily deaths in Philadelphia is due to fine combustion particles, but not to SO2 (Kelsall et al. 2000).

Our results indicate that daily deaths were identifiably higher on days for which the maximum temperature exceeded 32° C but that, even on cooler days, mortality increased if the level of SO2 exceeded 0.315 pphm. Morgan et al. (1998) showed that excess deaths were significantly associated with NO2, PM10 and NO2 after adjusting for weather confounders in full season in Sydney. However, in our study, other pollutants (e.g. O3, PM10, CO and NO2, etc.) were not significantly associated with total mortality, which means that these pollutants may have played less of a role in total mortality than maximum temperature and SO2 in Sydney summers.

To our knowledge, this is the first epidemiologic study to systematically examine the interaction effects of weather and air pollution on daily mortality using a time-series CART model. A major advantage of this technique is its ability to reveal interactions, i.e. hierarchical and non-linear relationships among input variables consisting of one dependent variable and a defined number of independent variables. CART handles parametric data without data transformation and can easily handle outliers and interactions (Hu et al. 2006). In exploring interaction effects, CART can perform binary splits until the terminal nodes are sufficiently homogeneous according to some criterion (e.g. a distance measure for a continuous variable) (Breiman et al. 1984). For non-linear relationships, CART would probably divide the space into more than two groups through subsequent splits.

The limitations of this study must be acknowledged. The study suffers from the usual problems of ecological designs, particularly the inability to model at the individual level. The most critical source of bias in this study is possible measurement errors of exposure. By using air pollutant concentrations averaged across Sydney, we assume that ambient pollutant concentrations represent an individual’s actual exposure to pollutants. This assumption does not justify time-activity patterns that may mediate exposure such as place/type of work and time spent outdoors. The use of citywide average exposure (e.g. PM10) does not account for variations in pollutant concentrations across Sydney.

In conclusion, increased daily maximum temperature (>32°C) and high SO2 concentration (>0.315 pphm) appear to contribute to excess mortality in summers in Sydney, Australia. As climate change continues, the health implications of the interactions between hot weather and air pollutants should be evaluated, and adaptive strategies should be developed to lessen their effects. These important issues need to be put at the centre of the public health research agenda.