Introduction

Both warm and cold extremes of temperature have adverse effects on health. A non-monotonic ‘V-shaped’ relationship is often observed between temperature and mortality—annually (Huynen et al. 2001) and for the separate warm and cold seasons (Ballester et al. 1997). Hajat et al. (2006) have shown that, although linear relationships exist between temperature and mortality, during extreme heat events mortality exceeds that expected from a linear association and is better represented non-linearly. Kalkstein and Davis (1989) describe a ‘threshold temperature’ beyond which mortality increases above the baseline level. Different thresholds have been identified for a variety of causes of death (Páldy et al. 2005; Huynen et al. 2001). Thresholds may be confounded by other meteorological variables, e.g. Saez et al. (2000) illustrated a 2°C higher threshold (23°C) on very humid days when the relative humidity was above 85% in Barcelona, Spain, but there is also evidence that humidity may have insignificant effects on mortality (Dessai 2002, 2003; Ballester et al. 1997; Braga et al. 2001). Thresholds have also been found to vary temporally for a single location (Davis et al. 2003; Ballester et al. 1997), and according to age, with elderly populations being most susceptible to changes in temperature (Conti et al. 2005; Donaldson et al. 2003; Huynen et al. 2001). Interestingly, there is evidence that cardiovascular fitness may be more important than age in determining individual vulnerability to heat (Havenith 1997). Havenith et al. (1995) examined the response to heat stress across a heterogeneous sample of 56 individuals aged 20–73 years in a warm humid climate of 80% relative humidity and 35°C air temperature. The effect of age was negligible compared with effects related to fitness, which was measured by maximum oxygen uptake. Comparative studies have shown the occurrence of geographical variation in thresholds. Heat-related/cold-related mortality thresholds occur at higher/lower temperatures in locations with a relatively warmer/colder climate, and the gradient (or steepness) of the temperature–mortality relationship for increasing/decreasing temperature is often found to be lower in warmer/colder locations than colder/warmer ones (Donaldson et al. 2003; Pattenden et al. 2003; Keatinge et al. 2000; Eurowinter 1997). For example, Curriero et al. (2002) illustrated across 11 US cities that threshold temperatures were higher in warmer southern cities, where the temperature–mortality association was less sensitive, than in cooler northern cities. The variation of thresholds and temperature–mortality gradients has led to inference on how populations may acclimatise to changing climatic conditions (Donaldson et al. 2003; Curriero et al. 2002; Braga et al. 2001; Saez et al. 2000).

The direct effects of extreme temperature on health are not always immediate—a lag is often observed between the temperature event and resultant mortality whereby separate previous days’ temperatures or lagged moving averages are associated with the current day’s mortality. Lags of less than 3 days are most commonly associated with heat-related mortality (Hajat et al. 2002; Michelozzi et al. 2005; Conti et al. 2005) but different lags may be associated with disease-specific mortalities (Gemmell et al. 2000; McGregor 1999; Páldy et al. 2005; Ballester et al. 1997). Some studies present a negative relationship between hot temperatures and mortality for lags above 3 days, which compensates some of the deaths caused by heat during the initial days of the heat event (Hajat et al. 2002, 2005; Pattenden et al. 2003; Braga et al. 2001). This is known as ‘mortality displacement’, whereby the heat principally affects individuals whose health is already compromised and who would have died shortly anyway, regardless of the weather. Estimates of mortality displacement vary considerably—Sartor et al. (1995) estimated that 15% of total deaths during the Belgium 1994 heat waves were due to displacement. Gouveia and Fletcher (2000) estimate mortality displacement as about 50% during the 1994 heat waves in the Czech Republic. Estimates varied between 1% and 30% in France during the summer heat wave of 2003 (Le Tertre et al. 2006) and evidence from the United States estimates the value as between 25% and 50% (Kalkstein 1993).

A number of studies point to increases in heat-related mortality under climate change scenarios. Donaldson et al. (2001) estimate a 253% increase in annual heat-related mortality by the 2050s for the United Kingdom, and Dessai (2003) estimated the heat-related mortality rate to increase from between 5.4 and 6.0 (per 100,000) for 1980–1998 to between 5.8 and 15.1 for the 2020s, and 7.3 to 35.6 for the 2050s for Lisbon. The range in values was due to the combined uncertainties inherent in climate change projections, potential acclimatisation, and methodologies.

Assessments of the impacts of climate change on heat-related mortality need to be location specific because it has been shown that the relationship is not evenly distributed in space (Davis et al. 2004; Kalkstein and Davis 1989). Attention also needs to be paid to the inherent uncertainties in impact assessments, especially those arising from climate projections, so that a range of possible impacts are illustrated. This paper summarises the first part of a study aimed at producing projections of the effect of future climate change on heat-related mortality. The research is published in two parts (Fig. 1). In this paper (Part 1) separate empirical–statistical non-linear regression models based on the aggregate dose-response relationship between daily maximum temperature (T max) and heat-related deaths (the difference between observed and expected deaths) are developed for six cities in order to model the current relationship between weather and heat-related mortality. In Part 2, climate change and population change scenarios are applied to the models developed here to estimate the heat-related mortality burden attributable to climate change for each city. This includes an exploratory uncertainty analysis to examine the uncertainties in the projections due to climate modelling, which is considered as a major source of uncertainty in climate-health modelling (Dessai 2003). Uncertainties concerning acclimatisation and those inherent in the temperature–mortality models are also included. Additional uncertainties such as population ageing and use of air-conditioning/heating units exist, but will not be examined due to the added complexities in modelling them.

Fig. 1
figure 1

The adopted methodology for this research (adapted from Dessai 2002)

Materials and methods

Selection of cities

The cities selected for this study were Boston, Budapest, Dallas, Lisbon, London and Sydney (Table 1). The aim was to include cities in different climates such as Continental Cool Summer, Temperate, Humid Subtropical and Mediterranean (McKnight and Hess 2000) so that any regional differences in exposure–response could be examined. Another important consideration was that data for at least 10 years was available to provide a reliable representation of the cities’ climates, and that it was available at reasonable cost.

Table 1 Sources of data used for each city

Mortality data

Daily total deaths from all causes were obtained for each city to include both heat stroke and any possible comorbid factors (Davis et al. 2003; Kilbourne 1997; Kunst et al. 1993). The maximum available data record for each city was examined because this gives a more reliable representation of climate and any associations with mortality, and gives more precise regression coefficients than if shorter periods were used (Davis et al. 2003; Horst 1966). Therefore this study assumes that exposure–response relationships remain constant over time. Davis et al. (2004) state that this stationary nature is often assumed in assessments such as this, but it is noted that some evidence points to the possibility of non-stationarity in several US cities (Davis et al. 2003).

Mortality data was missing for Dallas 1990, which was excluded from the analysis. Any other missing data was replaced by linear interpolation. An anomaly resulting from 137 extra deaths caused by an airliner accident at Dallas Fort-Worth International Airport on 2 August 1985 (Wikipedia 2006) was excluded from the analysis. As the focus of the research is heat-related mortality, only the summer months [June, July, August (December, January, February for Sydney); hereafter referred to as ‘summer’] were used for analysis. For inter-city analysis and estimation of future mortality burdens under climate- and population-change scenarios, mid-year population estimates were obtained or calculated by linear interpolation between census years as denominators for the computation of crude mortality rates (per 100,000); unless otherwise stated, all results are presented in these units.

Strong seasonal cycles in mortality rates could bias an analysis of heat-related mortality. A common method to remove the inherent seasonality is to convert daily mortality counts into daily mortality anomalies or excess mortality by subtracting from daily mortality counts a stable mortality baseline for each day (i.e. an ‘observed−expected’ method; Guest et al. 1999; Dessai 2002). Excess mortality was calculated using a 31-day moving average (Rooney et al. 1998; Dessai 2002), thus standardising the dependent variable used in the analysis (excess mortality) by removing any long-term trends in death rates (Davis et al. 2003). Excess mortality approximates heat-related deaths for temperatures above, or equal to, the threshold temperatures identified below (Davis et al. 2003; Dessai 2002; Páldy et al. 2005). Although the calculation of excess mortality facilitates the comparison of mortality between cities (Davis et al. 2004), strictly speaking the results of this study should be considered as city-specific because the data was not age-standardised to account for differences in demographics between cities. The data required for this was not available at an affordable cost. Nevertheless, non-adjusted rates have previously been used in climate change–health impact assessments (Dessai 2002, 2003).

Meteorological data

Temperature has been shown to be the dominant climate predictor of mortality (WISE 1999) so daily T max was obtained for the weather stations in Table 1. Also, after consideration of the relatively lower reliability of projections of other meteorological variables, such as humidity, from climate models (Covey et al. 2001; Sun et al. 2003)—with which the relationships would ultimately be used—it was deemed appropriate to use only T max. Missing data was replaced by linear interpolation. Data for Dallas and Boston were converted from degrees Fahrenheit to degrees Celsius. It is acknowledged that outdoor temperature measurements are not a direct representation of the climate conditions within buildings, where the majority of deaths occur (Kilbourne 1997) but widespread data on this spatial scale is non-existent. Table 2 presents the descriptive statistics for each city’s temperature distribution.

Table 2 Descriptive statistics for summer maximum temperature (T max) and ‘threshold temperatures’ for each city over the periods of record. SD Standard deviation

Model construction and validation

Model construction and identification of threshold temperatures followed the methodology of Dessai (2002), to which the reader is referred for a detailed description. The aggregate dose–response relationship between temperature and mortality was examined by grouping excess mortality into 2°C T max class intervals and calculating the number of excess deaths per day for each interval; 2°C class intervals were used (after Guest et al. 1999) instead of 1°C intervals (Dessai 2002) because this smoothes the high variability in daily mortality sometimes evident at higher temperatures. The observed threshold temperatures are illustrated in Table 2. It is acknowledged that using 2°C class intervals reduced the resolution at which thresholds were identified but this was not a major concern because the purpose was to allow a comparison of thresholds across cities in a relative sense. Non-linear regression analyses were performed on the data above the city-specific thresholds. All temperature intervals were used in the regression for Budapest because there were only four intervals above the threshold. Bootstrapped estimates of the 95% confidence intervals were also calculated. No more than four additive constants were included as parameters in any of the regression models because they can be meaningless in terms of physical interpretability and increase the risk of over-fitting (Glick 1978).

Split-sample validation (Snee 1977) was undertaken by splitting each city’s time-series in half to create respective ‘calibration’ and ‘validation’ samples (Camstra and Boomsma 1992; MacCallum et al. 1994). The entire periods available that were initially used for model calibration are hereafter referred to as the ‘whole periods’ and the calibration/validation periods are preceded by the years used for calibration/validation (e.g. Boston 1975–1986 calibration). The calibration periods used can be seen in Tables 3 and 4. Non-linear regression was performed on each city’s calibration dataset and the resultant models validated through correlation and residual analysis—a procedure also performed on the whole periods. A detailed description of this method is provided by Dessai (2002). The calibration and validation samples were then swapped and the procedure repeated.

Table 3 Regression forms and coefficients, R2, adjusted R2 \({\left( {{\text{R}}^{{\text{2}}}_{{\text{a}}} } \right)}\) F-statistics, and statistical significance of city-specific non-linear regressions
Table 4 The heat waves selected to examine mortality displacement in each city, total excess, deficit, net excess, and short-term mortality displacement for the heat wave and following 12 days

Lag effects and mortality displacement

To examine the possible influence of lag effects for each city, excess deaths per day were calculated as before, but for different lags up to the 12th previous day’s T max interval. Further lags were not examined because there is a risk of finding non-causal relationships due to chance or seasonal autocorrelation (Ballester et al. 1997). The lag period was limited to 12 because this is central in the broad range of lags examined in the literature, e.g. up to a maximum of 3-day (Hajat et al. 2002), 14-day (Ballester et al. 1997), 20-day (Braga et al. 2001) and 25-day (Pattenden et al. 2003) lags have been examined. A local linear regression smoothing surface with a bandwidth multiplier of 1.00 and normal kernel (Simonoff 1996) was fitted to the lag data to illustrate the basic relationship.

Six ‘heat waves’ were selected, one from each city, to examine mortality displacement. No formal international definition of a heat wave exists so these were defined as periods lasting three or more consecutive days when the daily T max was equal to, or greater than, the 95th percentile of summer T max over the whole period of record. Heat waves lasting more than 2 weeks according to this definition were excluded from the mortality displacement analysis because it has been suggested that short-term mortality displacement can be obscured by such lengthy heat episodes (Rooney et al. 1998). The 95th percentile was chosen instead of the 99th because this allows the inclusion of warmer days surrounding the main extremes that might still be associated with increases in mortality. Defining heat waves in this manner yielded several heat waves for each city although only one was observed for Sydney. For the remaining cities, the heat wave that included the highest daily maximum temperature for all heat waves identified for that city was selected for the mortality displacement analysis.

Results

The non-linear regression analysis for each city yielded the associations presented in Table 3 and Fig. 2. The associations are all statistically significant (P < 0.01). Very high R2 are observed due to the possibility of over-fitting and not including any other confounders in the model. However, adjusted R2 \({\left( {{\text{R}}^{{\text{2}}}_{{\text{a}}} } \right)}\) were also calculated and remained high, suggesting over-fitting was not a major problem.

Fig. 2
figure 2

Modelled temperature–mortality associations for each city with extrapolations of 2°C beyond the model calibration temperature with 95% confidence intervals (CI) to give an indication of model uncertainty

The threshold temperature at which heat-related deaths become apparent tends to be lower in the cities with a lower mean summer T max than cities with a high mean summer T max. Plots of threshold temperature against summer mean T max and variance for each city (Fig. 3) illustrate that mean summer temperature is strongly associated (P < 0.01) with threshold temperature. A weak inverse but statistically insignificant relationship between threshold and summer T max variance is also evident.

Fig. 3
figure 3

Associations between summer mean and standard deviation (SD) of T max and threshold temperature, and the difference in modelled excess deaths for the two highest T max intervals for each city (an estimate of the sensitivity of the temperature–mortality relationship)

Above the city-specific threshold temperatures, the temperature–mortality relationships are not homogenous across cities. Boston is a city with one of the lowest mean summer T max values but the highest summer T max variance (Table 2) and presents one of the steepest curves. In contrast, Dallas and Sydney, which present the lowest summer T max variances, exhibit the least steep curves. The curve for Lisbon is surprising because, opposite to Boston, the city presents one of the highest mean summer T max values and lowest summer T max variance. Lisbon’s temperature distribution was positively skewed (Table 2), such that the higher temperatures in the distribution were less common, as was confirmed by histograms (not shown). When the higher temperatures actually occurred, it is possible heat-related mortality was high because the population was less acclimatised to the occurrence of these infrequent temperatures. For example, the two highest T max intervals occurred only five times over the whole period in Lisbon, but in Boston they occurred 33 times. The mean number of heat-related deaths per day for Lisbon during these intervals was approximately four times greater than for Boston.

For each city, summer T max mean and variance were plotted against the difference between the modelled heat-related deaths for the two highest observed T max intervals, which was considered representative of the sensitivity of the temperature–mortality relationship (Fig. 3). Neither of the relationships were statistically significant.

There is evidence that the magnitude and variability of temperature will increase with climate change (McGregor et al. 2005; Beniston 2004; Meehl and Tebaldi 2004; Schär et al. 2004), so the curves were extrapolated by 2°C to examine the relationships beyond the temperature ranges for which the respective models were calibrated (Fig. 2). The Lisbon model appeared extremely sensitive to an increase in temperature, whereby a 2°C increase effectively increased the number of heat-related deaths fourfold, relative to heat-related mortality at 40°C. The models for the other cities were less sensitive, but it would be unreasonable to extrapolate beyond 2°C for use with climate change scenarios, and caution would have to be exercised when applying the Lisbon model, perhaps not extrapolating beyond 1°C.

Figure 4 illustrates that the temperature–mortality relationships become steeper with decreasing lag for each city, which justifies using unlagged data for model construction (following Donaldson et al. 2001, 2003). Interestingly, an inverse relationship between temperature and mortality at lags beyond 3 days for the higher temperatures of the distribution was apparent, most evidently for Dallas. This is most likely mortality displacement. One ‘heat wave’ was selected for each city to further examine this phenomenon. The heat waves are presented in Table 4 and were confirmed in the literature for Boston (CLIMB 2004), Lisbon (Dessai 2002), and London (Johnson et al. 2005). Time-series plots indicate mortality deficits (displacement) in the days after the heat waves (Fig. 5). Twelve days after the heatwave, daily excess mortality generally stabilizes around the baseline. Therefore, for each heatwave and the following 12 days, the total number of excess deaths, deficit deaths, the net effect, and short-term mortality displacement contribution (deficit over excess) were calculated, following Le Tertre et al. (2006). The results are presented in Table 4. With the exception of Budapest, short-term mortality displacement was highest for Dallas and Sydney and lowest for Boston and Lisbon—the cities with some of the weakest and strongest temperature mortality relationships, respectively (Fig. 2).

Fig. 4
figure 4

Excess deaths per day (per 100,000) for each T max interval over lag periods up to 12 days before (summer only). Note the vertical scales are different

Fig. 5
figure 5

Daily excess mortality (primary y-axis) and T max (secondary y-axis) for the heat waves selected for each city. The dates of the last day of each heat wave and the 12th day after this are annotated

Regarding validation, all the regression models were statistically significant at the 0.05 or 0.01 level (Table 3). R2 values remained elevated but \({\text{R}}^{{\text{2}}}_{{\text{a}}} \) values were approximately 30% less than R2 for the two Dallas models and the Sydney 1996–2003 model, suggesting that they may have been over-fit. Plots of the residuals by T max above the thresholds for each city’s two respective calibration periods are shown in Fig. 6. All the plots indicated a good residual spread with no strong relationships between T max and residual deaths, implying that the models were satisfactory at predicting heat-related deaths at all temperatures above the city-specific thresholds.

Fig. 6
figure 6

Residual excess deaths for each city for the models calibrated on the calibration/validation samples. The residual mean and SD were computed for all points on the plot (i.e. both validation periods combined)

However, a number of anomalous residuals were identified due to the models’ inability to account for short (1–2 days) lag effects and/or infrequent and sudden increases then decreases in temperature during extreme temperature events (annotated in Fig. 6). With the exception of London, plots of residuals by T max for the models calibrated on the whole datasets (not shown) resulted in residual anomalies with similar locations and the same dates, implying the modelling method was unable to resolve those days.

Observed and predicted values were aggregated into 2°C T max intervals and the correlation between samples calculated−correlations were also calculated for the non-aggregated daily predictions (Table 5). At the aggregate level, almost all correlations were greater than 0.90 and statistically significant at the 0.01 confidence level, indicating that the models were very good at predicting heat-related deaths. The exception was the Dallas 1975–1986 calibrated model, suggesting an inability of the model to be applied outside of the calibration range, and providing more evidence that this model was over-fit. Caution should be applied when interpreting the high correlations because they are due partly to not including other potentially confounding variables in the models. The correlations were weak-to-moderate when the daily predictions were used, but they were all still statistically significant (P < 0.01), again except for the Dallas’ 1975–1986 calibrated model.

Table 5 Pearson correlation coefficients and significance between observed (Obs) and predicted (Mod) mortality for mortality aggregated into 2°C T max intervals (Agg) and non-aggregated values (Daily), respectively; and observed and predicted heat-related deaths for each city (summer only; death rates are calculated using the mean population during the respective validation periods) with 95% confidence intervals shown in parenthesis

Further validation involved examining how well the models predicted the total number of heat-related deaths above the threshold temperatures identified (Table 5). All the models performed well, generally predicting total heat-related death rates to within 10–15% of their observed values, and sometimes almost exactly.

Discussion

Common epidemiological methods for modelling temperature–mortality associations relate the log-expected death count to explanatory variables and confounders known to have some association with mortality, including air-conditioning, air pollution, access to healthcare, and day of the week for example (Páldy et al. 2005; O’Neill et al. 2003; Pattenden et al. 2003; Curriero et al. 2002; Hajat et al. 2002). Such models are useful in that they adequately address confounding factors but if future climate scenarios are to be applied, reliable projections of confounding variables in line with the climate scenarios used (e.g. SRES scenarios; Nakićenović and Swart 2000) are required and may be unavailable. Obtaining data on confounding variables is difficult and/or expensive, especially for several cities at once. There is also evidence that potential confounders have little association with temperature-related mortality, e.g. socio-economic status (Gemmell et al. 2000; Guest et al. 1999) and air pollution (Pattenden et al. 2003; Páldy et al. 2005; Hajat et al. 2002). For this reason, several climate change–health impact studies have developed simple temperature–mortality models with temperature and mortality as the independent and dependent variables, respectively (Davis et al. 2004; Dessai 2002; Guest et al. 1999)—a method that was also adopted here.

Application of this method meant the models were constructed from the aggregate dose-response relationship between temperature and mortality, such that they were based on a small sample of independent–dependent variables or observations (less than 10) but in some cases used up to four constants to fit the model. It has been suggested that 10–15 observations should exist for each constant or there is a risk of over-fitting the models to the data (Green 1991; Peduzzi et al. 1996). The implications are that over-fitted models may perform extremely well on the calibration data but not on non-calibration data. However, split-sample validation illustrated that the modeling techniques generally performed well outside of the calibration range—evidence that over-fitting had not occurred. One exception was the Dallas 1975–1986 model.

The over-optimism associated with over-fit models can also be better understood by using shrinkage techniques such as the adjusted R2 \({\left( {{\text{R}}^{{\text{2}}}_{{\text{a}}} } \right)},\) which is dependent on the number of estimated parameters (p) as well as the proportion of total sum of squares explained (Dunn et al. 2003). R2 remains constant or increases as p increases but \({\text{R}}^{{\text{2}}}_{{\text{a}}} \) can decrease and so provides a better estimate of any over-optimism. For example, note the respective differences between the \({\text{R}}^{{\text{2}}}_{{\text{a}}} \) and R2 for the Dallas 1975–1986 and Sydney 1996–2003 models in Table 3, which suggests some over-fitting. The R2 values presented by Dessai (2002) are therefore an over-optimistic representation of model quality and hide the possibility that the models were over-fit. Bootstrapping (Efron and Tibshirani 2003) is another shrinkage technique and was applied here. It is acknowledged that over-fitting is a potential problem in studies such as the present study and that of Dessai (2002) but validation and shrinkage techniques can help to understand the degree to which this is occurring and so should always be applied. The results suggested that the Dallas 1975–1986 model was the only over-fit because the correlations were not statistically significant and the \({\text{R}}^{{\text{2}}}_{{\text{a}}} \) values were relatively low compared to R2.

The threshold temperatures identified are in line with the findings of others. Kalkstein and Davis (1989) estimated a threshold for Dallas at 40°C; 38°C was evaluated in this study. Dessai (2002) obtained a threshold of 29°C for Lisbon, 28°C was obtained in this study. Guest et al. (1999) observed that aggregated excess deaths became positive beyond 24°C for Sydney over the period 1979–1990—2°C lower than in this study, which examined the period 1988–2003, perhaps due to temporal acclimatisation or increased air-conditioning use since 1990. There is evidence to suggest reduced mortality risk with increased air-conditioning use—for example, a reduction in heat-related deaths of approximately 21% may have occurred in New York City from 1964 to 1988 because of increased access to air conditioning (Kalkstein 1993).

Daily minimum temperature (T min) was obtained to allow the estimation of temperature thresholds in terms of mean temperature (T mean) to allow comparison with the results of others not using T max. The T mean threshold for London was 19°C—the same value obtained by Hajat et al. (2002); Pattenden et al. (2003) estimated it as 18°C. Donaldson et al. (2001) estimated the threshold as 18.6°C for England and Wales. Curriero et al. (2002) have shown Boston to have a T mean threshold of 21°C, compared to 22°C found in this study. For Budapest, Páldy et al. (2005) observed that the temperature–mortality association was much steeper above 22°C; the T mean threshold obtained here was 23°C. Slight differences in the cities’ threshold temperatures can be expected across studies because of the use of different study periods, the lower resolution associated with using 2°C T max intervals, and the different methods of estimating excess deaths. A standard method for calculating these parameters would therefore aid inter-study comparisons in the future. Nevertheless this study is advantageous because it applied a common methodology across all six cities and the results largely corroborate the findings of others that have used various methodologies.

Figure 3 illustrates that threshold temperatures were higher in cities with a higher mean summer T max. It could also be inferred that temperature–mortality relationships were steeper in cooler cities but this would be based on a very weak relationship. Nevertheless, there is evidence to support these findings. Curriero et al. (2002) compared 11 US cities, illustrating that threshold temperatures were higher in warmer southern cities where the temperature–mortality association was less sensitive than in cooler northern cities. Donaldson et al. (2003) observed steeper temperature–mortality relationships coupled with lower thresholds in cooler climates by examining southern Finland, south-east England, and North Carolina (USA).

The role of temperature variability was shown to be insignificant. This is surprising given that Braga et al. (2001) observed that, at 30°C, the highest risk of mortality across 12 US cities was in cities with a higher variance of summertime temperature, and Chesnut et al. (1998) illustrated that summer temperature variability explained most of the variation in summer heat-related mortality across 44 US cities. These results suggest that acclimatisation occurs in areas that are consistently hot, although it is likely that a higher summertime T max variance puts the population at higher risk of heat-related mortality, perhaps due to an inability to acclimatise to the fluctuating temperature, which manifests itself as a lower threshold temperature and a steeper temperature–mortality relationship. There is no physiological evidence to support this, although the conjecture is often used to explain the relationship between temperature variability and heat-related mortality (Chesnut et al. 1998; Kalkstein 2000). It should also be noted that acclimatisation to fluctuating temperatures may be less of an issue where high air-conditioning use is common.

Linear fits were applied in Fig. 3 because previous studies have observed the relationships to be linear by Braga et al. 2001 and Chesnut et al. 1998. However, these latter studies included 12 and 44 cities, respectively, in their analyses. Stronger relationships may have been observed here if more cities were included in the analysis. It is also possible that the relationships may reflect differences in housing conditions such as age, structure, air-conditioning use, or prevalence of diseases associated with heat-related mortality that have not been controlled for. The role of summer mean- and temperature-variability on heat-related mortality receives little attention in the literature and further studies would be beneficial, especially as future increases in temperature variability are projected from climate models (Meehl and Tebaldi 2004; Schär et al. 2004).

The models constructed focus on maximum temperature and have not included diurnal temperature range or minimum temperature. There is evidence that elevated night temperatures (minimum temperatures) do not allow the body to recover from the high temperatures experienced during the day, increasing mortality risk (Besancenot 2002; Hajat et al. 2002). Minimum temperatures have increased more strongly than maximum temperatures over the last 30 years (WHO 2003a); a pattern expected to continue with climate change (Beniston and Diaz 2004). This suggests the models may not be capturing the complete relationship between temperature and mortality and this should be considered when interpreting the results from Part 2 of this study. However, the importance of diurnal temperature range is not definitive—Curriero et al. (2002) demonstrated that a daily diurnal temperature-range variable was unable to significantly predict mortality across 11 US cities.

The models have been constructed from time series lasting at least 16 years, over which time improvements in healthcare and infrastructure have occurred. These factors have not been accounted for in the study, which means it has been assumed that exposure–response relationships have remained constant over time. Davis et al. (2004) state that such stationarity is often assumed in assessments such as this. However Davis et al. (2003) have demonstrated that the US population has become less sensitive to hot and humid conditions over the past 35 years, despite increasingly ‘stressful’ weather. This has been due to improved medical care, increased air-conditioning use and better public-awareness programs concerning heat stress. Factors such as these might explain why, with the exception of Sydney, the models calibrated on the data of the second calibration period often show a flatter increase in mortality with temperature or a higher threshold for the mortality increase (Table 3), suggesting that the population is less sensitive to high temperatures. Considering this point, it is possible the models constructed are not an exact representation of present-day temperature–mortality associations because the model calibration periods include historical data from when the population may have been more sensitive to heat.

Furthermore, some climate change may have already occurred over the calibration periods, especially when more recent data was used for model calibration, as in the case of Sydney. Also, increased urbanisation will have increased the intensity of the urban heat island, the effect of which would be more apparent for city centre meteorological stations (e.g. Budapest) than those situated at airports (e.g. London). It was decided not to account for any increasing urbanisation or climate change that may have resulted in a climatic bias over time because the temperature observations for each city should represent the actual conditions experienced by their respective residents (Davis et al. 2004). However, because the models were calibrated from aggregated data, any acclimatisation that may have occurred due to these increased temperatures will be masked. An important issue concerning the scale of the temperature measurements is that the models were calibrated from point-source measurements but the climate model data applied to them in Part 2 is gridded, i.e. it has a different spatial resolution and is not influenced by urban heat islands. In Part 2, gridded observational data is compared with the point-source measurements used for model calibration here, and the implications of the comparisons are discussed.

Mortality displacement is often observed during heat waves but actual estimates of its contribution are not as common in the literature as perhaps they should be. With the exception of Budapest, short-term mortality displacement was highest in Dallas and Sydney—the cities with the weakest temperature mortality relationships—suggesting that in these cities any heat effects were predominantly mortality displacement. It is conjectured this may be because the remaining ‘healthy’ population is well acclimatised to the heat. Supporting this conjecture is that the lowest short-term mortality displacements were for Boston and Lisbon, the cities with the strongest temperature mortality relationships, such that the heat significantly effects the ‘healthy’ population too because it is less acclimatised to the heat. However, there is no empirical evidence to support this and the result for Budapest would be contradictory. An explanation for the high displacement presented by Budapest may be that the heat wave analysed was preceded by seven other hot days when the temperature was above the 90th percentile of daily T max, meaning the displacement may be due not only to the 3-day heat wave identified. Indeed, Hajat et al. 2002 showed that mortality displacement for the London August 1995 heat wave may have been influenced by another heat wave 2 weeks earlier. Analysis of further heat waves with cooler preceding days for Budapest would help to illustrate whether high mortality displacement is a common feature of heat waves in Budapest or whether it was due only to the antecedent temperatures.

Analysis of the other heat waves identified for each city would improve the reliability of the mortality displacement estimates. An extensive analysis of mortality displacement is not the focus of this study, however. Nevertheless, previous studies have estimated mortality displacement for one city based on only one heat wave (Gouveia and Fletcher 2000; Le Tertre et al. 2006; Sartor et al. 1995). It is acknowledged that it is hazardous to make general conclusions relating mortality displacement to the sensitivity of temperature–mortality associations based solely on one heat wave for each city because the heat wave chosen for analysis may not be typical of all heat waves experienced by that population. Also, factors other than temperature may influence the degree of mortality displacement and have not been accounted for in this study. For example, Hajat et al. (2005) demonstrated that mortality displacement was lowest in Delhi, higher in São Paulo and highest in London. London included the oldest and richest population; Delhi was the poorest and included the highest proportion of deaths among children. São Paulo’s demographic and epidemiologic profile was between that of London and Delhi.

The large range in short-term mortality displacement observed in this study, between 38% for Boston and 76% for Dallas, is not unusual—estimates varied between 1% and 30% for nine French cities during the summer heat wave of 2003 (Le Tertre et al. 2006), and Kalkstein (1993) estimated the value as between 25% and 50% for the United States. Around 15% of total deaths during the Belgium 1994 heat waves were estimated to be due to mortality displacement (Sartor et al. 1995), and Gouveia and Fletcher (2000) estimated mortality displacement to account for about 50% of total deaths during the 1994 Czech Republic heat waves. Further research is required to understand what factors influence the differences in the estimates. To some extent, estimates will vary according to the mortality baseline used to calculate excess deaths. The results presented are likely to be overestimates because a 31-day moving average of mortality was used to calculate the excess, meaning it is influenced by extremes within the span. Because displacement estimates are often based on past events they are not representative of the contribution that may occur in a present heat wave. Furthermore, estimates can be expected to change temporally because mortality displacement theoretically depends on the epidemiologic profile of the population at risk (Hajat et al. 2006).

The models were validated by split-sample validation (Snee 1977). The statistical significance of the calibration/validation sample models, which were calibrated on half the available data and validated on the remaining half and vice versa, indicated that the regression techniques were valid outside of the calibration range for each city. The respective mean and SD of residual deaths were generally within the values of −1.6 to 0.6 and 9.8 to 13.2 obtained by Dessai (2002). A more complex distributed lag model (Hajat et al. 2005; Schwartz 2000) may have resolved the anomalies in Fig. 6, as may have calibrating the models only on extreme temperatures such as those above the 99th centile. However, the inclusion of moderate temperatures is considered advantageous because many possible heat-related deaths may be ignored if these are not included. For example, Hajat et al. (2006) have shown that the burden of heat-related deaths is larger during high-frequency periods when temperatures are moderate, than during heat waves. Other than the anomalies highlighted, the residuals had a good spread, indicating the models performed similarly well at all temperatures.

The models were able to estimate the moderate and extreme effects of heat on mortality for the whole period, as well as the calibration/validation periods, with moderate-to-high accuracy. Accuracy is defined as the correlation between modelled and observed values (Stewart 2000). Similar to Dessai (2002), correlations on a daily basis were lower than at the aggregate level, but were still statistically significant to at least the 0.05 level. The models were not intended to address the mechanisms affecting daily heat-related mortality—a limitation of many empirical statistical models such as these (McMichael et al. 2001). Instead, the aim was to produce models that reproduced the effects that climate has on heat-related mortality as a whole, for which aggregation was necessary (e.g. Davis et al. 2003; Guest et al. 1999). The lower correlations on a daily basis are a result of not accounting for these mechanisms, but the effects climate has on mortality are evident at an aggregated level, for which the correlations were higher. Although the correlations are very high, it is important to understand that they are due partly to not including other potentially confounding variables in the models.

It was examined how well the models predicted the total number of heat-related deaths above the identified threshold temperatures (Table 5). This, and the aggregated correlations are perhaps a more useful measure of model reliability than the correlations based on daily predictions because they consider the total effect that the climate has on heat-related mortality rather than the effect of temperature on individual days, for which an ‘episode analysis’ method of assessment would be more appropriate (Páldy et al. 2005; Rooney et al. 1998). The modelled numbers of heat-related deaths were usually within 10–15% of the observed values for the periods modelled, similar to that of Dessai (2002). The modelled summer heat-related death rates per year were similar to the observed values, and are supported by previous studies. Kalkstein and Greene (1997) attributed 3.5 and 1.2 summer deaths per year (per 100,000) to heat for Boston and Dallas, respectively. Dessai (2002) attributed between 4.9 and 6.2 for Lisbon. Keatinge et al. (2000) estimated 40 (95% CI, 13, 68) annual heat-related deaths (per million) per year for London—comparable to the 28 (95% CI, 6, 68) estimated here.

Comparisons of modelled estimates with those published for documented heat waves further validate the models. Rooney et al. (1998) attributed 184 deaths to heat during the 1995 Greater London heat wave (30 July–3 August), representing a 23% excess. The London model predicted 170 (95% CI 76, 352) heat-related deaths for the same period, representing a 21% excess. Dessai (2002) showed that mortality increased by 29% during the Lisbon 1991 heat wave (9–18 July)—the Lisbon regression model predicted a 38% (95% CI 22, 66) increase over the same period. For Budapest, Páldy et al. (2005) estimated an excess mortality of 15% (95% CI 7, 23) for the August 1998 heat wave (3–5th August), 12% (95% CI 5, 19) for the August 1994 heat wave (30 July–8 August), and 15% (95% CI 7, 22) for the August 2000 heat wave (20–22 August)—the Budapest regression model predicted the increases as 14% (95% CI 4, 25), 13% (95% CI 4, 24), and 17% (95% CI 5, 30) respectively, over the same periods.

The baseline selected to calculate excess deaths varies considerably across studies such that mortality estimates are sensitive to the methods used (WHO 2003b). For example, using the moving average of mortality as the baseline (Dessai 2002; Rooney et al. 1998) can result in underestimates of excess mortality because extreme events are included in the span. Conversely, using mortality from the previous year for the baseline (e.g. Conti et al. 2005) may overestimate excess if the previous year was mild. Such differences make comparisons between studies less reliable than if a standard methodology was employed. Johnson and Griffiths (2003) compared three methods of calculating seasonal excess mortality for England and Wales and observed a difference of over 10,000 excess deaths per year depending upon which baselines were chosen. Comparisons are also complicated by the length of period used for analysis. The results of Rooney et al. (1998) and Páldy et al. (2005) are episode analyses based on individual extreme temperature events, which are often criticised for overestimating mortality due to short-term mortality displacement. Although the comparisons made with other studies here indicate the developed models produce similar mortality estimates, it should be noted that the different methodologies used in different studies may have influenced the results.

The mortality estimates used for comparisons across the other studies cited above were calculated from the models that used the whole periods available for calibration. However, extrapolating a model over the same data-range that was used to calibrate it may produce an unfair indicator of model quality because it has been fine-tuned to the calibration data. For this reason, less attention should be applied to the correlations and mortality estimates obtained from the whole models than to the calibration/validation models presented in Table 5. However, that the correlations remained high and the mortality estimates were similar to the observations for the calibration/validation models did suggest the methodology could produce models that were valid outside of their calibration range, implying the whole models could be applied under climate change scenarios with confidence. Extreme temperature events will increase in frequency and magnitude with climate change, so, in common with Dessai (2002), the whole models will be used in Part 2 because they include all available extremes in the calibration, which gives a better representation of how mortality varies under such conditions. This was considered more important than the inherent limitations associated with validating such a model. Ideally, the models used in Part 2 would have been calibrated and validated on separate datasets but until multiple long-term temperature–mortality datasets exist for a single, or several nearby locations, such compromises are necessary.

Conclusions

Significant relationships between maximum temperature and heat-related mortality during the summer months have been observed and modelled for six cities. Threshold temperatures were significantly associated with mean summer T max but the role of summer temperature variability was not clear. Estimates of mortality displacement for each city have been provided, which are greater in the cities with the least sensitive temperature–mortality relationships. The research is advantageous because, unlike other assessments of the effects of climate change on heat-related mortality (Donaldson et al. 2001; Guest et al. 1999), the models have been validated. Split-sample validation procedures and comparisons with other published papers indicated that the models reproduced the effects of temperature on heat-related mortality well. Furthermore, this highlights the high degree of insight into temperature–mortality relationships that can be gained by adopting the relatively simple methodology applied here, which excludes possible confounding factors and concentrates purely on temperature and mortality. The models developed can be used reliably to examine the potential effects of climate change on summertime heat-related mortality in the six cities studied here.