Introduction

A large number of studies have investigated the effects of temperature on health (Gasparrini and Armstrong 2010), many focusing on the effects of high temperatures which are often associated with climate change (Basu 2009). Health outcomes assessed to date include more often mortality (Curriero et al. 2002; Analitis et al. 2008; Baccini et al. 2008), but also hospital admissions (Michelozzi et al. 2007) and reproductive outcomes (Lajinian et al. 1997; Tam et al. 2008). The general pattern conveyed by these studies shows that the association of temperature with mortality is U- or J-shaped, i.e., increased mortality has been associated with very high or low temperatures (Curriero et al. 2002). Interestingly, the changing point of the curve, i.e., the temperature point corresponding to the minimum daily number of deaths, decreases in locations with colder climates, indicating that populations adapt to the prevailing meteorological conditions (Curriero et al. 2002; Baccini et al. 2008). Additionally, the effects of heat are more immediate than those of cold: cold effects are identified at longer lags (up to more than 15 days), while heat effects are usually evident on the same day and persist up to 3 days (Armstrong 2006; Analitis et al. 2008; Baccini et al. 2008).

Studies have indicated that humidity levels are linked to increased discomfort and thus it is worth exploring whether temperature health effects differ according to humidity levels (Budd 2008; Vaneckova et al. 2011). Most health studies have either included a humidity variable in their models in addition to a temperature measure (Armstrong 2006; Basu et al. 2005; Braga et al. 2001; Curriero et al. 2002) or used an index calculated as the combination of one or more temperature and humidity variables (Analitis et al. 2008; Baccini et al. 2008; Michelozzi et al. 2007). Steadman (1979a, b) first introduced an index of “apparent temperature,” which included wind speed, temperature, and humidity. Several other indices, mainly including temperature and humidity, were subsequently developed and described (Anderson et al. 2013). Although all have been introduced or used within the context of environmental research, few have been applied in health studies. Most health studies employing an index have used apparent temperature as introduced by Kalkstein and Valimont (1986) (Anderson et al. 2013).

A limited number of studies have attempted to evaluate the performance of different ways of modeling temperature and humidity in estimating health effects. Kim et al. (2011) compared the use of temperature and two indices in estimating heat-related effects in two Korean cities; Vaneckova et al. (2011) compared the use of temperature and five indices in Brisbane Australia also in estimating heat-related effects; Hajat et al. (2010) investigated how well different methods used by heat watch warning systems worldwide to predict days with excessive heat-related mortality agree; Lin et al. (2013), on the other hand, investigated cold-related mortality models in several regions in Taiwan. The most extensive study to date is by Barnett et al. (2010) for 107 U.S. cities using annual data and comparing models with average, minimum and maximum temperature and two composite indices. These studies reach no clear conclusion and identify inconsistent results from models using various expressions for temperature and humidity, which may depend on the local conditions and the health outcomes analyzed. One important environmental variable and potential confounder is air pollution, as it is associated with temperature and with many relevant health variables (Cheng and Kan 2011; Mackenbach et al. 1993; O’Neill et al. 2005; Ren et al. 2008; Stafoggia et al. 2008; Thurston and Ito 2001). As Europe is characterized by different and variable environmental conditions, a study investigating the performance of the various modeling specifications for temperature and humidity may add very useful insights.

We report here the first multi-city study in Europe investigating the performance of different methods to assess temperature and humidity effects at different lag structures, both for cold and warm periods. The study includes data from Athens (a south-eastern city with warm and dry climate), London (a north-western city with relatively cold and humid climatic conditions), and Rome (a southern city with warm climate and higher relative humidity levels).

Materials and methods

Data

We used data from Athens (Greece) and London (U.K.) for 2000–2005 and from Rome (Italy) for 1999–2004. For each city, daily counts of all-cause mortality, excluding deaths from external causes (International Classification of Diseases, 10th Revision, ICD-9 < 800, ICD-10 A00-R99), for all ages, were collected. We also collected daily data on the mean, minimum, and maximum values of ambient temperature (°C) and the mean relative humidity (%). Hourly data on ambient and dew point temperature (°C) were recorded to allow the calculation of apparent temperature using the formula as follows:

$$ \mathrm{A}\mathrm{T}=-2.653+0.994\cdot T+0.0153\cdot {\left(\mathrm{D}\mathrm{T}\right)}^2, $$
(1)

where AT stands for apparent temperature in °C, T for ambient temperature in °C, and DT for dew point temperature in °C. We initially calculated the hourly values of apparent temperature and then the mean, maximum, and minimum daily values. We have chosen this composite index of ambient and dew point temperature, introduced by Kalkstein and Valimont (1986) because it has been most widely used in epidemiological studies as shown in the comprehensive review of Anderson et al. (2013) and correlates very well (correlation coefficient > 0.94) with most other indices which have been described in the literature but have been less used in health studies. Additionally, it is coherent with Steadman’s apparent temperature index under the usual weather conditions in the U.S. cities included in Anderson et al. (2013). Weather conditions characterized by temperature and humidity levels in the three cities studied in the present paper, overlap with those referred to as “usual” in the Anderson et al. paper.

Finally, to adjust for the potential confounding effects of air pollution, we collected time-series data on the main regulated pollutants, specifically on the daily concentrations of nitrogen dioxide, ΝΟ2 (μg/m3, 24 hr mean), particulate matter with an aerodynamic diameter <10 μm (PM10, μg/m3, 24 h mean) and ozone, O3 (μg/m3, maximum 8 hr moving average) from the fixed monitoring sites operating in each city. NO2 is an indicator of traffic-related pollution (Analitis et al. 2008; Chiusolo et al. 2011; Katsouyanni et al. 2001), thus adequately reflecting the major source of pollution in the cities involved, while O3 is a secondary pollutant whose formation is related to high temperature (Bell et al. 2005; Gryparis et al. 2004; Ren et al. 2008). The choice of NO2 and O3 was supported by the large number of monitors providing complete times series data, since less than 1 % of days had missing data in all cities. PM10 was available from a smaller number of monitors, and regarding Athens, it was available only for the period 2001–2005, but since particulate pollution is very important for the study of health effects (WHO 2013), we chose to adjust for PM10 as well.

Data on the day of the week and on dates of bank holidays were also collected. These variables are known to be associated with mortality (Katsouyanni et al. 1996) since they affect health service provision and people’s response to feeling unwell (may not go to doctors at weekend but wait a day or so longer); therefore, controlling for them accounts for an important explanatory variable in the temporal pattern of mortality. What is more, the day of the week and dates of bank holidays may modify the population exposure to outdoor temperature due to different prevailing population time activity patterns during the various days of the week and on holidays.

Methods

We applied Poisson regression models allowing for over-dispersion. The comparison between the various temperature-humidity model specifications was based on the models’ goodness of fit and partial autocorrelation criteria (Samoli et al. 2013; Touloumi et al. 2006). For each of the three cities analyzed, we compared the mean, minimum, and maximum ambient temperature including mean relative humidity in the model and mean, minimum, and maximum apparent temperature.

The analysis was conducted separately for the cold period, defined as October through March, and the warm period, defined as April through September. The general form of the Poisson generalized linear models that were used was the following:

$$ \begin{array}{l} \log \left[E\left({Y}_t^c\right)\right]={\beta}_0^{cj}+NS\left({\mathrm{time}}_t^c,3\cdot 6\right)+{\displaystyle \sum_i{g}_i\left({x}_{it}^{cj}\right)}+{\displaystyle \sum_i{h}_i\left({\mathrm{RH}}_{it}^{cj}\right)}+\left(\mathrm{lag}01.PO{L}_t^c\right)\hfill \\ {}\kern5em +{\mathrm{HOL}}_t^c+{\mathrm{DOW}}_t^c,\hfill \end{array} $$
(2)

where E[Y c t ] is the expected value of the Poisson distributed variable Y t c indicating the daily mortality count on day t at city c, with Poisson variance allowing for overdispersion (ϕ) Var(Y c t ) = ϕE[Y c t ]; NS(time c t , 3 ⋅ 6) represents the seasonality control using a natural cubic spline of time as a continuous variable (time t  = 1, 2, …, 2192) with 3 degrees of freedom (df) per period (cold and warm) per year, that is 18 df per period for the 6 years under study; x cj it is temperature index j on day t at city c, and RH cj it is the mean relative humidity. If the index j was the mean, minimum, or maximum apparent temperature, the term of relative humidity was excluded from the model. The index of summation i refers to the lag structure in each specific model meaning that when relative humidity is included, it has the same lag structure as temperature, while the functions g and h described the relation of mortality with the corresponding index and mean relative humidity, respectively. The confounding effect of air pollution was taken into account by including the average value of the concentration of each pollution variable alternatively (NO2, PM10, O3) in the same (lag 0) and the previous (lag 1) day of exposure in the form of the linear term (lag01. POL c t ), where POL is the corresponding pollutant. We also included dummy variables for official holidays (HOL c t , 1 for official holidays other than Sundays, 0 for all other days) and for the day of the week effect (DOW c t , six dummy variables using Sunday as reference category).

It should be noted that in regression models, very highly correlated variables cannot be included simultaneously due to multicollinearity problems (Allen 1997); thus, we checked all correlations before building the final model.

We used lag structures previously identified in the literature after verifying them by Distributed Lag Nonlinear Models (DLNMs) separately for the cold and warm periods, while the shape of mortality-temperature/humidity associations was verified using generalized additive models (GAM). When these were determined, we used models of form (2) with specific summation indices i and g and h functions to compare the temperature\humidity model specifications. All models were fit in R v.2.15.1 (R development Core Team 2012). More specifically, the modelling procedures for each step were as follows:

Choice of lag structures

Based on previous results (Analitis et al. 2008: Baccini et al. 2008; Goodman et al. 2004), it was hypothesized that plausible lag structures for the cold period were as follows: a) the average exposure for the same and 13 previous days (lags 0–13) and alternatively the average exposure for the 13 previous days (lags 1–13) with exposure of the same day (lag 0) as a separate variable, b) the average exposure for the same and two previous days (lags 0–2), c) the average exposure for 4-day period ranging from the third to the sixth consecutive prior day (lags 3–6), d) the average exposure for the same and six previous days (lags 0–6), and e) the average exposure for 7-day period ranging from the seventh to the 13th consecutive prior day (lags 7–13); and for the warm period: a) the same day exposure (lag 0), b) the previous day exposure (lag 1), c) the exposure of 2 days back (lag 2), d) the exposure of 3 days back (lag 3), and e) the average exposure for the same and three previous days (lags 0–3) (Analitis et al. 2008; Baccini et al. 2008). In order to verify the temporal distribution of the effects on mortality, we applied DLNMs as implemented by Gasparrini et al. (2010) in R. The implementation of these models requires the determination of two bases; one for the space of mortality dependency on temperature and one for the space of temperature effect time lag (Gasparrini 2011). Hence, based on previous findings, we fitted a linear term for the association of each index with mortality during the cold period and a threshold term during the warm period (Analitis et al. 2008; Baccini et al. 2008). The threshold values for the DLNMs were chosen a priori based on the inspection of graphs showing the association of each index and the daily number of deaths in each city (graphs not shown). The lagged effect of temperature on mortality was estimated by a polynomial of 5th degree in time, where the effect on a specific day was considered as the result of at most 15 consecutive preceding days of exposure (which is the maximum lag reported in the literature to date) in both the cold and warm periods. When the index under study was the mean, minimum, or maximum temperature, we additionally included a term for the effect of relative humidity (lag 0 based on results of previous studies (Barnett et al. 2010; Vaneckova et al. 2011).

The general form of the DLNMs was as follows:

$$ \log \left[E\left({Y}_t^c\right)\right]={\beta}_0^{cj}+NS\left({\mathrm{time}}_t^c,3\cdot 6\right)+p\left({\displaystyle \sum_{i=0}^{15}g\left({x}_{it}^{cj}\right)},5\right)+{\mathrm{RH}}_t^{cj}+{\mathrm{HOL}}_t^c+{\mathrm{DOW}}_t^c $$
(3)

as a specific case of (2) where the mean relative humidity was included as a linear term without time lag, while the g function was linear for the cold period and followed a threshold association for the warm period. The term \( p\left({\displaystyle \sum_{i=0}^{15}g\left({x}_{it}^{cj}\right)},5\right) \) accounted for the temporal distribution of temperature effect on mortality, where p was the 5th degree polynomial with a maximum time lag of 15 days. There was no control for the confounding effect of air pollution in these models.

Verification of the shape of the association

GAMs, as implemented by Wood (2000; Wood and Augustin 2002) in R, were used to verify that the association between mortality and the temperature\humidity model specifications is adequately expressed as linear in the cold period and as a J shape in the warm. Under these models, we applied penalized regression splines to describe the above associations. We applied these models for all temperature\humidity indices and relevant lag structures that resulted from the DLNMs. Each temperature index was included in as a penalized regression spline with up to 10 df using cubic splines as basis functions. When the mean, minimum, or maximum temperature was the index applied, we also included a penalized regression spline for the mean relative humidity, with the same lag structure as for temperature.

The general form of the GAMs was as follows:

$$ \begin{array}{l} \log \left[E\left({Y}_t^c\right)\right]={\beta}_0^{cj}+\mathrm{N}\mathrm{S}\left({\mathrm{time}}_t^c,3\cdot 6\right)+{\displaystyle \sum_i{s}_i\left({x}_{it}^{cj},{k}_i\right)}+{\displaystyle \sum_i{s}_i\left({\mathrm{RH}}_{it}^{cj},{k}_i\right)}\hfill \\ {}\kern5em +{\mathrm{HOL}}_t^c+{\mathrm{DOW}}_t^c\hfill \end{array} $$
(4)

as a specific case of (2) where the functions representing the effect of each temperature index and of relative humidity on mortality were penalized regression splines s i with k i degrees of freedom, while the index i referred to the time lag structure. There was no term for the confounding effect of air pollution in these models.

Choice of best model specification (index and lag)

Having determined the plausible expressions for the lags of each index’s effect on mortality and the shape of the association with mortality per period, we applied models of the form (2) for every index and lag structure separately.

For the final models with threshold terms (i.e., for the warm period), we used the iterative algorithm of Muggeo (2003, 2008) to estimate the threshold values. Setting an initial value for every changing point, this algorithm estimates broken line models (slope parameters and break points). We used three different starting points for each index and lag structure per city. The initial values’ choice was based on graphs (not shown) representing the shape of association between each index and number of deaths during warm period and varied between cities. The algorithm estimated three potential threshold values (one for every initial value) from which we chose the minimum one per index and lag structure.

To evaluate the best temperature\humidity model specification, we compared the models expressed by (2) using the following criteria: 1) the Akaike information criterion (AIC) for over-dispersed data (Bolker 2009; Peng et al. 2006), 2) the generalized cross-validation (GCV) as determined in the R gam function (mgcv package; Wood 2001, 2004), and 3) the partial autocorrelation function (PACF) criterion calculated as the absolute value of the sum of the partial autocorrelations of the residuals from lags 1 to 30 (Samoli et al. 2013; Touloumi et al. 2006). Among the above criteria for model selection, the AIC and GCV assess goodness of fit, while the PACF investigates the remaining autocorrelation. Our selection procedure was based on sequential steps. We selected the best lag structure based on AIC for each index and then compared the models with the selected lag structure. The values of GCV and PACF criteria were estimated for all models with the lag structure indicated by best AIC. For each city and period (warm and cold), we define as “best” the model specification with the minimum model’s AIC, GCV, and PACF values when the three criteria agreed. We expect AIC and GCV to agree since they both assess model fit, but in case of discordance between the two, we selected the model in which either GCV or AIC and PACF are minimum. If the PACF and model fit criteria (AIC and GCV) disagreed, then no model choice is made.

Results

Descriptive statistics

City characteristics by season are presented in Table 1. The population of the three cities together was almost 13 million people. The mean daily total number of deaths ranged from 54 in Rome to 137 in London in warm season and from 62 in Rome to 158 in London in the cold season and was slightly elevated in all cities in the cold period. During the cold period, the lowest mean values of ambient and apparent temperature (8.6 and 6.5 °C, respectively) were observed in London, while in the warm period, the highest mean levels were observed in Athens (24.5 and 25.1 °C, respectively). The highest mean levels of relative humidity were observed in Rome in both periods and, although mean relative humidity declined during the warm period in all cities, the mean value in Rome remained above 70 %. The highest mean concentration of NO2 was observed in Rome during the cold period (73.2 μg/m3). NO2 mean concentration levels were lower in London and Rome during the warm period, while they did not differ by season in Athens. Mean levels of PM10 varied from 25.1 μg/m3 in London during warm period to 46.2 μg/m3 in Rome during the cold period. Seasonality was inverse in Athens and Rome between the warm and cold period; in Athens, PM10 was highest in the warm period; in Rome, it was highest in the winter, while no seasonality was observed in London. O3 mean concentration levels were consistently higher in the warm period compared to the cold for all cities. The highest mean value of O3 concentration was observed in Rome (108.0 μg/m3). These differences are due to the local sources and their distribution as well as local climatic conditions and topography (Harrison et al. 2006; Karanasiou et al. 2014).

Table 1 Mean value (standard deviation) for the city characteristics by season (cold period: October to March, warm period: April to September)

In Table 2, the correlations between the temperature and apparent temperature variables, mean relative humidity, and mean dew point temperature are shown by season separately for the three cities. It can be observed that the correlation coefficients between the metrics of temperature are very high in all cities. Mean relative humidity is moderately inversely correlated with most of temperature metrics with the highest correlation values observed for Rome in warm period. Correlation coefficient of mean dew point temperature with other temperature metrics is rather high and exceeds the value of 0.80 in most cases. Thus, more than one temperature metric or dew point and ambient temperature cannot be included in the same regression model due to multicollinearity.

Table 2 Pearson correlation coefficients between the temperature and apparent temperature variables, mean relative humidity and mean dew point temperature in Athens, London, and Rome by cold and warm period

Final models

The lag structures chosen a priori based on the literature were verified by the DLNMs. The shape of the association of mortality with temperature and humidity used separately or as apparent temperature was estimated by inspection of the shape of the penalized regression splines in the framework of GAMs for all the above lag structures. There was a linear association between mortality and mean relative humidity irrespective of the period, while between mortality and temperature, the association was linear and inverse during the cold months and was characterized by a J function for the warm period, i.e., absence of association for values less than a threshold and linear and positive thereafter. This shape of association characterized all indices in all three cities, with minor deviations under different lag structures.

From the investigation of the temporal and functional form of the association between mortality and temperature/humidity, the resulting model for the cold period was model (5a) and for the warm period model (5b)

$$ \begin{array}{l} \log \left[E\left({Y}_t^c\right)\right]={\beta}_0^{cj}+\mathrm{N}\mathrm{S}\left({\mathrm{time}}_t^c,3\cdot 6\right)+{\displaystyle \sum_i{x}_{it}^{cj}}+{\displaystyle \sum_i{\mathrm{RH}}_{it}^{cj}}+\left(\mathrm{lag}01.{\mathrm{POL}}_t^c\right)\hfill \\ {}\kern6em +{\mathrm{HOL}}_t^c+{\mathrm{DOW}}_t^c\hfill \end{array} $$
(5a)
$$ \begin{array}{l} \log \left[E\left({Y}_t^c\right)\right]={\beta}_0^{cj}+NS\left({\mathrm{time}}_t^c,3\cdot 6\right)+{\displaystyle \sum_i\left[{\left({x}_{it}^{cj}-{x}_i^{cj}\right)}_{+}+{\left({x}_{it}^{cj}-{x}_i^{cj}\right)}_{-}\right]}+{\displaystyle \sum_i{\mathrm{RH}}_{it}^{cj}}+\left(\mathrm{lag}01.{\mathrm{POL}}_t^c\right)\hfill \\ {}\kern6em +{\mathrm{HOL}}_t^c+{\mathrm{DOW}}_t^c\hfill \end{array} $$
(5b)

where u + = u if u > 0 and 0 otherwise, u  = u if u < 0 and 0 otherwise; x cj i represented the threshold value for temperature index j and city c. If the index j was the mean, minimum, or maximum apparent temperature, the term \( {\displaystyle \sum_i{\mathrm{RH}}_{it}^{cj}} \) was set to zero. The summation index i depended on the lag structure, which was the same for temperature and relative humidity.

For illustration, Fig. 1 presents the results from DLNMs for the selected models by city and period. The temporal distribution of the effect of temperature on mortality is shown during a period of 15 days. The effect is expressed as the relative risk of death for a unit change in the temperature variable, i.e., a 1 οC decrease for cold period and a 1 οC increase above the threshold value for the warm months. Effects of temperature exposure became evident 1 or 2 days later during the cold period, while they remained high for the next 5 to 10 days generally decreasing in magnitude, while in Athens and Rome, a secondary peak can be seen around day 12. The alternative lag structure using lags 1–13 and lag 0 as separate variable indicate the same temporal pattern, with lag 0 being not statistically significant in Rome and Athens. For the warm period, the ambient or apparent temperature effect was acute and lasted for fewer days. Specifically, the most severe effects occurred on the day of exposure in all cities regardless of the model specification used, while they were constantly decreasing for the next 3 to 6 days. There was no evidence for significant or consistent mortality displacement in the data of the cities under study.

Fig. 1
figure 1

Temperature effect on mortality following 15 days exposure by city and period. Results of distributed lag non linear models for (a) Athens, using mean temperature for both periods (threshold value 26 °C), (b) London, using minimum temperature for cold and maximum apparent temperature for warm period (threshold value 20 °C), and (c) Rome, using minimum temperature for cold period and mean temperature for the warm one (threshold value 23 °C)

Comparison of different model specifications for temperature and humidity

Table 3 presents the statistical criteria values per model specification for the selected lag structure for each period and city. In most cases, within each city, there was a good agreement between the model fit criteria (AIC and GCV) as it was expected, but differences were observed when using PACF. Cases where a “best” model could be selected (i.e, agreement of all three criteria) are indicated in bold letters. It can be seen that a “best” model could only be defined in London in the cold period and Rome in the warm period. Therefore, the best model in terms of goodness of fit and reduction of autocorrelation included minimum daily temperature with delayed effects up to 13 days (average of lags 0–13) for London in the cold period and mean daily temperature with delayed effects up to 3 days (average of lags 0–3) for Rome in the warm period.

Table 3 Model fit statistical criteria per index and lag structure for each city and period analyzed

For all other cases, no choice could be made. Specifically, although AIC and GCV indicated maximum daily temperature lagged up to 13 days (average of lags 0–13) as the best choice for Athens in cold period, the PACF criterion indicated minimum daily temperature lagged up to 13 days (average of lags 0–13). In the same city for warm period, the goodness of fit criteria showed that the best choice was mean daily temperature with delayed effects up to 3 days (average of lags 0-3), while PACF indicated the same lag structure but maximum daily temperature as the metric that best described daily number of deaths.

For London in the warm period, the goodness of fit criteria indicated maximum daily apparent temperature lagged up to 3 days (average of lags 0–3) as the best choice, while PACF criterion indicated mean daily temperature with delayed effects up to 3 days (average of lags 0–3). For the case of Rome in cold period, there was no clear distinction among the three criteria. AIC indicated mean daily temperature lagged up to 13 days (average of lags 0–13), GCV indicated minimum daily temperature with delayed effects of 3 to 6 days (average of lags 3–6), while PACF indicated maximum daily apparent temperature with the same lag structure as GCV. In summary, even though there is inconsistency in the models chosen, in the majority of cases, a model incorporating a separate variable for temperature and humidity performed best.

Finally, inclusion of NO2, PM10, and O3, introduced alternatively in the models, did not affect the results as to city-specific model fit and choice.

Discussion

We reported results from the first multi-city study in Europe comparing the performance of models using different “exposure” variables for the investigation of temperature and humidity health effects. We used data from three cities with different climatic conditions and focus on the use of separate temperature (minimum, maximum, and mean 24 h) and relative humidity variables and on the use of “apparent temperature”, the composite index incorporating ambient and dew point temperature which, despite some limitations, has been used most extensively in health studies to date (Anderson et al. 2013). Our results indicated that the choice of the best variable(s) for modeling the effects under investigation is not uniform in the different seasons and in different cities. Thus, the best model specification may depend upon the characteristics of each geographic location.

Although our findings illustrated the inadequacy of a uniform selection for a common model across cities characterized by different meteorological conditions, there was a good agreement between cities on the lag structure of effect per period, i.e., the cold effects are more prolonged compared with the effects of heat, consistent with previous results (Analitis et al. 2008; Baccini et al. 2008). Therefore, although the most appropriate index describing the association between mortality and temperature across cities is not unique and is determined by the meteorological and, possibly, other environmental or lifestyle characteristics of each city, within a city, the indices may actually be similar and have a good performance with no clear distinction. The same conclusion was reached by Barnett et al. (2010) who compared mean, minimum, and maximum temperature with or without adjustment for humidity, apparent temperature, and humidex (different index also combining temperature and humidity) using data from 107 U.S. cities during 1987–2000. Kim et al. (2011) considered ambient, perceived, and apparent temperature as heat stress indices, estimated the change of death risk for a 1 οC increase in each index for Seoul and Daegu in South Korea and concluded that all temperature indices examined for the warm period gave comparable results. Vaneckova et al. (2011) compared different models assessing heat-related mortality in Australia and concluded that heat indices could be used interchangeably with average temperature. A study conducted in Taiwan (Lin et al. 2013) comparing eight different low temperature indices on their ability to predict all causes mortality and cardiopulmonary mortality and morbidity during the cold season (November to April) concluded that mean air temperature was the best index to evaluate mortality from all causes and from circulatory diseases, while low temperature indices were found inconsistent in assessing the risk of outpatient visits. In the same setting, Lin et al. (2012) suggested that apparent temperature was the best index for all cause mortality effects evaluation, while maximum temperature was associated more adequately with outpatient visits in the warm period, during the same years. An analysis conducted in New York (Metzger et al. 2010) compared a maximum heat index with alternative temperature metrics in models for the prediction of daily all cause mortality fluctuations during the warm period from 1997 to 2006. The comparison concluded that a model including cubic functions of the heat index on the same and of the previous three days gave a better fit than models using maximum, minimum, or average (average value of minimum and maximum) temperature, or than those using spatial synoptic classification (SSC) of weather type. Hajat et al. (2010) compared the performance of criteria used by various Heat-Health Warning Systems, which are often based on models like the ones assessed in this study, in predicting the dangerously hot days and found little agreement between methods in identifying the days with most excess mortality.

Most of the studies have used at least one criterion for model fit to evaluate the performance of the various modeling specifications. Thus, Kim et al. (2011), Lin et al. (2012) and Lin et al. (2013) have used the AIC, while Barnett et al. (2010) used ten-fold cross validation technique. Vaneckova et al. (2011) compared their models using jackknife resampling method, while Metzger et al. (2010) evaluated model fit using percent deviance explained, first-order residual autocorrelation and correlation of raw and predicted values on days with heat index greater than 90o Fahrenheit. In our analysis, we assessed the remaining autocorrelation in the residuals with the use of the minimization of the PACF criterion. The latter is specific to time series analysis in contrast to the AIC and GCV which are criteria widely applicable for all types of models. Inspection of the partial autocorrelation plot in time series is a tool for the identification of the autoregressive order, while the presence of statistically significant partial serial correlation in model residuals in environmental epidemiology indicates that effects of an omitted time-dependent covariate are still present. Hence, minimizing this correlation seems a natural goal as it also avoids the need to fit autoregressive terms in such models. The fact that the model fit criteria did not agree with the PACF in the best model choice may be expected as they evaluate different aspects of model fitting, as the former are prediction based criteria. We would like to suggest that the PACF criterion may be more appropriate for time series designs, as prediction criteria may result in inducing negative autocorrelation in the model (Perrakis et al. 2014). The inspection of the selected models revealed that an index of daily temperature as a separate variable (minimum, maximum, or mean) introduced together with relative humidity is more often chosen compared to apparent temperature. The introduction of separate variables in the models allows for consideration of any interrelation between temperature and humidity which may characterize a specific location, while the introduction of a formula with a pre-defined association may in fact restrict the adaptability of a health effects model. However, it should be noted that although in some cases, the criteria used for best model within each city and for a specific season did not entirely agree, indicating that more than one model should be applied as sensitivity analysis, in London, for the cold season and Rome for the warm season, a more definitive selection can be made. Thus, the model with minimum temperature for lags 1–13 was uniformly chosen for London (cold period) and the model including mean temperature for lags 0–3 uniformly chosen for Rome (warm period).

Since the potential confounding effects of air pollution in the association between temperature and health is well established (Cheng and Kan 2011; Mackenbach et al. 1993; O’Neill et al. 2005; Ren et al. 2008; Stafoggia et al. 2008; Thurston and Ito 2001), we initially included NO2, O3 and PM10 alternatively in all models. In the DLNMs and GAMs, both the temporal distribution of the temperature effect and the shape of temperature-mortality association remained stable (results not shown) independently of the inclusion of all pollutants assessed. Hence, we included air pollution adjustment only in our final models used for the evaluation of the best meteorological index.

A possible source of bias in our methodology is the use of Muggeo’s algorithm (2003, 2008) to estimate the threshold values for the mortality temperature association in the warm period. This algorithm can be unstable for small samples and may converge to local and not to global maxima (Armstrong 2006; Baccini et al. 2008), while the threshold estimation depends fully on the algorithm’s initial values (Baccini et al. 2008). In order to account for these drawbacks, we used 6-year data and three different starting points for each temperature index and each lag structure for each city. The initial choice of values was based on graphs representing the shape of association between each index and number of deaths and could vary by city. The algorithm estimated three potential threshold values (one for every initial value) from which we chose the minimum one per index and lag structure.

In summary, our findings suggest that the optimal model is modified by the city-specific characteristics. Within a city, more than one combination of temperature-humidity index and lag structure may perform in a similar way with no clear distinction. However, ambient temperature included in the model as a separate variable adjusting for relative humidity performed slightly better than apparent temperature in most cases for both the warm and cold period of the year. We confirm the prolonged effects of temperature in the cold period regardless of the index used and suggest investigating the autocorrelation structure in addition to model fit in time series models. These results add information for future epidemiological model building investigating the health effects of meteorological variables and may inform policy makers as to the optimal choice of index when considering health impact predictions and prevention measures.