1 Introduction

Climate change has been communicated most popularly as a global mean temperature change. Although this is a simple presentation of the issue and on the surface easily digested, it lacks nuance and can lead to confusion. For example, the relatively small range of projected temperatures (1.5 to 4 °C; Trenberth et al. 2007) is experienced daily, weekly, and seasonally by most peoples throughout the world, and many areas of the world experience far greater shorter term variability. Thus, the implications of such apparently small globally averaged changes need further elucidation and links to the lived experience. The examination of extreme events and their frequency of occurrence is one such avenue. This type of analysis has become increasingly the focus of both climate change detection and climate change projection (Easterling et al. 2000a, b; Bonsal et al. 2001; Frich et al. 2002; Vincent et al. 2005; Vincent and Mekis 2006; Mohsin and Gough 2013). Temperature extremes are typically measured using threshold exceedances above or below a locally relevant value or through using percentiles. Climate change signals are then detected using time series analysis.

In this work, we use another approach of detecting trends in extreme temperature events and that is to examine the variation in record temperature extremes over a given period of time (Hoyt 1981; Bassett 1992; Benestad 2004; Redner and Petersen 2006; Newman et al. 2010; Wergen and Krug 2010; Rowe and Derry 2012). Much of the literature in this area focuses on the statistical impact of a net trend on the distributions of records, although some work specifically uses the trends in records as a change metric (Meehl et al. 2009; Rowe and Derry 2012). The work to date examines trends with the assumption that the observations are ongoing, thus the record length is increasing. In these circumstances, the number of records broken per year is expected to decrease assuming no net trend in the mean temperature and its variation about the mean (Coumou and Rahmstorf 2012; Rowe and Derry 2012). For example, if the record is 30 years in length, the expectation is that there are 365/30 or roughly 12 records per year. If this data record is extended to 50 years, the expectation per year drops to about seven per year and to roughly four per year if the record is 100 years in length. Thus, 365/n where n is the number of years in the climate record is the expected decrease in rate of record-breaking events. If the numbers are higher than this over a sustained period, this can be seen as evidence of an underlying change in the climate (Bassett 1992). Wergen and Krug (2010), for example, determined that 5 of the 17 record breaking events in 2005 using European climate data from 1976–2005 could be attributed to a warming climate. Changes are not limited to the more ubiquitous change as a result of increased greenhouse gases but can also result from regional and local phenomena such as land-use change including urbanization.

The research objectives of this research are first to explore the utility of using record extremes as a metric for detecting a changing climate and second to assess the impact of urbanization on the record extremes.

2 Study area

In this work, we examine extreme temperature records for the Greater Toronto Area (GTA). Toronto (43.7 N, 79.4 W) is located in the midlatitudes, on the northwest shore of Lake Ontario of Southern Ontario (Gough et al. 2002; Gough 2008; Mohsin and Gough 2010; Tam and Gough 2012). Southern Ontario is a transition zone between polar and sub-tropical air masses, resulting in mid-latitude cyclones and considerable weather fluctuations. These cyclones are common in the midlatitudes and have a significant impact on day-to-day climate variability (Gough 2008; Tam and Gough 2012). Toronto’s climate is classified as a humid continental climate characterized by warm to hot summers, cold winters with snow, no dry seasons, and a wide range in annual temperatures, modified by the presence of Lake Ontario which generates a lake effect (Gough and Rosanov 2001). The lake effect is a local scale circulation produced by the land/lake contrast that acts to produce mitigation of day-time warming in the summer season. Toronto has a well-defined urban heat island, producing warmer temperatures than the surrounding regions, particularly at night and in winter (Gough and Rosanov 2001; Mohsin and Gough 2012). Toronto has an evolving climate, the result of both urbanization and warming regional climate as documented by Mohsin and Gough (2010). Mohsin and Gough (2013) examined changes in extreme thermal indices in Toronto from 1971 to 2000. As a complement to this latter study, we explore an alternative extreme temperature measure in this work for the same time period and location. The use of a fixed period, in addition to the ready comparison to previous work, allows us to tease out the 1/n decrease in expectation that is inherent in the examination of an ongoing climate record, although in doing so, we distance ourselves somewhat from the way in which the breaking of temperature records is perceived by the public in the climate change discourse in which the 1/n factor must be carefully accounted for. The use of five stations of varying urbanization and history of urbanization will also enable us to shed light on the impact of urbanization on climate records. It would of course be desirable to use a longer climate record; however, for the five stations used, this is the only coincident time period.

Five weather observing stations are used in this work (Fig. 1). These stations are Toronto Lester B. Pearson International A (43.68 N, 79.63 W), Toronto Island A (43.63 N, 79.4 W), Toronto (43.67 N, 79.4 W), Richmond Hill (43.87 N, 79.45 W), and Oshawa Water Pollution Control Plant (WPCP) (43.87 N, 78.8 W). The Toronto station is located in the downtown core of the GTA. Toronto Lester B. Pearson International A is located on the northwestern fringe of the city, an area that has seen rapid urbanization during the study period. Richmond Hill is located in a suburban region north of the GTA. It has been identified as having its own urban heat island (Lelasseux 2005; Mohsin and Gough 2012). Toronto Island A is located south of the city of Toronto on an island in Toronto harbor. Oshawa WPCP is located in Oshawa, Ontario, a city of over 150,000, 50 km to the east of Toronto.

Fig. 1
figure 1

Map of the study sites

3 Data

The data were obtained from Environment Canada’s National Climate Data and Information Archive website. Daily maximum and minimum temperature data was obtained for all five stations for the years 1971 to 2000, inclusively. The data sets include 365 days of daily data for the 30-year period. As February 29 only occurs once every 4 years, these values were removed from all stations. Stations were selected based on data availability.

Unfortunately, perfect data sets rarely exist. Although the best stations available were chosen, there are still some gaps in the data for four of the five stations. An analysis of extremes is much more sensitive to missing values than studies on changes in mean temperatures. The first station to be examined is Toronto Lester B. Pearson International A. During December of 1992 and sporadically throughout 1993, occasionally daily data are missing for Toronto Lester B. Pearson International A. There is no discernable pattern to the missing days. The Toronto Island A station had missing data in 1994 when the station was changed to an automated station. This change occurred in December 1994. During this time, many daily data values are missing. It is common that with the transition from manned to automated, that not all values are ingested properly. Many missing values are seen during the beginning of 1995, and this is as a result of the same changeover. Toronto has a complete data set. There is not one single daily maximum or minimum value missing during the 1971 to 2000 time period. This station was, however, moved twice, in the 1970s, and these changes of location may have impacts on the data produced from the station, although none were detected by Vincent et al. (2012) in the creation of a homogenized temperature data set for Toronto. Richmond Hill is a station with a fairly complete data set for the 1971–2000 time period. In total, this station is missing 61 maximum and 61 minimum values for the entire 30 years of a total 10,950 observations. The missing data occurs during 2-month-long periods, April 1971 (30 days) and May 1992 (31 days). As this station is a manned station, located in the observers’ backyard, it is likely that the observer was on vacation. During these 2 months, all data for the station is missing, not just the temperature data. Oshawa WPCP is also missing data for 3-month-long periods: November 1976, December 1978, and May 1979. Again, the likely culprit is vacation with the lack of backup to continue observations during these time periods. This missing data totals 93 maximum values and 92 minimum values. The years in which data were missing were examined for the four stations that had missing data (Toronto Island, Toronto Pearson, Richmond Hill, Oshawa) and are reported below.

4 Analysis

A time series was generated for each day of the year for daily maximum temperature and daily minimum temperature. For daily maximum temperature, the year for the highest maximum temperature was recorded for each day of the year. For the daily minimum temperature, the year for the lowest minimum temperature was recorded for each day of the year. This was done for each of the five weather observing stations. Subsequently, the data was binned for each station by year and time series were generated for the number of records per year for all stations for the two extremes.

Both the Spearman rank correlation coefficient and the Kendall tau correlation coefficients are measures of the relationship between one dependent and one or more independent variables (Bolboaca and Jantschi 2006). Spearman’s rank is a non-parametric measure of the correlation between two variables (Bolboaca and Jantschi 2006). An advantage of using any non-parametric correlation coefficient analysis is that the data does not have to be normally distributed. In the case of Spearman’s rank, the relationship between the variables does not have to be linear in nature. It is assumed that the variables have been measured at an ordinal, or rank order scale, which allows for the observations to be ranked into two series. Kendall’s tau coefficients are also non-parametric measurements and can be used to analyze non-interval, ordinal data, as is the case with Spearman’s rank (Bolboaca and Jantschi 2006). There is a difference between these two statistical measurements. Spearman’s rank is calculated from, as the name implies, ranks, while Kendall’s tau represents probability. Since time is the dependent variable, the Kendall’s tau takes on a special case called the Mann-Kendall test. The value given for this test will also vary between −1 and +1. When there is agreement between the two rankings, implying they are the same, then the coefficient will be a value of +1. To that end, if there is disagreement, meaning one ranking is the reverse of the other, then the coefficient will be a value of −1. Values will range between the two and they imply increasing agreement along with an increasing value (Bolboaca and Jantschi 2006). A value of 0 implies that the rankings are completely independent of each other. The main limitation of using Spearman’s rank or Mann-Kendall (MK) is that these tests are not sufficient in determining the strength of the relationship between the variables, only that a relationship exists (Bolboaca and Jantschi 2006). To resolve this issue, the Theil-Sen approach (TSA) is used to determine the magnitude of the strength through the estimation of the slope. The TSA approach provides a more robust slope estimate than the least-squares method because outliers or extreme values in the time series affect it less (Sen 1968). The algorithm for TSA is derived by Hirsch et al. (1982) and consists of the median of all possible pair-wise slopes in the data set. In addition, the p values from the MK test are also used to identify the significance of the trends during 1971–2000. The null hypothesis for MK test states that all observations are independent; on the other hand, the alternative hypothesis assumes that a monotonic trend, positive or negative, exists in the time series (Helsel and Hirsch 1992). In this analysis, the MK test is applied to detect if a trend in the time series is statistically significant at 0.05 (95 %) and 0.01 (99 %) significant levels (confidence intervals) for a two-sided probability.

Further statistical significance analysis using t tests was done on the three 10-year intervals for all five stations. The t test is a method in determining the differences between the means of two groups or variables. In this study, the 1971–1980 period was compared to 1981–1990 and 1991–2000. Similarly, the 1981–1990 was compared to the 1991–2000 time period as well. In total, four t tests were computed for each station.

In addition to the above trend analysis, we also examine each year to determine if the records in that year deviate significantly from expectation. As noted above, since there are 30 years of data, each year has 365/30 records on average or approximately 12 per year. We use Fisher’s exact test to determine if the number of records in a particular year is significantly different than expectation. For this test, if frequency of records drops below 5 or exceeds 22, the p value drops below 0.05 and is considered statistically significant.

Finally, we assess the impact of missing data on the record extremes. In particular, we examine Fisher’s exact test results. Records that occurred in three of the remaining stations were identified. We assumed for the purpose of a sensitivity analysis that this was also a record for the station with missing data. We then assessed the impact on statistical classification as a result of this potential change in record extremes.

5 Results and discussion

Figure 2 depicts the extreme temperature records as a function of time for the five stations, for both Tmin and Tmax. Tables 1 and 2 report the results of the Spearman rank and Mann-Kendall analysis.

Fig. 2
figure 2figure 2

Toronto Pearson Tmin

Table 1 Correlation analysis for extreme maximum temperature counts
Table 2 Correlation analysis for extreme minimum temperature counts

There is a statistically significant decrease in extreme cold temperature records for Toronto Pearson station (Fig. 2a). In the last 6 years of the time series, only 1 year exceeded three records per year well below the expected value of 12 per year. In contrast, 1972 and 1978 each had over 30 records per year. For extreme warm temperatures (Fig. 2b), there is a trend toward more records in recent years, but this trend is not statistically significant. A similar story is seen with Toronto Island with extreme cold records (Fig. 2c) decreasing significantly with time and extreme warm records (Fig. 2d) increasing, but not significantly. Toronto Island shares with Toronto a peak in cold extremes in 1976, but not for 1972. The mitigating effect of Lake Ontario may account for the difference. Toronto (located in downtown Toronto) has the same trends as the other two, but neither is significant (Fig. 2e, f). The contrast between Toronto and Toronto Pearson suggests that urbanization is the key to the change in record cold temperatures. The downtown urbanization is long standing whereas Toronto Pearson experienced substantial urbanization during the study period (Mohsin and Gough 2012). This urbanization has led to the spreading of the urban heat island that is most clearly detected in the daily minimum temperature. Once again, we note that 1976 is conspicuous for the largest number of cold temperature records consistent with Toronto Island. Richmond Hill (Fig. 2g, h) to the north of Toronto shows similar results to Toronto Pearson, and for similar reasons, Lelasseux (2005) detected a distinct urban heat island for this location. The two peak years for cold temperature records are 1972 and 1976 similar to Toronto Pearson but in contrast to Toronto and Toronto Island. The final station is Oshawa (Fig. 2i, j), an urbanized area to the east of Toronto. Similar to Toronto Pearson and Richmond Hill, Oshawa has a statistically significant decrease in cold temperature records with time and an increase in warm temperature records, although the latter is not statistically significant. The two peak years for cold temperature records are once again 1972 and 1976 as was the case for Richmond Hill and Toronto Pearson. The 1972 record stratifies the stations binning Toronto and Toronto Island together and Toronto Pearson, Richmond Hill, and Oshawa together. Consistently for all five stations, 1988 had the most number of records of extreme warm temperatures with over 25 records each (Bassett 1992). In contrast, 1992 was the year with the fewest Tmax records for all five stations. 1992 was a particularly cool year both locally and globally as a result of the Mt. Pinatubo eruption in 1991 (Gu and Adler 2011).

The results from the five stations show that the changes currently being experienced in the Greater Toronto Area (GTA), and surrounding regions are largely manifested as changes in the extreme cold temperatures. In our analysis, the p values from the MK test (Tables 1 and 2) show that the trends for the extreme maximum temperature counts are not statistically significant. However, for the extreme minimum temperature counts, the trends are statistically significant for Toronto Pearson, Richmond Hill, and Oshawa, a result consistent with the observation for extreme cold indices by Mohsin and Gough (2013). They looked at the changes in both cold and warm extreme indices in the GTA and identified statistically significant decreasing trend for cold days (percentage of days when Tmax is below 10th percentile) at Toronto Pearson and Richmond Hill and cold nights (percentage of days when Tmin is below 10th percentile) for Toronto Pearson, Richmond Hill, and Oshawa. More generally, Mohsin and Gough (2013) found that decreasing trends in cold extreme metrics were ubiquitous for the GTA and appeared not to be influenced by urbanization as were the extreme warm measures. This work however did reveal for the most extreme, and in some respects most visible, extreme metric urbanization does play a mitigating role.

The results from the TSA analysis, which estimates the magnitude of the observed trends for both maximum and minimum extreme temperature counts, are shown in Table 3. It is evident from the magnitudes that the counts for maximum temperature are increasing and counts for minimum temperature are decreasing for the study period, 1971–2000. The magnitudes for both are highest for Toronto Pearson and Richmond Hill, the two stations that experienced substantial urbanization during the study period (Mohsin and Gough 2010). This is also consistent with the analysis of coldest nights reported in Mohsin and Gough (2013). For this metric, there is no temporal trend for Toronto and Toronto Island during the 1971 to 2000 period whereas there is a positive trend (increase in coldest night temperature) for the other stations. The general observation of the effect of urbanization on temperature is for maximum temperature the trend tends to increase and for minimum temperature the trend tends to decrease, which is also consistent with the TSA results for all stations.

Table 3 Results from Theil-Sen approach (TSA)

We now turn to the other two statistical tests, the decanal t tests and Fisher’s exact test. The results for the former are reported in Tables 4 and 5 and the latter in Tables 6 and 7. For all stations, there are no statistically significant changes between decades for Tmax. For Tmin, consistent with the results of the Mann-Kendall test, Toronto Pearson shows significant differences between each of the decades. Toronto and Toronto Island show no significant differences. For Richmond Hill, the comparison between the 1970s and the 1990s indicates a significant difference in the reduction of cold temperature records. This was also true for Oshawa. In addition, at Oshawa, there was a significant difference between the 1970s and 1980s. Once again, the three stations on the fringe of the GTA show significant differences whereas the two stations located near the center of urbanization do not.

Table 4 Statistical significance analysis for extreme maximum temperature counts
Table 5 Statistical significance analysis for extreme minimum temperature counts
Table 6 Tmin Fisher’s exact test
Table 7 Tmax Fisher’s test

The Fisher exact test identifies the years in which the reported number of records departs significantly from the expected value of 12. In examining the five stations (Table 6) for Tmin, 3 years provide significant departures from expectation. These are 1972, 1976, and 1992. The former two occur in the early years of the time frame as expected from the other results. In 1976, it was statistically significant for all five stations, indicating a regional phenomenon of sufficient strength to be unaffected by local features such as level of urbanization and proximity to the Lake. Similarly, 1992 had higher counts than expectation for all stations and significantly so for Toronto. As noted above, 1992 was cooler than normal due to the lingering effects of the Mt. Pinatubo explosion in 1991. There also appears to be a Mt. Pinatubo legacy effect for 1993 and 1994 where record counts were at or above expectation, although there may be other explanations. For values significantly lower than expectation, these are not recorded until 1985 when the first two appear for Toronto and Toronto Island. From 1990 onwards, the five stations had at least 2 years and as many as 6 years that were significantly lower than expectation. This is particularly true for 1995 and onwards where, of the 30 values (6 years for five stations), three of them were statistically significantly below expectation, and all but two were below expectation. The two above expectations were only marginally so (13, 14.5).

For Tmax (Table 7), as with the other statistical tests, the story is more nuanced. Two ubiquitous features are the statistically significant higher counts for 1988 at all five stations and the opposite for all stations in 1992. One interesting aspect of this is that the 1988 Tmax records were not mirrored by similar records the same year for Tmin. In fact, Tmin records for 1988 were close to expectation. The cooling of 1992 although more evident for Tmax was experienced for both Tmax and Tmin. In 1996, it was also notable for a lack of Tmax records. The only other low years occurred in the early part of the study period, particularly 1972 which also had significant cold records for Tmax. Interestingly, 1976 for Tmax is close to expectation (11 to 16), although it was universally and significantly above expectation for Tmin.

Finally, we examine the impact of the missing data on the analysis. We successively examined the four stations with missing data and flagged events in which three or more of the remaining stations experienced a record extreme during the missing data periods. On two occasions, extremes for maximum temperature (at least three of the remaining stations) occur during missing data periods. The first occurred during May 1979 during a missing data period for Oshawa. To examine the impact, we examine the most sensitive measure, the results from the Fisher’s exact test. For 1979, the Oshawa extreme maximum record was 8.5 which while below the mean value of 12 was well above the statistical significance value of 5. Adding one more count to this record would move it toward the mean value and further away from a significant departure. The second event occurred in 2000 during a time when Toronto Island was not reporting. The extreme count was 9.5, and thus, an increase of one count would not affect the statistical significance. The impact on minimum extremes was more detailed, and we examine these station by station. For Pearson, the month of December 1992 is missing; however, none of the other stations had record extremes for that month. Throughout 1993, there was some missing data. On four of these days, September 30, October 11, November 25, and December 27, there were record minimum extremes at least three of the remaining stations. For Pearson, in 1993, there were 11 record temperature minima, the smallest number among the stations. If this is increased by the four missing days identified above, this increases 15 records which is consistent with the other stations. This potential change in count does not affect the classification status (remains within the range of 5 to 22). For Richmond Hill, there are three days in which extreme minimum records were recorded in at least three of the other stations during missing days. One occurs in April 1971 and the other two in May 1992. Adding one to the number of records for 1971 for Richmond Hill increases the total to 7. This leads to no change in classification. This is also true for 1992 with the number of records potentially increasing to 18 with no change of classification. There are 4 days in November 1976 in which minimum records occurred in at least three of four stations but were missing days for Oshawa. For Oshawa, 1976 had a statistical significant number of records. Adding these four potential days would strengthen that relationship and therefore cause no change in classification. In addition, there was a record-breaking day in May 1979 during an Oshawa missing data period. Increasing 1979 by one for Oshawa will not change its classification. The missing days at Toronto Island did not correspond with a widespread (three or more other stations) minimum record extremes. The Toronto station had no missing data. Thus, although there were potentially as many as four missed records in a given year among the stations, the missing data did not contribute to a potential mislabeling of the statistical significance of record extremes.

6 Conclusions

Extreme temperature records often provide fodder for the public discourse on climate change. Extremes which in the context of this research are defined as those events which lie outside the range of previous lived experience provide a unique connection to the broader issue of climate change which is often characterized by an increase in the mean global temperature. This work provides a link between the two for the Greater Toronto Area providing analyses which are consistent with other extreme temperature research which focused on threshold exceedances, and not absolute extremes. We posed two research objectives. The first was to explore the utility of using record extremes as a metric for detecting a changing climate and the second to assess the impact of urbanization on the record extremes.

To this end, we examined extreme temperature records in the Greater Toronto Area (GTA) for the time period of 1971 to 2000. This time period is consistent with other work using the same time period but using other measures of temperature extremes (Mohsin and Gough 2010, 2013).

In a randomly varying climate with a fixed mean annual temperature, the expectation is there would be approximately 12 records for a given temperature variable per year for a 30-year climate record. Significant deviations from this value are indicators that the climate is changing. Using three different statistical tests, a coherent story of a change in minimum temperature was detected consistent with other extreme temperature measures (Mohsin and Gough 2013). The impact was more evident for weather observing stations in the fringe of the GTA suggesting that urbanization is a significant driver. Mohsin and Gough (2013) though reported that the trends in cold extreme metrics in general were largely uniform across the GTA and surrounding area, being less sensitive to urbanization than the trends in extreme warm indices. However, in this work, the trends in record cold extremes were a function of urbanization. A more nuanced picture emerged for Tmax with no significant trends over the time period, although individual years did deviate significantly from expectation. The year 1992 is clearly linked to a global cooling that resulted from the 1991 eruption of Mt. Pinatubo and demonstrated both a lack of extreme warm records and a surplus of cold records. In general, the signal of urbanization was more evident in the last part of the study period (1990–2000) consistent with Mohsin and Gough (2010).

The use of extreme temperature records adds another robust diagnostic tool in characterizing how a changing climate is locally manifested in a way that links closely to how climate is experienced and represented in public discourse.