Keywords

1 Introduction

Australian rainfall is extremely variable with episodes of drought that often end with extreme flooding. During the austral summer of 2010–2011 flooding impacted Queensland with the capital city Brisbane inundated and 85% of Queensland coalmines either closed entirely or operating with restricted production [1, 2]. By May 2011, Queensland’s coal mining sector had recovered to only 75% of its pre-flood output, with a loss of A$5.7 billion, equivalent to 2.2% of Queensland’s gross state product for the financial year ending June 2011. A report prepared for Australia’s National Climate Change Adaptation Research Facility following that extreme event, concluded that available seasonal rainfall forecasts from the Australian Bureau of Meteorology (BOM) are not useful, lacking localised information, and other micro-details, to enable focused pro-active planning and risk management [1].

Intra-seasonal, inter-annual and decadal variability in Queensland rainfall has been linked to complex physical phenomena remote to the Australian land mass [3]. These phenomena, manifesting as recurrent patterns in sea surface temperature (SST) and air pressure, are described numerically by climate indices. The dominant phenomenon is El Niño-Southern Oscillation (ENSO), which can be described quantitatively by measuring the departure from the long-term average of sea surface temperature over specified Niño region of the Pacific Ocean. The terms La Niña and El Niño originally comes from the Spanish and represents opposites, with El Niño events mostly associated with below average rainfall in eastern Australia and warmer waters in the eastern Pacific. The extreme rainfall in the austral summer of 2010–11 was linked to an extraordinarily strong La Niña event [4].

There are five designated Niño regions spanning the tropical Pacific Ocean from which anomalies in SST are calculated. For widespread global climate variability, Niño 3.4 is commonly preferred, because the sea surface temperature variability in this region is thought to have the strongest effect on shifting rainfall in the western Pacific, which in turn modifies the location of the heating that drives other major global atmospheric circulation patterns. Relationships between weather patterns, particularly rainfall, and ENSO phenomena have been explored in many part of the world including Australia [57], north America [8, 9], the north Atlantic European region [10], China [11] and India and West Africa [12].

Over the last three decades, a whole suite of different models with varying degrees of complexity have been developed for ENSO prediction. They are generally categorised as (i) purely statistical models that depend on finding patterns in historical data, (ii) physical models that rely on simulating ocean-atmospheric interactions, and (iii) hybrids of the statistical and simulation models [1315]. Most of the research effort has been into the fully physical coupled ocean-atmospheric models that run on supercomputers, however, their skill at forecasting remains comparable to simple statistical models [1517].

Data on ENSO is available back to the mid-1800s, with the predictability of El Niño and La Niño varying across decades. For example, the predictability of these events, as measured by anomalous correlations and root mean square error (RMSE), is considered low for the period 1936–1955, while highest scores are achieved for the periods 1876–1895 and 1976–1995 [18]. Predictability also varies on a seasonal basis, with the austral autumn/boreal spring considered a period when there is a relatively small signal-to-noise ratio and is now known as the spring predictability barrier (SPB) [1719]. The SPB has been extensively studied in the context of general circulation models (GCMs) [2022].

Yan and Yu [23] found that although taking the mean of an ensemble of 10 GCMs models reduced the SPB, it still remained a feature. Duan and Wei [24] explain that the SPB model errors may come from many different sources, such as model parameter errors, the uncertainties of some physical processes, errors in external forces, and the uncertainties of the computation scheme, without concluding with a definitive determination of which type of model error plays the dominant role in producing prediction uncertainties.

This paper details an investigation into the use of artificial neural network (ANN) software to forecast Niño values for the period 1987 to 2013, and then the incorporation of these values into an ANN model to forecast monthly rainfall for Nebo, a locality in the Bowen Basin that was severely impacted by the strong La Niña event during the Austral summer of 2010–11. This represents an extension of previous investigations into the application of ANNs for monthly rainfall forecasting in Queensland [2528], in particular through the incorporation of both lagged and forecast values for the full complement of Niño regions. The forecast Niño 3.4 values are benchmarked against output from other statistical models and also general circulation models.

2 Materials and Method

ANNs are massive, parallel-distributed, information-processing systems with characteristics resembling the biological neural networks of the human brain. The mathematical fundamentals of neural networks and specific applications in hydrology, including rainfall, have been reviewed in the two-part series ASCE Task Committee on Application of Artificial Neural Networks in Hydrology [29]. The ANN models used in this study for forecasting Niño values, and also rainfall, were developed using NeuroSolutions 6 for Excel software (NeuroDimensions, Florida, USA). This software provides great versatility in the architecture of neural networks that can be deployed. For the purposes of this study, a limited number were tested in a preliminary investigations, without extensively attempting to optimise the ANN configuration.

For forecasts of the Niño indices, principal component analysis (PCA) was first deployed using an unsupervised neural network, followed by a supervised neural network comprising a multi-layer perceptron, with one hidden layer. The PCA component of the network consisted of a Sanger synapse which linearly projected the input onto a smaller dimensional space, while preserving maximum intensity of the original signal. This reduced dimension means fewer weights for the supervised network to follow, improving generalisation. Optimisation occurred over 6000 epochs, equally split between the unsupervised and supervised components.

For forecasts of rainfall, some preliminary exploratory testing was undertaken using neural networks with multilayer perceptron architectures with up to three hidden lawyers. However, it was found that superior results were obtained, with Jordan and Elman networks, as shown by lower RMSE in the training and test data sets. Jordan and Elman networks extend the MLP incorporating context units, processing elements that remember past activity. Context units enable the network to extract and utilise temporal information contained in the data. In the Elman network, the activity of the first hidden processing element is copied to the context unit, whereas for the Jordan network the output of the network is copied.

For forecasts of Nebo monthly rainfall reported in this study, a Jordan neural network was selected. For each input data set, the artificial neural network was optimised over 3000 epochs using a genetic optimisation algorithm for 10 or 20 generations. For both Niño forecasts and rainfall forecasts, for every run the total data was divided into training (70%), evaluation (20%) and test sets (10%). The test set was not used in network training, but was important in the choice of the final model. Pearson correlation coefficients (r), Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) were used to compare the skill of the rainfall forecast from the best model for each ANN run against observed values for the test period.

The site of Nebo was chosen because of its proximity to a coal mine in the Bowen Basin and because it is a site with over 120 years of historical rainfall data available from the BOM (station 033054). Surface air temperature data for Nebo is not available, and so maxima and minima data from the Te Kowai Experimental Station in Mackay (station 033047), approximately 100 kms to the northeast of Nebo, were used with data available from 1908.

The first part of the study focused on forecasting ENSO by forecasting SST anomalies from four regions designated Niño 1 + 2 (0–10 S, 80–90 W), Niño 3 (5 S–5 N; 150 W–90 W), Niño 3.4 (5 S–5 N; 170 W–120 W) and Niño 4 (5 S–5 N: 160 E–150 W).

The other climate indices used for the rainfall forecasts were the Southern Oscillation Index (SOI), which is another measure of ENSO calculated using the pressure difference between Tahiti and Darwin [3]; the Inter-decadal Pacific Oscillation (IPO), which is thought to modulate the influence of ENSO on rainfall along the Australian east coast [30, 31]; and also the Dipole Mode Index (DMI), which is a coupled ocean and atmospheric phenomenon in the equatorial Indian Ocean [32].

Values for the Niño indices, SOI and DMI were sourced from the Royal Netherlands Meteorological Institute Climate Explorer, which is a web application that is part of the World Meteorological Organisation and European Climate Assessment and Dataset project. Values for the IPO were provided by Chris Folland from the United Kingdom’s Met Office Hadley Centre.

In order to forecast the lead Niño values, fours unary data sets corresponding to Niño 4, Niño 3.4, Niño 3 and Niño 1 + 2 were initially constructed, each comprising the current monthly value, plus twelve lagged values for the previous twelve months. Niño data for the period January 1871 to 1987 was used as the training period, and then forecasts produced for the period August 1987 to August 2013.

The four Nino unary sets were used as input to forecast each of the four Niño SST values in turn, commencing with forecasts for lead month 0. An iterative process was then applied, so that forecast values were added to each unary set to provide the input data for a forecast corresponding to the following month. In this way, a one-month forecast for each of the four Niño-values was produced one-month to twelve months in advance.

A new unary dataset designated as “Ninos” was constructed comprising both lead and lagged values for each Niño index. Seven monthly values of each Niño index contributed to Ninos, comprising lag 3, lag 2, lag 1, current, lead 0, lead 1 and lead 2. Thus Ninos comprised 28 columns of data. The full complement of available lagged Ninos was not tested because input of very large datasets tends to degrade performance of ANN models.

In order to forecast rainfall for Nebo, the desired output, the observed rainfall, was assigned as the monthly rainfall with a lead-time of three months ahead of the current month (lead month 2). The test period, which is also the forecast period, was initially set for 137 months from August 2000 to December 2011 that included the exceptionally wet astral summer of 2010–2011. Five unary input data sets were constructed corresponding to monthly values of the DMI, SOI, IPO, maximum atmospheric temperature (MaxT), and minimum atmospheric temperature (MinT). Each unary data set comprised the current monthly value, plus twelve lagged values for the previous twelve months. The unary set Ninos has already been described.

Unary, binary and ternary combinations of these unary sets were used as inputs to forecast rainfall. A total of 15 combinations were tested.

3 Results and Discussion

3.1 Forecasting ENSO with an ANN

The skill of a model, whether a GCM or ANN, at predicting climate indices or rainfall, as a variable for a single locality or region can be measured by the RMSE. This is the difference between the observed and forecast values [33, 34]. The lower the RMSE the smaller the difference, and therefore the more accurate the forecast. There is an extensive literature evaluating RMSE relative to other statistical techniques [35, 36]. Acknowledging that this statistical measure gives higher proportionate weight to large errors, we consider it suitable for our type of data given the highly variable nature of Queensland rainfall and ENSO events and the importance of being able to accurately forecast extreme events.

Barnston et al. [17] used the RMSE between the forecast and observed succession of running three-month mean SST anomalies for Niño 3.4, as a measure of the skill of eight statistical and 12 dynamic general circulation models to forecast ENSO. In each case the forecast period commenced immediately after the latest available observed data value. The area between the lines designated upper and lower limit define the boundaries of the individual output from these 20 models, as illustrated in Fig. 1. We have benchmarked output from our ANN against the envelope of values shown in Fig. 1. The ANN Niño 3.4 forecast was for the shorter interval of one month making it a more ambitious forecast. Nevertheless the ANN consistently produced a more skilful forecast than all of the models reviewed by Barnston et al. [17] except for our forecast with a lead of 2 months, which was equivalent to the best forecasts of the models reviewed by Barnston et al. [17], Fig. 1.

Fig. 1.
figure 1

A comparison of the RMSE for the ANN model versus the 20 models reviewed by Barnston et al. 2012 for Niño 3.4 forecasts

The specific years over which an ENSO forecast is made are considered important when measuring skill with the period 2002–10 and early to middle 1990s considered the more challenging years of the last thirty [18]. Using the ANN, we forecast for the 26 years from 1987 to 2013, which incorporates the more difficult periods.

Our ANN model is most skilful at forecasting Niño 4, and least skilful at forecasting Niño 1 + 2, as shown in Fig. 2. The RMSE for our ANN model increases as the lead time is increased from 0 to seven months, after which it is surprising constant as shown in Fig. 2, more so than for many other models [17, 19, 37].

Fig. 2.
figure 2

RMSE as a function of lead-time for all seasons combined for 12 months. All Niño values generated from the ANN model.

3.2 The Spring Predictability Barrier

Studies using some GCMs and hybrid models show low skill at forecasting ENSO events during the boreal spring with high RMSE for the months of March to June, this is referred to as the Spring Predictability Barrier (SPB). It is an often-discussed characteristic of ENSO forecasts [3840]. The SPB phenomenon is illustrated in Fig. 3 using results from the ensemble of forecasts reviewed in Zheng and Zhu [19]. The SPB corresponds to the months of March to May when the RMSE is in the vicinity of 1.0, compared to much lower values during other times of the year, for example July to September when the RMSE falls to values of about 0.3.

Fig. 3.
figure 3

RMSE of ensemble forecasts for Niño 3.4 based on Zheng and Zhu 2010 versus RMSE of ANN forecast, both with 9-month lead-time.

Output from our ANN model shows no such drop in skill for the March to May period, that is the SPB is not present, Figs. 3 and 4. The values in Fig. 4 were normalised by dividing the RMSE values by the corresponding absolute error for the corresponding month. This produced some evidence for a barrier, but associated with the summer months around June, July and August rather than the spring March, April May.

Fig. 4.
figure 4

Normalised RMSE of ANN forecast with 9-month lead-time

The SPB is discussed most often in the literature with reference to values of Niño 3.4. The skill of the forecasts for Niño 4 from our ANN is highest (lowest RMSE) from April through to June, while the skill of the forecast for Niño 3 is highest (lowest RMSE) in February, as shown in Fig. 5.

Fig. 5.
figure 5

Plot of RMSE against month for forecasts of Niño 3.4, 3 and 4 generated by the ANN

In a review of ENSO predictability, Barnston et al. [17] have presented a set of contour plots of RMSE for Niño 3.4 showing variation with lead time and target season for 12 GCMs and 8 statistical models. Inspection of these figures suggests that forecast skill for each model does indeed vary with the time of year. However, the concept of a forecast barrier associated universally with the spring (March to May) is not as ubiquitous as suggested. Focussing on forecasts with 6 to 9 months lead-time, some models including the Scripps Hybrid Coupled Model do indeed exhibit higher RMSE in the boreal spring period. However, other GCMs including the University of Maryland Intermediate Coupled Model and the Australian POAMA GCM exhibit lower RMSE during the first part of the year compared to the latter part. Generally, the statistical models show a large variation in temporal designation of periods representing lower predictive skill for Niño 3.4 within the year. For example, the NOAA CPC-CA statistical model clearly shows higher predictive skill during the first half of the year.

Although the SPB has been extensively researched for many years, particularly in the context of GCMs, a satisfactory explanation remains unresolved [24]. Investigations have suggested that the SPB is probably a result of errors in the models themselves, with both initialisation errors and parameter errors being implicated [41]. Others have attempted to rationalise the SPB in terms of physical phenomena [20]. Results from the present study, together with an objective assessment of the results from a diversity of GCMs and statistical models would tend to suggest that different models have different profiles of skill over the annual cycle, and the emphasis on the SPB is a consequence of the prominence given to output from certain types of GCMs, rather than a real physical phenomenon.

3.3 Forecasting Rainfall for Nebo, Queensland

The unary dataset designated Ninos was used alone and in binary and ternary combinations to forecast rainfall for Nebo with a lead-time of three months ahead of the current month. Consistent with our previous studies using ANN to forecast rainfall in Queensland [25], combinations incorporating local maximum and minimum temperature gave superior rainfall forecasts, with the highest r value and lowest RMSE, as illustrated in Table 1.

Table 1. Combinations of input variables used in the ANN model to forecast Nebo monthly rainfall with a 3-month lead

Figure 6 compares observed rainfall (mm) with forecast rainfall (mm) for Nebo from September 1996 to January 2012. The forecast was output from an ANN model after inputting current monthly values and 12 lagged values for minimum and maximum temperatures and the unary Nino combination of lagged and lead values.

Fig. 6.
figure 6

Comparing observed rainfall (mm) with forecast rainfall (mm) for Nebo from September 1996 to January 2012

A visual comparison of the observed versus forecast rainfall for the ternary combination, including the lagged and lead values for the Ninos with the current and lagged temperature inputs, indicates that the ANN consistently forecast too much rain for the drier months, as shown in Fig. 6. The ANN was able to give some indication that the astral summers of 1996–97, 2007–2008 and 2010–2011 were going to be wetter than average, but it did not forecast the exceptionally wet August of 1998 or December 2003 or adequately give an indication of the magnitude of the very wet months of February 1997 or December 2010.

Sensitivity analysis of the output indicated that the most influential inputs determining rainfall are Niño 3 (Lag 1), Niño 3.4 (Lag 1) and Niño 4 (Lead1).

4 Conclusions

Extreme rainfall during the austral summer of 2010–11 is linked with an extraordinarily strong La Niña [4]. This ENSO event was inadequately forecast by the BOM, and resulted in significant flooding across eastern Australia including major disruptions to coal mining operations in the Bowen Basin [1].

There has been a significant investment in GCMs over the last three decades and a major global research effort focused on improved ENSO forecasts. Barnston et al. [17] argue that GCMs can now outperform statistical models in their skill at forecasting ENSO, while Halide and Ridd [16] suggest that despite their complexity and the superior computational power of the super computers used to run them, GCMs are no better at forecasting ENSO than very simple models. Barnston et al. [17] and Halide and Ridd [16] both acknowledge problems associated with forecasting through the boreal SPB, with Barnston et al. [17] suggesting this will eventually be overcome as “science and engineering continues to advance” while Halid and Ridd [16] suggests the solution lies in better understanding the physical phenomena that result in anomalous warming and cooling of particular regions of the equatorial Pacific.

These, and other review papers [8, 1921], fail to adequately consider the potential advantage of using the most advanced statistical models currently available, which are ANNs that have been developed largely independently of the mainstream climate science community. ANNs require a different skill set for implementation than GCMs but since at least 2006 has shown potential for forecasting ENSO with more skill than GCMs and simple statistical models [42]. Consistent with Wu et al. [42] the results from our study indicate that the lower RMSE for ANNs is at least in part a consequence of ANNs being able to forecast through the SPB.

Translating the improved ENSO forecast into an improved rainfall forecast provides an additional level of complexity. In previous studies we have shown that ANNs can produce a superior monthly rainfall forecast for localities in Queensland relative to output from the official GCMs [2528]. The Bureau was unable to provide forecasts to enable benchmarking for the locality of Nebo in this study. We recognise that our best forecast measured in terms of lowest RMSE for the period from September 1996 to January 2012 has limitations, but it does demonstrate an improved skill through the inclusion of lead as well as lag values for Niño regions.