Application of gene expression programming for seasonal rainfall forecasting in Western Australia using potential climate indices

Islam, Farhana; Imteaz, Monzur Alam

doi:10.1007/s00382-023-06764-0

Application of gene expression programming for seasonal rainfall forecasting in Western Australia using potential climate indices

Open access
Published: 09 April 2023

Volume 62, pages 2779–2806, (2024)
Cite this article

Download PDF

You have full access to this open access article

Climate Dynamics Aims and scope Submit manuscript

Application of gene expression programming for seasonal rainfall forecasting in Western Australia using potential climate indices

Download PDF

1059 Accesses
1 Altmetric
Explore all metrics

Abstract

This study presents the development of rainfall forecast models using potential climate indices for the Kimberley region of Western Australia, using 100 years of rainfall and climate indices data for four rainfall stations. Three different modeling techniques: multiple linear regression (MLR), autoregressive moving average with exogenous input (ARIMAX), and gene-expression programming (GEP) were applied to develop prediction models. Preliminary analysis suggests that Western Tropical Indian Ocean (WTIO) and Southern Oscillation Index (SOI) have significant impacts on summer rainfall generation for the region. Developed models’ performances were evaluated using Pearson correlation coefficient ($r$), root mean square error ($RMSE$), mean absolute error $(MAE)$, Nash–Sutcliffe efficiency $(NSE)$, and refined Willmot index of agreement (${d}_{r}$). It is found that the GEP model exclusively outperforms the other two alternatives. In the calibration period, the GEP model resulted in a Pearson correlation coefficient (r) values ranging from 0.76 to 0.85, which are significantly higher than that achieved from MLR (0.32 to 0.44) and ARIMAX (0.53 to 0.83) models, while for the validation period, the correlation values for the models ranged from 0.74 to 0.87 for GEP, 0.35 to 0.51 for MLR and 0.59 to 0.77 for ARIMAX models. Considering other statistical error statistics it can be concluded that the GEP model is the best representative seasonal rainfall forecasting model for the region.

New Approaches for Estimation of Monthly Rainfall Based on GEP-ARCH and ANN-ARCH Hybrid Models

Article 05 October 2017

Modeling daily reference ET in the karst area of northwest Guangxi (China) using gene expression programming (GEP) and artificial neural network (ANN)

Article 11 August 2015

Gene-Expression Programming for Short-Term Forecasting of Daily Reference Evapotranspiration Using Public Weather Forecast Information

Article 04 August 2017

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Reliable forecasting of rainfall variability has always been a kind of special interest in meteorology, engineering hydrology as well as in agricultural economy. Rainfall forecasting could play an important role in making investment and management decisions and risk management policies in many sectors including agriculture, water management infrastructures, coastal and disaster management, and their preparedness plans. Such forecasting has always been challenging as too many factors are involved in the generation of rainfall, therefore it is understood that the prediction outcome may not be of optimum accuracy. However, having a forecast several months in advance can be a privilege, which may offer a certain scale of flexibility to the stakeholders to take timely decisions as well as mitigate associated risks of damage.

At present, two methods namely: dynamic method and statistical method are widely used to predict future rainfalls (Goddard et al. 2001). Dynamic models are encoded with the physics of the ocean, the interaction of the land and its atmosphere, which requires the most recent data of the present scenario and supercomputer resources to run the ensemble model and perform a computationally intensive calculation. Thus, all these intrinsic and sophisticated requirements made the dynamic method comparatively complex, expensive, and operationally time-consuming (Evans et al. 2020; Schepen et al. 2012). On the other hand, the statistical model requires long-term uninterrupted data to evaluate the relationship between the response variable and significantly contributing predictor variables. Therefore, the statistical model is relatively simple that requires less development time and supercomputer resources. While compared with dynamic models, statistical models are found to be widely preferred over their counterparts, due to the simplicity of the formation and easy-to-use application. Furthermore, dynamic models have not shown significant prediction performance over simple statistical models despite using high-tech resources (Abbot and Marohasy 2014; Mekanik et al. 2016).

To date, several statistical techniques have widely been used to develop rainfall prediction models. These techniques include both linear and non-linear approaches. Seasonal rainfall events being a complex phenomenon, requires analyzing both linear and non-linear relationships for its prediction. Among the linear techniques, multiple linear regression (MLR) is the most popular approach that was widely used by many researchers, hydrologists, and climatologists (Hossain et al. 2018b; Islam and Imteaz 2019; Mekanik et al. 2013; Rasel et al. 2016). Among non-linear techniques, artificial intelligence (AI) based models such as an artificial neural network (ANN), adaptive neuro-fuzzy interference system (ANFIS), support vector machine (SVM), genetic programming (GP), and gene expression programming (GEP) have drawn immense attention and been successfully applied in rainfall, streamflow, and rainfall-runoff forecasting. It is understood that non-linear techniques have superior capability of explaining the underlying non-linear relationships among the variables which are found as unexplained via linear regression.

By far, ANN is the most used nonlinear statistical approach which reveals the presence of existing nonlinear relationships (either visible or hidden) among the variables. ANN has been used to model and simulate complicated time series, weather forecasting, rainfall-runoff modeling, and other hydrological and meteorological prediction models (Akhtar et al. 2009; Chiang and Chang 2009; Esha and Imteaz 2020; Hossain et al. 2020; Thirumalaiah and Deo 2000; Yilmaz et al. 2011). Despite the successful application of ANN in capturing non-linear mechanisms, researchers are often reluctant to use ANN on broad scales due to its consequential fundamental disadvantages. ANN is labeled as a black-box model as it is not capable to provide the function structure and any definite function or equation on how to calculate the output. Moreover, ANN models are deemed as complex and the outcomes are not easily interpretable (Gandomi and Alavi 2013; Hashmi et al. 2011). In parallel, GP has emerged as the most popular alternative technique to overcome the drawbacks of the ANN (Koza 1994). The main advantages of GP over ANN are its capability of capturing knowledge from the experimental data without making assumptions and finally providing a prediction equation (Alavi and Gandomi 2011). The structure of the equation is simple, which facilitates its further use for hand calculation for daily design practices (Gandomi and Alavi 2013). An extended version of GP has also received global attention in the field of structural engineering, water resources, and hydrology, known as gene expression programming (GEP) (Ferreira 2001).

The use of GP and GEP has received great exposure in many hydrological and meteorological analyses around the world. The application of these techniques had a wide variety spanning from scouring prediction for hydraulic structures (Azamathulla 2012; Azamathulla and Ghani 2010; Azamathulla et al. 2010; Guven et al. 2008), water demand forecasting (Shabani et al. 2018), estimating evapotranspiration (Shiri et al. 2013, 2014a, 2014b), rainfall-runoff modeling (Drecourt 1999; Fernando et al. 2012a, 2012b; Khu et al. 2001; Savic et al. 1999), and spatial interpolation of data (Adhikary et al. 2016b, 2016a). A recent study showed that the GEP model offered higher efficiency in predicting specific return period events compared to the Regional Flood Estimation (RFE) method for Auckland, New Zealand (Zorn and Shamseldin 2015). In that study, the authors reported a relative error of the GEP model in flood estimation for 10 and 100 years period are 29% and 18% respectively, whereas the RFE model resulted in an error of 48% and 44% respectively. Another study used the GEP technique to model a stage-discharge relationship, where the GEP model was recommended as it outperformed traditional methods such as regression analysis and stage-discharge rating curve (Guven and Aytek 2009). Prior to that, genetic programming was applied to forecast El Nino3.4 time series that illustrated a prediction up to 12 months in advance (De Falco et al. 2005).

Several previous studies were conducted on explaining Australian rainfall variability that revealed a strong teleconnection between climate drivers and Australian rainfalls in different regions (Cai et al. 2011; Chowdhury and Beecham 2013; Feng et al. 2010; Fierro and Leslie 2013; Ghamariadyan and Imteaz 2020, 2021a, b; Hossain et al. 2018a; Islam and Imteaz 2019, 2020; Kirono et al. 2010; Marshall and Hendon 2014; McBride and Nicholls 1983; Mekanik et al. 2013; Risbey et al. 2009; Taschetto and England 2009; Tularam 2010). However, these teleconnections often depend on the geographical location of the site and varies with different seasons (Risbey et al. 2009). Therefore, sound knowledge of the climate drivers and their influence on localized rainfall events can facilitate predicting the trend of the seasonal rainfall. For Australia, Pacific Ocean SST anomalies have shown a high influence on rainfall generation in tropical and eastern regions, whereas, Indian Ocean SST anomalies play a key role in rainfall generation in southern and western regions. To be precise, Indian Ocean Dipole (IOD) and Southern Annular Mode (SAM) have been found as influential drivers for rainfall generation in south-eastern and western parts, Blocking highs for southern parts, and ENSO Modoki and Madden Julian Oscillation (MJO) for north-western and northern parts (Ashok et al. 2003a, 2003b; Marshall and Hendon 2014; Rasel et al. 2016; Risbey et al. 2009; Schepen et al. 2012; Taschetto and England 2009; Tibaldi et al. 1994; Ummenhofer et al. 2008). Among all these drivers, in general ENSO grouped indices were found as the major contributor to rainfall generation all over Australia (Montazerolghaem et al. 2016).

To evaluate the teleconnection between climate drivers and Australian rainfall variability, some of the studies considered the entire Australian seasonal rainfalls (Cai et al. 2011; Drosdowsky and Chambers 2001; Forootan et al. 2016; Kirono et al. 2010; McBride and Nicholls 1983; Risbey et al. 2009; Schepen et al. 2012), where the rest kept their studies restricted within a zone such as Queensland (Abbot and Marohasy 2012, 2014; Tularam 2010), South Australia (Chowdhury and Beecham 2013; Kamruzzaman et al. 2017; Nicholls 2010; Rasel et al. 2016; Tozer 2014), South West Western Australia (England et al. 2006; Evans et al. 2020; Feng et al. 2010; Islam and Imteaz 2020; Ummenhofer et al. 2008), and South East and East Australia (Mekanik et al. 2013; Murphy and Timbal 2008; Verdon et al. 2004). Rainfall in different locations can be generated via the interaction among different climate drivers within the region. Under such circumstances, localized prediction can be made with maximum precision and accuracy, therefore, a localized prediction is preferred as it considers the engagement of local dominant factors, resulting in reliable model development that depicts efficient prediction performance.

Current literature suggests that most of the attempts related to seasonal rainfall forecasting in Western Australia (WA) were region-based, with a majority of them were developed for South West Western Australia (Cai and Cowan 2006; England et al. 2006; Feng et al. 2015; Smith et al. 2000; Ummenhofer et al. 2008). Apart from these, a limited number of investigations were made on Central West Western Australian (CWWA) rainfall and North West Australian (NWA) rainfall variability (Feng et al. 2013; Fierro and Leslie 2013; Lin and Li 2012; Rotstayn et al. 2012). Among the studies performed in NWA, Rotstayn et al. (2012) evaluated and confirmed the influence of aerosol and greenhouse gas for an increase in summer rainfall. This was further consolidated by Shi et al. (2008), where they investigated the dynamics of the observed trend towards increased rainfall and compared the observed trend with model forced with increasing aerosol. Their study also reported an increment in NWA rainfall due to high and low sea level pressure (SLP) anomalies. In conjunction with this, an increase in NWA summer rainfall (December to February) was found to be relative to tropical Atlantic atmospheric vertical motion and southern Indian ocean climate indices (Feng et al. 2013; Lin and Li 2012). Surprisingly to date, none of the studies considered both SST and SLP-based ENSO indices and Western Tropical Indian Ocean (WTIO) index as contributors for NWA rainfall events. This study aimed to cover that gap and investigated the influence of lagged relationships among the climate indices on seasonal summer rainfall (December-January–February) variability in the Kimberley region of North West Western Australia (NWWA) using three different techniques, MLR, ARIMAX, and GEP. It should be emphasized that this is the first time such a GEP technique has been used to forecast long-term seasonal rainfall in Australia. Used GEP tool will provide some equations for forecasting summer rainfall in the region several months in advance, which can be easily used by the stakeholders without having expert knowledge for different agro-economic decision-making, as well as formulating polocies for the mitigation of damage due to flooding/drought.

2 Data and study area

The Kimberley region of Western Australia has been selected for this study due to its tropical positioning of the land in the north and diversified contribution in agriculture production, fishing and mining industry, construction, tourism, and retail trade for both the state of WA and the country. The main agricultural area in Kimberly is around 14,000 hectares around the Ord River Irrigation Area (ORIA) which makes an annual economic contribution worth 87 million Australian Dollars (AUD) in the Australian economy. Additionally, this region is popular for pastoral leases that create employment in remote areas mostly for the aboriginal community.

At present, the Kimberley region holds approximately 80% of freshwater resources in Western Australia, where most of the towns are getting their water being supplied from bore fields. Due to the pastoral nature of the inland and being dependent on limited freshwater resources, the region is vulnerable to saltwater intrusion and flooding due to sea-level rise associated with extreme weather events such as tropical cyclones during the summer season (December-January–February). This has created a demanding necessity of developing reliable rainfall prediction models for the region so that the associated adverse effects can be tackled down to save lives with minimal social and economic loss. This study considered the NWWA’s Kimberley region’s main rainfall season (summer rainfall events) to develop prediction models. Figure 1 and Table1 illustrate the study area and geographical location of selected rainfall stations. Four rainfall stations from the Kimberley region were selected considering uninterrupted data availability with fewer missing values.

Table 1 Overview of the selected rainfall stations in the Kimberley region

Application of gene expression programming for seasonal rainfall forecasting in Western Australia using potential climate indices

Abstract

Similar content being viewed by others

New Approaches for Estimation of Monthly Rainfall Based on GEP-ARCH and ANN-ARCH Hybrid Models

Modeling daily reference ET in the karst area of northwest Guangxi (China) using gene expression programming (GEP) and artificial neural network (ANN)

Gene-Expression Programming for Short-Term Forecasting of Daily Reference Evapotranspiration Using Public Weather Forecast Information

1 Introduction

2 Data and study area

3 Methodology

3.1 Multiple linear regression

3.2 Univariate autoregressive integrated moving average with exogenous input (ARIMAX)

3.3 Gene expression programming

3.3.1 An overview of gene expression programming algorithms

3.3.1.1 Open reading frames (ORFs) and genes

3.3.1.2 Multigenic Chromosomes

3.4 GEP model development for rainfall forecasting

3.5 Performance metrics

4 Results and discussion

4.1 Preliminary analysis

4.1.1 Single correlation analysis

4.2 MLR model development

4.2.1 Multiple linear regression analysis

4.3 ARIMAX model development

4.3.1 Exogenous input/ predictors, ARIMA order, and transfer function input selection

4.3.2 ARIMAX model development and selection of best forecast model

4.4 GEP model

4.4.1 GEP model development and selection of best forecast model

4.5 Comparisons between MLR, ARIMAX, and GEP model

5 Conclusion

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Data availability

Code availability

Ethics approval

Consent to participate

Consent for publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation