Introduction

Time-series analysis deals with real-world phenomena under the concept of dynamic dependencies. Most of the researchers focus on the linear time-series models because of their long history of successful applications as well as straightforward calculations and good approximation. Moreover, linear processes and models are often adequate in making inferences about the time-series-related phenomena as these models dominate the research from the past decades. Probably the linear autoregressive (Linear AR) model and its branches are the most extensively used time-series model to predict future values with the help of a linear combination of past values. The simple idea of stochastic deference in the autoregressive model can deliver accurate forecasts together with a random error in a series. But there is an issue of the biasness in terms of forecasting the high frequent or long-term time series and financial data, e.g. hourly temperature data, four-minute stock price data (Olson and Wu 2020). Conversely, nonlinear models can give better approximation and significant contribution in numerous cases, especially in this era of nonparametric and computer-intensive modelling. Moreover, from the literature it is seen that the application of nonlinear time-series models for financial prediction is increasing enormously because of its property of characterization of the asymmetric dynamics of data. Any deviation from the causal time-series linear models produces the nonlinearity. This variation can lead to certain restrictions in exploration, which result in different approaches to tackling the substantial nonlinear world of different classes (Tsay and Chen 2018). As a consequence, there is an issue of the accurate forecast, as many researchers argue with some case studies. For example, Teräsvirta (2006) conclude that the linear models give better forecast than nonlinear models. Alternatively, Montgomery et al. (1998) showed that in the case of the regime-switching forecast, the nonlinear models give better estimates; however, Dacco and Satchell (1999) found weak outlook. Meanwhile, White (2006) mention that nonlinear models have issues of complicated computation, overfitting, and difficult interpretation in practice for multi-step-ahead forecasts. So, there is no constrain conclusion about the better performance between linear and nonlinear models. However, the interest in nonlinear threshold time-series models was steadily increasing after addressing. Still, over previous decades, it has been getting more attention because of its application in economic time series, mainly because of its state-dependent or regime-switching behaviour (van Dijk et al. 2002). In modern time-series analysis, there is a bunch of models to handle the nonlinear time-series data that can broadly divide into the parametric and nonparametric. One can approach both parametric and nonparametric means of projection for univariate nonlinear time-series data. To make the comparison and multi-steps predictions, selected univariate parametric regime-switching threshold autoregressive models have considered in this study.

The regime-switching Threshold Autoregression (TAR) model is a nonlinear time-series model found in the literature with life cycle and jumping phenomena as a significant feature. The modern TAR modelling approach allows two or more branches or regimes governed by the values of the threshold variable. The available models in the literature may contain two or multiple schemes where movements between regimes governed by an observed variable. Also, TAR models become popular after the publication of Tong and Lim in 1990, though Tong first proposed it in the literature (Tong 1978, 1990). There is a variant of regime-switching threshold autoregressive models, among them, Self-exciting Threshold Autoregressive (SETAR) and smooth transition autoregressive (STAR) is needed to mention. Moreover, the STAR model has two popular variants named Logistic STAR (LSTAR) and Exponential STAR (ESTAR).

In 1991, the SETAR model, which might have considered as the extension of piecewise linear regression with structural changes in threshold space (Tong 1990; Tong and Yeung 1991). The SETAR model possibly the most widely used TAR model, which often named as segmented linear regression. There is a substantial application of SETAR model as it used to deal with the assets market prices, exchange rate forecasts, water usage for rice irrigation, currency incomes and currencies, GDP and others relative cases (Ismail and Isa 2006; Tong and Yeung 1991; Kräger and Kugler 1993; Tiao and Tsay 1994; Potter 1995, 1997; Chan and Tsay 1998; Clements and Smith 1999, 2001; Feng and Liu 2003; Umer et al. 2018). Conversely, the comparative study, along with the other models, was conducted by several researchers. Among them, in the study of Consumer Price Index of Lithuania, Export Volume Index, and Domestic Producer Price Index Series in Turkey and Industrial Production Index (IPI) of four major European countries have studied through the SETAR model in recent years. (Bratčikovienė 2012; Aydin and Güneri 2015; Boero and Lampis 2017). Another extended version of the popular SETAR model has introduced in the early nineties to deal with the existence of nonlinearity named smooth transition autoregressive (STAR) model (Teräsvirta 1994, 1996; van Dijk et al. 2002). The mechanism of governing the transition between regimes makes the main differences between STAR and SETAR model. The STAR model aims to identify the speed of transition between regimes by a transition function. It also determines the threshold level both endogenously as well as exogenously. Transition function includes two major types as a logistic and exponential function, which is popularly known as Logistic STAR (LSTAR) and Exponential STAR (ESTAR) model. The STAR model consists of three-stage as specification, estimation, and evaluation, which make the model more convenient, but the constant-coefficient may create a problem to measure the volatility (Teräsvirta 1998).

The brief idea of the STAR model was discussed in several articles (Teräsvirta et al. 1994; Teräsvirta 1994, 1996; Potter 1999; van Dijk et al. 2002). The STAR model has vast applications in different fields of study. For instance, Istanbul market efficiency, disinflations of Australia, Canada, and New Zealand, Swedish business cycle, the real exchange rate of G-10 countries, industrial production, and 47 macroeconomic variables of the G7 economies have investigated through STAR model (Sarantis 1999; Skalin and Teräsvirta 1999; Leybourne and Mizen 1999; Bradley and Jansen 2004; Teräsvirta et al. 2005; Antwi et al. 2019). There is a bunch of researches that compare the linear models with the threshold models, including SETAR, LSTAR, and ESTAR model. Boero and Marrocu (2002) showed the superiority of the STAR model against linear models. Chu (2008) shows the outperforms of the STAR model comparing to linear models. Moreover, the variant of the STAR model also used to deal with the nonlinear pattern, particularly most of the case study, which compares the linear and nonlinear models includes LSTAR and SETAR, in their analysis. Furthermore, many studies only consider the specific version of STAR models; for example, in the study of the Bucharest Stock Exchange (BET) index and the asymmetric behaviour of the quarterly unemployment rate focuses on the LSTAR model (Rothman 1998; Acatrinei and Caraiani 2011). Alternatively, Artificial neural networks (ANNS) and deep learning models become very popular in recent years because of its many successful applications towards different sectors. The characteristics of approximation in any arbitrary close nonlinear function and the nature of detecting truly nonlinear dynamic relationships without the complexity of dealing with parameter results increased the popularity of ANNs. Moreover, to ignore the complexity of parameters, ANNs often considered as pattern recognition and forecasting ‘black box’ models (Franses et al. 2000). There is a bunch of available neural network models to deal with time-series data. Among them, a straightforward architecture-based feed-forward neural network with a single hidden layer have considered in this study, which denoted as ANNs.

Evidence from the above discussion indicates that, through the application of the Linear AR (Linear AR) model, and regime-switching threshold AR (TAR) models, no constraint conclusion could have drawn, as several comparative studies suggest diverse models as a suitable model to deal with the time-series data. Moreover, there is an ample amount of research exists in the literature which deals with the nonlinearity of economic and financial data. Still, in reality, the existence of nonlinearity not only found in economic and financial data but also in other dynamic time-series data like meteorological data. Furthermore, there is a bunch of studies available in the literature which includes the Markov regime-switching and other regime-switching models for forecasting of wind speed as well as other time-series dynamics. (Haldrup and Nielsen 2006; Janczura and Weron 2010; Reikard 2010; Lerch and Thorarinsdottir 2013; Chen and Bunn 2014; Song et al. 2014; Allen et al. 2020; la Torre-Torres et al. 2020; Oscar et al. 2020; Ouyang et al. 2020). The authors of this article did not find any research article which includes any of the threshold regime-switching autoregressive models among SETAR, LSTAR, and ESTAR for the application of meteorological data. However, this study intended to deal with the nonlinearity of the selected meteorological variable of the capital Bangladesh (Dhaka); through the comparison of selected linear and nonlinear models. The focus of this study to introduce the nonlinear regime-switching threshold autoregressive time-series models in the meteorological field through the comparison of Linear AR, SETAR, LSTAR, and ANNs.

Methodology

Data sources

This study considers three variables namely daily Average, minimum, and maximum temperatures of Dhaka, Bangladesh for carried out the analysis. The dataset of the mentioned variables has been collected from Bangladesh meteorological department over the period January 1971 to May 2019. As these data contain missing values, the missing values have replaced by previous and subsequent 10 days average value.

Methods

This study involves the test of nonlinearity, identification of the parameters of the model, and comparison among the selected models along with 20-day forecast of the selected variables. Both the means of the parametric and nonparametric nonlinearity tests have included in the study. As mentioned by Tsay and Chen (2018), the BDS test, and Mcleod-Li and Engle test included among nonparametric tests. Similarly, the parametric test F-test, and Keenan and RESET tests considered for testing the nonlinearity. Linear autoregressive (Linear AR), SETAR, LSTAR, and ANNs models have selected to make a comparison between linear and nonlinear models. The main focus of this study is to introduce the nonlinear regime-switching parametric time-series model to the meteorological variable; thus, the SETAR and LSTAR parametric model have selected among the nonlinear regime-switching threshold autoregressive time-series models. Most basic ideas of linear autoregressive from the linear worlds and artificial neural network-based algorithms from the popular nonparametric nonlinear world have considered in this study. Since the linear autoregressive model is the basic starting of any linear time-series model, it has assumed that the particular extension of linear models like autoregressive integrated moving average (ARIMA) and other models may work correspondingly. Similarly, the underlying single hidden layer-based feed-forward neural network is the most prior ideas of nonparametric artificial intelligence segment. More advanced algorithms may apply in the further study if there is any evidence of better fitting of this algorithm. The nonlinear world boosted massively from the last few decades. There is a bunch of univariate nonparametric models including kernel smoothing, splines, wavelet smoothing, and many more. There is evidence of applying these methods also in the literature. But there are no constraints pieces of literature and application noticed for the parametric nonlinear models for the weather variables. Hence, this study introduces these models in the branch of weather forecasting. However, further study can be made by including the extension of linear AR models like ARIMA, nonparametric univariate models, and other advanced neural-network-based time-series algorithm along with the regime-switching threshold autoregressive models. Following mathematical illustration involved the basic idea of tests, models, and algorithms. For a further and detailed study, readers suggest seeing the referred articles and books. Since the linear Autoregressive (Linear AR) is the most used and trendy applied model for time-series analysis, the explanation and mathematical demonstration have not discussed in the following part.

Nonlinearity test

The BDS test The BDS test consider the null hypothesis of independent and identically distributed (iid) random variables of a time series with the help of correlation integral (Broock et al. 1996). Roughly, correlation integral is a popular idea of chaotic time-series analysis where repeated temporal patterns have measured in frequency. The embedding \( m \) dimensional correlation integral can define as,

$$ c(m,\varepsilon ) = \mathop {\lim }\limits_{{T_{m} \to \infty }} \frac{2}{{T_{m} (T_{m} - 1)}}\sum {\sum\limits_{m \le s < t < T} {I\left( {x_{t}^{m} ,x_{s}^{m} \left| \varepsilon \right.} \right)} } $$
(1)

where \( T \) denotes the sample size of a time series \( \left\{ {x_{t} \left| {t = 1, \ldots ,T} \right.} \right\} \) with \( m \) positive integers. The \( m \)-history have defined as \( x_{t}^{m} = \left( {x_{t} ,x_{t - 1} , \ldots ,x_{t - m + 1} } \right) \) with \( T_{m} = \, T - \, m \, + \, 1 \) number of constructed \( m \)-history. In Eq. (1), \( \varepsilon \) is a given positive real number with \( I(\mu ,v\left| \varepsilon \right.) \) indicator variable. For testing the nonlinearity, the correlation integral \( c(m,\varepsilon ) \) and 1st-history \( c(1,\varepsilon ) \) have compared with the intuition that if \( \left\{ {x_{t} } \right\} \) is iid then there is no existing pattern in the data under the independence and mth power of the corresponding probability of 1st-history. The BDS test can be defined as,

$$ D(m,\varepsilon ) = \frac{{\sqrt T \left( {\widehat{c}(m,\varepsilon ) - \left\{ {\widehat{c}(1,\varepsilon } \right\}^{m} )} \right)}}{s(m,\varepsilon )} $$
(2)

where \( \widehat{c}(k,\varepsilon ) = \mathop {\lim }\limits_{{T_{k} \to \infty }} \frac{2}{{T_{k} (T_{k} - 1)}}\sum {\sum\limits_{k \le s < t < T} {I\left( {x_{t}^{k} ,x_{s}^{k} \left| \varepsilon \right.} \right)} } \) and the standard error \( s(m,\varepsilon ) \) can be estimated under the null hypothesis consistently from the data. For more detail’s reader are suggest to see the reference paper (Brock 1987; Broock et al. 1996; Tsay 2010).

McLeodLi and Engle tests The general portmanteau test with the assumption of a weakly stationary \( x_{t}^{2} \) process has applied to residual \( \hat{a}_{t} \) of an fitted time-series model was proposed by McLeod and Li, which known as McLeod-Li test of testing nonlinearity (McLeod and Li 1983). The \( lag - l \) autocorrelation of squared residual with \( T \) sample size defined as,

$$ \hat{\rho }_{aa} = \frac{{\sum\nolimits_{t = l + 1}^{T} {\left( {\hat{a}_{t}^{2} - \hat{\sigma }} \right)\left( {\hat{a}_{t - 1}^{2} - \hat{\sigma }} \right)} }}{{\sum\nolimits_{t = 1}^{T} {\left( {\hat{a}_{t}^{2} - \hat{\sigma }} \right)} }} $$
(3)

where \( \hat{\sigma }^{2} = \sum\nolimits_{t = 1}^{T} {{\raise0.7ex\hbox{${\hat{a}_{t}^{2} }$} \!\mathord{\left/ {\vphantom {{\hat{a}_{t}^{2} } T}}\right.\kern-0pt} \!\lower0.7ex\hbox{$T$}}} \) and fixed positive integer \( m \) with the joint distribution \( \sqrt T \left[ {\hat{\rho }_{aa} \left( 1 \right),\hat{\rho }_{aa} \left( 2 \right), \ldots ,\hat{\rho }_{aa} \left( m \right)} \right]^{\prime } \) is asymptotically multivariate normal with mean zero and identity covariance matrix. To test the nonlinearity, McLeod–Li proposed the portmanteau statistics with the adequate fitted linear model and fourth-order stationarity as,

$$ Q^{*} (m) = T(T + 2)\sum\limits_{\iota }^{m} {\frac{{\hat{\rho }_{aa}^{2} \left( t \right)}}{T - \iota }} $$
(4)

It is mentionable that \( Q^{*} (m) \) has asymptotically distributed as \( \chi_{m}^{2} \) which is essentially a Ljung-Box test of the \( x_{t}^{2} \) process. For autoregressive conditional heteroscedastic (ARCH) model, often \( Q^{*} (m) \) is equivalent to the Lagrange multiplier test of Engle, where an \( AR(m) \) model has used with the error term \( \varepsilon_{t} \) (Engle 1982). The \( AR(m) \) model defined as,

$$ \hat{a}_{t}^{2} = \beta_{0} + \beta_{1} \hat{a}_{t - 1}^{2} + \cdots + \beta_{m} \hat{a}_{t - m}^{2} + \varepsilon_{t} $$
(5)

With the consideration of \( F \)-statistics the null hypothesis and alternative could have tested to test the ARCH effect with \( H_{0} :\beta_{1} = \beta_{2} = \cdots = \beta_{m} {\text{ vs }}H_{a} :\beta_{a} \ne 0{\text{ for i}} \in \left\{ {1,\ldots,m} \right\}. \) Moreover, one can use \( mF \) as test statistics with \( \chi_{m}^{2} \) distribution.

Keenan and Ramsey RESET Test Most of the parametric nonlinearity tests based on zero-mean stationary Volterra time series consider that if some of the higher-order is nonzero, then the series becomes nonlinear. In 1969, Ramsey proposed a specification RESET test for testing with the linear \( AR\left( p \right) \) model (Ramsey 1969). This model considering \( x_{t - 1} = (1,x_{t - 1} ,\ldots,x_{t - p} )^{\prime} \) and \( \phi = (\phi_{0} ,\phi_{1} ,\ldots,\phi_{p} )^{\prime} \) as,

$$ x_{t} = x^{\prime}_{t - 1} \phi + a_{t} . $$
(6)

The testing procedure consisting of three steps where the fitted value \( \hat{x}_{t} \), residuals \( \hat{a}_{t} \), the sum of squared residuals \( {\text{SSR}}_{0} = \sum\nolimits_{t = p + 1}^{T} {\hat{a}^{2}_{t} } \), and least square estimate \( \hat{\phi } \) have been obtain in the first step. In the second step, least square residuals \( \hat{v}_{t} \) and the sum of squared residual \( SSR_{1} = \sum\limits_{t = p + 1}^{T} {\hat{v}^{2}_{t} } \) have computed for the linear regression model,

$$ \hat{a}_{t} = x^{\prime}_{t - 1} \alpha_{1} + M^{\prime}_{t - 1} \alpha_{2} + v_{t} . $$
(7)

The test of nonlinearity of \( AR\left( p \right) \) stated in Eq. (6) could draw in third step with the F test, where the conclusion is drawn for the coefficient \( \alpha_{1} {\text{ and }}\alpha_{2} \) of Eq. (7). The acceptance of the null hypothesis of the zero coefficient \( (\alpha = 0) \) state the linearity of \( AR\left( p \right) \) model. The \( F \) – statistics could define as,

$$ F = \frac{{{\raise0.7ex\hbox{${\left( {{\text{SSR}}_{0} - {\text{SSR}}_{1} } \right)}$} \!\mathord{\left/ {\vphantom {{\left( {{\text{SSR}}_{0} - {\text{SSR}}_{1} } \right)} g}}\right.\kern-0pt} \!\lower0.7ex\hbox{$g$}}}}{{{\raise0.7ex\hbox{${{\text{SSR}}_{1} }$} \!\mathord{\left/ {\vphantom {{{\text{SSR}}_{1} } {\left( {T - p - g} \right)}}}\right.\kern-0pt} \!\lower0.7ex\hbox{${\left( {T - p - g} \right)}$}}}} ;\,g = s + p + 1 $$
(8)

with the degrees of freedom \( g \) and \( T - p - g \) under the linearity and normality assumption. On the contrary, to avoid the multicollinearity between \( \hat{x}^{2}_{t} \) and \( X_{t - 1} \) of Eq. (6), Keenan introduces the nonlinearity test, where modified \( \hat{x}^{2}_{t} \) are used in the second step of RESET test (Keenan 1985). According to the Keenan modification, the fitted linear regression in Eq. (7) has used to remove the linear dependence of \( \hat{x}^{2}_{t} \) on \( X_{t - 1} \) with the computation of the estimated residual \( \hat{\mu }_{t} \). To test the zero coefficient \( (\alpha = 0) \), the sum of squared residual \( SSR_{1} = \sum\limits_{t = p + 1}^{T} {(\hat{a}_{t} - \hat{\mu }_{t} \hat{\alpha })^{2} } = \sum\limits_{t = p + 1}^{T} {\hat{v}^{2}_{t} } \) has obtained from the linear regression model,

$$ \hat{a}_{t} = \hat{\mu }_{t} \hat{\alpha } + v_{t} . $$
(9)

The F test With the inclusion of half stacking vector of on and below elements of the diagonal matrix \( vech(x_{t - 1} x^{\prime}_{t - 1} ) \), the different choice of regressor \( M_{t - 1} = vech(x_{t - 1} x^{\prime}_{t - 1} ) \) has introduced by Tsay to improve the RESET and Keenan’s test (Tsay 1986). Tsay nonlinearity test uses the partial least square \( F \) statistics in the linear least square regression (9) with error term \( e_{t} \) to test the coefficient \( \alpha = 0 \), where \( F \)-statistics follow \( F \) distribution with \( g \) and \( T - p - g - 1 \) degrees of freedom. Laukkanen and others proposed different alternatives and extensions, where they suggest \( x_{t - i}^{3} ; \, i = \, 1,\ldots,p \) term for \( M_{t - 1} \) (Luukkonen et al. 1988).

$$ x_{t} = x^{\prime}_{t - 1} \phi + M^{\prime}_{t - 1} \alpha + e_{t} . $$
(10)

Nonlinear time-series models

SETAR model

A time-series \( x_{t} \) follows a two-regime TAR model for the threshold variable \( x_{t - d} \) of order \( p \) with a sequence of iid random variables \( \varepsilon_{t} \) is satisfying the following equation along with mean zero and unit variance.

$$ x_{t} = \left\{ \begin{aligned}\begin{array}{l} \phi_{0} + + \sum\limits_{i = 1}^{p} {\phi_{i} x_{i - 1} + \sigma_{1} \varepsilon_{t} ,{\text{ if x}}_{t - d} \le r,} \hfill \\ \theta_{0} + \sum\limits_{i = 1}^{p} {\theta_{i} x_{i - 1} + \sigma_{2} \varepsilon_{t} ,{\text{ if x}}_{t - d} { > }r , { }} \hfill \\\end{array} \end{aligned} \right. $$
(11)

where \( \theta_{i} \) and \( \phi_{i} \) are real-valued parameters such that \( \theta_{i} \)\( \ne \)\( \phi_{i} \) for some \( i \) with the theoretical delay \( d \) and threshold \( r \). The most straightforward class of nonlinear models involves the piecewise linear regression model for the estimation. As a consequence, the SETAR model is the simplest particular case of the TAR model. In a recent study, several improvements have been done in case of estimation and prediction of numerous parameters of the TAR model. Among the several types of SETAR models, only the two-equation based SETAR model with the two linear sub-model have mentioned in the following (Tong 1990). The previously defined TAR model could have been written with a slightly different version by the high-degree with an integer lagging value \( d \) as,

$$ x_{t} = \left\{ \begin{aligned}\begin{array}{l} \phi_{1,0} + \sum\limits_{i = 1}^{p} {\phi_{1,i} x_{t - 1} + \sigma_{1} \varepsilon_{t} ,{\text{ if x}}_{t - d} \le r, \, } \hfill \\ \phi_{2,0} + \sum\limits_{j = 1}^{p} {\phi_{2,j} x_{t - 1} + \sigma_{2} \varepsilon_{t} ,{\text{ if x}}_{t - d} { > }r ,} \hfill \\ \end{array} \end{aligned} \right. $$
(12)

with the two autoregressive levels \( p_{1} \) and \( p_{2} \), the simplified model can be obtained through the assumption \( p_{1} = p_{2} = p ; { }1 \le d \le p \) for the defined \( TAR\left( {2,p_{1} ,p_{2} } \right) \) model with \( d \) lagged. Finally, the simplified first-degree SETAR model with the \( \phi \) autoregressive parameters and standard noise deviation \( \sigma \) can be defined as,

$$ x_{t} = \left\{ \begin{aligned} \phi_{1,0} + \phi_{1,1} x_{t - 1} + \sigma_{1} \varepsilon_{t} ,{\text{ if x}}_{t - d} \le r, \hfill \\ \phi_{2,0} + \phi_{2,1} x_{t - 1} + \sigma_{2} \varepsilon_{t} ,{\text{ if x}}_{t - d} \le r, \hfill \\ \end{aligned} \right. $$
(13)

To know more details about the properties, estimation, and multi-step forecasting of the SETAR model, readers suggested reading referred articles and book chapters (Tong 1978, 1990; Teräsvirta 1996; Rothman 1998; Tsay and Chen 2018).

LSTAR model

As discussed before, the Smooth Transition Autoregressive Model (STAR) model is a similar kind of TAR model with a different mechanism of transition in regime with a ranged transition function. To define the STAR model, TAR(p) model defined in Eq. (10), can rewrite with the indicator variable \( I(y) \) as,

$$ \begin{aligned} x_{t} & = (\phi_{0,1} + \phi_{1,1} x_{t - 1} + \cdots + \phi_{p,1} x_{t - 1} + \sigma_{1} \varepsilon_{t} )[1 - I(x_{t - d} > r)] \\ & \quad + (\phi_{0,2} + \phi_{1,2} x_{t - 1} + \cdots + \phi_{p,2} x_{t - 1} + \sigma_{2} \varepsilon_{t} )I(x_{t - d} > r), \\ \end{aligned} $$
(14)

where \( d \) is the delay parameter and \( {\text{x}}_{t - d} \) is the threshold variable and the step \( I({\text{x}}_{t - d} { > }r) \) function governs the transition from one regime to another regime.

Any time series \( {\text{x}}_{t} \) will follow the two-regime STAR model with transition function \( G(s_{t} \left| {\gamma ,c)} \right. \) where \( s_{t} = {\text{x}}_{t - d} \) if it satisfies the following equation.

$$ \begin{aligned} x_{t} & = (\phi_{0,1} + \phi_{1,1} x_{t - 1} + \cdots + \phi_{p,1} x_{t - 1} + \sigma_{1} \varepsilon_{t} )[1 - G(x_{t - d} > r)] \\ & \quad + \, (\phi_{0,2} + \phi_{1,2} x_{t - 1} + \cdots + \phi_{p,2} x_{t - 1} + \sigma_{2} \varepsilon_{t} )G(x_{t - d} > r) + a_{t} . \\ \end{aligned} $$
(15)

where \( \gamma \) and \( c \) is the scale and location parameter with the condition \( 0 \le G(s_{t} \left| {\gamma ,c)} \right. \le 1 \), and \( a_{t} \) is the iid sequence of random noises with mean zero and variance \( \delta_{a}^{2} > 0. \) The exponential, standard Gaussian and logistic transition functions result in different types of STAR models. For instance, by considering the logistic function, the transition function can be defined as,

$$ G(s_{t} \left| {\gamma ,c)} \right. = \frac{1}{{1 + \exp [ - \gamma (s_{t} - c)]}}. $$
(16)

where \( G(s_{t} \left| {\gamma ,c)} \right. \to 1{\text{ if }}\gamma (s_{t} - c) \to \infty \) and the resulting model can define as a logistic STAR or LSTAR model. The iterative building of the model, identification, and smoothing details can found in a suggested reference where readers can make their clear concept about the LSTAR model (Teräsvirta et al. 1994; Tsay and Chen 2018).

It needs to be mentioned that the determination of the regime is relevant at regime base nonlinear threshold models as there are two regimes and multiple regime base models available in the literature. The graphical methods of determining the regime found effective in literature. This study involves the determination of the regime through the graphical method with the help of a smoothing function. The local smoothing method considers the scatter plot to have an initial idea about the regime. Moreover, to determine the autoregressive lag order, the popular Partial Autocorrelation function (PACF) and maximum likelihood estimation of determining the order also used to determine the parameter of linear AR model. Similarly, the identification of the hyperparameters of the SETAR, and LSTAR model includes the identification of theoretical delay, maximum lag order for the selected regime, and many more. Gonzalo and Pitarakis (2002) deliberate a procedure of selecting these parameters through a grid search procedure with the involvement of pooled AIC and p-values which can be implemented by the R package tsDyn (Gonzalo and Pitarakis 2002; Narzo et al. 2020).

Artificial neural network (ANNs)

The Neural network can be used in prediction as it is a nonparametric statistical advancement in calculating power and algorithm. Among several algorithms, the vanilla feed-forward network most widely used in modern time-series analysis. However, the single hidden layer feed-forward neural network ANNs have considered dealing with the nonlinearity of selected weather variables of Bangladesh. Franses et al. (2000) compare and show the relations with different TAR based models. However, the ANNs aim to model the nonlinear relationship, where the interpretation of the regime is not focused, as the determination of switching done through the particular linear combination of the \( p \) lagged variable over the vector \( x_{t} \). Finally, the summarized neural network model denotes with linear output, \( D \) hidden units and activation function \( g \) as follows,

$$ x_{t + s} = \beta_{0} + \sum\limits_{j = 1}^{D} {\beta_{j} g} \left( {\gamma_{0j} + \sum\limits_{i = 1}^{m} {\gamma_{ij} } x_{t - (i - 1)d} } \right). $$
(17)

The Reader can see the book of Franses et al. (2000) and Ripley (1996) to know the summarized the derivation, properties, and estimation as well as the network nomenclature (Venables and Ripley 2002). The hidden layer size also obtained by the tsDyn R package along with an iterative process (Gonzalo and Pitarakis 2002; Narzo et al. 2020).

Results and analysis

The analysis considers two significant schemes. Some prior exploratory analysis, nonlinearity test, and determination of hyperparameter of the models were involved in the first part, and the comparison of applied models, as well as the multi-step prediction, deliberated in the second part.

One common tactic is to determining the autoregressive order of lag order to deal with the autoregressive models. However, there are several approaches available in the literature to deal with the autoregressive order. The most used PACF function and maximum likelihood estimation (MLE) of selecting autoregressive orders have used and found 12 as an adequate autoregressive order for average, maximum, and minimum temperatures. The estimated order of 12 is selected from MLE with estimated sigma-square value 2.0766, 2.086, and 2.0606, respectively (Fig. 1).

Fig. 1
figure 1

PACF of selected variable

As the focus of this study to deal with the nonlinearity of weather variables, the test of nonlinearity is vital before exploring the appropriate model. The BDS test, Mcleod-li and Engle test; Ramsey RESET and Keenan test; and the F test have performed, and almost all of the tests confirm the existence of nonlinearity (Table 1). As mentioned in the methodology section, the BDS test tries to reject the null hypothesis of an iid random variable of a given time series with three embedding dimensions, if there is an existence of nonlinearity. The significant p values (less than 0.05) confirm the rejection of the null hypothesis for the BDS test. In Mcleod-li and Engle test, the considered portmanteau test statistics with significant p value (less than 0.05) confirm the nonlinearity (Table 1). Furthermore, there is evidence of the ARCH effect existence as Mcleod-li portmanteau statistics are similar to the popular Ljung-Box test statistics and Engle Lagrange multiplier test. Likewise, Keenan and RESET and the modified F test also state the nonlinearity as the depicts the significant test statistics (Table 1).

Table 1 Values from nonlinearity tests

The next task is to get an idea about regimes as threshold autoregressive models acknowledged as the regime-switching model. There may have an existence of two regimes or multiple regimes according to the nature of the data. The initial idea about the regime can be revealed through the smoothing kernel function as this study consider the local smoothing function demonstrated in the preliminary papers of Tsay (Tsay 1986). Though SETAR and LSTAR model got a useful algorithm for determining the nonlinearity and regime, the graphical presentation may give more light on shades about the confirmation of regimes. However, the scatter plot with the smoothing function estimation for the selected data indicate the existence of two-regime base threshold models (Fig. 2).

Fig. 2
figure 2

Regime confirmation through local smoothing function

After the confirmation of the regime, the hyperparameter of the applied models needs to determine to find a better-fitted model. Gonzalo and Pitarakis (2002) discussed the hyperparameter estimation of threshold autoregressive; one can read their article to know more about that. The threshold models required the theoretical delay to apply the model. However, the author of this paper used the commands and procedure of selecting hyperparameter according to the documentation of R package tsDyn, where the automatic selection of hyperparameters like theoretical delay and variables, maximum and minimum lag value and coefficients for the lagged time series is performed smoothly with the procedure of grid search (Narzo et al. 2020). The possible combinations of the theoretical values have tested the specified hyperparameters for the applied threshold models and selected variables. The hyperparameter selects through the grid search algorithm along with the famous model selection criteria and p values.

The focus of selecting the hyperparameter is because of determining the theoretical delay through the lowest AIC value. For average temperature with the other parameter, the theoretical delay has searched between positive integer values from 1 to 11 and found that 8 as the appropriate number of delays with the lowest AIC value 55,868.12 for both SETAR and LSTAR model. The number of delays has been selected with the existence of a maximum lag value of order 1 for both low and higher regimes. Furthermore, the hidden layer size has selected for the ANNs with a grid search procedure for average temperature.

Similarly, for the maximum and minimum temperature, the threshold delay has selected as 8 is the maximum lag order for low and 1 as for high regimes for both SETAR and LSTAR models. Likewise, the ANNs’ hidden layer size has selected for both maximum and minimum temperatures as 17. Moreover, iterative graphs are produced for the SETAR and LSTAR model for all the selected variables. Only the grid search plot of the SETAR model of average and maximum temperature with the representation of the 10 best-fitted hyperparameters among the bunch of combinations have displayed here (Fig. 3).

Fig. 3
figure 3

Hyperparameter selection through the grid search for average (left) and maximum (right) temperature for SETAR model

After having the hyperparameter, the next task is to find the best-fitted models among the applied models for all the selected temperature variables. Consequently, the models have applied through the selected hyperparameters and find the best-fitted model subsequently with the comparison through the popular Akaike Information Criteria (AIC) and Mean Absolute Per cent Error (MAPE). Though the three decimals value of MAPE looks similar for SETAR, Linear AR, and LSTAR in the study, however, the decimal values after three-digit make sense as the values show the minimality for every case along with the least AIC values. Among the models considered in this study, the LSTAR model nominated as the best-fitted model for the average temperature with the AIC and MAPE values 6397.77 and 0.03635184, respectively. Correspondingly, for both the maximum and minimum temperatures, the LSTAR model has selected as the most appropriate model as they consist of the lowest AIC and MAPE value (Table 2).

Table 2 Model Selection Criteria’s

Now, for the average temperature the fitted LSTAR model for the low and high regime with the residual variance \( \delta^{2}_{\varepsilon } = 1.443 \) can be written as,

$$ x_{t} = (1.53 + 0.94x_{t - 1} )[1 - G(x_{t - 8} \left| { \, 0.5601,27.4} \right.6)]{ + }(9.49 - 0.31x_{t - 1} )[1 - G(x_{t - 8} \left| { \, 0.5601,27.4} \right.6] + \varepsilon_{t} . $$

where \( G(x_{t - 7} \left| { \, 0.5601,27.4} \right.6) = (1 + exp[0.5601(x_{t - 7} - 27.46)])^{ - 1} \) is the logistic function. The LSTAR model seems to fit well for two regimes through the significant maximum higher and lower regime of order 1, constants values, smoothing parameter \( \gamma = 0.5601 \) and threshold value 27.46. The threshold value indicates that the average temperature became higher according to the last ten years’ average temperature of Dhaka. Similarly, for the maximum temperature the LSTAR with residual variance \( \delta^{2}_{\varepsilon } = 2.768 \) model could write as,

$$ x_{t} = (1.32 + 0.72x_{t - 1} )[1 - G(x_{t - 8} \left| { \, 1.52,30.73} \right.)]{ + }(4.23 - 0.12x_{t - 1} )[1 - G(x_{t - 8} \left| { \, 1.52,30.73} \right.)] + \varepsilon_{t} . $$

where \( G(x_{t - 8} \left| { \, 1.52,30.73} \right.) = (1 + exp[1.52(x_{t - 8} - 30.73)])^{ - 1} \) is the logistic function with significant lag order for the lower regime and higher regime, respectively, with significant constant, gamma \( (\gamma ) \), and the threshold value. The threshold value for the maximum temperature is 30.73, with a smoothing parameter 1.52 indicates that the maximum temperature may have upward intensity compared to the previous year.

For minimum temperature, the significant \( \gamma = 0.637 \) and threshold value 20.08 with the logistic function \( G(x_{t - 8} \left| { \, . 6 3 7,20.8} \right.) = (1 + exp[ . 6 3 7(x_{t - 8} - 20.8)])^{ - 1} \) gives the following fitted model for significant lower and higher regime as,

$$ x_{t} = (.88 + 0.82x_{t - 1} )[1 - G(x_{t - 8} \left| { \, 0.6374,20.08} \right.)]{ + }(0.70 - 0.35x_{t - 1} )[1 - G(x_{t - 8} \left| { \, 0.6374,20.08} \right.)] + \varepsilon_{t} . $$

As mentioned before that the SETAR and LSTAR models use the F test in the time of model fitting to test the nonlinearity. Moreover, the validation of nonlinearity with a full order of the LSTAR model against the Linear AR model also done by F test and the results are presented in the following table (Table 3). Hence, it is seen that the p value for all variables considered in this study is less than 0.001. Therefore, these results may confirm the presence of nonlinearity in LSTAR model.

Table 3 Nonlinearity test value of full-order LSTAR model against full-order AR model

Finally, the 20 steps ahead forecasting plot has been done for every selected temperature variable with the selected appropriate model and compare with the all applied models along with the observed value to have more insights about the model fitting. The downward intensity comparing to the observed values for average temperatures confirms that the average temperature of Dhaka will increase eventually. It is noticeable that all the applied models, except ANNs, seem to give a similar forecast. Likewise, the maximum temperature through the LSTAR model enunciates similar results of an upward increase, comparing to the observed values of the maximum temperature. It is also mentionable that LSTAR model forecasts are slightly higher or seem to have upward intensity comparing to Linear and SETAR autoregressive model for both average and minimum temperature forecast, which may indicate that the temperatures will have intensification sooner or later. Moreover, in the case of minimum temperature, the downward trend authorizes the decline of the minimum temperature, and there is an intensity of having an upward trend in the long run (Fig. 4).

Fig. 4
figure 4

Forecast comparison of average, maximum, and minimum temperature of Dhaka

Another noticeable thing is that, though nonlinear SETAR and Linear AR models have greater AIC and MAPE values, they still give almost similar forecasts like LSTAR. The authors of this study also investigate this issue by changing all the associated parameters of the applied models with consideration of several combinations of theoretical delay, lag values, and others, but almost every case shows a similar result. Hence, the authors conclude that the best-fitted LSTAR models as statistical modelling has a particular rule of selecting the model through the model selection criteria. And believe that in a further study, this issue could reconsider to have a better weather forecast in the nonlinear world. However, another supposition could make as in case of temperature data, a smooth logistic transition autoregressive (LSTAR) model elasticities a better-fitted model, but ANNs gives a more realistic forecast.

Summary and conclusion

Most of the researcher deals with linear models because of its simplicity and reduced computational complexity. However, nonlinear data often referred to do many transformations to ensure the linearity but the main thing is that how uneven transferred information can give an accurate prediction about the real scenario. Moreover, the nonlinear models become popular maybe because of the computational advancement in the last decades, though it has a long history of evolving and application. The main objectives of this study were making the introduction of a regime-switching nonlinear threshold time-series approach for weather variables. To illustrate the application of nonlinear time series, the test of existing nonlinearity has done with the help of BDS, Mcleod-Li and Engle, Keenan and RESET, and F tests where all of them provide the same conclusion, i.e. all considered minimum, maximum, and, the average temperature having the nonlinearity property. However, nonlinearity test value of full-order LSTAR model against the full-order linear AR model also validates the existence of nonlinearity with the help of F test.

The conventional AR model was considered among the linear models as AR is the initial mathematical base of many linear models. Univariate parametric SETAR and LSTAR model were considered among the bunch of regime-switching nonlinear threshold models to compare and forecast the temperatures of Dhaka, Bangladesh. And fundamental single hidden layer base feed-forward neural network model has considered among the nonparametric nonlinear time-series algorithms with an assumption that if there is any evidence of better forecasting, further study can make with the advanced models. After the parameter estimation of includes parametric models including theoretical delay, maximum and minimum lag order, threshold value; the parametric models have applied for the contrast and comparison.

However, the models compared with the help of model selection criteria AIC and MAPE, which illustrate the LSTAR model as the best-fitted model as it stretches the minimum value of AIC and MAPE for all selected variables. The model can illustrate as the following mathematical form for average, maximum, and minimum temperatures, respectively, \( x_{t} = (1.53 + 0.94x_{t - 1} )[1 - G(x_{t - 8} \left| { \, 0.5601,27.4} \right.6)]{ + }(9.49 - 0.31x_{t - 1} )[1 - G(x_{t - 8} \left| { \, 0.5601,27.4} \right.6] + \varepsilon_{t} . \)

$$ x_{t} = (1.32 + 0.72x_{t - 1} )[1 - G(x_{t - 8} \left| { \, 1.52,30.73} \right.)]{ + }(4.23 - 0.12x_{t - 1} )[1 - G(x_{t - 8} \left| { \, 1.52,30.73} \right.)] + \varepsilon_{t} . $$
(and)
$$ x_{t} = (.88 + 0.82x_{t - 1} )[1 - G(x_{t - 8} \left| { \, 0.6374,20.08} \right.)]{ + }(0.70 - 0.35x_{t - 1} )[1 - G(x_{t - 8} \left| { \, 0.6374,20.08} \right.)] + \varepsilon_{t} . $$

With the logistic function \( G(x_{t - 7} \left| { \, 0.5601,27.4} \right.6) = (1 + exp[0.5601(x_{t - 7} - 27.46)])^{ - 1} \);\( G(x_{t - 8} \left| { \, 1.52,30.73} \right.) = (1 + exp[1.52(x_{t - 8} - 30.73)])^{ - 1} \); and \( G(x_{t - 8} \left| { \, . 6 3 7,20.8} \right.) = (1 + exp[ . 6 3 7(x_{t - 8} - 20.8)])^{ - 1} \), respectively.

Finally, the 20 steps ahead forecasting of temperatures have done through the LSTAR model and compared with the observed, SETAR, and Linear AR model forecast values for checking the forecasting accuracy.

Though the values of model selection criteria promote LSTAR as an appropriate model, still forecasting comparison, identified ANNs forecasting plot is more realistic compared to the observed values. This study also concludes that the parametric nonlinear LSTAR model elasticities a better-fitted model, but ANNs gives a more realistic forecast for the temperatures of Bangladesh. However, the forecast from the fitted model shows that the average temperature and maximum temperature will increase steadily, and the minimum temperature will decrease eventually. There is a scope of further study including the advanced ANNs algorithms, extended version of linear AR models, and other parametric as well as nonparametric nonlinear time-series models.