1 Introduction

Weather forecasting is an application of science and technology to predict the state of the atmosphere for a given location and then it plays a vital role in our daily life today. The more accuracy the weather forecast has, the lower risks human being can be faced. One of the most importance parts of weather forecasting is weather nowcasting [22]. Weather nowcasting combines a description of the current state of the atmosphere and a short-term forecast of how the atmosphere will evolve during the next several hours [7]. It is possible to forecast small features such as rainfall, clouds and individual storms with reasonable accuracy in this time range, according to [9]. Latest radar, satellite and observational data are used to make analysis of the small-scale features present in a small area such as a city and make an accurate forecast for the following few hours. However, satellite observations are the appropriate choice for all the regions that may be away from radar coverage [8, 16].

There are some typical methods which have been widely used to forecast weather from observations of satellite images namely [6, 10, 17], and [18]. Specifically, [6] used multi-channel correlation-relaxation labeling to analyze cloud motion. Melgani [10] reconstructed the context of cloud-contaminated multi-temporal and multi-spectral images. Shukla & Pal [17] proposed an approach to study the evolution of convective cells. Shukla et al. [18] proposed a method for predicting satellite image sequences combining spatiotemporal regression (STAR) model with fuzzy clustering (Fuzzy C-Means – FCM) to increase the forecast accuracy. Although this technique maybe resulted in better prediction accuracy than those of [6, 10] and [17], the forecasting output was not good enough because of the limitations of fuzzy sets such as the hesitation and vagueness. Park and Lee [14] presented an approach using fuzzy reasoning and ensemble methods to forecast red tides. In this approach, fuzzy reasoning is a prediction method that derives an approximation proposition from vagueness information and knowledge based on a fuzzy model. The ensemble method was then employed to help improving accuracy of resulting classifier and predictor. Nadig et al. [12] made the comparison of individual and combined artificial neuron network (ANN) model for prediction of air and dew point temperature. The model was developed through the Ward-style network architecture [23] consisting of a three layered neural network with input, hidden and output layers. Although prediction based on ANN could result in better accuracy, it is prevented from a number of parameters such as activation functions, number of nodes in the hidden layer, distribution of nodes between the slabs of the Ward-style model that need determining.

Recently, a generalized fuzzy set namely picture fuzzy set (PFS) has been proposed in [4]. It is a generalization of fuzzy set (FS) of [24] and intuitionistic fuzzy set (IFS) of [2] with the debut of the positive, the negative, the neutral and the refusal degrees showing various possibilities of an element to a given set. PFS has a variety of applications in real contexts such as the confidence voting and personnel selection. Deploying fuzzy rule-based systems and soft computing methods on PFS would result in better accuracy [19]. Some preliminary researches on fuzzy clustering methods on PFS or picture fuzzy clustering (PFC) in [19, 21] have clearly demonstrated the usefulness of PFS in the modeling and performance improvement over traditional fuzzy tools. Thus, our objective in this research is to design hybrid PFC methods for the weather nowcasting problem in order to achieve better accuracy.

In this paper, we propose two novel hybrid forecast methods based on picture fuzzy clustering for weather nowcasting. The contributions of this paper are:

  1. a)

    The first method named as PFC-STAR uses a combination of picture fuzzy clustering and spatiotemporal regression. The proposed PFC-STAR method consists of three steps. Firstly, PFC is used to partition the training sample into clusters. Secondly, all elements of these clusters are labeled and Discrete Fourier Transform (DFT) - based filter is embedded to clarify nonpredictable scales leading to the increase in time range of predictability.Finally, the STAR technique is used to predict the output image of weather nowcasting with two steps of determining and training weights. The training process can correct the weights to be more adaptive with the output. STAR technique is used twice in this method to produce the results with better accuracy.

  2. b)

    The second one named as PFC-PFR integrates picture fuzzy clustering with picture fuzzy rule. Picture fuzzy rule technique (PFR), proposed by [20], is a method for short term prediction from the sequent previous data. Combining this technique with PFC may result in better predicted accuracy. In PFC-PFR, fuzzy rules are used for the prediction step. Moreover, in order to forecast the output images more accurate, the parameters for defuzzified function in this method are trained with particle swarm optimization algorithm [5]. The picture fuzzy rules are then defuzzified by appropriate parameters of defuzzified function to get better predicted images.

  3. c)

    Experimental evaluation on satellite image sequences of Southeast Asia will be performed to validate the accuracies of methods. The experimental results indicate that the proposed methods are better than the relevant ones in weather nowcasting, particularly in the accuracy of rain-rate retrieval.

The rest of the paper is organized as follows. Section 2 presents the state-of-the-art method for weather nowcasting namely FCM-STAR of [18]. Section 3 describes the proposed approaches, and Section 4 validates those methods on satellite image sequences of Southeast Asia. Finally, conclusions and further works are covered in Section 5.

2 FCM-STAR

In this section, we introduce the FCM-STAR method of [18] including two standalone algorithms: Fuzzy C-Means (FCM) and Spatiotemporal Regression (STAR) that will be presented in sub-sections accordingly.

2.1 Fuzzy C-Means

Fuzzy C-Means (FCM), proposed by [3], is based on the iteration process to optimize the membership matrix and the cluster centers.

$$ J=\sum\limits_{k=1}^{N} \sum\limits_{j=1}^{C} u_{kj} \times \left\| {X_{k} -V_{j}} \right\|\to \min , $$
(1)
$$ \left\{\begin{array}{c} {u_{kj} \in [0,1]} \\ {\sum\limits_{j=1}^{C} {u_{kj}} =1} \\ \\ {k=1,...N;j=1,..,C} \end{array}\right.. $$
(2)

The objective function of FCM is defined as in (12) where,

  • m is fuzzier;

  • C is the number of clusters;

  • N is the number of data elements;

  • r is the dimensionality of the data;

  • u k j is the membership degree of data elements X k to cluster j;

  • X k R r is the k thelement of\(X=\left \{ {X_{1} ,X_{2} ,...,X_{N}} \right \}\);

  • V j is the center of cluster j.

Use the Lagrange multiplier method, the cluster centers and the membership matrix are determined in (34), respectively.

$$ V_{j}=\frac{\sum\limits_{k=1}^{C}u_{kj}^{m}X_{k}}{\sum\limits_{k=1}^{C}u_{kj}^{m}}, $$
(3)
$$ u_{kj}=\frac{1}{\sum\limits_{i=1}^{C}\left( \frac{\|X_{k}-V_{j}\|}{\|X_{k}-V_{i}\|}\right)^{\frac{1}{m-1}}} $$
(4)

Descriptions of the FCM method are shown in Table 1.

Table 1 Fuzzy C-Means algorithm

2.2 STAR

Pfeifer and Deutsch [15] proposed a linear spatio-temporal autoregressive model which was a 3D version of the regular autoregressive model (AR) and it enabled users to forecast a spatio-temporal series based on the information of its own past in space and time. The intensity of every pixel \(P\left ({x,y,t} \right )\) in an image domain can be modeled as a function “ ψ” of the intensity of neighboring pixels in space and time, with assumption of causality.

$$ P(x,y,t)=\psi(P(x+{\Delta} x_{i},y+{\Delta} y_{i},t+{\Delta} t_{i})), $$
(5)

where \(\left ({\Delta x_{i} ,{\Delta } y_{i} ,{\Delta } t_{i}} \right )\) is the neighborhood coverage in space and time with Δt k <0. Equation (5) is modified for clustering based Spatiotemporal Regression (STAR) by [18] as follow.

$$ P(x,y,t)=\sum\limits_{k=t-T}^{t-1}\sum\limits_{j=y-J}^{y+J}\sum\limits_{i=x-I}^{x+I}W_{i,j,k}^{G}P(i,j,k), $$
(6)

where 2I+1, 2J+1 and T correspond to the total number of rows, columns and frames respectively, which are included in the predictor set (neighborhood), and \(W_{i,j,k}^{G} \) is the corresponding weight with the superscript G denotes the weight for G thcluster, \(x=\overline {0,H} \), \(y=\overline {0,K} \). The weights are calculated by minimizing the function,

$$ \left\|P(x,y,t)=\sum\limits_{k=t-T}^{t-1}\sum\limits_{j=y-J}^{y+J}\sum\limits_{i=x-I}^{x+I}W_{i,j,k}^{G}P(i,j,k)\right\|\to \min, $$
(7)

where \(\left \| . \right \|\) denotes to the Euclidean distance. Equation (7) is solved by the least square method using QR factorization [13]. After finding the weights, all pixels in predicted image will be calculated by (6).

3 The proposed hybrid forecast methods

In this section, we firstly recall the picture fuzzy clustering algorithm (PFC) [19, 21] and then present two novel hybrid forecast methods for weather nowcasting namely PFC-STAR and PFC-PFR.

3.1 Picture fuzzy clustering

Suppose that we have a dataset X consisting of N data points in d dimensions. The algorithm divided the dataset into C groups satisfying the objective function below.

$$\begin{array}{@{}rcl@{}} J&=&\sum\limits_{k=1}^{N} \sum\limits_{j=1}^{C} {\left( {u_{kj} \left( {2-\xi_{kj}} \right)} \right)^{m}\left\| {X_{k} -V_{j}} \right\|^{2}}\\&& +\sum\limits_{k=1}^{N} {\sum\limits_{j=1}^{C} {\eta_{kj} \left( {\log \eta_{kj} +\xi_{kj}} \right)}} \to \min , \end{array} $$
(8)

Some constraints are defined as follows.

$$ u_{kj} ,\eta_{kj} ,\xi_{kj} \in \left[ {0,1} \right], $$
(9)
$$ u_{kj} +\eta_{kj} +\xi_{kj} \le 1, $$
(10)
$$ \sum\limits_{j=1}^{C} {\left( {u_{kj} \left( {2-\xi_{kj}} \right)} \right)} =1, $$
(11)
$$ \sum\limits_{j=1}^{C} {\left( {\eta_{kj} +\frac{\xi_{kj}} {C}} \right)} =1, \,\, k=1,...N;j=1,..,C. $$
(12)

Using the Lagrange multiplier method, optimal solutions of the systems are:

$$ V_{j} =\frac{\sum\limits_{k=1}^{N} {\left( {u_{kj} \left( {2-\xi_{kj}} \right)} \right)^{m}} X_{k}} {\sum\limits_{k=1}^{N} {\left( {u_{kj} \left( {2-\xi_{kj}} \right)} \right)^{m}}} , \,\, j=1,..,C, $$
(13)
$$ u_{kj} \,=\,\frac{1}{\sum\limits_{i=1}^{C} {\left( {2\,-\,\xi_{kj}} \right)\left( {\frac{\left\| {X_{k} \,-\,V_{j}} \right\|}{\left\| {X_{k} \,-\,V_{i}} \right\|}} \right)^{\frac{2}{m-1}}}} , \,\, k\,=\,1,...N;j\,=\,1,..,C, $$
(14)
$$ \eta_{kj} \,=\,\frac{e^{-\xi_{kj}} }{\sum\limits_{i=1}^{C} {e^{-\xi_{ki}} }} \left( {1\,-\,\frac{1}{C}\sum\limits_{i=1}^{C} {\xi_{ki}} } \right), \,\, k=1,...N;j=1,..,C, $$
(15)
$$\begin{array}{@{}rcl@{}} \xi_{kj} &=&1-\left( {u_{kj} +\eta_{kj}} \right)-\left( {1-\left( {u_{kj} +\eta_{kj}} \right)^{\alpha} } \right)^{\frac{1}{\alpha} }, \,\,\\ k&=&1,...N;j=1,..,C. \end{array} $$
(16)

Descriptions of the PFC method are shown in Table 2.

Table 2 Picture Fuzzy Clustering

3.2 PFC-STAR

In Fig. 1, we illustrate the activities of the PFC-STAR method. The input image sequences of the PFC-STAR algorithm are firstly processed by PFC algorithm and Discrete Fourier Transform (DFT) - based filter. PFC is used to partition all pixels in these images into clusters and mark them by different colors. DFT-filter is employed to remove nonpredictable scales of images and change them to the Fourier domain. Secondly, STAR technique is applied to predict the image result from all images in the Fourier domain and the clusters of pixels. It aims to find out the weights for each cluster of pixels in each image to determine the predicted image. Finally, before resulting in the final predicted image, some noises such as the salt-and-peeper are removed by the Adaptive Median Filtering method.

Fig. 1
figure 1

The PFC-STAR algorithm

3.2.1 Training method

In order to enhance the accuracy in predicted images, STAR technique is used twice for determining and training the weights. The first (N−2) images are employed as the input of STAR to calculate the weights and images from T 2 to T N−1. For example, in Fig. 2, suppose there are four images in a sequence. The first two images are used for calculating the weight to achieve center pixels of image T 3, and then the image T 2 and T 3 are employed to train the weights to get center pixels of image T 4. Consequently, the weights for predicting images can be calculated in (17) as follow.

$$ \left\{ \begin{array}{l} P\left( x,y,t-1\right)=\sum\limits_{k=t-T}^{t-2} {\sum\limits_{j=y-J}^{y+J} {\sum\limits_{i=x-I}^{x+I} {W_{i,j,k}^{G} P\left( {i,j,k} \right)}} } \\ P\left( {x,y,t} \right)=\sum\limits_{k=t-T+1}^{t-1} {\sum\limits_{j=y-J}^{y+J} {\sum\limits_{i=x-I}^{x+I} {W_{i,j,k}^{G} P\left( {i,j,k} \right)}} } \end{array} \right., $$
(17)

where 2I+1, 2J+1 and T correspond to the total number of rows, columns and frames respectively, which are included in the predictor set (neighborhood), and \(W_{i,j,k}^{G} \) is the corresponding weight with the superscript G denotes the weight for G thcluster, \(x=\overline {0,H} \), \(y=\overline {0,K} \).

Fig. 2
figure 2

Example of calculating and training the weights

Discrete Fourier Transform (DFT) is the sampled Fourier Transform and therefore does not contain all frequencies forming an image, but only a set of samples which is large enough to fully describe the spatial domain image. The number of frequencies corresponds to the number of pixels in the spatial domain image, i.e. the images in the spatial and Fourier domains are of the same size. For a square image of size H×K, the two-dimensional DFT is given by (18).

$$ P\left( {x,y,t} \right)=\sum\limits_{h=0}^{H-1} {\sum\limits_{k=0}^{K-1} {p\left( {h,k,t} \right)e^{-i2\pi \left( {\frac{hi}{H}+\frac{kj}{K}} \right)}}} , $$
(18)

where \(p\left ({h,k,t} \right )\) is the image in the spatial domain at time t and the exponential term is the basis function corresponding to each point \(P\left ({x,y,t} \right )\) in the Fourier space. The inverse DFT from Fourier space into spatial domain is described in (19).

$$ p\left( {x,y,t} \right)=\frac{1}{HK}\sum\limits_{h=0}^{H-1} {\sum\limits_{k=0}^{K-1} {P\left( {h,k,t} \right)e^{i2\pi \left( {\frac{hi}{H}+\frac{kj}{K}} \right)}}} . $$
(19)

The predicted image could consist of noises and out-of-bound pixels. The noises are removed by using the Adaptive Median Filtering method [1]. This method performs spatial processing to preserve detail and smooth non-impulsive noises. A prime benefit to this adaptive approach to Median Filtering is that the adaptive windows of Adaptive Median Filtering do not erode away edges or other small structures in the image. Additional, some out-of-bound pixels are normalized by (20).

$$ P\left( {x,y,t} \right)=\sum\limits_{k=t-T}^{t-1} {\sum\limits_{j=1}^{C} {\mu_{j}^{(k)} \left( {2-\xi_{j}^{(k)}} \right)V_{j}^{(k)}} } $$
(20)

Note that the components of this equation are calculated from the PFC model.

3.2.2 Remarks

Firstly, the PFC-STAR algorithm uses FC-PFS instead of other fuzzy clustering methods so that pixels are partitioned into clusters with high clustering quality which are useful for predicting process. Secondly, the proposed algorithm includes two mixed processes of determining and training the weights for forecasting images and then it can produce more accurate predicted images. The weights can be adapted with training images that make them more fitted with next ones.

However, PFC-STAR still has some limitations. Firstly, the use of STAR technique may lead to over-fitted outputs in the sense that the algorithm may be good with this dataset but bad for others. Secondly, PFC-STAR utilizes DFT transformation for preprocessing data and DFT inversion for generating outputs. Those procedures are time-consuming. Finally, STAR technique is employed twice to solve the equations of determining and training the weights, and this takes the algorithm more time to run.

3.3 PFC-PFR

The proposed algorithm is described in Fig. 3. The input image sequences of the PFC-PFR algorithm are firstly preprocessed by calculating the different pixel sequent matrices. These matrices are then processed by AFC-PFS algorithm to partition to an appropriate number of clusters in order to generate picture fuzzy rules. The parameters for defuzzified function in this method are trained with particle swarm optimization algorithm [5]. The picture fuzzy rules are then defuzzified by appropriate parameters of defuzzified function to get better predicted images.

Fig. 3
figure 3

PFC-PFR schema

Picture fuzzy rule [20] was developed based on fuzzy rule - an IF-THEN rule involving linguistic terms proposed by [24]. The triangular picture fuzzy number (TPFN) for picture fuzzy rule is described by five real numbers \(\left ({{a}^{\prime },a,b,c,{c}^{\prime }} \right )\) with \(\left ({{a}^{\prime }\le a\le b\le c\le {c}^{\prime }} \right )\) and two triangular functions shown in (2122) and Fig. 4 as follows.

$$ u=\left\{ {\begin{array}{l} \frac{x-a}{b-a},\,\,\mathit{for}\,a\le x\le b \\ \frac{c-x}{c-b},\,\,\mathit{for}\,b\le x\le c \\ 0,\,\,\mathit{otherwise} \\ \end{array}} \right., $$
(21)
$$ \eta +\xi =\left\{ {\begin{array}{l} \frac{b-x}{b-{a}^{\prime}},\,\,\mathit{for}\,{a}^{\prime}\le x\le b \\ \frac{x-b}{{c}^{\prime}-b},\,\,\mathit{for}\,b\le x\le {c}^{\prime} \\ 1,\,\,\mathit{otherwise} \\ \end{array}} \right.. $$
(22)
Fig. 4
figure 4

A triangular picture fuzzy number A

Integrate (2122) with (16) and denote L = η + ξ, the values of the neutral membership and the refusal degree are calculated in (2324).

$$ \eta =\left( {1-\left( {1-u-L} \right)^{\alpha} } \right)^{\frac{1}{\alpha} }-u, $$
(23)
$$ \xi =1-\left( {u+\eta} \right)-\left( {1-\left( {u+\eta} \right)^{\alpha} } \right)^{\frac{1}{\alpha} }. $$
(24)

D E F(A) is the defuzzified value of TPFN A (Fig. 6) and is calculated in (25).

$$ DEF(A )=\frac{{a}^{\prime}+2a+3b+2c+{c}^{\prime}}{9}. $$
(25)

The closest fuzzy rules with respect to the input observation are utilized to produce an interpolated conclusion for sparse fuzzy rule-based systems. The following picture fuzzy rules interpolation scheme illustrates that:

figure c

where Rule j is the j th fuzzy rule in the sparse fuzzy rule base, x k denotes the k th antecedent variable, ydenotes the consequence variable, A k,j denotes the k th antecedent fuzzy set of Rule j, B j denotes the consequence fuzzy set of Rulej, \(A_{k}^{\ast } \) denotes the k th observation fuzzy set for the k th antecedent variable x k , B denotes the interpolated consequence fuzzy set, d is the number of variables appearing in the antecedents of fuzzy rules, q is the number of fuzzy rules, k=1,..,d, and j=1,..,q.

Suppose that we have a dataset with d input time series \(\left \{ {T_{1} \left (t \right ),T_{2} \left (t \right ),...,T_{d} \left (t \right )} \right \}\), and one output time series \(M\left (t \right )\), t=0,..,N. The proposed PFC-PFR method can be shortly described in Fig. 5.

Fig. 5
figure 5

Steps in PFC-PFR algorithm

Step 1: Each element in different matrix is calculated by (26) based on the variation rates \(R_{k} \left (i \right )\), i=1,..,N of the k th input time series \(T_{k} \left (i \right )\) at time i, where k=1,..,d.

$$ R_{k} \left( i \right)=\frac{T_{k} \left( i \right)-T_{k} \left( {i-1} \right)}{T_{k} \left( {i-1} \right)}\times 100~\% . $$
(26)

The variation rates \(\left \{ {R_{1} \left (i \right ),R_{2} \left (i \right ),...,R_{d} \left (i \right )} \right \}\) of the input time series \(\left \{ {T_{1} \left (t \right ),T_{2} \left (t \right ),...,T_{d} \left (t \right )} \right \}\), t=0,..,N at time i are determined based on (26). N training samples \(\left \{ {X_{1} ,X_{2} ,...,X_{N}} \right \}\), where X i is represented by \(\left \{ {R_{1} \left (i \right ),R_{2} \left (i \right ),...,R_{d} \left (i \right ),R_{0} \left (i \right )} \right \}\), i=1,..,N are constructed. Denote \(X_{i} =\left \{ {I_{i}^{(1)} ,I_{i}^{(2)},...,I_{i}^{(d)} ,O_{i}} \right \}=\left \{ {R_{1} \left (i \right ),R_{2} \left (i \right ),...,R_{d} \left (i \right ),R_{0} \left (i \right )} \right \}\), where \(I_{i}^{(k)} \) (O i ) is the k th input (output) of X i , k=1,..,d.

Step 2: The proposed AFC-PFS algorithm presented above is used to partition the training sample into an appropriate number of clusters (C) \(\left \{ {P_{1} ,P_{2} ,...,P_{C}} \right \}\). The center V j of cluster P j , the positive degree u i j , the neutral degree η i j and the refusal degree ξ i j of X i are calculated by (1316), j=1,..,C, i=1,..,N.

Step 3: The picture fuzzy rules using TPFN are constructed based on the clusters \(\left \{ {P_{1} ,P_{2} ,...,P_{C}} \right \}\), where rule j corresponds to P j , shown as follows.

Rule j : If x 1 = A 1,j and x 2 = A 2,j and …and x k = A d,j Then y = B j

where Rule j is the fuzzy rule corresponding to the cluster P j , x k is the k th antecedent variable, A k,j is the k th antecedent fuzzy set of Rule j, y is the consequence variable, B j is the k th consequence fuzzy set of Rule j, j=1,..,C, k=1,..,d, and the real numbers \(\left ({{a}^{\prime },a,b,c,{c}^{\prime }} \right )\) of TPFN A k,j are calculated in (2731) with \(U_{ij} =\frac {u_{ij} +\eta _{ij}} {\left ({1+\xi _{ij}} \right )}\).

Besides the positive degree, the neutral and refusal degrees also play an important role in determining the appropriate boundary points represented for the rules. The best value of U i j indicates the large range of possibility \(\left ({u_{ij} +\eta _{ij}} \right )\) and the smaller one of refusal degree \(\left ({1+\xi _{ij}} \right )\). This means that a “good” point not only has high positive and neutral degree but also does not obtain small refusal degree. Therefore, FC-PFS gives more information and orients PFR to choose the more appropriate points to make better rules than other clustering algorithms, e.g. FCM.

$$ {a}^{\prime}_{k,j} ={\min}_{i=1,2,...,n} I_{i}^{(k)} , $$
(27)
$$ {c}^{\prime}_{k,j} ={\max}_{i=1,2,...,n} I_{i}^{(k)} , $$
(28)
$$ b_{k,j} =I_{t}^{(k)} ,\text{ where }U_{j,t} =\max\limits_{1\le i\le n} \left( {U_{i,t}} \right), $$
(29)
$$ a_{k,j} =\frac{\sum\nolimits_{i=1,2,...,n\,and\,I_{i}^{(k)} \le b_{k,j}} {U_{i,j}} \times I_{i}^{(k)}} {\sum\nolimits_{i=1,2,...,n\,and\,I_{i}^{(k)} \le b_{k,j}} {U_{i,j}} } , $$
(30)
$$ c_{k,j} =\frac{\sum\nolimits_{i=1,2,...,n\,and\,I_{i}^{(k)} \ge b_{k,j}} {U_{i,j}} \times I_{i}^{(k)}} {\sum\nolimits_{i=1,2,...,n\,and\,I_{i}^{(k)} \ge b_{k,j}} {U_{i,j}} } . $$
(31)

where \(I_{i}^{(k)} \) is the k th input of the training sample X i , j=1,..,C, k=1,..,d. The real numbers \(\left ({{a}^{\prime },a,b,c,{c}^{\prime }} \right )\) of TPFN B j of Rule j are described in (3236).

$$ {a}^{\prime}_{j} ={\min}_{i=1,2,...,n} O_{i} , $$
(32)
$$ {c}^{\prime}_{k,j} ={\max}_{i=1,2,...,n} O_{i} , $$
(33)
$$ b_{j} =O_{t} ,\text{ where }U_{j,t} =\max_{1\le i\le n} \left( {U_{i,j}} \right), $$
(34)
$$ a_{j} =\frac{\sum\nolimits_{i=1,2,...,n\,and\,I_{i}^{(k)} \le b_{j}} {U_{i,j}} \times O_{i}} {\sum\nolimits_{i=1,2,...,n\,and\,I_{i}^{(k)} \le b_{j}} {U_{i,j}} } , $$
(35)
$$ c_{j} =\frac{\sum\nolimits_{i=1,2,...,n\,and\,I_{i}^{(k)} \ge b_{j}} {U_{i,j}} \times O_{i}} {\sum\nolimits_{i=1,2,...,n\,and\,I_{i}^{(k)} \ge b_{j}} {U_{i,j}} } . $$
(36)

where O i is the desired output of X i and j=1,..,C. Based on (2736), TPFN of the fuzzy rules are constructed.

Step 4: If some picture fuzzy rules are activated by the inputs of the i th sample X i that means \(\min _{1\le k\le d} U_{A_{k,j}} \left ({I_{i}^{(k)}} \right )>0\) then calculate the inferred output \(O_{i}^{\ast } \) in (37) and move to Step 6. Otherwise go to Step 5.

$$ O_{i}^{\ast} =\frac{\sum\nolimits_{j=1}^{q} {\min_{1\le k\le d} U_{A_{k,j}} \left( {I_{i}^{(k)}} \right)} \times DEF\left( {B_{j}} \right)}{\sum\nolimits_{j=1}^{q} {\min_{1\le k\le d} U_{A_{k,j}} \left( {I_{i}^{(k)}} \right)}}, $$
(37)

\(U_{A_{k,j}} \left ({I_{i}^{(k)}} \right )\) denotes as the membership value of the input \(I_{i}^{(k)} \) belonging to the triangular picture fuzzy set A k,j , j=1,..,q and k=1,..,d. It is calculated based on the triangular picture fuzzy function in (212324) with q being denoted the number of activated picture fuzzy rules and \(DEF\left ({B_{j}} \right )\) being the defuzzified value of the consequence picture fuzzy set B j of the activated picture fuzzy rule j, j=1,..,q, i=1,..,N.

Step 5: If there is not exist any activated picture fuzzy rule, calculate the weight W j of Rule j with respect to the input observations \(x_{1} ={I_{i}^{1}} \) , \(x_{2} ={I_{i}^{2}} \), ..., \(x_{d} ={I_{i}^{d}} \) by (38) and compute the inferred output \(O_{i}^{\ast } \) by (39). r denotes the input vectors \(\left \{ {I_{i}^{(1)} ,I_{i}^{(2)} ,...,I_{i}^{(d)}} \right \}\), r j denotes the vector of the defuzzified values of the antecedent fuzzy sets of Rule j-\(\left \{ {DEF\left ({A_{1,j}} \right ),DEF\left ({A_{2,j}} \right ),...,DEF\left ({A_{d,j}} \right )} \right \}\). \(\left \| {r^{\ast } -r_{j}} \right \|\) is the Euclidean distance between the vectors r and r j . The constraints of the weights are: 0≤W j ≤1, j=1,..,C and \(\sum \nolimits _{j=1}^{C} {W_{j} =1} \). \(DEF\left ({B_{j}} \right )\) is the defuzzified value of consequence picture fuzzy sets B j .

$$ W_{j} =\frac{1}{\sum\nolimits_{h=1}^{C} {\left( {\frac{\left\| {r^{\ast} -r_{j}} \right\|}{\left\| {r^{\ast} -r_{h}} \right\|}} \right)^{2}}} $$
(38)
$$ O_{i}^{\ast} =\sum\limits_{j=1}^{C} {W_{j} \times DEF\left( {B_{j}} \right)} , $$
(39)

3.3.1 Training method

Step 6: In order to get the better image prediction, PSO algorithm is employed to determine the appropriate parameters for defuzzified function. As in (25), the coefficients for the five real numbers are chosen unequally. We rewrite this equation by adding parameters (z 1,z 2, z 3, z 4,z 5) to find the most appropriate ones in (40).

$$ DEF\left( A \right)=\frac{z_{1}{a}^{\prime}+z_{2}a+z_{3}b+z_{4}c+z_{5}{c}^{\prime}}{z_{1}+z_{2}+z_{3}+z_{4}+z_{5}} $$
(40)

The training defuzzified parameter process employs the two last different matrices \(\left ({m-1} \right )^{\text {th}}\) and \(\left ({m-2} \right )^{\text {th}}\) with roles as testing sample and input sample (X) respectively. We use PSO algorithm [5] which is representation of the movement of organisms in a bird flock or fish school to optimize these parameters (Table 3).

Table 3 Training parameters by PSO

Suppose that there are popsize particles, each of them is encoded with five parameters \(\left ({z_{1} ,z_{2} ,z_{3} ,z_{4} ,z_{5}} \right )\) corresponding to the weight for calculating defuzzified value for TpPFN as a solution. If particle i archives better solutions than the previous ones, it will record them in the local optimal solutions P b e s t- (\(z_{Pbest_{j}^{(i)}} ,j=\overline {1,5})\). Denote a new \(\delta _{j}^{(i)}\) is the velocity for changing of parameter z j of particle i, j=1,..,6. The optimizing process will continue over all particles until a number of iterations are reached. The final solutions within the most suitable of the five parameters are gathering from all particles through the best values of particles (P b e s t i ) and the swarm (G b e s t). G b e s t includes \(z_{Gbest_{j}} \)(the parameter for defuzzified value that make the rules have best accuracy) and G b e s t value, the best quality value that all particles achieve – fitness value. The fitness function is calculated in (41) as the difference between the generated different matrix from G b e s t parameters and the (m−1)th different pixel matrix.

$$ diff=\sum\limits_{i=1}^{N} {\left| {pix_{i}^{(n-1)} -pix_{i}^{(new)}} \right|} , $$
(41)

where \(pix_{i}^{(n-1)}\) is the i thpixel value of the (m−1)th different pixel matrices; \(pix_{i}^{(new)}\) is the i thpixel value of the new different pixel matrix generated from G b e s t parameters. Each particle i is updated by (4243) as below.

$$ \delta_{j}^{(i)} =\delta_{j}^{(i)} +c_{1} \left( {z_{Pbest_{j}^{(i)}} -z_{j}^{(i)}} \right)+c_{2} \left( {z_{Gbest_{j}} -z_{j}^{(i)}} \right), $$
(42)
$$ z_{j}^{(i)} =z_{j}^{(i)} +\delta_{j}^{(i)} , $$
(43)

where c 1,c 2≥0 are PSO’s parameters. Generally, c 1,c 2 often are set to be 1. Details of this method are described as follow.

Step 7: Finally, we calculate the forecast value \(M_{Forecasted} \left (i \right )\) at time i based on the predicted variation rate \(O_{i}^{\ast } \), where \(M\left ({i-1} \right )\) is the actual value at time i−1 in (44).

$$ M_{Forecasted} \left( i \right)=M\left( {i-1} \right)\times \left( {1+O_{i}^{\ast} } \right). $$
(44)

3.3.2 Remarks

PFC-PFR method has some advantages. Firstly, PFC-PFR uses FC-PFS to partition different pixels in images that help the making rules process work more exactly. Secondly, the use of interpolative picture fuzzy rules can help the algorithm avoid being over-fitted or inaccurate. Finally, the proposed method employs the training defuzzified parameter process with PSO algorithm to improve the predicted accuracy of the output images. This method can result in more accurate predicted images than those of STAR technique because STAR only employs autoregressive method which affects more than one set of parameters to the output. However, the proposed method, like PFC-STAR, is of high computational time because of training process with PSO algorithm.

4 Evaluation

The data inputs for weather nowcasting are sequent satellite images split from [11] in the same location with interval time. The image collection includes three sets of images: Malaysian (Data 1), Luzon – Philippines (Data 2) and Jakarta – Indonesia (Data 3). Each set contains seven images consecutively from 7.30 am to 13.30 pm on 28/11/2014. All images have the same size (100x100 pixels). These images are shown from Figs. 1214. Each set of images is divided into the training and the testing subsets using the Hold-out cross validation method where the last 3 images denoted as predicted images 1–3 are assumed to be predicted. In the experiments, we have implemented the following algorithms:

  • The proposed PFC-STAR and PFC-PFR methods.

  • PFC-STAR* and PFC-PFR*: PFC-STAR and PFC-PFR methods without training.

  • FCM-STAR method of [18].

  • FCM-PFR: Combining Fuzzy C-Means (FCM) and Picture Fuzzy Rule (PFR). This algorithm is presented in a similar hybrid behavior with the proposed methods in order to validate whether or not the hybridization in the new algorithms is better than other hybrid schemes. This could explain why we choose the standalone algorithms to make the hybridization.

  • Fuzzy reasoning method (FIR) of [14].

  • ANN of [12].

The number of clusters used in these algorithms is 4 as recommended in [18]. The experimental results are taken as the average values after 50 runs. The accuracy of prediction is measured by Root Mean Squared Error (RMSE).

Table 4 measures RMSE values of the algorithms where bold values imply the best records for a given data and predicted image. For instance, with Data 1 and Predicted image 1, RMSE values of PFC-PFR*, FCM-PFR, PFC-STAR*, FCM-STAR, FIR, ANN, PFC-PFR and PFC-STAR are 4.303, 4.371, 6.893, 8.661, 4.789, 5.001, 4.13 and 4.428, respectively. It is obvious that RMSE value of PFC-PFR is the smallest among all so that this value is marked as bold in the table. Analogously, we perform the matching with other data and predicted images and get the results in Table 4. There are some remarks as follows:

Table 4 RMSE values of the algorithms where bold values mean the best records for a given data and predicted image

Firstly, according to the number of bold values in the table, it can be recognized that two proposed methods (PFC-PFR and PFC-STAR) have the remarkable advantages over others including the new algorithms without training methods (PFC-PFR*, PFC-STAR*), a hybrid algorithm of an existing clustering algorithm and the proposed component (FCM-PFR) and three relevant algorithms (FCM-STAR, FIR and ANN). PFC-PFR is slightly better than PFC-STAR with the number of bolds values being 4 compared with 3 of PFC-STAR.

Secondly, the proposed methods do not give the best results for the first predicted image. The average values of PFC-PFR and PFC-STAR by all data for predicted image 1 are 5.124 and 5.278, respectively while those of PFC-PFR*, PFC-STAR*, FCM-PFR, FCM-STAR, FIR and ANN are 5.785, 8.382, 6.139, 10.92, 4.808, 7.228, respectively. The best value in this case is of FIR. Nonetheless, PFC-PFR and PFC-STAR also have small RMSE values and are better than the remained algorithms.

Thirdly, the proposed methods are efficient for the second and third predicted images. In the predicted image 2, the average values of the proposed methods (PFC-PFR and PFC-STAR), variants without training (PFC-PFR* and PFC-STAR*), hybrid FCM-PFR and three relevant algorithms (FCM-STAR, FIR and ANN) are (8.226, 8.41), (9.474, 9.494), 9.661, and (11.29, 26.3, 10.58), respectively. PFC-PFR is the best method in this case. Analogously, in the predicted image 3, the average values of the proposed methods (PFC-PFR and PFC-STAR), variants without training (PFC-PFR* and PFC-STAR*), hybrid FCM-PFR and the relevant algorithms (FCM-STAR, FIR and ANN) are (9.285, 9.266), (11.03, 11.12), 10.98, and (12.04, 19.38, 13.769), respectively. PFC-STAR is the best method in this case. It has been shown that the proposed methods are capable to maintain high accuracy of predicted images in further forecast intervals. This is significant because the larger the forecast interval of weather nowcasting is, the worse the performance and accuracy of outputs an algorithm would be. The changes of RMSE values between several predicted images of the proposed method are not much in comparison with those of other algorithms.

Fourthly, Figs. 67, and 8 illustrate the total RMSE values of the algorithms by all data in predicted images 1, 2 and 3, respectively. The total value is computed by the sum of RMSE values of all data for a predicted image. The figures also affirm that the proposed methods are better than the relevant ones- (FCM-STAR, FIR and ANN). Moreover, the training methods in PFC-PFR and PFC-STAR are quite important because they significantly reduce RMSE values as illustrated by the comparison between the proposed methods (PFC-PFR and PFC-STAR), variants without training (PFC-PFR* and PFC-STAR*). The combination in the proposed methods is also better than the combination of a well-known clustering algorithm – FCM and our picture fuzzy rule (PFR) method. This shows the role of picture fuzzy clustering (PFC) to enhance the accuracy of forecast.

Fig. 6
figure 6

Total RMSE values of the algorithms by all data in predicted image 1

Fig. 7
figure 7

Total RMSE values of the algorithms by all data in predicted image 2

Fig. 8
figure 8

Total RMSE values of the algorithms by all data in predicted image 3

Table 5 demonstrates the comparison of RMSE values between the algorithms by marking the bold values in Table 4 as 1 and calculating how many times other values in the same data and the predicted image are larger than the bold values. The times are written down in Table 5 to clearly show ratios between the algorithms. Again, this table confirms our remarks above.

Table 5 Comparison of RMSE of the algorithms

Table 6 represents the standard (Std.) value for RMSE values of the algorithms in Table 4. This shows the variation of RMSE values of all algorithms.

Table 6 The Std. values for RMSE of the algorithms

In Figs. 910, and 11, we illustrate RMSE values with error bars of the PFC-PFR algorithm by different numbers of clusters on Data 1, 2 and 3, respectively. This shows the stability of the algorithm in different cases of parameters.

Fig. 9
figure 9

RMSE of PFC-PFR algorithm with different clusters of Data 1

Fig. 10
figure 10

RMSE of PFC-PFR algorithm with different clusters of Data 2

Fig. 11
figure 11

RMSE of PFC-PFR algorithm with different clusters of Data 3

Lastly, Figs. 1213, and 14 show the illustrative results of all data where 4 first images (7h30, 8h30, 9h30 and 10h30) are used to predict three last images (11h30, 12h30 and 13h30).

Fig. 12
figure 12

Forecast results of Data1 by PFC-PFR (A) and PFC-STAR (B)

Fig. 13
figure 13

Forecast results of Data2 by PFC-PFR (A) and PFC-STAR (B)

Fig. 14
figure 14

Forecast results of Data3 by PFC-PFR (A) and PFC-STAR (B)

5 Conclusions

In this paper, we proposed two novel hybrid forecast methods based on picture fuzzy clustering for weather nowcasting. The first method named as PFC-STAR uses a combination of picture fuzzy clustering and spatiotemporal regression. The second one named as PFC-PFR integrates picture fuzzy clustering with picture fuzzy rule. Both of the proposed algorithms employed the training processes to enhance the predicted accuracy. Experimental evaluation on satellite image sequences of Southeast Asia showed that the proposed methods are better than the relevant ones. The main contributions of this paper including PFC-STAR and PFC-PFR algorithms enrich knowledge of deploying hybrid forecast methods on picture fuzzy sets for interdisciplinary problems. Further research directions of this paper could lean to the following ways: i) investigate a distributed version of the algorithms; ii) consider parallel versions of the algorithms to reduce computational costs; iii) apply the algorithms to other forecast problems.