Keywords

1 Introduction

The real-time simulation of pollutant dispersion or the accidental release of radioactive substances in the atmosphere is a challenging aspect of many national services and agencies. In particular, releases of harmful radionuclides (e.g. Fukushima, Chernobyl) could be simulated and monitored [1, 10, 13, 20]. In this work we consider atmospheric compounds from the ENSEMBLE system [6,7,8]. ENSEMBLE is a web-based system aiming at assisting the analysis of multi-model data provided by many national meteorological services and environmental protection agencies worldwide. It is worth noting that in the case of multimodel ensemble for atmospheric dispersions, models are certainly more or less dependent from several intrinsic mechanisms (e.g., they often share features, initial/boundary data, numerical methods, parameterizations and emissions). For this reason, results obtained by ensemble analysis may lead to erroneous interpretations and in a multimodel approach the effective number of models may be lower than the total number, since models could be linearly (or nonlinearly) dependent on each other.

To solve this problem, a number of techniques has been proposed in literature. In [15, 17, 18] the authors present a statistical analysis (i.e., Bayesian Model Averaging) for combining predictive distributions from different sources of a multi-model ensemble, and in [16] some basic properties of multi-model ensemble systems are investigated. Moreover, cluster-based approaches have also been proposed [2,3,4]. In this paper, we introduce a methodology that improves the forecasting by considering observations that may become available during the course of the event. The methodology is based on fuzzy similarity relations that allow to combine multiple hierarchical agglomerations, each for a different forecasting leading time. From the overall temporal agglomeration obtained by a consensus matrix it is possible to select a subset of models and discard redundant information.

The remainder of the paper is organized as follows. In Sect. 2 the proposed methodology is detailed. In particular, some fundamental concepts on t-norms and fuzzy similarity relations (Sect. 2.2) are given and the agglomerative based approach is described in Sect. 2.3. Finally, in Sect. 3 some experimental results, obtained by applying this methodology on an ensemble of prediction models, are described. Conclusions and future remarks are given in Sect. 4.

2 Fuzzy Similarity and Agglomerative Clustering

In general, when one deals with clustering tasks, fuzzy logic permits to obtain soft clustering, instead of hard (crisp or non-fuzzy) clustering of data. Hierarchical clustering is a methodology for cluster analysis which seeks to build a hierarchy of clusters and it can be agglomerative or divisive. In this work we consider an agglomerative clustering approach. One of the main aspects of this methodology is the use of a measure of dissimilarity between sets of observations, by using an appropriate metric. On the other hand, a dendrogram is a tree diagram used to illustrate the results produced by hierarchical clustering. In the following, we show that a dendrogram can be associated with a fuzzy equivalence relation based on Łukasiewicz valued fuzzy similarities. Successively, a consensus matrix, that is the representative information of all dendrograms, is obtained by combining multiple temporal hierarchical agglomerations of dispersion models. The main steps of the proposed approach are

  1. 1.

    Membership functions characterization;

  2. 2.

    Fuzzy Similarity Matrix calculation (or dendrogram) for all the models at a fixed time;

  3. 3.

    Consensus matrix construction for temporal hierarchical agglomerations.

2.1 Membership Functions

The effective of fuzzy logic is the transformation of linguistic variables in fuzzy sets. Fuzzification is the process of changing a real scalar value into a fuzzy value and it is achieved by using different types of membership functions. The membership function represents the degree of truth to which a given input belongs to a fuzzy set. In the proposed approach, fuzzy sets are described by the following membership functions [21]

$$\begin{aligned} \mu (\mathbf{x}_i) = \frac{\mathbf{x}_i - \min (\mathbf{x}_i)}{\max (\mathbf{x}_i) - \min (\mathbf{x}_i)}, \end{aligned}$$
(1)

where \(\mathbf{x}_i = [x_1^i, x_2^i, \ldots , x_L^i]\) is the i-th observation vector of the L considered models.

2.2 Fuzzy Similarity

We observe that fuzzy sets can be combined via the conjunction and disjunction operations and continuous triangle norms or co-norms are adopted, respectively. A triangular norm (t-norm for short), is a binary operation t on the unit interval [0, 1]. In particular, it is a function \(t:[0,1]^2 \rightarrow [0,1]\), such that it satisfies the following four axioms for all \(x,y,z \in [0,1]\) [11]

$$\begin{aligned} \begin{array}{lccrr} t(x,y) &{}=&{} t(y,x) &{} &{}{ ({ commutativity})}\\ \\ t(x,t(y,z)) &{}=&{} t(t(x,y),z) &{} &{} {({ associativity})} \\ \\ t(x,y) &{} \le &{} t(x,z) &{} \text {whenever } y \le z &{}{({ monotonicity})}\\ \\ t(x,1) &{}=&{} x &{} &{}{({\textit{boundary condition}})} \end{array} \end{aligned}$$
(2)

In practical situations the following four basic t-norms are considered

(3)

However, in these years, several parametric and non-parametric t-norms have been introduced [11] and generalized versions have also been studied [5]. In the following, we focus on the properties of the Łukasiewicz t-norm (\(t_\mathbf{L}\)). One main operator adopted in fuzzy-based systems (e.g., fuzzy inference systems) is the residuum \(\rightarrow _t\)

$$\begin{aligned} x \rightarrow _t y = \bigvee \left\{ z | t(z,x) \le y \right\} \end{aligned}$$
(4)

where \(\bigvee \) is the union operator and, for the left-continuous basic t-norm \(t_\mathbf{L}\), is given by

(5)

Moreover, we also note that letting p be a fixed natural number in a generalized Łukasiewicz structure, we obtain

$$\begin{aligned} \begin{array}{lcc} t_\mathbf{L}(x,y) &{}=&{} \root p \of {\max (x^p + y^p - 1,0)} \\ x \rightarrow _\mathbf{L} y &{}=&{} \min (\root p \of {1 - x^p + y^p},1) \\ \end{array} \end{aligned}$$
(6)

Another fundamental operation on a residuated lattice is the bi-residuum that will be used for our construction of the fuzzy similarities. It is defined as

$$\begin{aligned} x \leftrightarrow _t y = (x \rightarrow _t y) \wedge (y \rightarrow _t x), \end{aligned}$$
(7)

where \(\wedge \) is the meet. In the case of the left-continuous basic t-norm \(t_\mathbf{L}\), we obtain the following bi-residuum

$$\begin{aligned} x \leftrightarrow _\mathbf{L} y = 1 - \max (x,y) + \min (x,y) \end{aligned}$$
(8)

On the other hand, a binary fuzzy relation R is defined on \(U \times V\) as a fuzzy set on \(U \times V\) (\(R \subseteq U \times V\)). A similarity matrix is a fuzzy relation \(S \subseteq U \times U\) such that, for each \(u,v,w \in U\), the following properties are satisfied

$$\begin{aligned} \begin{array}{lccr} S \langle u,u \rangle &{}=&{} 1 &{} {({\textit{everthing is similar to itself}})}\\ \\ S \langle u,v \rangle &{}=&{} S \langle v,u \rangle &{} {({ symmetric})} \\ \\ t(S \langle u,v \rangle , S \langle v,w \rangle )&{} \le &{} S \langle u,w \rangle &{} {({\textit{weakly transitive}})}\\ \end{array} \end{aligned}$$
(9)

It is essential to observe that from fuzzy sets with membership functions \(\mu : X \rightarrow [0,1]\), a fuzzy similarity matrix S can be generated as

$$\begin{aligned} S \langle a,b \rangle = \mu (a) \leftrightarrow _t \mu (b) \end{aligned}$$
(10)

for all \(a,b \in X\).

Moreover, to build the fuzzy similarity matrix a main result is considered [19, 21]

Proposition 1

Consider n Łukasiewicz valued fuzzy similarities \(S_i\), \(i=1,\ldots ,n\) on a set X. Then

$$\begin{aligned} S \langle x,y \rangle = \frac{1}{n} \sum _{i=1}^{n} S_i\langle x,y \rangle \end{aligned}$$
(11)

is a Łukasiewicz valued fuzzy similarity on X.

In this work, we consider for Eq. 11

$$\begin{aligned} S_i\langle x,y \rangle = x \leftrightarrow _\mathbf{L} y. \end{aligned}$$
(12)

Now, let \(t_\mathbf{L}\) be the Łukasiewicz product, it is worth noting that S is a fuzzy equivalence relation on X with respect to \(t_\mathbf{L}\) iif \(1 - S\) is a pseudo-metric on X.

2.3 Dendrogram and Consensus Matrix

We also have to observe that if a similarity relation is min-transitive (\(t=\min \) in (9)) then it is a fuzzy-equivalence relation that can be graphically described by a dendrogram [12]. In other words, transitivity implies the existence of the dendrogram.

The min-transitive closure \(R^T\) of R can be obtained as follows [14]

$$\begin{aligned} R^T = \bigcup _{i=1}^{n-1} R^i \end{aligned}$$
(13)

where \(R^{i+1}\) is defined as

$$\begin{aligned} R^{i+1} = R^{i} \circ R, \end{aligned}$$
(14)

and n is the dimension of a relation matrix.

Considering two fuzzy relations R and S, we observe that the composition \(R \circ S\) is a fuzzy relation defined by

$$\begin{aligned} R \circ S \langle x,y \rangle = \text {Sup}_{z \in X} \{R \langle x,z \rangle \odot S \langle z,y \rangle \} \end{aligned}$$
(15)

\(\forall x,y \in X\), where \(\odot \) stands for a t-norm (e.g., min operator) [14]. Then we can conclude that the min-transitive closure \(R^T\) of a matrix R can be easily computed and the overall process is described in Algorithm 1.

figure a

We also observe that to accomplish an agglomerative clustering a dissimilarity relation is needed. Here we considered the following result [14].

Lemma 1

Letting R be a similarity relation with the elements \(R \langle x,y \rangle \in [0,1]\) and letting D be a dissimilarity relation, which is obtained from R by

$$\begin{aligned} D(x,y) = 1 - R \langle x,y \rangle \end{aligned}$$
(16)

then D is ultrametric iif R is min-transitive.

In other words, we have a one-to-one correspondence between min-transitive similarity matrices and dendrogram and between ultrametric dissimilarity matrices and dendrograms.

Finally, after the dendrograms have been obtained at each time, a consensus matrix, that is the representative information of all temporal dendrograms, is obtained by combining the transitive closures by using Eq. 15 (i.e., max-min) [14]. The overall approach is described in Algorithm 2.

figure b

3 Experimental Results

This Section aims to illustrate some results obtained by the proposed approach. In particular, we consider the multi-model ensemble simulated distributions of the ETEX-1 experiment [9]. The ETEX-1 experiment concerned the release of pseudo-radioactive material on 23 October 1994 at 16:00 UTC from Monterfil, southeast of Rennes (France). Briefly, a steady westerly flow of unstable air masses was present over central Europe. Such conditions persisted for the 90 h that followed the release with frequent precipitation events over the advection area and a slow movement toward the North Sea region. Just for an example, in Fig. 1 we show the integrated concentration after 78 h from release. In the experiment, the main objective of the several independent groups worldwide (25 members) was to forecast the observations with different atmospheric dispersion models. Moreover, each simulation was based on weather fields generated by (most of the time) different Global Circulation Models (GCM) and all the simulations relate to the same release conditions. For further information on the involved groups and the adopted models the reader can refer to [8] and [9].

Fig. 1.
figure 1

ETEX-1 temporal integrated observations after 78 h.

Fig. 2.
figure 2

Representative dendrogram obtained by consensus matrix: x-axis are related to the models and those on the y-axis are related to the model data similarities.

Now we apply the proposed approach to analyze data of the ETEX-1 experiment. The preliminary step is the fuzzification. In particular, Eq. 1 is applied on the concentrations estimated by models at each time level. Successively, for each concentration at different times a dendrogram (similarity matrix) is produced (Eq. 11 with Łukasiwicz norm and \(p=1\)). Finally, the consensus matrix that described the representative dendrogram is estimated by using the approach described in Algorithm 2. In Fig. 2 a particular of the representative dendrogram obtained after 78 h is visualized. We observe that different clusters of similar models are obtained.

To highlight the clustering outcomes, in Fig. 3, we show some representative distributions of the clustered models. For example, as confirmed by dendrogram, the distributions of the models 22 and 24 are very close. See Figs. 3a and b for a comparison. Instead, the model 21 has a very diffusive distribution, as highlighted by the dendrogram. This distribution is visualized in Fig. 3c. At this point, we can identify models that have similar behavior by analyzing the different clusters. In order to identify the group of models that more appropriately describe observations, we compare the distributions of the models by using a Kullback Leibler divergence.

Fig. 3.
figure 3

Model distributions: (a) model 22; (b) model 24; model 21.

Fig. 4.
figure 4

KL divergence varying the clustering number.

The Kullback Leibler (KL) divergence between two discrete n-dimensional probability density functions \({\mathbf p} = [p_i \ldots p_n]\) and \({\mathbf q} = [q, \ldots q_n]\) is defined as

$$\begin{aligned} KL({\mathbf p}||{\mathbf q}) = \sum _{i=1}^n p_i \log \left( \frac{p_i}{q_i}\right) . \end{aligned}$$
(17)

This is known as the relative entropy. It satisfies the Gibbs’ inequality

$$\begin{aligned} KL({\mathbf p}||{\mathbf q}) \ge 0 \end{aligned}$$
(18)

where equality holds only if \({\mathbf p} \equiv {\mathbf q}\). In general \(KL({\mathbf p}||{\mathbf q}) \ne KL({\mathbf q}||{\mathbf p})\). In our experiments we use the symmetric version [2] that can be defined as

$$\begin{aligned} KL = \frac{KL({\mathbf p}||{\mathbf q}) + KL({\mathbf q}||{\mathbf p})}{2}. \end{aligned}$$
(19)

First of all, we compute the KL divergence between each model and the median value of the overall cluster. Successively, for each cluster, the model with the minimum KL is selected. The median model of these considered models is compared with the real observations by KL. In Fig. 4 we show the KL obtained by varying the number of clusters.

We observe that varying the number of clusters this procedure permits to select the models that have the best approximation of the real observation (see [17] and [4] for more details). After our analysis, we conclude that the best approximation is obtained by using 6 clusters. Moreover, we stress that a lower KL does not necessarily correspond to the use of a large number of models. This suggest an approach for systematic reduction of ensemble data complexity and the use of the consensus matrix permits to obtain a more robust and realistic temporal analysis.

4 Conclusions

In this work we focused on models comparison in a multi-model air quality ensemble system. A methodology based on temporal hierarchical agglomeration is introduced for real-time simulation of pollutant dispersion or the accidental release of radioactive nuclides in the atmosphere. The proposed methodology is able to combine multiple temporal hierarchical agglomerations of dispersion models and it is based on fuzzy similarity relations combined by a transitive consensus matrix. The methodology is adopted for individuating models that characterize the predicted atmospheric pollutants from the ETEX-1 experiment. The results show that this methodology is able to discard redundant temporal information, reducing the data complexity. In the next future, further experimentations will be devoted to real pollutant dispersions (e.g., Fukushima) and different similarity relations also using ordinal sums.