Self-organizing maps with adaptive distances for multiple dissimilarity matrices

Palomino Mariño, Laura Maria; Tenorio de Carvalho, Francisco de Assis

doi:10.1007/s10994-024-06607-x

Self-organizing maps with adaptive distances for multiple dissimilarity matrices

Published: 03 September 2024

(2024)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Machine Learning Aims and scope Submit manuscript

Self-organizing maps with adaptive distances for multiple dissimilarity matrices

Download PDF

Laura Maria Palomino Mariño¹^na1 &
Francisco de Assis Tenorio de Carvalho ORCID: orcid.org/0000-0003-1128-745X¹^na1

30 Accesses
1 Altmetric
Explore all metrics

Abstract

There has been an increasing interest in multi-view approaches based on their ability to manage data from several sources. However, regarding unsupervised learning, most multi-view approaches are clustering algorithms suitable for analyzing vector data. Currently, only a relatively few SOM algorithms can manage multi-view dissimilarity data, despite their usefulness. This paper proposes two new families of batch SOM algorithms for multi-view dissimilarity data: multi-medoids SOM and relational SOM, both designed to give a crisp partition and learn the relevance weight for each dissimilarity matrix by optimizing an objective function, aiming to preserve the topological properties of the map data. In both families, the weight represents the relevance of each dissimilarity matrix for the learning task being computed, either locally, for each cluster, or globally, for the whole partition. The proposed algorithms were compared with already in the literature single-view SOM and set-medoids SOM for multi-view dissimilarity data. According to the experiments using 14 datasets for F-measure, NMI, Topographic Error, and Silhouette, the relevance weights of the dissimilarity matrices must be considered. In addition, the multi-medoids and relational SOM performed better than the set-medoids SOM. An application study was also carried out on a dermatology dataset, where the proposed methods have the best performance.

On-Line Relational SOM for Dissimilarity Data

Exploiting second-order dissimilarity representations for hierarchical clustering and visualization

Article 11 May 2022

Multi-view document clustering based on geometrical similarity measurement

Article 22 March 2021

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Multi-view data research has become increasingly important since large amounts of information are constantly being generated from different sources. These data are heterogeneous, and the variables are organically partitioned into groups; however, there are still some potential links between them. Each group of variables is referred to as a specific perspective (view), and the multiple views on a given issue can take different arrangements (Xu et al., 2015). Often, in multi-view data, the objects are represented by several matrices or views (e.g., sensor signal data can be decomposed into time and frequency data, image data can be described by texture and color data, and multimedia segments can be represented by video and audio signal data (Yang & Wang, 2018)). In such data, each view has its specific features for a certain knowledge discovery task. However, different views frequently contain complementary information that must be exploited (Wang et al., 2017).

According to Sun et al. (2019), multi-view learning can improve learning performance effectively on natural single-view data (Sun et al., 2019), thus representing the main reason for single-view-based data representation to be usually incomplete. In this sense, different views might provide complementary information for the learning problem (Wang et al., 2017). Concerning multi-view clustering, there are three combination strategies to perform the learning task (Cleuziou et al., 2009): concatenation strategy, distributed strategy, and centralized strategy. The first strategy consists of concatenating the views into a single one, either directly, by juxtaposing the sets of features, or indirectly, by combining the proximity matrices derived from each view. The second strategy starts by clustering the objects from each view independently and then looking for a solution that represents a consensus among all the groups. Finally, the last strategy uses multiple views simultaneously to mine hidden patterns from the data (Cleuziou et al., 2009).

Throughout learning, the multi-view approaches explicitly use distinct data representations that can either be the original data features or those obtained through computations (Sun et al., 2019). Moreover, each view can be represented by either vectorial or non-vectorial data. The former has received considerably more attention from unsupervised learning approaches, where most machine learning and data analysis methods available are based on a vector model, with each example being represented by a vector of quantitative values (Frigui et al., 2007). Unfortunately, many current datasets do not support this type of representation. There are categorical data, abstract data, and relational data, among others. In the latter, the objects are described through a relationship between data pairs that contain only information on the degrees to which pairs of objects in the dataset are related (Kaufman & Rousseeuw, 1987; Frigui et al., 2007). A way of dealing with these types of data is to consider the objects represented by a matrix of dissimilarities.

In dissimilarity data, each pair of objects is represented by a dissimilarity relationship (Rousseeuw & Kaufman, 2005). Single-view dissimilarity data is represented by a dissimilarity matrix defined as $\textbf{D}=[d(e_k,e_l)] \, (1 \le k,l \le N)$, where $d(e_k,e_l)$ is the dissimilarity between objects $e_k$ and $e_l$ on dissimilarity matrix $\textbf{D}$. Unsupervised dissimilarity data methods introduce approaches to handle such data using the most appropriate dissimilarity function for the problem at hand. These approaches can handle heterogeneous data through different transformations.

Therefore, dissimilarity data are more generic since they can be applied to scenarios where objects cannot be represented by numerical features. They are also more useful when the distance measure is computed according to a suitable algorithm instead of an algebraic expression as usual, or when sets of similar objects cannot be represented adequately by a prototype vector (Frigui et al., 2007).

Several traditional machine learning algorithms have been extended to cope effectively with a variety of multi-view and dissimilarity data problems (Gusmão & Carvalho, 2019; Dantas & Carvalho, 2011; Olteanu & Villa-Vialaneix, 2015). The Kohonen Self-Organizing Map (SOM) (Kohonen, 2001; Badran et al., 2005; Kohonen, 2013; Astudillo & Oommen, 2014; Cottrell et al., 2018) is a powerful tool for dealing with these kinds of challenges. SOM performs clustering and non-linear data projection at the same time, thus providing a strong visualization tool. The SOM has proven effective, especially when considering the faithfulness (precision) of the mapping from a high-dimensional space (Vatanen et al., 2015). In addition, the SOM has proven a reliable approach for a variety of fields, including finance, statistics, bio-medicine, industry, and many others (see Refs. Kaski et al. (1998); Oja et al. (2003); Domínguez-González et al. (2012); Astudillo and Oommen (2014); Kamimura (2019); Douzas et al. (2021) for details).

A SOM consists of neurons (vertices) usually arranged on a regular two or three-dimensional grid (the map). Each neuron is associated with a cluster representative (prototype) of a data subset (a cluster). Both the data and the a priori topology impose the cluster structure. The SOM network can be trained either incrementally or batch-wise. According to Kohonen (2013), the batch variant of the SOM network is most suited for practical applications. However, incremental training is the preferable approach when data are given sequentially. Throughout the map training, each object must select its Best Matching Unity (BMU), i.e., the neuron with the most similar prototype to its description. Therefore, the BMU-associated prototype and the neuron-associated prototypes in the BMU spatial neighborhood are updated to better represent the object similarity to these neurons.

A SOM preserves the data topological properties, which implies that if two objects in the original description space are similar, the related BMU prototypes are also similar and will be associated with adjacent or close vertices on the map. Thus, the data are grouped by clusters such that the most similar prototypes are associated with adjacent vertices, while less similar prototypes are associated with distant vertices on the map (Kohonen, 2013).

There are different prototype-based approaches to determine the representative of a cluster concerning dissimilarity data, but only a few SOM algorithms have been proposed. Median Batch SOM (Kohonen, 2001) was the first extension of the original SOM for single-view dissimilarity data. In this regard, the cluster prototypes were represented by a single medoid in Ref. Kohonen (2001) and later by a set-medoids in Ref. Golli et al. (2005). Furthermore, another method for extending SOM to single-view dissimilarity data was proposed, in which the cluster representative is a “normalized linear combination” of the objects from the whole dataset. Based on this latter approach, batch (Hasenfuss & Hammer, 2007) and on-line (Olteanu et al., 2012) versions of SOM for dissimilarity data are currently available. Finally, Ref. Mariño and Carvalho (2020) introduced a batch SOM for single-view dissimilarity data, with the cluster prototypes represented by vectors of weights. Each component of these vectors quantifies the significance of each object in a given cluster. Despite the growing interest in machine learning, to the best of our knowledge, only Refs. Dantas and Carvalho (2011) and Olteanu and Villa-Vialaneix (2015) have proposed SOM algorithms that can manage multi-view dissimilarity data. Ref. Olteanu and Villa-Vialaneix (2015) introduced an online extension of relational SOM algorithm (Hasenfuss & Hammer, 2007), whereas Refs. Dantas and Carvalho (2011) proposed a batch extension of the median SOM (Golli et al., 2005; Kohonen, 2001) to the case where several dissimilarities matrices are available for describing the dataset.

Paper proposal

Herein, we propose two families of batch SOM algorithms for multi-view dissimilarity data in the framework of the centralized strategy: Batch SOM algorithms for multi-view dissimilarity data with weighted medoids as cluster representatives (MBSOM-MM$\textbf{dd}$) and Batch SOM algorithms for multi-view dissimilarity data with a normalized linear combination of the objects as cluster representatives (MRBSOM).

The new MBSOM-MM$\textbf{dd}$ family extends Ref. Mariño and Carvalho (2020) aiming to manage datasets described by multiple dissimilarity matrices. For a fixed neighborhood, the new method gives the optimal solution for computing the representative (weighted medoids) associated with each neuron, computing optimal adaptive relevance weights on the dissimilarity matrices, and providing the optimal cluster partition.

Furthermore, MRBSOM, as the already in the literature batch SOM methods for relational data (Hasenfuss & Hammer, 2007), keeps the idea that each cluster representative is a normalized linear combination of the objects represented in the description space, but additionally uses different adaptive weights on the dissimilarity matrices aiming to take into account the importance of each dissimilarity matrix on the unsupervised learning task.

In the families of both models, the weights change at each algorithm iteration such that each matrix has a different influence on the training of the map. Each one of the proposed families can compute relevance weights for each dissimilarity matrix, either locally, for each cluster, or globally, for the whole partition (Dantas & Carvalho, 2011).

The main contributions of our paper are the two families of Batch SOM algorithms that can manage multi-view datasets described by several dissimilarity matrices. More precisely, the paper provides:

The respective objective functions that, for a fixed neighborhood, should be optimized to learn the MBSOM-MM$\textbf{dd}$ and MRBSOM model families;
For a fixed neighborhood radius, i) the optimal solution for computing the cluster representatives associated with each neuron in the proposed models; ii) the optimal solution for computing the relevance weights of the dissimilarity matrices on the training of the SOM; and iii) the optimal solution for the partition associated to the neurons of the proposed algorithms;
The time complexity of the proposed models;
A significant evaluation of the proposed methods compared with relevant batch SOM algorithms for multi-view dissimilarity data.

Thus, the proposed algorithms aim to improve MBSOM-CM$\textbf{dd}$ (Dantas & Carvalho, 2011) because the number of objects (medoids) that represent a cluster may be insufficient to describe it. Moreover, MBSOM-CM$\textbf{dd}$ ignores the relevance of the medoids, for instance, when several objects are selected as medoids, these medoids are not necessarily equally important for the cluster. They do not describe it in the same way. Additionally, in MBSOM-CM$\textbf{dd}$, the cardinality of the set of medoids (the representative) is a parameter that must be provided a priori. Finally, the relevance of the specific data source can impact the result of model performance. Nevertheless, the impact of data sources considering the relevance for the cluster locally and for each view globally has not been studied with these approaches of cluster representation regarding SOM algorithms.

The paper is organized as follows: Sect. 2 describes the families of the proposed models MBSOM-CM$\textbf{dd}$ and MRBSOM. In addition, we also provide an in-depth description and formalization of the algorithm MBSOM-CM$\textbf{dd}$ introduced in the work of Ref. Dantas and Carvalho (2011). Moreover, we analyze the time complexity of the proposed algorithms. Section 3 presents the setup of our experiments. Section 4 provides the performance evaluation of the proposed algorithms against already in the literature approaches, showing the results and discussing the main findings obtained. This section also provides further insights into the families of the models through an application concerning the dermatology dataset (Dua & Graff, 2017). Finally, Sect. 5 introduces our final remarks.

2 Batch SOM algorithms for multi-view dissimilarity data

This section presents the batch SOM families for multi-view dissimilarity data MBSOM-CM$\textbf{dd}$, MBSOM-MM$\textbf{dd}$, and MRBSOM. Section 2.1 discusses the cluster representatives and the relevance weights of the dissimilarity matrices and provides the error functions of these batch SOM algorithms. Section 2.2 describes their three main steps (the computation of the cluster representatives, the relevance weights of the dissimilarity matrices, and the update of the clusters) and provides the main algorithm for each batch SOM family. Section 2.3 introduces some notations aiming to simplify the presentation of the methods according to the algorithms used and the relevance weights assigned to each dissimilarity matrix. Finally, Sect. 2.4 introduces the time complexity analysis of the proposed variants of the batch SOM families.

2.1 MBSOM-CM$\textbf{dd}$, MBSOM-MM$\textbf{dd}$ and MRBSOM self-organizing maps

This section provides a detailed presentation of the MBSOM-CM$\textbf{dd}$, MBSOM-MM$\textbf{dd}$, and MRBSOM SOM algorithms.

Let $E = \{e_1,\ldots ,e_N\}$ be a set of objects and let P dissimilarity matrices $\textbf{D}_p=[d_p(e_k,e_l)] \, (1 \le k,l \le N)$ where $d_p(e_k,e_l)$ is the dissimilarity between objects $e_k$ and $e_l$ on dissimilarity matrix $\textbf{D}_p \, (1 \le p \le P)$.

A SOM consists of a low-dimensional (usually two-dimensional) regular grid (map), which contains C nodes (neurons). Each SOM map is associated with a partition in which a neuron indexed by r has associated a cluster $P_r$ and a representative (prototype).

Let $\{1,\ldots ,C\}$ be the cluster index set and let f be the assignment function that maps each object to an index $r=f(e_k) \in \{1,\ldots ,C\}$ of the cluster index set. The partition $\mathcal {P}=\{P_1,\ldots ,P_C\}$ associated with a SOM is defined by the assignment function which gives the index of the cluster of $\mathcal {P}$ to which the object $e_k$ belongs to, i.e., $P_r=\{e_k \in E: f(e_k) = r\}$.

Following (Golli et al., 2005), in MBSOM-CM$\textbf{dd}$ (Dantas & Carvalho, 2011) it is assumed that the representative of each cluster is a set of objects (set-medoids), i.e., the prototype $G_r$ of cluster $P_r$ is a subset of fixed cardinal $1 \le q \ll N$ of the set of objects E: $G_r \in E^{(q)} = \{A \subset E: |A| = q\}$. Besides, $\mathcal {G}=(G_1,\ldots ,G_r, \ldots , G_C)$ is the vector of cluster prototypes.

Moreover, following (Mariño & Carvalho, 2020), in MBSOM-MM$\textbf{dd}$ it is assumed that the representative $\textbf{v}_r = (v_{r1},\ldots ,v_{rN})$ of cluster $P_r$ is a N-dimensional vector of weights whose components measure how the objects are weighted as a medoid regarding the cluster $P_r$. Let $\textbf{V}=(\textbf{v}_1, \ldots , \textbf{v}_C) = (v_{rj}) \, (1 \le r \le C; 1 \le j \le N)$ be the matrix of prototype weights of the objects regarding the clusters.

Regarding MRBSOM, Refs. Cottrell et al. (2018); Hammer and Hasenfuss (2007) pointed out that if the data are described by a dissimilarity matrix where each cell is the squared Euclidean distance, they can be embedded in a pseudo-Euclidean space in such a way that optimum prototypes can be expressed as linear combinations of data points. Therefore, the unknown distances $\Vert \textbf{x}_k - \textbf{v}_r\Vert ^2$, where $\textbf{x}_k=(\textbf{x}_{k(1)}, \ldots , \textbf{x}_{k(P)})$ is the description of the object $e_k$ and $\textbf{v}_r=(\textbf{v}_{r(1)},\ldots ,\textbf{v}_{r(P)})$ is the representative of cluster $P_r$ both in the pseudo-Euclidean space, can be expressed in terms of known values of the squared Euclidean distances of the dissimilarity matrix.

Assuming that $\Vert \textbf{x}_k - \textbf{v}_r\Vert ^2 = \sum _{p=1}^P \Vert \textbf{x}_{k(p)} - \textbf{v}_{r(p)}\Vert ^2$ and that the p-th component of the cluster representative is such that $\textbf{v}_{r(p)} = \sum _{k=1}^N \alpha _{rk} \textbf{x}_{k(p)}$ where $\sum _{k=1}^N \alpha _{rk} = 1$, according to Cottrell et al. (2018); Hasenfuss and Hammer (2007):

$$\begin{aligned} \Vert \textbf{x}_{k(p)} - \textbf{v}_{r(p)}\Vert ^2 = [\textbf{D}_p \varvec{\alpha }_r]_k - \frac{1}{2} \varvec{\alpha }_r^\top \textbf{D}_p \varvec{\alpha }_r \, (1 \le p \le P), \end{aligned}$$

(1)

where $[\textbf{D}_p \varvec{\alpha }_r]_k$ is the k-th component of $[\textbf{D}_p \varvec{\alpha }_r]$ and $\varvec{\alpha }_r=(\alpha _{r1},\ldots ,\alpha _{rN}) \, (1 \le r \le C)$. Since $\textbf{v}_{r(p)}$ is in the implicitly pseudo-Euclidean space, it is the vector $\varvec{\alpha }_r$ that is updated, where the distances between the prototypes and the objects are computed only indirectly through the coefficients $\alpha _{rk}$. According to Ref. Hammer and Hasenfuss (2007), the equation (1) still holds to any given dissimilarity matrix $\textbf{D}_p$.

Therefore, in MRBSOM it is assumed that the representative $\varvec{\alpha }_r = (\alpha _{r1}, \ldots , \alpha _{rN})$ of cluster $P_r$ is a N-dimensional vector of coefficients $\alpha _{rk}$. Let $\mathcal {A}=(\varvec{\alpha }_1,\ldots ,\varvec{\alpha }_C)=(\alpha _{rk})_{\begin{array}{c} 1 \le r \le C\\ 1 \le k \le N \end{array}}$ be the matrix of coefficients $\alpha _{rk}$.

Dissimilarity matrices can have different relevance to the training of the SOM. In most applications, some dissimilarity matrices may be irrelevant, while among those that are relevant, some may be more or less relevant than others.

Therefore, aiming to obtain a significant SOM from all dissimilarity matrices, the MBSOM-CM$\textbf{dd}$, MBSOM-MM$\textbf{dd}$, and MRBSOM SOM algorithms were designed to provide the clusters and their respective prototypes by simultaneously preserving the spatial order of the prototypes on the map, as well as to learn the relevance weight for each dissimilarity matrix by optimizing a suitable error function.

The relevance weights can be assigned to each dissimilarity matrix globally, to all the clusters, according to the $(P \times 1)$ matrix $\textbf{W}=(w_{p}) \, (1 \le p \le P)$, with $w_{p} \in \textrm{R}^+$. They can also be assigned to each dissimilarity locally, to each cluster, according to the $(P \times C)$ matrix $\textbf{W}=(w_{rp}) \, (1 \le p \le P; 1 \le r \le C)$, with $w_{rp} \in \textrm{R}^+$.

The training of the MBSOM-CM$\textbf{dd}$ algorithm provides the vector of prototypes $\mathcal {G}$, the matrix of relevance weights $\textbf{W}$, and the partition $\mathcal {P}$ by iteratively minimizing the error function $J_{MBSOM-MMdd}$. Furthermore, the training of the MBSOM-MM$\textbf{dd}$ algorithm provides the matrix $\textbf{V}$ of prototype weights, the matrix of relevance weights $\textbf{W}$, and the partition $\mathcal {P}$ by iteratively minimizing the error function $J_{MBSOM-MMdd}$. Furthermore, the training of the MRBSOM algorithm provides the matrix $\mathcal {A}$ of coefficients, the matrix of relevance weights $\textbf{W}$, and the partition $\mathcal {P}$ by iteratively minimizing the error function $J_{MRBSOM}$. Table 1 provides the error functions of the algorithms.

Table 1 Error functions of the SOM algorithms

Self-organizing maps with adaptive distances for multiple dissimilarity matrices

Abstract

Similar content being viewed by others

On-Line Relational SOM for Dissimilarity Data

Exploiting second-order dissimilarity representations for hierarchical clustering and visualization

Multi-view document clustering based on geometrical similarity measurement

Explore related subjects

1 Introduction

2 Batch SOM algorithms for multi-view dissimilarity data

2.1 MBSOM-CM\(\textbf{dd}\), MBSOM-MM\(\textbf{dd}\) and MRBSOM self-organizing maps

2.2 MBSOM-CM\(\textbf{dd}\), MBSOM-MM\(\textbf{dd}\) and MRBSOM algorithms

2.2.1 Representation step

Remark

2.2.2 Weighting step

Remark

Remark

2.2.3 Assignment step

Remark

Remark

2.3 The methods

2.4 Complexity analysis

3 Experimental setting

4 Experimental analysis and discussion

5 Final remarks

Availability of data and materials

Code availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Consent to participate

Consent for publication

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (pdf 758 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation