1 Introduction

Multi-view data research has become increasingly important since large amounts of information are constantly being generated from different sources. These data are heterogeneous, and the variables are organically partitioned into groups; however, there are still some potential links between them. Each group of variables is referred to as a specific perspective (view), and the multiple views on a given issue can take different arrangements (Xu et al., 2015). Often, in multi-view data, the objects are represented by several matrices or views (e.g., sensor signal data can be decomposed into time and frequency data, image data can be described by texture and color data, and multimedia segments can be represented by video and audio signal data (Yang & Wang, 2018)). In such data, each view has its specific features for a certain knowledge discovery task. However, different views frequently contain complementary information that must be exploited (Wang et al., 2017).

According to Sun et al. (2019), multi-view learning can improve learning performance effectively on natural single-view data (Sun et al., 2019), thus representing the main reason for single-view-based data representation to be usually incomplete. In this sense, different views might provide complementary information for the learning problem (Wang et al., 2017). Concerning multi-view clustering, there are three combination strategies to perform the learning task (Cleuziou et al., 2009): concatenation strategy, distributed strategy, and centralized strategy. The first strategy consists of concatenating the views into a single one, either directly, by juxtaposing the sets of features, or indirectly, by combining the proximity matrices derived from each view. The second strategy starts by clustering the objects from each view independently and then looking for a solution that represents a consensus among all the groups. Finally, the last strategy uses multiple views simultaneously to mine hidden patterns from the data (Cleuziou et al., 2009).

Throughout learning, the multi-view approaches explicitly use distinct data representations that can either be the original data features or those obtained through computations (Sun et al., 2019). Moreover, each view can be represented by either vectorial or non-vectorial data. The former has received considerably more attention from unsupervised learning approaches, where most machine learning and data analysis methods available are based on a vector model, with each example being represented by a vector of quantitative values (Frigui et al., 2007). Unfortunately, many current datasets do not support this type of representation. There are categorical data, abstract data, and relational data, among others. In the latter, the objects are described through a relationship between data pairs that contain only information on the degrees to which pairs of objects in the dataset are related (Kaufman & Rousseeuw, 1987; Frigui et al., 2007). A way of dealing with these types of data is to consider the objects represented by a matrix of dissimilarities.

In dissimilarity data, each pair of objects is represented by a dissimilarity relationship (Rousseeuw & Kaufman, 2005). Single-view dissimilarity data is represented by a dissimilarity matrix defined as \(\textbf{D}=[d(e_k,e_l)] \, (1 \le k,l \le N)\), where \(d(e_k,e_l)\) is the dissimilarity between objects \(e_k\) and \(e_l\) on dissimilarity matrix \(\textbf{D}\). Unsupervised dissimilarity data methods introduce approaches to handle such data using the most appropriate dissimilarity function for the problem at hand. These approaches can handle heterogeneous data through different transformations.

Therefore, dissimilarity data are more generic since they can be applied to scenarios where objects cannot be represented by numerical features. They are also more useful when the distance measure is computed according to a suitable algorithm instead of an algebraic expression as usual, or when sets of similar objects cannot be represented adequately by a prototype vector (Frigui et al., 2007).

Several traditional machine learning algorithms have been extended to cope effectively with a variety of multi-view and dissimilarity data problems (Gusmão & Carvalho, 2019; Dantas & Carvalho, 2011; Olteanu & Villa-Vialaneix, 2015). The Kohonen Self-Organizing Map (SOM) (Kohonen, 2001; Badran et al., 2005; Kohonen, 2013; Astudillo & Oommen, 2014; Cottrell et al., 2018) is a powerful tool for dealing with these kinds of challenges. SOM performs clustering and non-linear data projection at the same time, thus providing a strong visualization tool. The SOM has proven effective, especially when considering the faithfulness (precision) of the mapping from a high-dimensional space (Vatanen et al., 2015). In addition, the SOM has proven a reliable approach for a variety of fields, including finance, statistics, bio-medicine, industry, and many others (see Refs. Kaski et al. (1998); Oja et al. (2003); Domínguez-González et al. (2012); Astudillo and Oommen (2014); Kamimura (2019); Douzas et al. (2021) for details).

A SOM consists of neurons (vertices) usually arranged on a regular two or three-dimensional grid (the map). Each neuron is associated with a cluster representative (prototype) of a data subset (a cluster). Both the data and the a priori topology impose the cluster structure. The SOM network can be trained either incrementally or batch-wise. According to Kohonen (2013), the batch variant of the SOM network is most suited for practical applications. However, incremental training is the preferable approach when data are given sequentially. Throughout the map training, each object must select its Best Matching Unity (BMU), i.e., the neuron with the most similar prototype to its description. Therefore, the BMU-associated prototype and the neuron-associated prototypes in the BMU spatial neighborhood are updated to better represent the object similarity to these neurons.

A SOM preserves the data topological properties, which implies that if two objects in the original description space are similar, the related BMU prototypes are also similar and will be associated with adjacent or close vertices on the map. Thus, the data are grouped by clusters such that the most similar prototypes are associated with adjacent vertices, while less similar prototypes are associated with distant vertices on the map (Kohonen, 2013).

There are different prototype-based approaches to determine the representative of a cluster concerning dissimilarity data, but only a few SOM algorithms have been proposed. Median Batch SOM (Kohonen, 2001) was the first extension of the original SOM for single-view dissimilarity data. In this regard, the cluster prototypes were represented by a single medoid in Ref. Kohonen (2001) and later by a set-medoids in Ref. Golli et al. (2005). Furthermore, another method for extending SOM to single-view dissimilarity data was proposed, in which the cluster representative is a “normalized linear combination” of the objects from the whole dataset. Based on this latter approach, batch (Hasenfuss & Hammer, 2007) and on-line (Olteanu et al., 2012) versions of SOM for dissimilarity data are currently available. Finally, Ref. Mariño and Carvalho (2020) introduced a batch SOM for single-view dissimilarity data, with the cluster prototypes represented by vectors of weights. Each component of these vectors quantifies the significance of each object in a given cluster. Despite the growing interest in machine learning, to the best of our knowledge, only Refs. Dantas and Carvalho (2011) and Olteanu and Villa-Vialaneix (2015) have proposed SOM algorithms that can manage multi-view dissimilarity data. Ref. Olteanu and Villa-Vialaneix (2015) introduced an online extension of relational SOM algorithm (Hasenfuss & Hammer, 2007), whereas Refs. Dantas and Carvalho (2011) proposed a batch extension of the median SOM (Golli et al., 2005; Kohonen, 2001) to the case where several dissimilarities matrices are available for describing the dataset.

Paper proposal

Herein, we propose two families of batch SOM algorithms for multi-view dissimilarity data in the framework of the centralized strategy: Batch SOM algorithms for multi-view dissimilarity data with weighted medoids as cluster representatives (MBSOM-MM\(\textbf{dd}\)) and Batch SOM algorithms for multi-view dissimilarity data with a normalized linear combination of the objects as cluster representatives (MRBSOM).

The new MBSOM-MM\(\textbf{dd}\) family extends Ref. Mariño and Carvalho (2020) aiming to manage datasets described by multiple dissimilarity matrices. For a fixed neighborhood, the new method gives the optimal solution for computing the representative (weighted medoids) associated with each neuron, computing optimal adaptive relevance weights on the dissimilarity matrices, and providing the optimal cluster partition.

Furthermore, MRBSOM, as the already in the literature batch SOM methods for relational data (Hasenfuss & Hammer, 2007), keeps the idea that each cluster representative is a normalized linear combination of the objects represented in the description space, but additionally uses different adaptive weights on the dissimilarity matrices aiming to take into account the importance of each dissimilarity matrix on the unsupervised learning task.

In the families of both models, the weights change at each algorithm iteration such that each matrix has a different influence on the training of the map. Each one of the proposed families can compute relevance weights for each dissimilarity matrix, either locally, for each cluster, or globally, for the whole partition (Dantas & Carvalho, 2011).

The main contributions of our paper are the two families of Batch SOM algorithms that can manage multi-view datasets described by several dissimilarity matrices. More precisely, the paper provides:

  • The respective objective functions that, for a fixed neighborhood, should be optimized to learn the MBSOM-MM\(\textbf{dd}\) and MRBSOM model families;

  • For a fixed neighborhood radius, i) the optimal solution for computing the cluster representatives associated with each neuron in the proposed models; ii) the optimal solution for computing the relevance weights of the dissimilarity matrices on the training of the SOM; and iii) the optimal solution for the partition associated to the neurons of the proposed algorithms;

  • The time complexity of the proposed models;

  • A significant evaluation of the proposed methods compared with relevant batch SOM algorithms for multi-view dissimilarity data.

Thus, the proposed algorithms aim to improve MBSOM-CM\(\textbf{dd}\) (Dantas & Carvalho, 2011) because the number of objects (medoids) that represent a cluster may be insufficient to describe it. Moreover, MBSOM-CM\(\textbf{dd}\) ignores the relevance of the medoids, for instance, when several objects are selected as medoids, these medoids are not necessarily equally important for the cluster. They do not describe it in the same way. Additionally, in MBSOM-CM\(\textbf{dd}\), the cardinality of the set of medoids (the representative) is a parameter that must be provided a priori. Finally, the relevance of the specific data source can impact the result of model performance. Nevertheless, the impact of data sources considering the relevance for the cluster locally and for each view globally has not been studied with these approaches of cluster representation regarding SOM algorithms.

The paper is organized as follows: Sect. 2 describes the families of the proposed models MBSOM-CM\(\textbf{dd}\) and MRBSOM. In addition, we also provide an in-depth description and formalization of the algorithm MBSOM-CM\(\textbf{dd}\) introduced in the work of Ref. Dantas and Carvalho (2011). Moreover, we analyze the time complexity of the proposed algorithms. Section 3 presents the setup of our experiments. Section 4 provides the performance evaluation of the proposed algorithms against already in the literature approaches, showing the results and discussing the main findings obtained. This section also provides further insights into the families of the models through an application concerning the dermatology dataset (Dua & Graff, 2017). Finally, Sect. 5 introduces our final remarks.

2 Batch SOM algorithms for multi-view dissimilarity data

This section presents the batch SOM families for multi-view dissimilarity data MBSOM-CM\(\textbf{dd}\), MBSOM-MM\(\textbf{dd}\), and MRBSOM. Section 2.1 discusses the cluster representatives and the relevance weights of the dissimilarity matrices and provides the error functions of these batch SOM algorithms. Section 2.2 describes their three main steps (the computation of the cluster representatives, the relevance weights of the dissimilarity matrices, and the update of the clusters) and provides the main algorithm for each batch SOM family. Section 2.3 introduces some notations aiming to simplify the presentation of the methods according to the algorithms used and the relevance weights assigned to each dissimilarity matrix. Finally, Sect. 2.4 introduces the time complexity analysis of the proposed variants of the batch SOM families.

2.1 MBSOM-CM\(\textbf{dd}\), MBSOM-MM\(\textbf{dd}\) and MRBSOM self-organizing maps

This section provides a detailed presentation of the MBSOM-CM\(\textbf{dd}\), MBSOM-MM\(\textbf{dd}\), and MRBSOM SOM algorithms.

Let \(E = \{e_1,\ldots ,e_N\}\) be a set of objects and let P dissimilarity matrices \(\textbf{D}_p=[d_p(e_k,e_l)] \, (1 \le k,l \le N)\) where \(d_p(e_k,e_l)\) is the dissimilarity between objects \(e_k\) and \(e_l\) on dissimilarity matrix \(\textbf{D}_p \, (1 \le p \le P)\).

A SOM consists of a low-dimensional (usually two-dimensional) regular grid (map), which contains C nodes (neurons). Each SOM map is associated with a partition in which a neuron indexed by r has associated a cluster \(P_r\) and a representative (prototype).

Let \(\{1,\ldots ,C\}\) be the cluster index set and let f be the assignment function that maps each object to an index \(r=f(e_k) \in \{1,\ldots ,C\}\) of the cluster index set. The partition \(\mathcal {P}=\{P_1,\ldots ,P_C\}\) associated with a SOM is defined by the assignment function which gives the index of the cluster of \(\mathcal {P}\) to which the object \(e_k\) belongs to, i.e., \(P_r=\{e_k \in E: f(e_k) = r\}\).

Following (Golli et al., 2005), in MBSOM-CM\(\textbf{dd}\) (Dantas & Carvalho, 2011) it is assumed that the representative of each cluster is a set of objects (set-medoids), i.e., the prototype \(G_r\) of cluster \(P_r\) is a subset of fixed cardinal \(1 \le q \ll N\) of the set of objects E: \(G_r \in E^{(q)} = \{A \subset E: |A| = q\}\). Besides, \(\mathcal {G}=(G_1,\ldots ,G_r, \ldots , G_C)\) is the vector of cluster prototypes.

Moreover, following (Mariño & Carvalho, 2020), in MBSOM-MM\(\textbf{dd}\) it is assumed that the representative \(\textbf{v}_r = (v_{r1},\ldots ,v_{rN})\) of cluster \(P_r\) is a N-dimensional vector of weights whose components measure how the objects are weighted as a medoid regarding the cluster \(P_r\). Let \(\textbf{V}=(\textbf{v}_1, \ldots , \textbf{v}_C) = (v_{rj}) \, (1 \le r \le C; 1 \le j \le N)\) be the matrix of prototype weights of the objects regarding the clusters.

Regarding MRBSOM, Refs. Cottrell et al. (2018); Hammer and Hasenfuss (2007) pointed out that if the data are described by a dissimilarity matrix where each cell is the squared Euclidean distance, they can be embedded in a pseudo-Euclidean space in such a way that optimum prototypes can be expressed as linear combinations of data points. Therefore, the unknown distances \(\Vert \textbf{x}_k - \textbf{v}_r\Vert ^2\), where \(\textbf{x}_k=(\textbf{x}_{k(1)}, \ldots , \textbf{x}_{k(P)})\) is the description of the object \(e_k\) and \(\textbf{v}_r=(\textbf{v}_{r(1)},\ldots ,\textbf{v}_{r(P)})\) is the representative of cluster \(P_r\) both in the pseudo-Euclidean space, can be expressed in terms of known values of the squared Euclidean distances of the dissimilarity matrix.

Assuming that \(\Vert \textbf{x}_k - \textbf{v}_r\Vert ^2 = \sum _{p=1}^P \Vert \textbf{x}_{k(p)} - \textbf{v}_{r(p)}\Vert ^2\) and that the p-th component of the cluster representative is such that \(\textbf{v}_{r(p)} = \sum _{k=1}^N \alpha _{rk} \textbf{x}_{k(p)}\) where \(\sum _{k=1}^N \alpha _{rk} = 1\), according to Cottrell et al. (2018); Hasenfuss and Hammer (2007):

$$\begin{aligned} \Vert \textbf{x}_{k(p)} - \textbf{v}_{r(p)}\Vert ^2 = [\textbf{D}_p \varvec{\alpha }_r]_k - \frac{1}{2} \varvec{\alpha }_r^\top \textbf{D}_p \varvec{\alpha }_r \, (1 \le p \le P), \end{aligned}$$
(1)

where \([\textbf{D}_p \varvec{\alpha }_r]_k\) is the k-th component of \([\textbf{D}_p \varvec{\alpha }_r]\) and \(\varvec{\alpha }_r=(\alpha _{r1},\ldots ,\alpha _{rN}) \, (1 \le r \le C)\). Since \(\textbf{v}_{r(p)}\) is in the implicitly pseudo-Euclidean space, it is the vector \(\varvec{\alpha }_r\) that is updated, where the distances between the prototypes and the objects are computed only indirectly through the coefficients \(\alpha _{rk}\). According to Ref. Hammer and Hasenfuss (2007), the equation (1) still holds to any given dissimilarity matrix \(\textbf{D}_p\).

Therefore, in MRBSOM it is assumed that the representative \(\varvec{\alpha }_r = (\alpha _{r1}, \ldots , \alpha _{rN})\) of cluster \(P_r\) is a N-dimensional vector of coefficients \(\alpha _{rk}\). Let \(\mathcal {A}=(\varvec{\alpha }_1,\ldots ,\varvec{\alpha }_C)=(\alpha _{rk})_{\begin{array}{c} 1 \le r \le C\\ 1 \le k \le N \end{array}}\) be the matrix of coefficients \(\alpha _{rk}\).

Dissimilarity matrices can have different relevance to the training of the SOM. In most applications, some dissimilarity matrices may be irrelevant, while among those that are relevant, some may be more or less relevant than others.

Therefore, aiming to obtain a significant SOM from all dissimilarity matrices, the MBSOM-CM\(\textbf{dd}\), MBSOM-MM\(\textbf{dd}\), and MRBSOM SOM algorithms were designed to provide the clusters and their respective prototypes by simultaneously preserving the spatial order of the prototypes on the map, as well as to learn the relevance weight for each dissimilarity matrix by optimizing a suitable error function.

The relevance weights can be assigned to each dissimilarity matrix globally, to all the clusters, according to the \((P \times 1)\) matrix \(\textbf{W}=(w_{p}) \, (1 \le p \le P)\), with \(w_{p} \in \textrm{R}^+\). They can also be assigned to each dissimilarity locally, to each cluster, according to the \((P \times C)\) matrix \(\textbf{W}=(w_{rp}) \, (1 \le p \le P; 1 \le r \le C)\), with \(w_{rp} \in \textrm{R}^+\).

The training of the MBSOM-CM\(\textbf{dd}\) algorithm provides the vector of prototypes \(\mathcal {G}\), the matrix of relevance weights \(\textbf{W}\), and the partition \(\mathcal {P}\) by iteratively minimizing the error function \(J_{MBSOM-MMdd}\). Furthermore, the training of the MBSOM-MM\(\textbf{dd}\) algorithm provides the matrix \(\textbf{V}\) of prototype weights, the matrix of relevance weights \(\textbf{W}\), and the partition \(\mathcal {P}\) by iteratively minimizing the error function \(J_{MBSOM-MMdd}\). Furthermore, the training of the MRBSOM algorithm provides the matrix \(\mathcal {A}\) of coefficients, the matrix of relevance weights \(\textbf{W}\), and the partition \(\mathcal {P}\) by iteratively minimizing the error function \(J_{MRBSOM}\). Table 1 provides the error functions of the algorithms.

Table 1 Error functions of the SOM algorithms

For each object \(e_k\), the winning neuron, known as the best matching unit (BMU), is the neuron with the cluster representative closest to \(e_k\). The BMU is indexed by \(f(e_k)\) and is identified with the corresponding cluster representative.

The error measure of a BMU regarding the object \(e_k\) is computed by the generalized dissimilarity (Badran et al., 2005) function \(\Delta _{\textbf{W}}\), which allows comparing each object \(e_k\) to each cluster representative. Table 2 provides \(\Delta _{\textbf{W}}\) according to the SOM algorithms.

Table 2 Generalized dissimilarity functions of the SOM algorithms

In the equations (5), (6) and (7), \(h_{f(e_k),r}\) is the neighborhood kernel function that measures the influence neighborhood of BMU on neuron r. Several choices are possible, in Ref Dantas and Carvalho (2011), it is defined as

$$\begin{aligned} h_{f(\textbf{e}_k),r} = \exp \left\{ -\frac{\Vert a_{f(\textbf{e}_k)} - a_r\Vert ^2}{2\sigma ^2}\right\} . \end{aligned}$$
(8)

where \(a_{f(\textbf{e}_k)}\) and \(a_r\) are the BMU and neuron r positions in the grid, respectively. Moreover, \(\sigma \) is the neighborhood radius. The size of the neighborhood decreases with \(\sigma \): the smaller \(\sigma \) the fewer the neurons belonging to the effective neighborhood of a given BMU (Badran et al., 2005; Kohonen, 2001).

Therefore, the generalized dissimilarity function \(\Delta _{\textbf{W}}\) is a weighted sum of the global match functions \(D_{\textbf{W}}\) computed between an object \(e_k\) and a cluster representative. Note that it considers all the neurons in the neighborhood of the BMU.

The function \(D_{\textbf{W}}\) computes the global matching between an object \(e_k\) and a cluster representative. Table 3 provides \(D_{\textbf{W}}\) according to the SOM algorithms when the weights are assigned globally.

Table 3 Global match functions: the weights are assigned globally

Table 4 provides \(D_{\textbf{W}}\) according to the SOM algorithms when the weights are assigned locally.

Table 4 Global match functions: the weights are assigned locally

The function \(D_p\) computes the local matching between an object \(e_k\) and a cluster representative on dissimilarity \(\textbf{D}_p \, (1 \le p \le P)\). Table 5 provides \(D_p\) according to the SOM algorithms.

Table 5 Local match functions

Note that because the global match functions are weighted sums of the local match functions, the different dissimilarity matrices need to have been scaled previously by a suitable normalization to make their relative value ranges comparable, thus preventing some particular views from prevailing above the others in the computation of the global match functions only due to different feature measurement units.

2.2 MBSOM-CM\(\textbf{dd}\), MBSOM-MM\(\textbf{dd}\) and MRBSOM algorithms

For a fixed radius \(\sigma \), from an initial solution, the training map of the SOM algorithms are obtained by minimizing the error functions of Table 1, which are performed iteratively in three steps: representation, weighting, and assignment.

The representation step gives the optimal solution for computing the cluster representatives associated with the map neurons. The weighting step computes the relevance weights of the dissimilarity matrices to the training of the SOM. Finally, the assignment step provides the optimal solution for the clusters associated with the map neurons.

2.2.1 Representation step

The representation step gives the optimal solution for computing the cluster representatives associated with the map neurons. During the representation step, the matrix of relevance weights \(\textbf{W}\) and the partition \(\mathcal {P}\) are kept fixed.

MBSOM-CM \(\textbf{dd}\) algorithm The error function \(J_{MBSOM-MMdd}\) is minimized regarding the vector of prototypes \(\mathcal {G}=(G_1,\ldots ,G_C)\).

The prototype \(G_r\) of cluster \(P_r\), which minimizes the error function \(J_{MMBSOM-MMdd}\), either minimizes \(\sum _{k=1}^N h_{f(e_k),r} \sum _{p=1}^P w_p D_p(e_k,G_r)\) (if the weights are assigned globally) or minimizes \(\sum _{k=1}^N h_{f(e_k),r} \sum _{p=1}^P w_{rp} D_p(e_k,G_r)\) (if the weights are assigned locally). The prototype \(G_r\) is computed according to the following brute force algorithm 1 (Dantas & Carvalho, 2011):

figure a

Algorithm 1

MBSOM-MM \(\textbf{dd}\) algorithm

The error function \(J_{MBSOM-MMdd}\) is minimized regarding the matrix \(\textbf{V}=(v_{rj}) \, (1 \le r \le C; 1 \le j \le N)\) of prototype weights of the objects. To exclude the trivial solution \(\textbf{V}\) equal to null, we consider the sum constraint. Therefore, the error function \(J_{MBSOM-MMdd}\) is minimized regarding the matrix \(\textbf{V}=(v_{rj}) \, (1 \le r \le C; 1 \le j \le N)\) of prototype weights of the objects, subject to \(\sum _{j=1}^N v_{rj} = 1\) and \(v_{rj} \ge 0\), either if the weights are assigned globally or locally.

Table 6 provides the Lagrangian functions either if the weights are assigned globally or locally, where \(\beta _r\) are the Lagrange multipliers.

Table 6 Lagrangian functions

Then, taking the partial derivatives of \(\mathcal {L}\) w.r.t \(v_{rj}\) and \(\beta _r\), and by setting the partial derivatives to zero, we obtain the optimal solutions to \(v_{rj}\), which are shown in Table 7.

Table 7 Prototype weights of the objects

Remark

Equation (20) allows us to conclude that at the end of the training of MBSOM-CM\(\textbf{dd}\) algorithm when \(h_{f(e_k),r} \sim 0\) for \(f(e_k) \ne r\), the lower the \(\sum_{e_k \in P_r} \sum_{p=1}^P w_{rp} d_p(e_k,e_j)\) is, the higher the prototype weight \(v_{rj}\) is. In addition, Eq. (21) demonstrates that the lower the \(\sum _{e_k \in P_r} \sum{_{p=1}^P} w_pd_p(e_k,e_j)\) is, the higher the prototype weight \(v_{rj}\) is.

MRBSOM algorithm

The error function \(J_{MRBSOM}\) is minimized regarding the matrix \(\mathcal {A}=(\alpha _{rk}) \, (1 \le r \le C; 1 \le k \le N)\) of coefficients. The error function \(J_{MRBSOM}\) can be written as

$$\begin{aligned} J_{MRBSOM} = \sum _{k=1}^N \sum _{r=1}^C h_{f(e_k),r} \sum _{p=1}^P (\textbf{x}_{k(p)} - \textbf{v}_{r(p)})^\top (\textbf{x}_{k(p)} - \textbf{v}_{r(p)}). \end{aligned}$$
(22)

Taking the partial derivative of \(J_{MRBSOM}\) w.r.t \(\textbf{v}_{r(p)}\) and by setting the partial derivative to zero, we obtain:

$$\begin{aligned} \textbf{v}_{r(p)} = \sum _{k=1}^N \frac{h_{f(e_k),r}}{\sum _{k=1}^N h_{f(e_k),r}} \textbf{x}_{k(p)} = \sum _{k=1}^N \alpha _{rk} \textbf{x}_{k(p)}. \end{aligned}$$
(23)

where \(\alpha _{rk}=\frac{h_{f(e_k),r}}{\sum _{k=1}^N h_{f(e_k),r}}\), considering that \(\sum _{k=1}^N \alpha _{rk} = 1\), and \(\alpha _{rk} \ge 0 \, (1 \le r \le C; 1 \le k \le N)\). Since \(\textbf{v}_{r(p)}\) is in the pseudo-Euclidean space, it is the matrix \(\mathcal {A}=(\alpha _{rk}) \, (1 \le r \le C; 1 \le k \le N)\) of coefficients that is updated as follows:

$$\begin{aligned} \alpha _{rk} = \frac{h_{f(e_k),r}}{\sum _{k=1}^N h_{f(e_k),r}} \, (1 \le r \le C; 1 \le k \le N). \end{aligned}$$
(24)

2.2.2 Weighting step

During the weighting step, the cluster representatives and the partition \(\mathcal {P}\) are kept fixed. The error functions of Table 1 are minimized regarding the matrix of weights \(\textbf{W}\).

A trivial solution for this minimization problem is reached when \(\textbf{W}\) is null. To exclude the trivial solution, constraints on the elements of \(\textbf{W}\) are required. Two main types of constraints have been proposed: a constraint on the product of the weights (Diday & Govaert, 1977) and a constraint on the sum of the weights (Huang et al., 2005). Herein, we consider only the product constraint since the sum constraint requires fixing further hyper-parameters in advance.

Therefore, the error functions of Table 1 are minimized either regarding the matrix of weights \(\textbf{W}=(w_{p}) \, (1 \le p \le P)\), subject to \(\prod _{p=1}^P w_p =1, \, w_p > 0\), if the weights are assigned globally, or regarding the matrix of weights \(\textbf{W}=(w_{rp}) \, (1 \le r \le C; 1 \le p \le P)\), subject to \(\prod _{p=1}^P w_{rp} =1, \, w_{rp} > 0\), if the weights are assigned locally.

Table 8 provides the Lagrangian functions according to the SOM algorithms for weights assigned globally, where \(\beta \) is the Lagrange multiplier.

Table 8 Computation of the global weights: Lagrangian functions

Then, taking the partial derivatives of \(\mathcal {L}\) w.r.t \(w_p\) and \(\beta \), and by setting the partial derivatives to zero, we obtain the optimal solutions to \(w_p\) according to the SOM algorithms, which are shown in Table 9.

Table 9 Optimal global weights

Remark

Equation (28) allows us to conclude that at the end of the training of MBSOM-MM\(\textbf{dd}\) algorithm, when \(h_{f(e_k),r} \sim 0\) for \(f(e_k) \ne r\), the lower the \(\sum _{r=1}^C \sum _{e_k \in P_r} \sum _{e \in G_r} d_p(e_k,e)\) the higher the relevance weight \(w_p\) of the dissimilarity matrix \(\textbf{D}_p\). Similar remarks can be achieved for MBSOM-MM\(\textbf{dd}\) and MRBSOM algorithms.

Table 10 provides the Lagrangian functions according to the SOM algorithms for weights assigned locally, where \(\beta _r\) are the Lagrange multipliers.

Table 10 Computation of the local weights: Lagrangian functions

Then, taking the partial derivatives of \(\mathcal {L}\) w.r.t \(w_{rp}\) and \(\beta _r\), and by setting the partial derivatives to zero, we obtain the optimal solutions to \(w_{rp}\), which are shown in Table 11.

Table 11 Optimal local weights

Remark

Equation (34) allows us to conclude that at the end of the training of MBSOM-CM\(\textbf{dd}\) algorithm, when \(h_{f(e_k),r} \sim 0\) for \(f(e_k) \ne r\), the lower the \(\sum _{e_k \in P_r} \sum _{e \in G_r} d_p(e_k,e)\) the higher the relevance weight \(w_{rp}\) of the dissimilarity matrix \(\textbf{D}_p\) on the cluster \(P_r\). Similar remarks can be achieved for the MBSOM-MM\(\textbf{dd}\) and MRBSOM algorithms.

2.2.3 Assignment step

During the assignment step, the cluster representatives and the matrix of relevance weights \(\textbf{W}\) are kept fixed. The aim is to minimize the error functions of Table 1 regarding the partition \(\mathcal {P}\). Regarding the MBSOM-CM\(\textbf{dd}\) algorithm, according to Eq. (2), the error function \(J_{MBSOM-MMdd}\) is minimized if \(\Delta _{\textbf{W}}(e_k,G_{f(e_k)})\) is minimized for each \(e_k \in E\). For a fixed vector of prototypes \(\mathcal {P}\) and a fixed matrix of relevance weights \(\textbf{W}\), \(\Delta _{\textbf{W}}(e_k,G_{f(e_k)})\) is minimized if \(f(e_k) = arg \min _{1 \le s \le C} \Delta _{\textbf{W}}(e_k,G_s)\).

Following a similar reasoning to MBSOM-MM\(\textbf{dd}\) and MRBSOM algorithms, Table 12 provides the update rule for the clusters \(P_r \, (1 \le r \le C)\) according to the SOM algorithms and relevance weights of the dissimilarity matrices.

Table 12 Update rules for the clusters

Remark

Note that according to Eqs. 37 and 42, the objects are assigned to clusters by computing the dissimilarity between them and the cluster representatives. Dissimilarity matrices that have a big (small) weight strongly (weakly) contribute to computing the dissimilarity between objects and cluster representatives. This is why a dissimilarity matrix with a big weight is more relevant to the clustering task than a dissimilarity matrix with a small weight.

These three steps are repeated until the fixed number of iterations \(N_{iter}\) (epochs) is achieved. Algorithm 1 summarizes these steps.

Remark

  1. 1.

    The performance of these SOM algorithms at the end of the training, and the associated partition, depend on the choice of their parameters \(N_{iter}\), \(\sigma _0\), \(\sigma _f\), q (MBSOM-CM\(\textbf{dd}\) family), and n (MBSOM-CM\(\textbf{dd}\) family);

  2. 2.

    For a fixed radius, at each step of the algorithms (representation, weighting, and assignment), the objective functions are locally minimized and decreased. Since the radius is not a variable of the objective functions, its update cannot guarantee that the objective functions decrease regarding the last step before the radius updating; therefore the convergences of algorithms cannot be ensured.

  3. 3.

    Furthermore, the final solution depends on the choice of the C initial cluster representatives (e.g., the C set-medoids, regarding the MBSOM-CM\(\textbf{dd}\) algorithm).

Algorithm 2
figure b

MBSOM-CM\(\textbf{dd}\), MBSOM-MM\(\textbf{dd}\), and MRBSOM algorithms

2.3 The methods

For a simpler presentation of the methods according to the algorithms used and the relevance weights assigned to each dissimilarity matrix, we adopted the following notations.

  • MBSOM-CM\(\textbf{dd}\)-G (Dantas & Carvalho, 2011), if each cluster representative is a set of objects (set-medoids) presenting a matching function described by Eq. (5) and the weights are assigned globally by Eq. (9).

  • MBSOM-CM\(\textbf{dd}\)-L (Dantas & Carvalho, 2011), if each cluster representative is a set of objects (set-medoids) presenting a matching function described by Eq. (5) and the weights are assigned locally by Eq. (12).

  • SBSOM-CM\(\textbf{dd}\), if the SOM of Ref. Golli et al. (2005) in which each cluster representative is a set of objects (set-medoids).

  • MBSOM-MM\(\textbf{dd}\)-G, if each cluster representative is the entire set of objects weighted on this cluster (weighted medoids) based on Eq. (6) and the weights, are assigned globally by Eq. (10).

  • MBSOM-MM\(\textbf{dd}\)-L, if each cluster representative is the entire set of objects weighted on this cluster (weighted medoids) based on Eq. (6) and the weights are assigned globally by Eq. (13).

  • SBSOM-MM\(\textbf{dd}\), if the SOM of Ref. Mariño and Carvalho (2020) in which each cluster representative is the entire set of objects weighted on this cluster (weighted medoids).

  • MRBSOM-G, if each cluster representative is a normalized linear combination of the objects represented in the description space based on Eq. (7) and the weights are assigned globally by Eq. (11).

  • MRBSOM-L, if each cluster representative is a normalized linear combination of the objects represented in the description space based on Equation 7 and the weights are assigned locally by Eq. (14).

  • SRBSOM, if the batch SOM of Ref. Hasenfuss and Hammer (2007) in which each cluster representative is a normalized linear combination of the objects represented in the description space.

Note that we also consider related single-view batch SOM algorithms (SBSOM-CM\(\textbf{dd}\), SBSOM-MM\(\textbf{dd}\), and SRBSOM) in order to compare them with the proposed multiple-view approaches.

2.4 Complexity analysis

The complexity of the algorithms mainly depends on the matching function \(\Delta _{\textbf{W}}\) of Table 2, which allows comparing each object with the representatives of the clusters. Moreover, the final computational complexity considers the following three main steps: representation, weighting, and assignment.

  • Regarding the methods MBSOM-CM\(\textbf{dd}\)-G and MBSOM-CM\(\textbf{dd}\)-L, the time complexity of \(\Delta \) is \(O (P\times N \times C)\). The representation, weighting (Local or Global), and assignment steps are \(O (P\times N^2 \times C)\), \(O (P\times N \times C))\), and \(O (P\times N \times C^2)\), respectively. The final time complexity is \(O (N_{iter}\times P\times N^2\times C)\).

  • In methods MBSOM-MM\(\textbf{dd}\)-G and MBSOM-MM\(\textbf{dd}\)-L, the time complexity of \(\Delta \) is \(O (P\times N^2 \times C)\), which is the same in the representation and weighting (local or global) steps, whereas it is \(O (P\times N^2 \times C^2)\) for the assignment step. The final time complexity is \(O (N_{iter}\times P\times N^2\times C^2)\).

  • Finally, in methods MRBSOM-G and MRBSOM-L, the time complexity of \(\Delta \) is \(O (P\times N^3 \times C)\). In the representation, weighting (local or global), and assignment steps, it is \(O (N \times C))\), \(O (P\times N^2 \times C)\), and \(O (P \times N^3 \times C))\), respectively. The final time complexity is \(O (N_{iter}\times P \times N^3 \times C)\).

3 Experimental setting

The successful training of the batch SOM algorithms depends on the choice of their parameters (Badran et al., 2005; Kohonen, 2001, 2013). This section describes the experimental setting used to evaluate the proposed methods compared with other state-of-the-art batch SOM algorithms for multi-view and single-view relational data. The algorithms were implemented in the language “C” and executed on the same machine (OS: Windows 7 64-bits, Memory: 16 GB, Processor: Intel Core i7-X990 CPU @ 3.47 GHz).

A total of 168 distinct experiments were executed by the methods SBSOM-CM\(\textbf{dd}\), MBSOM-CM\(\textbf{dd}\)-L, MBSOM-CM\(\textbf{dd}\)-G, SBSOM-MM\(\textbf{dd}\), MBSOM-MM\(\textbf{dd}\)-L, and MBSOM-MM\(\textbf{dd}\)-G. The methods SRBSOM, MRBSOM-L, and MRBSOM-G performed 56 experiments. Experiments proceed with different parameter sets, array maps, shapes, and initials (\(h_0\)) kernel values. The search for the best configuration of the proposed methods is based on the trade-off between the internal validity indices Topographic Error (TE), which measure the quality of the SOM map (Kiviluoto, 1996), and the Silhouette Coefficient (SIL) (Rousseeuw & Kaufman, 2005), which measures the quality of the cluster partition. The configuration that provides a solution with a not-so-high TE and a not-so-low SIL is preferred to a configuration that provides a solution with a high TE but a low SIL. The goal is to obtain a trained map and cluster partition of high quality simultaneously as measured by these indices.

For instance, let us consider the silhouette and topographic error scores of two experiments of one of the assessed methods: experiment 1 (\(SIL = 0.70\) and \(TE = 0.45\)) and experiment 2 (\(SIL = 0.68\) and \(TE = 0.20\)). The setup of experiment 2 is the most suitable according to our methodology. The overall results are presented in the same setting regarding the best behavior in most scenarios, according to the internal indices. This means that if the best results regarding all methods and scenarios are achieved with a squared shape, the methods are compared between them using this configuration with their best setup.

A remarkable feature of multi-view learning is that its performance on an original single-view dataset could still be improved by using manually generated multi-views (Zhao et al., 2017; Sun et al., 2019). Accordingly, datasets that were originally single-view were split into multiple datasets described by disjoint subsets of the original set of features. Fourteen datasets were considered in this study (see Sect. 1 of the Supplementary Material). Table 13 summarizes these datasets, in which P is the number of views, N is the number of objects, M is the number of a priori classes, and ARRAY is the dimension of the grid array. The maps are arrays of square (Sq.) or rectangle (Rect.) shapes. Table 13 also provides C, the number of neurons (clusters) in the maps for each dataset, which can be deduced from ARRAY. The number of neurons is roughly \(\lfloor \sqrt{N} \rfloor \) according to prior research (Vesanto et al., 1999).

Table 13 Summary of the multi-view datasets

For each dataset, the Euclidean distance is used to compute a dissimilarity matrix simultaneously considering all the variables in each view. Regarding IRIS and WINE datasets, for each variable describing the objects a dissimilarity matrix is computed using the Euclidean distance.

Then, the matrices were normalized according to their overall dispersion to have the following dynamic range: each dissimilarity \(d(e_k,e_l) \, (1 \le k,l \le N)\) in a given dissimilarity matrix \(\textbf{D}\) is normalized as \(\frac{d(e_k,e_l)}{T}\), where \(T=\sum _{k=1}^N d(e_k,g)\) is the overall dispersion and \(g = e_l \in E = \{e_1,\ldots ,e_N\}\) is the overall representative, which is computed according to \(l = arg \min _{1 \le h \le N} \sum _{k=1}^N d(e_k,e_h)\). The considered batch SOM algorithms operate on these normalized dissimilarity matrices.

Regarding the single-view batch SOM algorithms of Refs. Golli et al. (2005); Hasenfuss and Hammer (2007); Mariño and Carvalho (2020), the single dissimilarity matrix of each dataset is obtained using a concatenation strategy, i.e., as the average of the dissimilarity matrices describing this dataset.

In addition, we consider the Gaussian neighborhood kernel function (Eq. 43) that ranges in [0, 1]:

$$\begin{aligned} h_{s,r} = \exp \left\{ -\frac{\Vert a_{s} - a_r\Vert ^2}{2\sigma _{(t)}^2}\right\} . \end{aligned}$$
(43)

where \(\Vert a_{s} - a_r\Vert ^2\) is the squared Euclidean distance in the topological space between neurons \(a_{s}\) and \(a_{r}\) and \(\sigma _{(t)}\) is the kernel width (radius) at the iteration t.

The fixed initial value of the kernel width \(\sigma _{0}=\sigma _{(0)}\) and the final value \(\sigma _{f}=\sigma _{(N_{iter})}\) are updated at each iteration t according to Equation (44). Each method is executed using \(N_{iter}=50\) iterations (cf. Ref. Kohonen et al. (2014)).

$$\begin{aligned} \sigma _{(t)} = \sigma _{0} \left( \frac{\sigma _f}{\sigma _{0}}\right) ^{\frac{t}{N_{iter}}}. \end{aligned}$$
(44)

About the initial and final values of the kernel width, some heuristic methods proposed in the literature are mainly related to the classical SOM assignment based only on the distance between objects and BMU (Vesanto et al., 1999). In our case, since the assignment is based on the generalized dissimilarity functions (c.f. Table 2), which leads to maps with a high topographic error, we propose to fix the values of \(\sigma _0\) and \(\sigma _f\) by using the heuristic method of Ref. Carvalho et al. (2022). The initial and final values of the radius \(\sigma \) are computed according to the expressions (45) and (46), respectively. Thus, we initialized the map radius (\(\sigma _{0}\)) representing the distance of two neurons from a kernel value (\(h_0\)) equal to 0.1. In addition, we computed \(\sigma _f\) such that two neighboring neurons have a kernel value (\(h_f\)) equal to 0.01. The map diameter in the topological space is the largest topological distance between two neurons of the map and is computed as \((x_{max})^{2}+(y_{max})^{2}\), where \(x_{max}\) and \(y_{max}\) correspond to the grid size in the horizontal X-axis and vertical Y-axis, respectively.

$$\begin{aligned} \sigma _{0}= & {} \sqrt{\frac{-[(x_{max})^{2}+(y_{max})^{2}]}{2\ln ( h_{0})}}. \end{aligned}$$
(45)
$$\begin{aligned} \sigma _{f}= & {} \sqrt{\frac{-1}{2 \ln ( h_{f})}}. \end{aligned}$$
(46)

Specifically for the models SBSOM-CM\(\textbf{dd}\), MBSOM-CM\(\textbf{dd}\)-L, and MBSOM-CM\(\textbf{dd}\)-G, after some trials based on the trade-off between TE and SIL, the cardinal q of the set-medoids was fixed to 5. In turn, after some trials, the parameter n, which controls the level of smoothness of the distribution of prototype weights among all the objects in each of the clusters, was fixed to 1.1 for the models SBSOM-MM\(\textbf{dd}\), MBSOM-MM\(\textbf{dd}\)-L, and MBSOM-MM\(\textbf{dd}\)-G based also on the trade-off between TE and SIL. The experiments were repeated and randomly initialized 10 times. For each loop, 50 iterations were performed.

Finally, to assess the degree of agreement between an a priori partition and a partition provided by the SOM algorithms, we used two external validity indices computed on datasets with labeled instances: the F-measure (Manning et al., 2008) and the Normalized Mutual Information (NMI) (Cover & Thomas, 2006). See Sect. 1 of the supplementary material for further information.

4 Experimental analysis and discussion

This section assesses the performance of the proposed multi-view SOM algorithms, namely MBSOM-MM\(\textbf{dd}\)-L, MBSOM-MM\(\textbf{dd}\)-G, MRBSOM-L, and MRBSOM-G compared with already in the literature multi-view SOM algorithms (Dantas & Carvalho, 2011): MBSOM-CM\(\textbf{dd}\)-L and MBSOM-CM\(\textbf{dd}\)-G. In addition, we compared them with the most related single-view methods SBSOM-CM\(\textbf{dd}\) (Golli et al., 2005), SBSOM-MM\(\textbf{dd}\) (Mariño & Carvalho, 2020), and SRBSOM (Hasenfuss & Hammer, 2007).

The algorithms are compared according to F-measure, NMI, TE, and SIL. Table 14 presents the average ranks concerning the compared method in each metric. The average rank provides the criterion for evaluating the models. It means that algorithms that achieve higher mean scores for F-measure, NMI, or Silhouette perform the best, whereas algorithms with lower mean scores for Topographic error outperform the others. Regarding average ranks, lower values imply better results. In this table, the best average rank provided by each index is in bold and italic.

According to Table 14, MRBSOM-G had the best performance regarding the average rank of F-measure, NMI, and TE. MBSOM-MM\(\textbf{dd}\)-L surpassed the other algorithms regarding the average rank of SIL. The multi-view variants improved the single-view variants and the proposed algorithms outperformed the multi-view benchmarks (set-medoids SOM) in most cases. The global-weighted algorithms outperformed the local-weighted algorithms regarding NMI and TE. Local-weight algorithms performed the best concerning SIL.

Table 14 Average Rank by index and algorithm

Based on the average ranks, we used the Friedman’s test (Friedman, 1940; Demšar, 2006) to test the null hypothesis that all models perform the same regarding F-measure, NMI, TE, and SIL indices. The test reached a p-value = \(9.899e^{-17}\) and p-value = \(3.422e^{-17}\) concerning TE and Silhouette respectively. The obtained p-value allows us to reject the null hypothesis that all the algorithms perform the same concerning these indices. To detect significant pairwise differences among all models, the Nemenyi’s test (Nemenyi, 1963; Demšar, 2006) was applied regarding the average rank of NMI and TE, available in Table 14. For a significance level \(\alpha \), the test determines the critical difference (CD). If the difference between the average ranking of the two algorithms is greater than CD, the null hypothesis that the algorithms have the same performance is rejected.

Figures 1 and 2 show the CD plot to visualize the differences concerning the TE and SIL indices. Models that are not significantly different (with \(\alpha = 0.05\)) are connected (Demšar, 2006). The first positions in the figures indicate lower values in the average rank, hence the best algorithm. The critical difference (CD) determined by the Nemenyi post-test was 3.211.

Concerning TE, algorithms MRBSOM-G, MRBSOM-L, and SRBSOM significantly improved algorithms SBSOM-CM\(\textbf{dd}\) and MBSOM-CM\(\textbf{dd}\)-L. Local-weight algorithms performed better concerning SIL, algorithms MBSOM-MM\(\textbf{dd}\)-L and MBSOM-CM\(\textbf{dd}\)-L presented significant improvements concerning SBSOM-MM\(\textbf{dd}\), which had the worst performance in this case. Finally, the proposed multi-view SOM algorithms presented a better performance than the already in the literature multi-view SOM algorithms; however, these improvements are not always statistically significant.

Fig. 1
figure 1

Comparison of all models against each other through the Nemenyi test according to TE with \(\alpha = 0.05\)

Fig. 2
figure 2

Comparison of all models against each other through the Nemenyi test according to SIL with \(\alpha = 0.05\)

Dermatology dataset

Next, we provide more detailed results for the Dermatology dataset (Dua & Graff, 2017) to demonstrate the usefulness of the proposed algorithms. Figure 3 displays the distribution of the objects over 9 nodes on the \(3 \times 3\) grid provided by the proposed MBSOM-MM\(\textbf{dd}\)-G algorithm on the Dermatology dataset. Each node represents a cluster (a neuron). In addition, the size of the circle is proportional to the number of objects in the cluster. Also, the total area of the circle is shared between the areas corresponding to the a priori classes.

Fig. 3
figure 3

SOM maps provided by the proposed method MBSOM-MM\(\textbf{dd}\)-G for the Dermatology dataset

In this figure, the map tends to mix the data of patients diagnosed with seboreic dermatitis and pityriasis rosea. This might be linked to both diseases presenting very similar manifestations. Similar behavior was observed in all other maps. Furthermore, this map, and overall those produced by the proposed methods, tend to better cluster data from patients previously diagnosed with pityriasis rubra pilaris, which are less well represented in the dataset. Finally, compared to the set-medoids SOM algorithms on this dataset, the clusters generated by the proposed MBSOM-MM\(\textbf{dd}\)-G algorithm (as well as by the other proposed methods) are more homogeneous.

If the a priori number of classes corresponds to the true and unknown number of clusters, the proposed SOM algorithms, even when they run with a number of neurons (clusters) larger than the number of clusters (classes), tend to be able to find the true number of clusters. Accordingly, the proposed algorithms tend to leave some clusters empty, and thus produce a better mapping, from a clustering point of view. The empty nodes help separate the main groups of diseases. For more details, Sect. 4 of the Supplementary Material describes this dataset and presents the learned maps provided by the SOM algorithms.

Table 15 shows the results for the internal and external indices computed for the solution with the best objective function value over the 10 executions for the Dermatology dataset. In this table, regarding each metric, the best score is highlighted in bold; the result is followed by the rank, which is shown in italic; and the best rank among all methods is shown in bold and italic.

Table 15 Results for the Dermatology dataset

Concerning F-measure and NMI, the multi-medoids SOM algorithms had the best performance and the relational SOM algorithms had the second best performance. Moreover, the set-medoids SOM algorithms had the worst performance.

Overall, the multi-view algorithms outperformed the single-view algorithms within the same family, except for MBSOM-MM\(\textbf{dd}\)-G, which is outperformed by SBSOM-MM\(\textbf{dd}\) in terms of F-measure, and MRBSOM-L, which is outperformed by SRBSOM in terms of NMI. Regarding TE and SIL, the MRBSOM-L and MRBSOM-G models achieved better results than the SRBSOM model. MBSOM-MM\(\textbf{dd}\)-L and MBSOM-MM\(\textbf{dd}\)-G outperformed SBSOM-MM\(\textbf{dd}\) concerning SIL index. MBSOM-CM\(\textbf{dd}\)-L and MBSOM-CM\(\textbf{dd}\)-G overcame SBSOM-CM\(\textbf{dd}\) regarding SIL index. Finally, the algorithms with the worst overall performance were SBSOM-CM\(\textbf{dd}\), MBSOM-CM\(\textbf{dd}\)-L, and MBSOM-CM\(\textbf{dd}\)-G. The proposed methods with the best performance were MBSOM-MM\(\textbf{dd}\)-L and MRBSOM-G.

In conclusion, regarding the overall mean ranking, the multi-view algorithms outperformed the single-view algorithms. The multi-medoids and relational SOM surpassed the set-medoids SOM.

For illustrative purposes, Table 16 shows the vectors of relevance weights locally and globally estimated for the solution with the best objective function value over the 10 executions for the Dermatology dataset. The view with the highest relevance to each of the globally weighted approaches and the most relevant view for the clusters concerning each one of the locally weighted approaches are highlighted in boldface in this table. Among all methods, View 2 had the greatest influence on the outputs. However, concerning the local-weighted algorithms, View 1 had the greatest influence on defining clusters 4 and 6 produced by the MBSOM-CM\(\textbf{dd}\)-L, as well as on defining cluster 2 produced by the MBSOM-MM\(\textbf{dd}\)-L and clusters 1 and 2 produced by the MRBSOM-L algorithm.

Table 16 Dermatology dataset: vectors of relevance weights

5 Final remarks

The paper proposes two new families of batch SOM algorithms for multi-view dissimilarity data: multi-medoids SOM and relational SOM. Both families of algorithms are designed to provide a crisp partition to learn the relevance weight for each dissimilarity matrix and provide a representative for each cluster based on a suitable objective function. The goal of the proposed algorithms is to preserve the topological properties of the data on the map. The relevance of each dissimilarity matrix can be computed locally for each cluster in MBSOM-MM\(\textbf{dd}\)-L and MRBSOM-L methods, or globally for the whole partition in MBSOM-MM\(\textbf{dd}\)-G and MRBSOM-G methods. MBSOM-MM\(\textbf{dd}\)-L, and MBSOM-MM\(\textbf{dd}\)-G consider the cluster representatives as vectors of weights whose components measure how the objects are weighted as a medoid in a given cluster. In MRBSOM-L and MRBSOM-G, each cluster representative is a normalized linear combination of the objects represented in the description space.

Experiments with 14 datasets were performed by means of similar parametrizations for an efficient evaluation. The comparison was established using the multi-view SOM algorithms MBSOM-CM\(\textbf{dd}\)-G and MBSOM-CM\(\textbf{dd}\)-L (Dantas & Carvalho, 2011), as well as the single-view SOM algorithms SBSOM-CM\(\textbf{dd}\) (Golli et al., 2005), SRBSOM (Hasenfuss & Hammer, 2007), and SBSOM-MM\(\textbf{dd}\) (Mariño & Carvalho, 2020).

We tested the null hypothesis that all models perform the same and the non-parametric Friedman’s test together with the Nemenyi post-test indicated merely random differences in terms of the performance measures.

Our findings indicate that the proposed models presented better performance than the benchmarks, although not necessarily statistically significant improvements were obtained. In most cases, MRBSOM-G, MRBSOM-L, SRBSOM (Hasenfuss & Hammer, 2007), MBSOM-MM\(\textbf{dd}\)-G, and MBSOM-MM\(\textbf{dd}\)-L outperformed the other methods. Specifically, methods MRBSOM-G, MRBSOM-L, and SRBSOM (Hasenfuss & Hammer, 2007) significantly improved methods SBSOM-CM\(\textbf{dd}\) (Golli et al., 2005), and MBSOM-CM\(\textbf{dd}\)-L concerning TE. In addition, we provide more detailed results for the Dermatology dataset in which the proposed methods were the best-ranked.

In conclusion, the multi-view models outperformed the single-view models. Moreover, the multi-medoids and relational SOM algorithms performed better than the set-medoids SOM algorithm. In further research, we aim to extend this study considering prior and new on-line versions of the proposed SOM algorithms.