1 Introduction

According to Rosch [24, 25], prototypes are those elements that represent a category better than others. The degree of representativeness can be measured using a distance function to a salient entity of the category, i.e., a prototype [13]. Prototypes can be observed or unobserved (abstract), and they can be represented by a single value or by interval-valued variables. Several numerical techniques to find prototypes in a given multivariate dataset have been proposed in the statistical literature based on different criteria. The most widely used techniques are generally based on non-hierarchical clustering algorithms [11, 18, 27], although many other approaches can be adopted [19, 20, 22].

This paper proposes a two-step procedure based on consensus clustering (CC) to find prototypes within multidimensional data. The first step aims to define two partitions of the N × J data matrix X in K groups, where K is assumed to be known; the second step aims to find the correspondence between these two partitions and define the partition solution as the compromise between the two partitions [1, 7, 17, 29].

This work aims to represent a new approach to finding a set of K prototypes through CC to pair the partitions obtained via two different partitioning methods: fuzzy c-means (FCM) [5] and archetypal analysis (AA) [2, 8]. The former seeks K homogeneous groups vis-à-vis their barycenters, while the latter identifies a set of K extreme points, called archetypes, and creates a group around each archetype. Formally, AA minimizes the sum of distances between each point and a set of K archetypes, as defined by a convex combination of extreme points. K is given or can be selected by running the algorithm for different values of K and choosing the desired value according to the most commonly used methods. Such methods are generally based on graphic displays.

The paper is structured as follows: Sect. 2 presents a brief background on CC, FCM, and AA; Sect. 3 then demonstrates a simulation study in order to examine the reliability of the method. Finally, Sect. 4 reports on the results of an application using real data.

2 Methodology

Given a multivariate dataset and two or more partitioning criteria, consensus analysis aims to find a compromise in the set of the partitions [17]. Consensus clustering aims for the same goal among two or more partitions obtained via cluster analysis approaches. Taking into account the final aim of the analysis, the researcher chooses a consensus measure. The present proposal considers two fuzzy approaches as partitioning methods that optimize two different criteria, given the number of groups K: fuzzy c-means (FCM) and archetypal analysis (AA). Proposed by Bezdek et al. [6], the former aims to maximize the homogeneity within the K groups, while the latter, proposed by Cutler and Breiman [8], identifies the K groups with respect to a set of K extreme points, called archetypes, and aims to maximize the heterogeneity among the K groups.

FCM and AA can be defined in terms of a factorization problem of the data matrix X under different constraints. Formally, let X be a generic N × J data matrix, and let P be an unknown K × J prototypes matrix; FCM and AA are based on the solution of the following non-negative factorization problem [4]:

$$\displaystyle{ f(\mathbf{Y},\mathbf{P}) =\arg \min _{\mathbf{Y,P}}\left \|\mathbf{X - YP}\right \|_{2}^{2},\; }$$
(1)

where the notation \(\left \|.\right \|_{2}^{2}\) denotes the quadratic norm, Y is the generic N × K memberships matrix, and P refers to the matrix of the centers in the FCM context and to the archetypes matrix in the AA context. In order to avoid any confusion in the remainder of this paper, the Y matrix will be referred to as \(\boldsymbol{\varDelta }\) and \(\boldsymbol{\varGamma }\), and the P matrix as C and A when we refer to the FCM or to the AA, respectively. The generic elements y ik vary in [0,1] and represent the membership degree of the generic unit x i ′ to the generic element p k .

2.1 Fuzzy c-Means

Both the fuzzy c-means clustering method [5, 6] and the traditional k-means method minimize the sum of the weighted squared distances between the N units and the K centers. Formally, given an N × J data matrix X FCM minimizes the objective function shown in Eq. (2).

$$\displaystyle{ f(\varGamma,\mathbf{C}) = \left \|\mathbf{X} -\varGamma \mathbf{C}\right \|_{2}^{2},\; }$$
(2)

where Γ represents the memberships matrix with elements γ ik . The function in Eq. (2) is minimized under the constraints k = 1 K γ ik = 1 and γ ik ≥ 0. The elements γ ik of the Γ matrix are defined according to Eq. (3), while the C matrix is defined according to Eq. (4).

$$\displaystyle{ \gamma _{ik} = \left (\sum _{k^{{\prime}}=1}^{K}\left ( \frac{\left \|\mathbf{x}_{i} -\mathbf{c}_{k}\right \|_{2}} {\left \|\mathbf{x}_{i} -\mathbf{c}_{k^{{\prime}}}\right \|_{2}}\right )^{ \frac{2} {m-1} }\right )^{-1},\; }$$
(3)
$$\displaystyle{ \mathbf{C} = (\varGamma ^{T}\varGamma )^{-1}\varGamma ^{T}\mathbf{X}\; }$$
(4)

Note that m is the fuzzifier parameter, commonly set to 2 [6]. Including Eq. (4) in Eq. (2), the objective function becomes:

$$\displaystyle{ f(\varGamma ) = \left \|\mathbf{X} -\varGamma (\varGamma ^{T}\varGamma )^{-1}\varGamma ^{T}\mathbf{X}\right \|_{ 2}^{2}\; }$$
(5)

Then, once the number of groups K is fixed, the FCM algorithm runs through the following steps [6, 9, 30]:

  1. 1.

    Randomly initialize the cluster centers C (t) and set t = 0;

  2. 2.

    Calculate γ ik using Eq. (3);

  3. 3.

    Calculate C (t+1) using Eq. (4);

  4. 4.

    If \(\left \|\mathbf{C}^{(t)} -\mathbf{C}^{(t+1)}\right \|_{2}^{2} \leq \epsilon\), go to Step 5; otherwise, C (t) = C (t+1); set t = t + 1 and go to Step 2;

  5. 5.

    Print centers matrix C and membership matrix Γ;

  6. 6.

    Stop.

2.2 Archetypal Analysis

The term archetype is used in the literature to define different meanings. In the prototyping approach, the challenge is to find a few points (archetypes), not necessarily observed, in a set of multivariate observations such that all the data can be well represented as convex combinations of the archetypes.

Formally, given an N × J data matrix X, archetypal analysis [8, 10, 12] finds a set of archetypes {a 1, , a K } that are linear combinations of the data points, as shown in Eq. (6).

$$\displaystyle{ \mathbf{A} = \mathbf{BX},\; }$$
(6)

where B is the K × N coefficients matrix with k = 1 K β ki = 1 and β ki ≥ 0, such that the archetypes resemble the data as a convex mixture. For a given choice of archetypes, AA minimizes the objective function shown in Eq. (7).

$$\displaystyle{ f(\varDelta,\mathbf{A}) = \left \|\mathbf{X} -\varDelta \mathbf{A}\right \|_{2}^{2},\; }$$
(7)

under the constraints k = 1 K δ ki = 1 and δ ki ≥ 0. Including Eq. (6) in Eq. (7), the objective function becomes:

$$\displaystyle{ f(\varDelta,\mathbf{B}) = \left \|\mathbf{X} -\varDelta \mathbf{BX}\right \|_{2}^{2}.\; }$$
(8)

Once the number of groups K is fixed, the AA algorithm then runs through the following steps [2, 8, 12]:

  1. 1.

    Randomly initialize the matrix B (t) and set t = 0;

  2. 2.

    Find coefficient matrix Δ (t), solving the problem in Eq. (8) under constraints δ ki ≥ 0 and k = 1 K δ ki = 1;

  3. 3.

    Given the coefficients δ ki (t), compute the intermediate archetypes, solving the equation in (8) for A (t);

  4. 4.

    Update Eq. (8) over B under constraints k = 1 K β ki = 1 and β ki ≥ 0;

  5. 5.

    Set t = t + 1, B (t) = B (t+1) and calculate A (t) = B (t) X;

  6. 6.

    Compute the objective function and, unless it falls below a threshold, continue with Step 2;

  7. 7.

    Stop.

Note that the matrices \((\boldsymbol{\varGamma }^{T}\boldsymbol{\varGamma })^{-1}\boldsymbol{\varGamma }^{T}\) and B play the same role, i.e., to project the single points in a K-dimensional space. In fact, defining the \((\boldsymbol{\varGamma }^{T}\boldsymbol{\varGamma })^{-1}\boldsymbol{\varGamma }^{T} = \mathbf{B}^{{\ast}}\), it is possible to shown that the objective functions of FCM (9) and AA (10) optimize an equivalent criterion.

$$\displaystyle\begin{array}{rcl} f(\varDelta,\mathbf{B})& =& \left \|\mathbf{X} -\varGamma \mathbf{B}^{{\ast}}\mathbf{X}\right \|_{ 2}^{2}.\;{}\end{array}$$
(9)
$$\displaystyle\begin{array}{rcl} f(\varDelta,\mathbf{B})& =& \left \|\mathbf{X} -\varDelta \mathbf{BX}\right \|_{2}^{2}.\;{}\end{array}$$
(10)

2.3 Consensus Clustering

In this paper, the consensus clustering procedure is structured in two fundamental steps: (1) to find and represent the consensus between the partitions of fuzzy c-means and archetypal analysis through correspondence analysis [3, 15, 16] and (2) to measure the consensus through the principal indices of CC.

Let X be an N × J data matrix with T = {T 1, , T R } and V = {V 1, , V C } two partitions of X: the consensus between partitions T and V is found by starting from the entries shown in the cross-classifying contingency table (shown in Table 1) and crossing the two partitions [17].

Table 1 Cross-table between partition T and partition V

Many proposals have been put forth in the literature for the consensus measurement, including Boulis and Ostendorf [7], Fowlkes and Mallows [14], Hubert and Arabie [17], Steinley [28], Strehl and Ghosh [29], and Yeung and Ruzzo [31]. This paper has used the Adjusted Rand Index (ARI) among these different options [23, 26]. ARI was first proposed by Hubert and Arabie [17] in such a context; the index assumes a generalized hypergeometric distribution as a null hypothesis. The two clusterings are drawn randomly, with a fixed number of clusters and a fixed number of elements in each cluster. ARI is then the normalized difference between the Rand Index and its expected value under the null hypothesis. The ARI is defined as shown in Eq. (11).

$$\displaystyle{ ARI = \frac{\sum _{r=1}^{R}\sum _{=1}^{C}\binom{n_{rc}}{2} -\binom{n}{2}^{-1}\sum _{r=1}^{R}\binom{n_{r.}}{2}\sum _{c=1}^{C}\binom{n_{.c}}{2}} {\frac{1} {2}\left [\sum _{r=1}^{R}\binom{n_{r.}}{2} +\sum _{ c=1}^{C}\binom{n_{.c}}{2}\right ] -\binom{n}{2}^{-1}\sum _{r=1}^{R}\binom{n_{r.}}{2}\sum _{c=1}^{C}\binom{n_{.c}}{2}}.\; }$$
(11)

ARI has an expected value of 0 for independent clusterings and a maximum value of 1 for identical clusterings.

3 Simulation Study

This section demonstrates an application of simulated data; in particular, it evaluates the consensus between fuzzy c-means and archetypal analysis in different experimental conditions.

According to Fordellone and Palumbo [13], data were generated from four multivariate Gaussian distributions, each with four dimensions. The first is a multivariate Gaussian distribution with μ = [0, 0, 0, 0]T and \(\pmb{\varSigma }= \mathbf{I}\) (i.e., noise); the last three are multivariate Gaussian distributions that simulate three groups of units according to the experimental conditions shown in Table 2; the groups follow the scheme shown below:

Group 1::

\(\mathbf{X} \sim \mathcal{ N}(\boldsymbol{\mu }= [\quad\)-20\(\phantom{-}10\quad 30\phantom{ -} 15]^{T},\pmb{\varSigma })\)

Group 2::

\(\mathbf{X} \sim \mathcal{ N}(\boldsymbol{\mu }= [\phantom{-2}0\phantom{ -} 20\qquad 15\qquad\)-5\(]^{T},\pmb{\varSigma })\)

Group 3::

\(\mathbf{X} \sim \mathcal{ N}(\boldsymbol{\mu }= [\phantom{-}15\phantom{ - 2}5\qquad\)-7\(\phantom{-}20]^{T},\pmb{\varSigma })\)

Table 2 Consensus results from simulated data

Table 2 also shows the consensus results obtained from the eight experimental conditions; in particular, the Rand and adjusted Rand indices are reported, which are measurements of agreement/disagreement (i.e., consensus) between two different partitions (FCM and AA in this case). It is worth noting that the maximum consensus was achieved in the first two experimental conditions, where there was a low correlation between the variables and a normal kurtosis level, whereas the minimum level of consensus was shown in the experimental conditions, which demonstrated platykurtic kurtosis levels (rows 3, 4, 7, and 8 in the table).

The aim of the simulation study was to establish the degree of reliability of consensus prototyping under several different hypotheses. In the eight proposed cases, the lowest levels of the consensus occurred most notably when platykurtic distributions were present; this occurred because a low level of kurtosis stimulated the presence of outlier points, and AA is very sensitive to extreme points.

4 Personality Traits Finding

In psychology, trait theory (also called dispositional theory) is an approach that aims to study human personality. Trait theorists are primarily interested in the measurement of different traits that can be defined as habitual patterns of behavior, thought, and emotion [21, 22].

Consensus clustering between fuzzy c-means and archetypal analysis is used in this section to delineate the personality traits (http://personality-testing.info/) from a sample of 898 adults (493 males and 405 females). The dataset has 40 ordinal items (from “Strongly disagree” to “Strongly agree”), with ten items for each measurement scale (numeric). Four different scales were used as part of an experimental “DISC” personality test from the International Personality Item Pool (http://ipip.ori.org/newCPIKey.htm):

  • Assertiveness (AS): the quality of being self-assured and confident without being aggressive;

  • Social confidence (SC): generally described as a state of being certain about something;

  • Adventurousness (AD): represented by a desire to engage in activities with some potential for physical danger;

  • Dominance (DO): conceptualized as a measure of individual differences in levels of group-based discrimination.

The following subsection shows the results that were obtained by the application of the consensus-prototyping approach on the four quantitative measurement scales. The scales were computed using principal component analysis (PCA) applied to the fifty available items. The number of groups (K = 3) was chosen according to the FCM and AA scree-plots shown in Fig. 1. The scree-plots show the reductions in the objective functions at different values of K.

Fig. 1
figure 1

Scree-plots of the fuzzy c-means and archetypal analysis approaches for different numbers of groups

The results of CC are shown in consensus Table 3. A correspondence analysis is also applied on the consensus table to graphically illustrate the consensus level. The results are represented in Fig. 2, where we may see that the three groups are very close when they have a maximum level of inertia. In Table 3, in contrast, the consensus prototypes represent roughly 76% of the sample; the Adjusted Rand Index equals 0.499, while the Rand Index equals 0.768.

Table 3 Consensus table between the fuzzy c-means and archetypal analysis approaches
Fig. 2
figure 2

Correspondence analysis applied to contingency Table 3

Figure 3 shows the defined groups and the single distributions of the scales. The different colors (purple, blue, and red) and the different symbols (•, △, and +) of the points represent the three prototypes. The white points in Fig. 3 represent the observation without consensus.

Fig. 3
figure 3

Scatterplots of the three prototypes computed via consensus clustering

Due to space considerations, we cannot discuss in depth the profiles of the prototypes that were found. Looking at the scatterplot matrix shown in Fig. 3, however, we can state that prototype 1 (the red • symbol) is characterized by a high level of scale AD; prototype 2 (the blue △ symbol) presents low values for scales SC, DO, AS; and the last prototype (the purple + symbol) is characterized by high levels of scales SC, AS and a low level of AD.

5 Concluding Remarks

The empirical results shown in the present paper lead us to argue that when the groups are well defined (thus avoiding any overlap), the consensus clustering between the two different partitioning methods (i.e., fuzzy c-means and archetypal analysis) clarifies the presence of well-definable prototypes. The simulation study was helpful for appreciating which causes can deeply affect the consensus among the two approaches: the platykurtic level and the presence of multivariate outliers in the data have greatly affected performance of the consensus analysis; the high correlation among variables has worsened performance, but not much in comparison with the previous perturbater effects.