Keywords

1 Introduction

Compositional data are by nature multivariate. Indeed, vectors with positive components are considered as compositional data when the interest lies in the relative information between their components: this last fact implies that they can be represented by a unique element in a simplex by dividing the components by their sum. Classical statistical techniques need to be adapted to deal with these constraints (positivity, sum equal to one). A common approach consists in transforming the data using the centered log-ratio (clr) or the isometric log-ratio (ilr) transformations (see Egozcue et al. 2011), and in applying standard techniques in this coordinate space. Filzmoser et al. (2012) propose to use the ilr transformation and detect outliers with the usual or the robust version of the Mahalanobis distance. Because of the affine invariance property of the Mahalanobis distance, the authors notice that the identified outliers do not depend on the choice of the ilr transformation. Moreover, they propose some graphical tools in coordinate space based on robust principal component analysis (PCA) and biplots representation in order to interpret the outliers. Their interpretation is only done in coordinate space. This is also the case for Filzmoser et al. (2014) who propose tools based on pairwise Mahalanobis distances for detecting local outliers in data that are compositional and spatial at the same time. In the present work, we consider adapting the invariant coordinate selection (ICS) technique for outlier detection to compositional data. ICS is a multivariate statistical method based on the joint diagonalization of two scatter matrices and aimed at detecting interesting features in multivariate datasets such as outliers or clusters (see, e.g., Tyler et al. 2009 and Archimbaud et al. 2018a). Compared to the Mahalanobis distance criterion, ICS includes a dimension reduction step. Compared to PCA, the components of ICS are invariant under affine transformations. We first propose to introduce ICS in coordinate space using an ilr transformation. Following Archimbaud et al. (2018a), we focus on the case of a small proportion of outliers and use the invariant components associated with the largest eigenvalues of the joint diagonalization of two particular scatter matrices. As with the Mahalanobis distance, the identification of outliers with ICS does not depend on the choice of the ilr transformation (see also Muehlmann et al. 2021). In order to go beyond coordinate space and interpret the outliers in the simplex, we introduce new algebra tools and define eigen-elements of endomorphisms of the simplex. We also introduce a class of one-step M-scatter estimators and elliptical distributions in the simplex. Thanks to these tools, we are able to write a reconstruction formula of the data in the simplex decomposing the data in a proper way for outlier identification and interpretation using ternary diagrams. In Sect. 2, we recall some facts about the ICS method and its application to outlier detection. Section 3 is a reminder about compositional data analysis. In Sect. 4, we develop some tools necessary for Sect. 5. First come some properties of the algebra of D × D matrices with the zero-sum property: in particular, their rank, their inverses, and their eigen-elements. Then Sect. 4.2 defines one-step M-scatter functionals for simplex-valued random variables together with an adapted version of Mahalanobis distance. Finally, Sect. 4.3 introduces the family of elliptical distributions in the simplex. Section 5 first introduces ICS in coordinate space and then reformulates ICS directly in the simplex. In Sect. 5.3, we present a formula for reconstructing the data from ICS in coordinate space and in the simplex. Section 6 is dedicated to three applications, with two toy datasets (with small and large dimensions) and a real marketing application from the automobile industry.

2 Reminder About ICS and Outlier Detection

Invariant coordinate (or component) selection is a multivariate statistical method based on the simultaneous diagonalization of two scatter matrices. As detailed in Nordhausen and Ruiz-Gazen (2022), the method belongs to a large family of multivariate statistical methods and is useful in particular for outlier detection as described below.

2.1 Scatter Matrices

The family of scatter matrices generalizes the notion of covariance matrix (see Nordhausen and Tyler 2015; Tyler et al. 2009, among others), and it has the following functional definition. For a p-dimensional vector X with distribution function FX, a functional S(FX), also denoted by S(X), is called a scatter functional if it is a p × p symmetric positive-definite and affine equivariant matrix. We recall that an affine equivariant matrix S(X) is such that

$$\displaystyle \begin{aligned} \mathbf S(\mathbf A \mathbf X + \mathbf b) = \mathbf A \mathbf S(\mathbf X) \mathbf A^T, \end{aligned}$$

where T denotes the transpose operator, A is any full rank p × p matrix, and b any p-vector.

For a p-variate dataset Xn = (x1, …, xn)T, the empirical version S(Xn) of a scatter functional is the scatter functional S(Fn), where Fn is the empirical distribution function. Thus, a scatter matrix estimator is a p × p symmetric positive-definite and affine equivariant matrix such that

$$\displaystyle \begin{aligned} \mathbf S(\mathbf X_n \mathbf A + \mathbf 1_n \mathbf b^T) = \mathbf A^T \mathbf S(\mathbf X_n) \mathbf A, \end{aligned}$$

where A is any full rank p × p matrix, b any p-vector, and 1n an n-dimensional vector of ones.

There exist many scatter matrices as detailed for example in Tyler et al. (2009). The most well-known scatter matrix is the covariance matrix. As many other scatter matrices, the covariance involves the mean that is an affine equivariant location estimator. We recall that an affine equivariant location estimator T is such that:

$$\displaystyle \begin{aligned} \mathbf T(\mathbf A \mathbf X + \mathbf b) = \mathbf A \mathbf T(\mathbf X) + \mathbf b, \end{aligned}$$

for the functional version, and

$$\displaystyle \begin{aligned} \mathbf T(\mathbf X_n \mathbf A + \mathbf 1_n \mathbf b^T) = \mathbf A^T \mathbf T(\mathbf X_n) + \mathbf b, \end{aligned}$$

for the empirical version where A is any full rank p × p matrix and b any p-vector.

A general class of scatter matrices is the class of one-step M-estimators with a functional defined by

$$\displaystyle \begin{aligned} {\mathbf{COV}}_{w}(\mathbf X)= {{\mathbf E}}\left[w(M^2(\mathbf X))(\mathbf X - {\mathbf E}(\mathbf X))(\mathbf X - {\mathbf E}(\mathbf X))^T\right], \end{aligned}$$

where w is a non-negative and continuous weight function and

$$\displaystyle \begin{aligned} M^2(\mathbf X) = (\mathbf X - {\mathbf E}(\mathbf X))^T {\mathbf{COV}}(\mathbf X) ^{-1} (\mathbf X - {\mathbf E}(\mathbf X)) \end{aligned} $$
(1)

is the square Mahalanobis distance with E(X) the expectation of X and COV(X) its covariance matrix. The sample version of one-step M-estimators is

$$\displaystyle \begin{aligned} {\mathbf{COV}}_{w}(\mathbf X_n)=\frac{1}{n} \sum_{i=1}^n w(M^2(\mathbf x_i))(\mathbf x_i - \bar{\mathbf x}_n)(\mathbf x_i - \bar{\mathbf x}_n)^T, \end{aligned}$$

where \(\bar {\mathbf x}_n=1/n \sum _{i=1}^n \mathbf x_i\) is the empirical mean and

$$\displaystyle \begin{aligned} M^2(\mathbf x_i)=(\mathbf x_i - \bar{\mathbf x}_n)^T {\mathbf{COV}}(\mathbf X_n) ^{-1} (\mathbf x_i - \bar{\mathbf x}_n) \end{aligned}$$

is the empirical version of the square Mahalanobis distance.

Note that the covariance matrix COV is obtained with w(d) = 1, while the fourth-moment-based estimator COV4 is obtained with w(d) = d∕(p + 2). COV4 is widely used in the blind source separation literature (see, e.g., Nordhausen and Virta 2019; Theis and Inouye 2006) but also in the context of outlier detection (see Archimbaud et al. 2018a).

For elliptical distributions with second moments, scatter functionals are all proportional to the covariance matrix (see, e.g., Bilodeau and Brenner 2008). We recall that an elliptical distribution is obtained as an affine transformation of a spherical distribution that is a distribution invariant by orthogonal transformation. Multivariate normal and Student distributions belong to this family of distributions.

2.2 ICS Principle

Let S1(X) and S2(X) be two scatter functionals. ICS consists of the simultaneous diagonalization of S1(X) and S2(X). If the random vector X follows an elliptical distribution, the two scatter matrices will be proportional, and the result will be useless. However, as mentioned in Tyler et al. (2009), comparing two different scatter functionals may help revealing interesting departures from an elliptical distribution. This is the case in particular for anomaly detection. The method searches for a p × p matrix H(X) and a diagonal matrix Λ(X) so that

$$\displaystyle \begin{aligned} \mathbf H(\mathbf X)^T \mathbf S_{1}(\mathbf X) \mathbf H(\mathbf X) = \mathbf I_p \;\; \mbox{ and } \;\; \mathbf H(\mathbf X)^T \mathbf S_{2}(\mathbf X) \mathbf H(\mathbf X) = \boldsymbol{\Lambda}(\mathbf X), \end{aligned} $$
(2)

where Ip denotes the p × p identity matrix. The matrix Λ(X) contains the eigenvalues of S1(X)−1S2(X) in decreasing order, while the columns of the matrix H(X) = (h1, …, hp) contain the corresponding eigenvectors so that

$$\displaystyle \begin{aligned} \mathbf S_{2}(\mathbf X) \mathbf H(\mathbf X) = \mathbf S_{1}(\mathbf X)\mathbf H(\mathbf X) \boldsymbol{\Lambda}(\mathbf X) \end{aligned}$$
$$\displaystyle \begin{aligned} \mbox{ or equivalently } \;\; \mathbf S_{1}(\mathbf X)^{-1} \mathbf S_{2}(\mathbf X) \mathbf H(\mathbf X) = \mathbf H(\mathbf X) \boldsymbol{\Lambda}(\mathbf X). \end{aligned}$$

These eigenvalues and eigenvectors can also be derived through the spectral decomposition of the following symmetric matrix:

$$\displaystyle \begin{aligned} \mathbf S_{1}(\mathbf X)^{-1/2} \mathbf S_{2}(\mathbf X)\mathbf S_{1}(\mathbf X)^{-1/2} = \mathbf U(\mathbf X) \boldsymbol{\Lambda}(\mathbf X) \mathbf U(\mathbf X)^T, \end{aligned} $$
(3)

with U(X) a p × p orthogonal matrix and the same eigenvalues in the diagonal matrix Λ(X). We have

$$\displaystyle \begin{aligned} \mathbf H(\mathbf X)=\mathbf S_{1}(\mathbf X)^{-1/2}\mathbf U(\mathbf X) \end{aligned}$$

and

$$\displaystyle \begin{aligned} \mathbf H(\mathbf X) \mathbf H(\mathbf X)^T=\mathbf S_{1}(\mathbf X)^{-1} \mbox{ and } \mathbf H(\mathbf X)^{-1}=\mathbf U(\mathbf X)^T \mathbf S_{1}(\mathbf X)^{1/2}. \end{aligned}$$

Tyler et al. (2009) give an interesting interpretation of the eigenvalues λ1, …, λp in terms of kurtosis. Using the optimality property of eigen-elements, it is easy to see that h1 maximizes the ratio:

$$\displaystyle \begin{aligned} \frac{\mathbf h^T \mathbf S_2(\mathbf X) \mathbf h}{\mathbf h^T \mathbf S_1(\mathbf X) \mathbf h} \end{aligned}$$

over all possible vectors h in \(\mathbb {R}^p\) and that λ1 is equal to the maximum ratio. This ratio of two scale measures can be viewed as a generalized measure of kurtosis, and λ1 can thus be interpreted as a maximum kurtosis. The other eigenvalues and eigenvectors can be defined in a similar way by maximizing the same ratio over vectors h that verify additional orthogonality conditions (see Tyler et al. (2009) for details).

Using any affine equivariant location estimator T(X), the ICS scores Z = (z1, …, zp)T are defined by

$$\displaystyle \begin{aligned} {} \mathbf Z = \mathbf H(\mathbf X)^T(\mathbf X- \mathbf T(\mathbf X)), \end{aligned}$$

or equivalently by zk =< hk, X −T(X) >  where < ., . >  denotes the standard scalar product. The scores define the affine invariant coordinates or components. The square Euclidian norm of these coordinates is given by

$$\displaystyle \begin{aligned} \begin{array}{rcl} \mathbf Z^T \mathbf Z & = &\displaystyle (\mathbf X- \mathbf T(\mathbf X))^T \mathbf H(\mathbf X) \mathbf H(\mathbf X)^T (\mathbf X- \mathbf T(\mathbf X))\\ & = &\displaystyle (\mathbf X- \mathbf T(\mathbf X))^T \mathbf S_{1}(\mathbf X)^{-1} (\mathbf X- \mathbf T(\mathbf X)). \end{array} \end{aligned} $$

The last expression is a generalization of the Mahalanobis distance (1) of X with the location parameter T(X) (instead of E(X)) and with respect to the scatter matrix S1(X) (instead of COV(X)). In the special case where T(X) = E(X) and S1(X) = COV(X), we have

$$\displaystyle \begin{aligned} \mathbf Z^T \mathbf Z =\sum_{k=1}^p z_k^2= M^2(\mathbf X). \end{aligned} $$
(4)

The empirical version of ICS consists of the joint diagonalization of a scatter pair of estimators S1(Xn) and S2(Xn). For a p × p matrix H(Xn) and a diagonal matrix Λ(Xn), we have

$$\displaystyle \begin{aligned} \mathbf H(\mathbf X_n)^T \mathbf S_{1}(\mathbf X_n) \mathbf H(\mathbf X_n) = \mathbf I_p \;\; \mbox{ and } \;\; \mathbf H(\mathbf X_n)^T \mathbf S_{2}(\mathbf X_n) \mathbf H(\mathbf X_n) = \boldsymbol{\Lambda}(\mathbf X_n). \end{aligned}$$

Using any affine equivariant location estimator T(Xn), the ICS scores are given by

$$\displaystyle \begin{aligned} {} \mathbf Z_n =(\mathbf z_1,\ldots,\mathbf z_n)^T= (\mathbf X_n- \mathbf 1_n \mathbf T(\mathbf X_n)^T)\mathbf H(\mathbf X_n) \end{aligned}$$

and are affine invariant. As in (4), if \(T(\mathbf X_n)=\bar {\mathbf x}_n\) and S1(Xn) = COV(Xn), we have

$$\displaystyle \begin{aligned} M^2(\mathbf x_i)= \mathbf z_i^T \mathbf z_i. \end{aligned}$$

2.3 ICS for Outlier Detection

As already stated in Tyler et al. (2009), one possible application of ICS is outlier detection. The Mahalanobis distance is a well-known tool to detect outliers (see Rousseeuw and Van Zomeren 1990), but it does not offer the possibility of dimension reduction. ICS gives the possibility of selecting components that are helpful in detecting anomalies (see Archimbaud et al. (2018a) for details). In the case of a small proportion of outliers, the theoretical properties of ICS (see Archimbaud et al. (2018a) for details) lead us to only focus on the invariant components associated with the largest kurtosis and thus the largest eigenvalues. In this context, Archimbaud et al. (2018a) show that the scatter pair S1(X) = COV(X) and S2(X) = COV4(X) is not only simple and fast to compute but also effective in detecting outliers when compared to other pairs that involve robust scatter estimators. Archimbaud et al. (2018a) propose different automatic procedures for invariant components selection based on hypothesis testing. In short, the idea is to test sequentially the normality of each of the invariant components using some classical tests such as the D’Agostino test and to select the first k components that reject the Gaussian assumption. After selecting k invariant components among p, the last step of the procedure is the outlier identification. Let us consider the empirical version of ICS. For each observation i = 1, …, n, the square “ICS distance” is the square Euclidian norm in the invariant coordinate system accounting for the k first coordinates:

$$\displaystyle \begin{aligned} \left(\mbox{ICS distance}\right)^2_{i,k}=\sum_{j=1}^k \left(z_i^j \right)^2, \end{aligned} $$
(5)

where \(z_i^j\) denotes the jth value of the score zi. In Archimbaud et al. (2018a), an observation is flagged as an outlier when its ICS distance using k components is larger than a cutoff based on Monte Carlo simulations from the standard Gaussian distribution. Given a data dimension, a scatter pair, and a number k of selected components, many Gaussian samples are generated, and the ICS distances are computed. A cutoff is derived for a fixed level γ as the mean of the (1 − γ)-quantiles of these distances over the replications. The whole ICS procedure for outlier detection is available in the R package ICSOutlier described in Archimbaud et al. (2018b) and used in Sect. 6 below.

3 Reminder About Compositional Data Analysis

A D composition u is a vector of D parts (or shares) of some whole that carries relative information. There exists a unique representation of this vector in the unit simplex space

$$\displaystyle \begin{aligned} {\mathbf{S}}^D = \left\{ \mathbf{u} = (u_1,\ldots,u_D)^T: u_m >0, m=1,...,D;\sum_{m=1}^D u_m =1 \right\}. \end{aligned}$$

For any vector \(\mathbf {w} \in {{\mathbb R}^{+D}},\) its representer in the simplex is obtained by the closure operation

$$\displaystyle \begin{aligned} \mathcal{C}(\mathbf{w}) = \left(\frac{w_1}{\sum_{m=1}^D w_m}, \cdots, \frac{w_D}{\sum_{m=1}^D w_m}\right). \end{aligned}$$

The following operations endow the unit simplex with a vector space structure:

  1. 1.

    ⊕ is the perturbation operation, corresponding to the addition in \({\mathbb R}^D\):

    $$\displaystyle \begin{aligned} \text{For} \quad \mathbf{u},\mathbf{v} \in {\mathbf{S}}^D, \mathbf{u} \oplus \mathbf{v} = \mathcal{C}(u_1v_1,\ldots,u_Dv_D). \end{aligned}$$
  2. 2.

    ⊙ is the power operation, corresponding to the scalar multiplication in \({\mathbb R}^D\):

    $$\displaystyle \begin{aligned} \text{For} \quad \lambda \in {\mathbb R}, \mathbf{u} \in {\mathbf{S}}^D \quad \lambda \odot \mathbf{u} = \mathcal{C}(u_1^\lambda,\ldots,u_D^\lambda). \end{aligned}$$

The subtraction operation can be naturally defined by \(\mathbf {u} \ominus \mathbf {v} = \mathcal {C}(u_1/v_1,\ldots , u_D/v_D)\). Compositional data analysis uses log-ratio transformations such as the centered log-ratio (clr) and the isometric log-ratio (ilr) transformations. The clr vector components specify the relative dominance of each compositional part over the whole composition, see for example Filzmoser et al. (2018). Formally, the clr transformation of a vector u ∈SD is defined by

$$\displaystyle \begin{aligned} \text{clr}(\mathbf{u}) = {\mathbf{G}}_D \ln{\mathbf{u}}, \end{aligned}$$

where \( {\mathbf {G}}_D = {\mathbf {I}}_D - \frac {1}{D}{\mathbf {1}}_{D}{{\mathbf {1}}_{D}}^T\), ID is a identity matrix, and 1D is the D-vector of ones and where the logarithm of u ∈SD is understood componentwise.

For a vector u in the orthogonal space \({\mathbf {1}}_D^\perp \) (orthogonality with respect to the standard scalar product of \({\mathbb R}^D\)), the inverse clr transformation is defined by

$$\displaystyle \begin{aligned} \text{clr}^{-1}(\mathbf{u}) = \mathcal{C}(\exp(\mathbf{u})). \end{aligned}$$

The simplex \(\mathcal {S}^D\) of dimension D − 1 can be equipped with the Aitchison scalar product

$$\displaystyle \begin{aligned} <\mathbf{u},\mathbf{v}>_A = <\text{clr}(\mathbf{u}),\text{clr}(\mathbf{v})>, \end{aligned}$$

where the right-hand side scalar product is the standard scalar product in \(\mathbb {R}^D\).

The clr coordinates sum up to zero inducing a degeneracy. For this reason, the class of isometric log-ratio coordinates has been introduced providing orthonormal and non-singular coordinates. For any given orthonormal basis (e1, ⋯ , eD−1) of SD, orthonormality being understood with respect to the Aitchison scalar product here, one can define a so-called contrast matrix V of dimension D × (D − 1) (e.g. Pawlowsky-Glahn et al. 2015) given by V = clr(e1, ⋯ , eD−1), where clr is understood columnwise. To each such matrix, is associated an isometric log-ratio transformation by

The inverse transformation, for any vector u of \({\mathbb R}^{D-1},\) is given by

$$\displaystyle \begin{aligned} \mathbf{u} = \text{ilr}_V^{-1}({\mathbf{u}}^*) = \mathcal{C}(\exp (\mathbf{V} {\mathbf{u}}^*)). \end{aligned} $$

The link between the ilr and clr transformations is clr(u) = VilrV(u).

4 Multivariate Tools for Compositional Data

For working with scatter matrices for compositional data, we are going to need some algebra tools concerning matrices of endomorphisms in the simplex.

4.1 Algebra of Endomorphisms of the Simplex and Eigendecomposition

Let \(\mathcal {A}\) be the set of D × D matrices such that A1D = 0D and AT1D = 0D, where 0D denotes the D-dimensional column vector of zeros: this condition is called the zero-sum property. Pawlowsky-Glahn et al. (2015) define endomorphisms of the simplex using the ilr transformation and prove that they can be associated to a matrix belonging to \(\mathcal {A}\), see Property 4.16 and pages 55–58. The linearity here refers to the vector space structure of the simplex based on the perturbation and powering operations. Let us introduce an equivalent formulation based on the clr transformation: for u ∈ SD and \(\mathbf A \in \mathcal {A}\), endomorphisms of the simplex are defined by maps \(\mathbf {u} \mapsto \mathbf A \boxdot \mathbf u := \text{clr}^{-1}(\mathbf A\text{clr}(u)).\)

The composition of endomorphisms corresponds to the ordinary matrix product since it is clear that \(\mathbf {A} \boxdot (\mathbf {B} \boxdot \mathbf {u}) = \mathbf {AB} \boxdot \mathbf {u}\), and therefore, \(\mathcal {A}\) is an algebra with neutral element GD. We are now going to extend the definition of the ilr transformation to matrices of \(\mathcal {A}\).

Theorem 1

LetVbe a D × (D − 1) contrast matrix, and letPVbe the D × D block matrix\([\mathbf {V} \; \frac {1}{\sqrt {D}}{\mathbf {1}}_D].\)For a D × D matrix\(\mathbf {A} \in \mathcal {A},\)the (D − 1) × (D − 1) matrixA := ilrV(A) = VTAVis such that\(\mathbf {A} =\mathit{\text{ilr}}^{-1}_V({\mathbf {A}}^*)= \mathbf {V}{\mathbf {A}}^*{\mathbf {V}}^T= {\mathbf {P}}_V \begin {pmatrix} \mathbf A^* & \mathbf 0_{D-1} \\\mathbf 0_{D-1}^T & 0\end {pmatrix} {\mathbf {P}}_V^T\)and satisfies the following properties:

  1. 1.

    The rank ofAis equal to the rank of ilrV(A).

  2. 2.

    If ilrV(A) is invertible, thenAis invertible in\(\mathcal {A}\), and we have the following expressions for its\(\mathcal A\)-inverse

    $$\displaystyle \begin{aligned} \mathbf A^{-1} = (\mathbf A + \frac{1}{D} {\mathbf{1}}_{D}{\mathbf{1}}_{D}^T)^{-1} - \frac{1}{D} {\mathbf{1}}_{D}{\mathbf{1}}_{D}^T = \mathbf V(\mathbf V^T\mathbf A\mathbf V)^{-1}\mathbf V^T= \mathbf P_V \begin{pmatrix} {\mathbf A^*}^{-1} & \mathbf 0_{D-1}\\ \mathbf 0_{D-1}^T & 0\end{pmatrix} \mathbf P_V^T. \end{aligned}$$
  3. 3.

    ilrV(AB) = ilrV(A)ilrV(B). IfAis invertible, then ilrV(A−1) = (ilrV(A))−1. If (ilrV(A))1∕2exists, then ilrV(A1∕2) = (ilrV(A))1∕2.

Note that a matrix A of the algebra \(\mathcal {A}\) is never invertible in the space of matrices in the classical sense. But it may be invertible in the sense of the algebra, and its \(\mathcal A\)-inverse then coincides with the Moore–Penrose pseudo-inverse of A in the usual sense. The matrix ilrV(A) is simply the matrix corresponding to A in coordinate space when the coordinates are defined by ilrV. We also extend the definition of the clr transformations to matrices.

Theorem 2

For a D × D matrixB, let us define its clr transformation by

$$\displaystyle \begin{aligned} {} \mathit{\text{clr}}(\mathbf{B}) ={\mathbf{G}}_D\mathbf{B}{\mathbf{G}}_D. \end{aligned}$$

We then have the following properties:

  1. 1.

    If\(\mathbf {A} \in \mathcal {A},\)then clr(A) = A.

  2. 2.

    If \(\mathbf {B} \notin \mathcal {A},\) then \( \mathit{\text{clr}}(\mathbf {B}) \in \mathcal {A}\) and for any \(\mathbf {x} \in \mathcal {S}^D\)

    $$\displaystyle \begin{aligned} \mathbf{B} \boxdot \mathbf{x} := \mathit{\text{clr}}^{-1}(\mathit{\text{clr}}(\mathbf{B})\mathit{\text{clr}}(\mathbf{x})) = \mathit{\text{clr}}(\mathbf{B}) \boxdot \mathbf{x}. \end{aligned} $$
    (6)
  3. 3.

    If\(\mathbf {B} \notin \mathcal {A},\)then the unique element\(\mathbf {A} \in \mathcal {A}\)such that ilrV(A) = ilrV(B) isA = clr(B).

  4. 4.

    For any contrast matrixVand any\(\mathbf {A} \in \mathcal {A}\), we have clr(A) = VilrV(A)VT.

Note that the matrix product \(\boxdot \) can be defined even when the matrix B does not belong to \(\mathcal {A}\), but in that case it is not linear. Note also that the ilr and clr transformations preserve symmetry.

The next proposition links the eigen-elements of A to those of ilr(A). Let us first define the notion of \(\mathcal {A}\)-diagonalizable for a matrix of \(\mathcal {A}.\)

Definition 1

A matrix \(\mathbf {A} \in \mathcal {A}\) is said \(\mathcal {A}\)-diagonalizable if there exists a basis e1, …, eD−1 of \(\mathcal {S}^D\) and D − 1 reals λj (j = 1, …, D − 1) such that

$$\displaystyle \begin{aligned} \mathbf{A} \boxdot {\mathbf{e}}_j = \lambda_j \odot {\mathbf{e}}_j \quad \forall j =1, \ldots, D-1. \end{aligned} $$
(7)

We will say that ej is an \(\mathcal {A}\)-eigenvector of A. It is clear that clr(ej) is then an eigenvector of clr(A) = A and that for any contrast matrix V, ilrV(ej) is an eigenvector of ilrV(A). Note that 1D is an eigenvector of \(\mathcal {A}\) associated to the eigenvalue 0. It is natural to say that a matrix \(\mathbf {A}\in \mathcal {A}\) is diagonal in a given basis e1, …, eD−1 of \(\mathcal {S}^D\) if Eq. (7) is satisfied for these vectors.

Theorem 3

LetVbe a D × (D − 1) contrast matrix. For a D × D matrix\(\mathbf {A} \in \mathcal {A},\)we have the following properties:

  1. 1.

    If\(\mathbf e_j^* \in \mathbb {R}^{D-1}\)is an eigenvector of ilrV(A), then\(\mathbf e_j =\mathit{\text{ilr}}^{-1}(\mathbf e_j^*) \in \mathcal {S}^D\)is an\(\mathcal {A}\)-eigenvector ofAand\(\mathbf w_j=\mathit{\text{clr}}(\mathbf e_j) \in \mathbb {R}^D\)an eigenvector ofA.

  2. 2.

    The set of eigenvalues ofAcontains the eigenvalue 0. The other D − 1 eigenvalues ofAcoincide with the eigenvalues of ilrV(A) for any contrast matrixV.

  3. 3.

    ilrV(A) is diagonalizable if and only ifAis diagonalizable, and if and only ifAis\(\mathcal {A}\)-diagonalizable.

All symmetric matrices in \(\mathcal {A}\) are \(\mathcal {A}\)-diagonalizable. Note that the vectors \(\mathbf e_j = \text{clr}^{-1}(\mathbf e_j^*)\) are independent of the contrast matrix V. Let A be a symmetric matrix of \(\mathcal {A}\). Since the vector 1D is an eigenvector of A, A cannot be diagonal in the canonical basis of \(\mathbb {R}^D\), but it can be diagonal in a basis obtained by completing \({\mathbf {w}}_D=\frac {1}{D}{\mathbf {1}}_D\) with D − 1 orthogonal eigenvectors in \({\mathbf {1}}_D^\perp \), say w1, …, wD−1. Then \({\mathbf {e}}_j=\text{clr}^{-1}({\mathbf {w}}_j)\in \mathcal {S}^D\) (j = 1, …, D − 1) is an orthonormal basis of \(\mathcal {S}^D\) for the Aitchison metric since < ei, ej >A =< wi, wj >E = δij, where δij = 1 if i = j and 0 otherwise, and these vectors are \(\mathcal {A}\)-eigenvectors of A. If W = [w1…, wD−1] is the corresponding contrast matrix, then \(\text{ilr}_W(\mathbf {A})_{ij}={\mathbf {w}}_i^T\mathbf {A}{\mathbf {w}}_j= \lambda _j {\mathbf {w}}_i^T{\mathbf {w}}_j=\lambda _i \delta _{ij}\), which shows that ilrW(A) is the (D − 1) × (D − 1) diagonal matrix Λ with the λi as diagonal elements. Then using Theorem 1, we can write that \(\mathbf {A} = {\mathbf {P}}_W \begin {pmatrix} \boldsymbol {\Lambda } & \mathbf 0_{D-1} \\\ \mathbf 0_{D-1}^T & 0\end {pmatrix} {\mathbf {P}}_W^T\)showing that A is similar to the diagonal matrix \(\begin {pmatrix} \boldsymbol {\Lambda } & \mathbf 0_{D-1} \\\ \mathbf 0_{D-1}^T& 0\end {pmatrix}.\) This last result gives us the general form of diagonal matrices of \(\mathcal {A}\) with the corresponding spectral representation \(\mathbf {A} = \sum _{i=1}^{D-1}\lambda _i {\mathbf {w}}_i{\mathbf {w}}_i^T.\)

4.2 One-Step M-Scatter Functionals of a Compositional Random Vector

For a simplex-valued random vector X (see Pawlowsky-Glahn et al. 2015), let us recall the following definition of expectation:

$$\displaystyle \begin{aligned} {\mathbf E}^\oplus \mathbf{X} := \text{clr}^{-1}( {\mathbf E}\text{clr} (\mathbf{X})) \end{aligned}$$

and the following definition of the (clr-)covariance matrix COVX (see Aitchison 1982) given by the D × D matrix

$$\displaystyle \begin{aligned} {\mathbf{COV}}^\oplus\mathbf{X}:={\mathbf{COV}}(\text{clr}(\mathbf{X})). \end{aligned}$$

Note that, by Theorem 2, we can see that COVX is also equal to clr−1(COV(clr(X))). Using the same principles, let us now introduce a simplex adapted definition of the square Mahalanobis distance as being the square Mahalanobis distance in the usual sense of the clr coordinates of X

$$\displaystyle \begin{aligned} M^2({\mathbf X}) = (\text{clr}(\mathbf{X}) - {\mathbf E}\text{clr}(\mathbf{X}))^T ({{\mathbf{COV}}}^\oplus\mathbf{X})^{-1} (\text{clr}(\mathbf{X}) - {\mathbf E}\text{clr}(\mathbf{X})). \end{aligned}$$

In the same line, let us define the following one-step M-scatter matrix of a simplex-valued random vector as the corresponding scatter of its clr coordinates

$$\displaystyle \begin{aligned} {\mathbf{COV}}_w^\oplus\mathbf{X} &:= {\mathbf{COV}}_w(\text{clr}(\mathbf{X}))\\ &\ = {\mathbf E}[w(M^2({\mathbf X}))(\text{clr}(\mathbf{X}) - {\mathbf E}\text{clr}(\mathbf{X})) (\text{clr}(\mathbf{X})- {\mathbf E}\text{clr}(\mathbf{X}))^T]. \end{aligned} $$

For w(d) = d∕(D + 2), we get the fourth-moment-based scatter matrix \({\mathbf {COV}}_4^\oplus \mathbf {X}\):

$$\displaystyle \begin{aligned} {\mathbf{COV}}_4^\oplus\mathbf{X} &:= {\mathbf{COV}}_4(\text{clr}(\mathbf{X}))\\ &\ = \frac{1}{D+2} {\mathbf E}[M^2({\mathbf X})(\text{clr}(\mathbf{X}) -{\mathbf E}\text{clr}(\mathbf{X}))(\text{clr}(\mathbf{X}) - {\mathbf E}\text{clr}(\mathbf{X}))^T]. \end{aligned} $$

All these characteristics can also be expressed using the ilr coordinates associated to any contrast matrix V by the following formulas:

$$\displaystyle \begin{aligned} {{\mathbf E}}^\oplus \mathbf{X} = \text{ilr}_V^{-1}( {{\mathbf E}}\text{ilr}_V(\mathbf{X})), \end{aligned}$$
$$\displaystyle \begin{aligned} {\mathbf{COV}}^\oplus\mathbf{X}={\mathbf{COV}}(\text{clr}(\mathbf X))={\mathbf{COV}} (\mathbf V \text{ilr}_V( \mathbf{X})), \end{aligned}$$

and thus

$$\displaystyle \begin{aligned} {\mathbf{COV}}^\oplus\mathbf{X}=\mathbf V {\mathbf{COV}} (\text{ilr}_V(\mathbf{X})) \mathbf V^T= \text{ilr}_V^{-1}({\mathbf{COV}} (\text{ilr}_V(\mathbf{X}))), \end{aligned}$$
$$\displaystyle \begin{aligned} M^2(\mathbf{X}) = M^2(\text{ilr}_V(\mathbf{X})), \end{aligned}$$

and similarly

$$\displaystyle \begin{aligned} {\mathbf{COV}}_w^\oplus(\mathbf{X})=\mathbf V {\mathbf{COV}}_w (\text{ilr}_V(\mathbf{X})) \mathbf V^T= \text{ilr}_V^{-1}({\mathbf{COV}}_w (\text{ilr}_V(\mathbf{X}))). \end{aligned}$$

Note that the scatter functionals \({\mathbf {COV}}_w^\oplus \mathbf {X}\) belong to the algebra \(\mathcal {A}\), and thus we also have

$$\displaystyle \begin{aligned} {\mathbf{COV}}_w^\oplus(\mathbf{X})=\text{clr}^{-1}({\mathbf{COV}}_w(\text{clr}(\mathbf X))). \end{aligned}$$

Given a sample of size n, the empirical versions of the previous scatter matrices can be derived easily.

4.3 Elliptical Distribution in the Simplex

Nguyen (2019) introduces the Student distribution in the simplex with an application to political economy. Mateu-Figueras et al. (2021) review some distributions in the simplex including the multivariate Student distribution. We define a new family of elliptical distributions with second moment in the simplex. A random vector X with values in \(\mathcal {S}^D\) is said to follow an elliptical distribution if any of its ilr coordinates follows an elliptical distribution with second moment in \(\mathbb {R}^{D-1}.\) This definition makes sense due to the following theorem.

Theorem 4

Given two contrast matricesVandW, if\({\mathbf X}_V^* = \mathit{\text{ilr}}_V(\mathbf X)\)follows an elliptical distribution with parameters\(\boldsymbol \mu _V^*=\mathbf E(\mathbf X_V^*)\)and\(\boldsymbol \Sigma _V^*={\mathbf {COV}}(\mathbf X_V^*)\), then\(\mathbf X_W^* = \mathit{\text{ilr}}_W(\mathbf X)\)follows an elliptical distribution with parameters\((\boldsymbol \mu _W^*,\boldsymbol \Sigma _W^*)\)with

$$\displaystyle \begin{aligned} \begin{array}{rcl} \mathbf W \boldsymbol{\mu}_W^* & = &\displaystyle \mathbf V \boldsymbol{\mu}_V^*,\\ \mathbf W \boldsymbol{\Sigma}_W^*\mathbf W^T& =&\displaystyle \mathbf V\boldsymbol{\Sigma}_W^*\mathbf V^T ,\\ \mathbf W{\boldsymbol{\Sigma}_W^*}^{-1}\mathbf W^T& =&\displaystyle \mathbf V{\boldsymbol{\Sigma}_V^*}^{-1}\mathbf V^T. \end{array} \end{aligned} $$

From this theorem, we can say that μclr = Vμ = E(clr(X)) is an invariant that characterizes the location parameter of the elliptical distribution in clr coordinate space, and μ = clr−1(μclr) = E(X) is an invariant that characterizes the location parameter in the simplex. Moreover, \(\boldsymbol {\Sigma } = \mathbf V \boldsymbol {\Sigma }^*_V \mathbf V^T = \text{ilr}_V^{-1}(\boldsymbol {\Sigma }_V^*)= \mathbf {COV}^{\oplus }\mathbf {X}\) is an invariant that characterizes the scatter matrix in the simplex. Similarly, \(\mathbf Q= \mathbf V{\boldsymbol {\Sigma }_V^*}^{-1}\mathbf V^T = \text{ilr}_V^{-1}({\boldsymbol {\Sigma }^*_{\boldsymbol {V}}}^{-1})\) is an invariant that characterizes the precision matrix of this distribution in the simplex. As in Pawlowsky-Glahn et al. (2015), it is easy to write the density of this distribution with respect to Lebesgue measure in coordinate space as well as with respect to the Aitchison measure in the simplex. As in Comas-Cufí et al. (2016), we can extend this definition to a mixture of elliptical distributions.

5 ICS for Compositional Data

5.1 ICS in Coordinate Space

With the definitions introduced in Sect. 4.2, we can now define ICS for a compositional random vector X. For a given choice of contrast matrix V, let X = ilrV(X). In the ilr coordinate space, ICS consists of the joint diagonalization of two scatter matrices S1(X) and S2(X). Following Archimbaud et al. (2018a), let us focus on S1(X) = COV(X) and S2(X) = COV4(X). From Eq. (3) in Sect. 2.2, we can derive the affine invariant coordinates by diagonalizing the (D − 1) × (D − 1) symmetric matrix

$$\displaystyle \begin{aligned} {} {\mathbf{L}}^*={\mathbf{COV}}(\mathbf X^*)^{-1/2}{\mathbf{COV}}_4(\mathbf X^*) {\mathbf{COV}}(\mathbf X^*)^{-1/2}. \end{aligned}$$

Let λ1 ≥… ≥ λD−1 be the eigenvalues of L in descending order, and Λ be the (D − 1) × (D − 1) diagonal matrix with the vector of eigenvalues on its diagonal. Let \({\mathbf {u}}^*_k\), k ranging from 1 to D − 1, be the D − 1 corresponding eigenvectors of L and \({\mathbf {U}}^* = [{\mathbf {u}}^*_1 \ldots {\mathbf {u}}^*_{D-1}]\) be the matrix whose columns are these eigenvectors. By construction, the matrix U is orthogonal (with respect to the standard scalar product in \({\mathbb R}^{D-1}\)). We have for all k = 1, …, D − 1:

$$\displaystyle \begin{aligned} \mathbf L^* \mathbf u_k^*= \lambda_k \mathbf u_k^*. \end{aligned}$$

If we denote by \(\mathbf h_k^*\), k = 1, …, D − 1 the column vectors of H = COV(X)−1∕2U, we have

$$\displaystyle \begin{aligned} \begin{array}{rcl} {\mathbf{H}}^{*T}{\mathbf{COV}}({\mathbf{X}}^*) {\mathbf{H}}^{*} & = &\displaystyle {\mathbf{I}}_{D-1}, {} \end{array} \end{aligned} $$
(8)
$$\displaystyle \begin{aligned} \begin{array}{rcl} {\mathbf{H}}^{*T}{\mathbf{COV}}_4({\mathbf{X}}^*) {\mathbf{H}}^{*} & = &\displaystyle \boldsymbol{\Lambda}. {} \end{array} \end{aligned} $$
(9)

Equations (8) and (9) correspond to the joint diagonalization of COV(X) and COV4(X). As for Eq. (2), we also have

$$\displaystyle \begin{aligned} {\mathbf{COV}}_4({\mathbf{X}}^*) \mathbf H^* = {\mathbf{COV}}({\mathbf{X}}^*)\mathbf H^* \boldsymbol{\Lambda}(\mathbf X). \end{aligned}$$

The scores or invariant coordinates of X are given by

$$\displaystyle \begin{aligned} \mathbf Z^*= \mathbf H^{*T}({\mathbf{X}}^* - {\mathbf E}{\mathbf{X}}^*) \end{aligned} $$
(10)

or equivalently by \(z_k^*=<\mathbf h_k^*,{\mathbf {X}}^* - {\mathbf E}{\mathbf {X}}^*>\), k = 1, …, D − 1.

5.2 ICS in the Simplex

Let us now use Sect. 4 to obtain a formulation of the previous results back in the simplex. This presentation of ICS involves elements (scatter matrices, eigenvalues, and eigenvectors) that are independent of the particular choice of contrast matrix, thus justifying this approach. Let us denote by L the following matrix:

$$\displaystyle \begin{aligned} \mathbf{L}=({\mathbf{COV}}^\oplus\mathbf{X})^{-1/2}{\mathbf{COV}}_4^\oplus\mathbf{X} ({\mathbf{COV}}^\oplus\mathbf{X})^{-1/2}. \end{aligned} $$
(11)

By Theorem 1, we have that

$$\displaystyle \begin{aligned} \text{ilr}_{V}(\mathbf{L}) = {\mathbf{L}}^*, \end{aligned} $$
(12)

and by Theorem 3, we have that, for k = 1, …, D,

$$\displaystyle \begin{aligned} \mathbf L \boxdot \mathbf u_k = \lambda_k \odot \mathbf u_k, \end{aligned}$$

where \(\mathbf u_k= \text{ilr}_{V}^{-1}(\mathbf u_k^*)\) for k = 1, …, D − 1, and \({\mathbf {u}}_D = \mathbf 1_D/\sqrt {D}\) corresponding to λD = 0. We have < uk, ul >A  = δkl, for k, l = 1, …, D. The vectors uk are the \(\mathcal {A}\)-eigenvectors of L. We can write the following spectral representation of L:

$$\displaystyle \begin{aligned} {} \mathbf L=\sum_{k=1}^{D-1} \lambda_k \text{clr}(\mathbf u_k) \text{clr}(\mathbf u_k)^T. \end{aligned}$$

If we denote by \(\mathbf h_k=\text{ilr}_{V}^{-1}(\mathbf h_k^*)= ({\mathbf {COV}}^\oplus \mathbf {X})^{-1/2} \boxdot \mathbf u_k\), k = 1, …, D, we get

$$\displaystyle \begin{aligned} {\mathbf{COV}}_4^\oplus\mathbf{X} \boxdot \mathbf h_k= \lambda_k \odot {\mathbf{COV}}^\oplus\mathbf{X} \boxdot \mathbf h_k \end{aligned}$$

and

$$\displaystyle \begin{aligned} ({\mathbf{COV}}^\oplus\mathbf{X})^{-1} {\mathbf{COV}}_4^\oplus\mathbf{X} \boxdot \mathbf h_k= \lambda_k \odot \mathbf h_k. \end{aligned}$$

The scores \(\mathbf Z^*=(z_1^*,\ldots ,z_{D-1})\) defined by (10) do not depend on the contrast matrix as already mentioned in Muehlmann et al. (2021) and are given by

$$\displaystyle \begin{aligned} z_k^*=<\mathbf h_k^*, \mathbf X^* - {\mathbf E}{\mathbf{X}}^*> \, =\, <\mathbf h_k,\mathbf X \ominus {{\mathbf E}}^\oplus \mathbf{X} >_A. \end{aligned} $$
(13)

This equation shows that the scores can be used for outlier detection independently of the contrast matrix.

5.3 Reconstruction Formula

From (10), it is easy to derive the reconstruction formula in coordinate space:

$$\displaystyle \begin{aligned} \mathbf X^*={\mathbf E}{\mathbf{X}}^* + (\mathbf H^{*T})^{-1}\mathbf Z^*. \end{aligned} $$
(14)

Let \(\mathbf a_k^*\) denote the column vectors of the matrix (HT)−1 = COV(X)1∕2U for k = 1, …, D − 1. Let us define the scalar product with respect to the metric COV(X)−1 by

$$\displaystyle \begin{aligned} < \mathbf u^*, \mathbf v^*>_{ {\mathbf{COV}}({\mathbf{X}}^*)^{-1}} = \mathbf u^{*T} {\mathbf{COV}}({\mathbf{X}}^*)^{-1} \mathbf v^* . \end{aligned}$$

Equation (8) shows that the vectors \(\mathbf a_k^*\), k = 1, …, D − 1, are orthonormal in the sense of this scalar product since the equation can be rewritten as

$$\displaystyle \begin{aligned} \begin{array}{rcl} (\mathbf H^{*})^{-1} {\mathbf{COV}}({\mathbf{X}}^*)^{-1} (\mathbf H^{*T})^{-1} & = &\displaystyle \mathbf I_{D-1}.{} \end{array} \end{aligned} $$
(15)

This orthogonality implies that the reconstruction formula can also be obtained by

$$\displaystyle \begin{aligned} \mathbf X^*- {\mathbf E}{\mathbf{X}}^* = \sum_{k=1}^{D-1} < \mathbf a_k^*, \mathbf X^*-{\mathbf E}{\mathbf{X}}^*>_{ {\mathbf{COV}}({\mathbf{X}}^*)^{-1}} \mathbf a_k^*. \end{aligned} $$
(16)

The scalar products \(< \mathbf a_k^*, \mathbf X^*-{\mathbf E}{\mathbf {X}}^*>_{ {\mathbf {COV}}({\mathbf {X}}^*)^{-1}}\), k = 1, …, D − 1, are the coordinates of the (D − 1) vector:

$$\displaystyle \begin{aligned} (\mathbf H^{*})^{-1} {\mathbf{COV}}({\mathbf{X}}^*)^{-1} (\mathbf X^*-{\mathbf E}{\mathbf{X}}^* ). \end{aligned}$$

Using (14), this vector can be written:

$$\displaystyle \begin{aligned} \begin{array}{rcl} (\mathbf H^{*})^{-1} {\mathbf{COV}}({\mathbf{X}}^*)^{-1} (\mathbf X^*-{\mathbf E}{\mathbf{X}}^* ) & = &\displaystyle (\mathbf H^{*})^{-1} {\mathbf{COV}}({\mathbf{X}}^*)^{-1} (\mathbf H^{*T})^{-1}\mathbf Z^*. {} \end{array} \end{aligned} $$
(17)

Using (17) and (15), we get

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} (\mathbf H^{*})^{-1} {\mathbf{COV}}({\mathbf{X}}^*)^{-1} \mathbf (\mathbf X^* -{\mathbf E}{\mathbf{X}}^*) & = &\displaystyle \mathbf Z^*, \end{array} \end{aligned} $$

and thus

$$\displaystyle \begin{aligned} < \mathbf a_k^*,\mathbf X^*-{\mathbf E}{\mathbf{X}}^* >_{ {\mathbf{COV}}({\mathbf{X}}^*)^{-1}} = z_k^*, \end{aligned}$$

where \((z_1^*,\ldots ,z_{D-1}^*)\) denote the coordinates of Z.

Combining with (16), we get the final reconstruction formula in coordinate space

$$\displaystyle \begin{aligned} \mathbf X^* = {\mathbf E}{\mathbf{X}}^* + \sum_{k=1}^{D-1} z_k^* \mathbf a_k^*. \end{aligned} $$
(18)

Applying \(\text{ilr}_{V}^{-1}\) to Eq. (18), we get the following simplex version of the reconstruction formula

$$\displaystyle \begin{aligned} \mathbf X= {\mathbf E}^\oplus\mathbf{X} \bigoplus_{k=1}^{D-1} z_k^* \odot \mathbf a_k, \end{aligned} $$
(19)

where

$$\displaystyle \begin{aligned} \mathbf a_k= \text{ilr}_{V}^{-1}(\mathbf a_k^*)= ({\mathbf{COV}}^\oplus\mathbf{X})^{1/2} \boxdot \mathbf u_k. \end{aligned} $$
(20)

The vectors ak are related to the \(\mathcal {A}\)-eigenvectors uk of L by (20). They generate simplex lines called ICS axes that are the sets of vectors α ⊙ak, for \(\alpha \in \mathbb {R}.\) In the next section, we use the empirical version of the reconstruction formula (19) in order to plot the projection of the data on the vector a1 in some ternary diagrams in situations where the number of selected invariant coordinates is one.

We can also write (19) in terms of the vectors hk:

$$\displaystyle \begin{aligned} \mathbf X= {\mathbf E}^\oplus\mathbf{X} \bigoplus_{k=1}^{D-1} <\mathbf h_k,\mathbf X \ominus {{\mathbf E}}^\oplus \mathbf{X} >_A \odot \left( {\mathbf{COV}}^\oplus\mathbf{X} \boxdot \mathbf h_k\right). \end{aligned}$$

6 Examples of Application

We first consider two artificial datasets following a mixture of two normal distributions with 10% of observations that differ from the 90% constituting the main bulk of the data. The dimension is D = 3 for the first example and D = 20 for the second one. The contrast matrices we use for the ilr transformations in this section are triangular Helmert matrices corresponding to the original ilr transformation defined by Egozcue et al. (2003).

6.1 Toy Examples

For the first example, the contrast matrix is given by \(\mathbf V^T=\begin {pmatrix} \frac {1}{\sqrt {2}} & -\frac {1}{\sqrt {2}} & 0\\ -\frac {1}{2}\sqrt {\frac {2}{3}} & -\frac {1}{2}\sqrt {\frac {2}{3}} & \sqrt {\frac {2}{3}} \end {pmatrix}.\)

In this toy example, n = 100 observations are generated in the ilr space with D − 1 = 2 dimensions from a mixture of two Gaussian distributions. The mean and the covariance matrix of the first 90% of the observations (sample 1) are, respectively,

$$\displaystyle \begin{aligned} \boldsymbol \mu_1^*=(0,0)^T \mbox{ and } \boldsymbol \Sigma_1^*=0.02 \mathbf I_2+0.02 \mathbf 1_2 \mathbf 1_2^T, \end{aligned}$$

while the mean vector and the covariance matrix of the remaining 10% (sample 2) are

$$\displaystyle \begin{aligned} \boldsymbol \mu_2^*=\left(\frac{2}{\sqrt{2}}\log2,\frac{-1}{\sqrt{6}}\log2\right)^T \mbox{ and } \boldsymbol \Sigma_2^*=0.05 \mathbf I_2. \end{aligned}$$

Figure 1 on the left (resp., in the middle) shows the dataset in the simplex \(\mathcal {S}^3\) (resp., in the ilr space). The points in cyan (resp., magenta) belong to sample 1 (resp., sample 2), and we can see that component x2 has higher values in sample 2 than in sample 1, to the detriment of x1 and x3. We perform the ICS method in the ilr space using the ICSOutlier package (Archimbaud et al. 2018b). The eigenvalues are 1.57 and 0.81, and the D’Agostino test for normality leads to the selection of a single invariant component. Note that this test is based on the ICS scores and thus does not depend on the ilr transformation (see Archimbaud et al. 2018a, for more details). Figure 1 on the right reports the ICS distances as in Eq. (5) for each observation. The horizontal line represents a cutoff value based on Monte Carlo simulations and a 90% quantile. The choice of the quantile order can be done with respect to the expected percentage of outliers in the data. The ICS distances and the cutoff are also independent of the ilr transformation since they depend on the ICS scores only. Figure 1 on the right allows us to identify outliers represented by filled circles. On this example, all 10 observations from sample 2 are identified as outliers, whereas only 1 out of the 90 observations from sample 1 is incorrectly identified (at the limit of the cutoff value).

Fig. 1
Three graphs. 1. A ternary plot. A triangle with vertices, X 1, X 2 and X 3. All the axes in the three sides range from 0 to 1 with intervals of 0.2. Two clusters of data are present. One is large and the other is small. One is concentrated just below the center while the other is present to its left. 2. A scatterplot between i l r subscript 2 and i l r subscript 1. There are two clusters. The large one is in the left half of plot area and the small one is at the bottom right. 3. A scatterplot between I C S distances and observation number. A horizontal line at I C S distance equals 2.8 is present. A large cluster is spread evenly below the line. A small cluster is present at the top right.

First toy example: data in the simplex (left), data in the ilr space (middle), identification of the outlying observations using ICS (right)

The two vectors generating the ICS axes (dashed lines in Fig. 2) are equal to \(\mathbf a_1^*=(0.31,-0.1)\) and \(\mathbf a_2^*=(0.12,0.22)\) in the ilr space and a1 = (0.27, 0.43, 0.30) and a2 = (0.28, 0.33, 0.39) in the simplex space. To better understand the role of the ICS components and how they discriminate the observations, we represent in Fig. 2 the projections of the observations on the first ICS axis (left plots) and the second ICS axis (right plots) in the ilr space (top plots) and in the simplex space (bottom plots). The first ICS axis allows to discriminate the observations with a high value of x2 relatively to the other shares and results in a good discrimination of the two groups. On the contrary, the second axis that seems to separate observations with high values of x1 against observations with high values of x3 does not allow to discriminate the two groups.

Fig. 2
Four plots. 1. Two scatterplots between i l r subscript 2 and il r subscript A. In the first graph, two clusters are present in the, a 1 star line that starts at (negative 0.3, 0.2) and ends at (1.4, negative 0.5). Another intersecting line, a 2 star begins at (negative 0.25, negative 0.3) and ends at (0.8, 1.3). The points, a 2 star and a 1 star are present on the lines at (0.2,0.2) and (0.4, negative 0.4), respectively. In the second, the two clusters are present in the a 2 star line. 2. Two ternary plots which has vertices X 1, X 2 and X 3. In the first. the two clusters are present in the a 1 line. In the second, the two clusters are present in the a 2 line.

First toy example: plot of the ICS axes and projections of the data on the ICS axes (ICS 1 on the left and ICS 2 on the right) in the ilr space (top plots) and in the simplex (bottom plots)

Finally, using the cutoff value, we represent in gray in Fig. 3 the zones or areas of the ilr space (left plot) and of the simplex (right plot) where the observations are considered as outliers. This confirms that the observations with a large or a small value of x2 relatively to the other shares are in the outlying zone.

Fig. 3
Two-part image. 1. A scatterplot between i l r subscript 2 and il r subscript 1. Two clusters are present in the, a 1 star line that starts at (negative 0.3, 0.2) and ends at (1.4, negative 0.4). An intersecting line, a 2 star begins at (negative 0.25, negative 0.3) and ends at (0.8, 1.3). The points, a 2 star and a 1 star are present on the lines at (0.2, 0.2) and (0.4, negative 0.4), respectively. A line parallel to this line is present to its left and rights 2. A ternary plot which has vertices X 1, X 2 and X 3. Two clusters are present in the a 1 curve which resembles an inverted C. There are three intersecting curves from X 3 and X 1. The central curve has the point a 2.

First toy example: outliers zones in gray in the ilr space (left) and in the simplex space (right)

For the second toy example, we generate a higher dimensional dataset with D = 20, using two multivariate Gaussian distributions. The first sample is of size n1 = 90 with

$$\displaystyle \begin{aligned} \boldsymbol \mu_1^*=(0,0,\ldots,0)^T \;\; \mbox{and }\; \boldsymbol \Sigma_1^*=0.02 \mathbf I_{D-1}+0.02 \mathbf 1_{D-1}^{ } \mathbf 1_{D-1}^T, \end{aligned}$$

and the second sample is of size n2 = 10 with

$$\displaystyle \begin{aligned} \boldsymbol \mu_2^*&=\left(\frac{2}{\sqrt{2}}\log2,\frac{-1}{\sqrt{6}}\log2,0, \ldots,0\right)^T \text{ and }\\ \boldsymbol \Sigma_2^*&=\begin{pmatrix} 0.05 \mathbf I_2& \mathbf 0 \\ \mathbf 0 & 0.02 \mathbf I_{D-1}+0.02 \mathbf 1_{D-3}^{ } \mathbf 1_{D-3}^T \end{pmatrix}. \end{aligned} $$

When D > 3, several options can be used for representing compositional data. One possibility is to plot ternary diagrams using sub-compositions as described in van den Boogaart and Tolosana-Delgado (2008). An alternative is to plot a ternary diagram with x1, x2 and the sum of the remaining parts x3 + … + xD. Another possibility is to replace the sum of the remaining parts by their geometric mean. If D > 3 is not too large, these sub-ternary diagrams can be gathered in a square matrix of dimension D(D − 1)∕2.

In order to identify the outliers, we implement the ICS method using ICSOutlier in coordinate space. The procedure selects only the first invariant component. The left plot of Fig. 4 displays the ICS distances and the cutoff value as an horizontal line to identify outliers. This plot is the same for all ilr transformations. 9 observations out of 10 are detected as outliers in sample 2, while none of the observations from sample 1 are identified as outliers. The symbols for the points are as in Fig. 1.

Fig. 4
Two-part graph. 1. A scatterplot between I C S distances and the number of observations. A horizontal line is present at I C S equals 2.5. A large cluster is spread evenly below this line at the bottom of the plot area. The second cluster is spread at the right extreme of the plot area. 2. A five by five matrix. The first row, first column entry is A. The second row, second column entry is B. The third row, third column entry is C. The fourth row, fourth column entry is D. The fifth row, fifth column entry is E. All the other entries are ternary plots with two clusters lying at different places.

Second toy example: ICS distances (left), sub-ternary diagrams of the first five composition parts (right), with cyan (resp., magenta) for sample 1 (resp., 2) and filled circles for detected outliers

The right plot represents several sub-ternary diagrams, but not all of them because of the large dimension D = 20. The selected ternary diagrams plot two parts among x1 to x5 against the geometric mean of the rest denoted by ∗. However, the diagrams that are not shown are very similar to the ones that focus on x3, x4, and x5 (see the rows and columns 3, 4, and 5 on the matrix plot). Observations with the cross (resp., circle) symbol belong to sample 1 (resp., sample 2). The sub-ternary diagrams confirm that x1 and x2 are the composition parts playing a role in explaining the outlyingness of the red points. In fact, the observations of sample 1 are clearly visible and separated from the other group when considering the diagram with the x1 and x2 components and the geometric mean of the other parts. On the contrary, when looking at the ternary diagrams that do not take x1 and x2 separately from the other parts, the outliers are not distinct from the other observations.

We represent in Fig. 5 the sub-ternary diagram (x1, x2, ∗) (where ∗ represents the geometric mean of the rest), with small circles in cyan (resp., magenta) for sample 1 (resp., sample 2). The vector a1 is plotted together with the ICS axis represented by a dashed line. We see that the data projected on the first ICS axis are clearly discriminated by high values of x2 relatively to x1.

Fig. 5
A ternary plot with vertices X 1, X 2 and star. An inverted C shaped curve begins at the bottom left at X 1 and ends at the bottom right at X 2. Two clusters are present in this curve.

Second toy example: plot of the ICS axis a1 and projections of the data on this axis in the ternary diagram (x1, x2, ∗)

6.2 Market Shares Example

This market share dataset has been simulated from a model fitted on the real European cars market in 2015 and is available in Barreiro et al. (2022). The plot on the top of Fig. 6 represents the shares in the French automobile market of 5 segments (D = 5), from January 2003 to August 2015, denoted by A, B, C, D, and E (European cars market segments, from the cheapest cars to the most powerful and luxury ones). We perform the ICS method in the ilr space and represent in the bottom of Fig. 6 the ICS distances for each observation. The normality test of the ICS procedure reveals that only the first component is important for outlier identification. The cutoff value is based on the quantile of order 97.5%. All the identified outliers are concentrated in a time interval between September 2008 and May 2009. It turns out that during this period, the global automobile market was undergoing a crisis with worldwide sales significantly down and political solutions have been provided such as the scrapping bonus at the end of 2008.

Fig. 6
Two part graph. 1. A line graph plots A, B, C, D and E lines between market share automobile and years from 2004 to 2016. All the lines have mild fluctuations. The line B and C are at the top of the plot area in that order. The lines D, A and E are at the bottom of the plot area in the order. 2. A scatterplot between I C S distances and the number of observations. A horizontal line is present at I C S equals 3.9. A large cluster is spread evenly below this line at the bottom of the plot area. The second cluster is present at the center top.

French Market automobile shares example: from January 2003 to August 2015 in 5 segments (top) and identification of the outlying observations using ICS distances (bottom). The dotted vertical lines represent the period during which outliers were identified (September 2008 to May 2009)

As before, in Fig. 7, we represent the matrix of sub-ternary diagrams with detected outliers in red. The ternary diagram vertices consist of two selected parts, and the third part indicated by ∗ corresponds to the geometric mean of the remaining parts. It seems that among all ternary diagrams, the ones including segment A are the best possible in order to identify the outliers. More precisely, the sub-ternary diagram that includes segments A, D, and the others separates the most the two groups. Thus, we plot in Fig. 8 the data in the sub-ternary diagram (A, D, ∗). We also represent the vector a1, the ICS axis, and the projections of the data on this axis.

Fig. 7
A five by five matrix. The first row first column entry is A. The second row second column entry is B. The third row third column entry is C. The fourth row fourth column entry is D. The fifth row fifth column entry is E. All the other entries are ternary plots with two clusters lying at different places.

French Market automobile shares example: outlier identification on the matrix sub-ternary diagram

Fig. 8
1. A ternary plot with vertices A, D and star. The flipped C shaped curve begins at the bottom left at A and ends at the top at star. Two clusters are present in this curve. 2. A magnified image of the two clusters with a point a 1 in the center.

French Market automobile shares example: projection of the data on the first ICS axis a1 in the sub-ternary diagram defined by A, D, and the amalgamation of other components (left). Zoom on the interesting part of the ternary diagram (right)

The time points that are detected as outlying correspond to observations with high values of segment A, compared to more normal values of D and low values of the geometric mean of B, C and E. This interpretation is confirmed when looking at the top plots of Fig. 6.

7 Conclusion

The present contribution extends ICS for outlier detection to the context of compositional data. As for standard data, ICS with the scatter pair COV and COV4 is a powerful tool to detect a small proportion of outliers. The definition of ICS in coordinate space is straightforward, and the identification of outliers does not depend on the choice of the isometric log-ratio transformation. The definition of ICS in the simplex is more challenging, and some algebra tools have been introduced to tackle the problem. Using a reconstruction formula, ICS axes can be plotted on ternary diagrams that help interpreting the outliers. Further interpretation tools are work in progress. Among the perspectives, we can mention the extension of ICS to compositional functional data (see Rieser and Filzmoser (2022) and Archimbaud et al. (2022)). Some supplementary material is available on https://github.com/tibo31/ics_coda in order to permit the reproducibility of the empirical analyses contained in the present paper.