1 Introduction

Principal component analysis (PCA) and canonical correlation analysis (CCA) are two-dimensional reduction techniques of widespread use in statistics. For a random vector \({{\textbf {z}}}\) in the Euclidean space of dimension m, with positive definite covariance matrix \(\varvec{\varSigma }\), PCA looks for the spectral decomposition of \(\varvec{\varSigma }\), the eigenvectors \({{\textbf {t}}} _{1}^{1},\ldots ,{{\textbf {t}}}_{m}^{1}\) associated with the corresponding m-uple of eigenvalues \((\gamma _{1}^{1},\ldots ,\gamma _{m}^{1})\) in decreasing order, \(\gamma _{j}^{1}\ge \gamma _{j+1}^{1}>0,\) for all \(1\le j\le m-1\) that is,

$$\begin{aligned} \varvec{\varSigma }={{\textbf {T}}}^{1}\varvec{\varDelta }^{1}\left( {{\textbf {T}}} ^{1}\right) ^{t}=\sum _{i=1}^{m}\gamma _{i}^{1}{{\textbf {t}}}_{i}^{1}\left( {{\textbf {t}}}_{i}^{1}\right) ^{t}, \end{aligned}$$
(1)

where \({{\textbf {T}}}^{1}\in {\mathbb {R}}^{m\times m}\) is an orthonormal matrix whose columns are \({{\textbf {t}}} _{j}^{1},j=1,\ldots ,m\) and \(\varvec{\varDelta }^{1}=diag\left( \gamma _{1}^{1},\ldots ,\gamma _{m}^{1}\right) \). The variables \(\left( {{\textbf {t}}} _{1}^{1}\right) ^{t}({\textbf {z-}}E{\textbf {z),\dots ,}}\left( {{\textbf {t}}} _{m}^{1}\right) ^{t}({\textbf {z-}}E{\textbf {z)}}\) are usually referred as principal components. The eigenvalues and eigenvectors can be also obtained through an optimization scheme (Seber 2004, p.181). On the other hand, the principal components are the best linear predictors for \({\textbf {z-}}E{{\textbf {z}}}\) when looking for linear combinations \(\sum _{k=1}^{p}({{\textbf {a}}}_{k}^{t}({\textbf {z-}}E{\textbf {z))a}} _{k} \) based on an orthonormal set \(\left\{ {{\textbf {a}}}_{1},\dots ,{{\textbf {a}}} _{p},\dots ,{{\textbf {a}}}_{m}\right\} \), \(p<m\). Let \({{\textbf {z}}}\sim F\), then the principal components solve the optimization problem

$$\begin{aligned} ({\varvec{{\mu }}}_{{{\textbf {z}}}}{{\textbf {,}}}V_{p})&=\arg \min _{{\varvec{{\mu }} \in {\mathbb {R}}}^{m},V}E_F\left\| \left( {\varvec{{z-\mu }}}\right) {\textbf {-P}}_{V}\left( {\varvec{{z-\mu }}}\right) \right\| ^{2} \nonumber \\&=\arg \min _{{\varvec{{\mu }}}\in {\mathbb {R}}^{m},V}E_F\left\| P_{V^{\perp }}\left( {\varvec{{z-\mu }}}\right) \right\| ^{2}, \end{aligned}$$
(2)

where \({{\textbf {P}}}_{V}\) stands for the orthogonal projection on a subspace V of dimension \(p<m\), \(V=\) \(\left\langle {{\textbf {a}}}_{1}{} {\textbf {,\ldots ,a }}_{p}\right\rangle \) means that V is generated by the orthonormal set \(\left\{ {{\textbf {a}}}_{1}{} {\textbf {,\ldots ,a}}_{p}\right\} \) and \(V^{\perp }=\left\langle {{\textbf {a}}}_{p+1},\ldots ,{{\textbf {a}}} _{m}\right\rangle \) denotes the orthogonal complement of V. Then, \(({\varvec{\mu }}_{{{\textbf {z}}}}{} {\textbf {,}}V_{p})\) for (2) are given by \({\varvec{\mu }}_{{{\textbf {z}}}}=E{\textbf {z,}}\ \ V_{p}=\left\langle {{\textbf {t}}} _{1}^{1},\ldots ,{\textbf {t}}_{p}^{1}\right\rangle \ \text {and}\ {{\textbf {P}}}_{V_{p}}( {\textbf {z)}}=\sum _{k=1}^{p}(\left( {{\textbf {t}}}_{k}^{1}\right) ^{t}\left( {\textbf {z-}}E{{\textbf {z}}}\right) {\textbf {)t}}_{k}^{1}.\)

Let us denote the subset \({\mathscr {O}} _{r,m}=\left\{ {{\textbf {A}}}\in {\mathbb {R}} ^{r\times m}:{\textbf {AA}}^{t}={{\textbf {I}}}_{r}\right\} \) and \(p^{\prime }=m-p\). According to (2), it is easy to see that

$$\begin{aligned} \min _{{\varvec{{\mu }}},V}E_{F}\left\| {{\textbf {z}}}-{\varvec{\mu }} -{{\textbf {P}}}_{V}({{\textbf {z}}}-{\varvec{\mu }})\right\| ^{2}= & {} \min _{\left( {{\textbf {D}}},{{\textbf {a}}}\right) \in {\mathscr {B}}_{p^{\prime },m}^{1}}E_{F}\left\| {\textbf {Dz}}-{{\textbf {a}}}\right\| ^{2} \end{aligned}$$
(3)

with

$$\begin{aligned} {\mathscr {B}}_{p^{\prime },m}^{1}=\left\{ ({{\textbf {D}}},{{\textbf {a}}}):{{\textbf {D}}} \in {\mathscr {O}}_{p^{\prime },m},{{\textbf {a}}}\in {\mathbb {R}} ^{p^{\prime }}\right\} . \end{aligned}$$
(4)

To downweight outlying observations in (3), Maronna (2005) defined SM-estimators for principal vectors in PCA through the equations

$$\begin{aligned} \sigma ^1({{\textbf {D}}},{{\textbf {a}}})&=\min \left\{ s>0:E_{F}\chi \left( \frac{\Vert {\textbf {Dz-a}}\Vert ^{2}}{s}\right) \le \delta \right\} , \nonumber \\ ({{\textbf {D}}}_{SM}^{1},{{\textbf {a}}}_{SM}^{1})&=\arg \min _{( {{\textbf {D}}},{\textbf {a)}}\in {\mathscr {B}}_{p^{\prime },m}^{1}}\sigma ^1({{\textbf {D}}}, {{\textbf {a}}}), \end{aligned}$$
(5)

with \(\chi :[0,\infty )\rightarrow \left[ 0,1\right] \). He also considered SL-estimators for principal vectors by minimization of an L-scale rather than an M-scale as in (5).

CCA was proposed by Hotelling (1936) to determine the relationship between two sets of variables obtained by transforming the vectors \({{\textbf {x}}}\) and \({{\textbf {y}}}\) into two vectors \({{\textbf {z}}}\) and \({{\textbf {w}}}\) in lower dimensions, whose association has been greatly strengthened (see Das and Sen 1998 for a very thorough account on CCA and their wide variety of applications). In recent years, CCA has also gained popularity as a method for the analysis of genomic data, since it has the potential to be a powerful tool for identifying relationships between genotype and gene expression. It has also been used in geostatistical applications (see Furrer and Genton 2011). CCA is closely related to multivariate regression when the vectors \({{\textbf {x}}}\) and \({{\textbf {y}}}\) are not treated symmetrically (see Yohai and García Ben 1980). Given the two random vectors \({{\textbf {x}}}\) and \( {{\textbf {y}}}\) of dimensions p and q, respectively, the joint covariance matrix is given by

$$\begin{aligned} \varvec{\varSigma }=\left( \begin{array}{cc} E({\textbf {x}}-E{\textbf {x)}}({\textbf {x}}-E{\textbf {x)}}^{t} &{} E({\textbf {x}}-E {\textbf {x)}}({\textbf {y}}-E{\textbf {y)}}^{t} \\ E({\textbf {y}}-E{\textbf {y)}}({\textbf {x}}-E{\textbf {x)}}^{t} &{} E({\textbf {y}}-E {\textbf {y)}}({\textbf {y}}-E{\textbf {y)}}^{t} \end{array} \right) =\left( \begin{array}{cc} \varvec{\varSigma }_{{\textbf {xx}}} &{} \varvec{\varSigma }_{{\textbf {xy}}} \\ \varvec{\varSigma }_{{\textbf {yx}}} &{} \varvec{\varSigma }_{{\textbf {yy}}} \end{array} \right) , \end{aligned}$$
(6)
$$\begin{aligned} \det (\varvec{\varSigma }_{{\textbf {xx}}})>0<\det (\varvec{\varSigma }_{{\textbf {yy}} }),0<r=rank(\varvec{\varSigma }_{{\textbf {xy}}})\le \min (p,q)=s. \end{aligned}$$

CCA seeks sets \(\left\{ {\varvec{{\alpha }}}_{1},\ldots ,{\varvec{{\alpha }}} _{r}\right\} \subset {{\mathbb {R}}}^{p}\) and \(\left\{ {\varvec{\beta }}_{1},\ldots ,{\varvec{\beta }}_{r}\right\} \subset {\mathbb {R}}^{q}\), respectively, to yield uncorrelated standardized linear combinations of the variables in \({{\textbf {x}}}\) and the variables in \({{\textbf {y}}}\) that are maximally correlated with each other. We can define the canonical vectors \({\varvec{\alpha }}_{j},{\varvec{\beta }}_{j},\) \(j=1,\ldots ,r\) (except for the signs) as solutions to an optimization problem (Seber 2004, p. 258). Suppose we take unit length vectors \(\left( {\textbf {a,b}}\right) \in {\mathbb {R}}^{p}\times {\mathbb {R}}^{q}\) such that \(Var({{\textbf {a}}}^{t}{} {\textbf {x)}}=1=Var({{\textbf {b}}}^{t}{} {\textbf {y)}}\) and \(Corr({{\textbf {b}}}^{t}{\varvec{{y,\beta }}}_{j}^{t}{} {\textbf {y)}}=0=Corr({{\textbf {a}}}^{t} {\varvec{{x,\alpha }}}_{j}^{t}{} {\textbf {x)}}, j=1,2,\ldots ,k-1\), where Var and Corr stand for the variance and the correlation operators for random variables. With this constraint, we choose \(({\varvec{\alpha }}_{k},{\varvec{\beta }}_{k})\) to yield the maximum squared correlation between \({{\textbf {a}}}^{t}{{\textbf {x}}}\) and \({{\textbf {b}}}^{t}{{\textbf {y}}}\). If \(\rho _{k}\) stands for the positive correlation between \(\varvec{\alpha } _{k}^{t}{{\textbf {x}}}\) and \(\varvec{\beta }_{k}^{t}{{\textbf {y}}}\) (the k-th canonical correlation)\({\textbf {,}}\) then \(\rho _{k}^{2}=\left( Corr(\varvec{\alpha }_{k}^{t}{\varvec{x}},\varvec{\beta }_{k}^{t}{{\textbf {y}}})\right) ^{2}\) and one gets a decreasing sequence of squared canonical correlations, \(\rho _{1}^{2}\ge \cdots \ge \rho _{r}^{2}\). \(\varvec{\alpha }_{k}\) and \(\varvec{\beta } _{k}\) will be unique (apart from signs) if the canonical correlations are distinct. It is well known that the optimization problem is equivalent to solving the eigensystem

$$\begin{aligned} \varvec{\varSigma }_{{\textbf {xx}}}^{-1}\varvec{\varSigma } _{{\textbf {xy}}}\varvec{\varSigma } _{ {\textbf {yy}}}^{-1}\varvec{\varSigma }_{{\textbf {yx}}}\varvec{\alpha }_{k}=\rho _{k}^{2}\varvec{\alpha }_{k},k=1,\ldots ,r \end{aligned}$$
(7)
$$\begin{aligned} \varvec{\varSigma }_{{\textbf {yy}}}^{-1}\varvec{\varSigma }_{{\textbf {yx}}}\varvec{ \varSigma }_{{\textbf {xx}}}^{-1}\varvec{\varSigma }_{{\textbf {xy}}}\varvec{\beta } _{k}=\rho _{k}^{2}\varvec{\beta }_{k},k=1,\ldots ,r, \end{aligned}$$
(8)

which makes the search computationally more tractable. Classical estimators are obtained by replacing in (7) and (8) by the sample covariance matrix. A robust counterpart of (7) and (8) can be easily performed by solving the linear system, for \(k=1,\ldots ,r\),

$$\begin{aligned}{} & {} \varvec{\varSigma }^{(R)}=\left( \begin{array}{cc} \varvec{\varSigma }_{{\textbf {xx}}}^{(R)} &{} \varvec{\varSigma }_{{\textbf {xy}}}^{(R)}\\ \varvec{\varSigma }_{{\textbf {yx}}}^{(R)} &{} \varvec{\varSigma }_{{\textbf {yy}}}^{(R)} \end{array} \right) , \nonumber \\{} & {} \begin{array}{lr} \varvec{\varSigma }_{{\textbf {xx}}}^{(R)^{-1}}\varvec{\varSigma }_{{\textbf {xy}}}^{(R)} \varvec{\varSigma }_{{\textbf {yy}}}^{(R)-1}\varvec{\varSigma }_{{\textbf {yx}}}^{(R)} {\varvec{\alpha }}_{k}^{(R)}&{} =\left( \rho _{k}^{(R)}\right) ^{2}{\varvec{\alpha }}_{k}^{(R)}, \\ \varvec{\varSigma }_{{\textbf {yy}}}^{(R)^{-1}}\varvec{\varSigma }_{{\textbf {yx}}}^{(R)} \varvec{\varSigma }_{{\textbf {xx}}}^{(R)^{-1}}\varvec{\varSigma }_{{\textbf {xy}}}^{(R)} {\varvec{\beta }}_{k}^{(R)}&{} =\left( \rho _{k}^{(R)}\right) ^{2}{\varvec{\beta }} _{k}^{(R)}, \end{array} \end{aligned}$$
(9)

with \(\varvec{\varSigma }^{(R)}\) a robust dispersion estimator.

The canonical variables \({{\textbf {z}}}=(\alpha _{1}^{t}({\textbf {x}}-E{{\textbf {x}}}),\ldots ,\alpha _{r}^{t}({\textbf {x}}-E{\textbf {x))}}^{t}\) and \({{\textbf {w}}}=(\beta _{1}^{t}({\textbf {y- }}E{\textbf {y)}},\)..., \({\varvec{\beta }} _{r}^{t}({\textbf {y}}-E{\textbf {y))}}^{t}\) are also the best linear combinations to predict each other by making the mean squared loss \(E_F\left\| {\textbf {z-w}}\right\| ^{2}\) as small as possible (see Seber 2004, p. 260), since they solve the optimization problem

$$\begin{aligned} \left( {{\textbf {A}}}_{C},{{\textbf {B}}}_{C},{\varvec{\mu }}_{{{\textbf {x}}}},{\varvec{\mu }} _{{{\textbf {y}}}}\right) =\underset{\left( \varvec{{\bar{A}}},\varvec{{\bar{B}}}, {\varvec{\mu }},{\varvec{\nu }}\right) \in {\mathscr {C}}}{\arg \min }E_F \left\| \varvec{{\bar{A}}}\left( {{\textbf {x}}}-{\varvec{\mu }} \right) -\varvec{{\bar{B}}}({{\textbf {y}}}-{\varvec{\nu }})\right\| ^{2} \end{aligned}$$
(10)

with

$$\begin{aligned} \mathscr {C}=\left\{ \left( \varvec{{\bar{A}}},\varvec{{\bar{B}}},{\varvec{\mu }},{\varvec{\nu }}\right) :\varvec{{\bar{A}}}\in {\mathbb {R}}^{r\times p},{\bar{B}}\in {\mathbb {R}}^{r\times q},{\varvec{\mu }}\in {\mathbb {R}}^{p},{{\varvec{\nu }}}\in {\mathbb {R}} ^{q},\varvec{{\bar{A}}}\varvec{\varSigma } _{{\textbf {xx}}}\varvec{{\bar{A}}}^{t}={{\textbf {I}}} _{r}=\varvec{{\bar{B}}}\varvec{\varSigma }_{{\textbf {yy}}}\varvec{{\bar{B}}}^{t}\right\} , \nonumber \\ \end{aligned}$$
(11)

\({{\textbf {I}}}_{r}\) an \(r\times r\) identity matrix, \({\varvec{\mu }}_{{{\textbf {x}}}}=E{{\textbf {x}}}\) and \({\varvec{\mu }}_{{{\textbf {y}}}}=E{{\textbf {y}}}\). The subscript C stands for Classical.

Adrover and Donato (2015) introduced SM-estimators for canonical vectors in CCA as follows. Given the matrices \(\varvec{{\bar{A}}}\in {\mathbb {R}}^{r\times p}\) and \(\varvec{{\bar{B}}}\in {\mathbb {R}} ^{r\times q},\) let us take \({{\textbf {A}}}=\varvec{{\bar{A}}}\varvec{\varSigma }_{{\textbf {xx}} }^{1/2},\) \({{\textbf {B}}}=\varvec{{\bar{B}}}\varvec{\varSigma }_{{\textbf {yy}}}^{1/2},\) with \(\varvec{\varSigma }_{{\textbf {xx}}}\) and \( \varvec{\varSigma }_{{\textbf {yy}}}\) given in (6), \({{\textbf {D}}}=\left( \begin{array}{cc} {{\textbf {A}}}&-{{\textbf {B}}} \end{array} \right) \in {\mathbb {R}} ^{r\times m},\) \(m=p+q\) and the random vector \({{\textbf {z}}}=({{\textbf {x}}}^{t} \varvec{\varSigma }_{{\textbf {xx}}}^{-1/2},{{\textbf {y}}}^{t}\varvec{\varSigma }_{ {\textbf {yy}}}^{-1/2})^{t}\). By reformulating (10) and (11) for the standardized vectors \(\ \varvec{\varSigma }_{{\textbf {xx}}}^{-1/2}{{\textbf {x}}}\) and \(\varvec{\varSigma }_{ {\textbf {yy}}}^{-1/2}{} {\textbf {y,}}\) we have

$$\begin{aligned} \underset{\left( \varvec{{\bar{A}}},\varvec{{\bar{B}}},{\varvec{\mu }},{\varvec{\nu }}\right) \in {\mathscr {C}}}{\min }E_F \left\| \varvec{{\bar{A}}x}-\varvec{{\bar{B}}y} -(\varvec{{\bar{A}}}{\varvec{\mu }}-{\bar{B}}\varvec{\nu })\right\| ^{2} =\min _{\left( {{\textbf {D}}},{{\textbf {a}}}\right) \in {\mathscr {B}}_{r,m}^{0}}E_F \left\| {\textbf {Dz}}-{{\textbf {a}}}\right\| ^{2} \end{aligned}$$
(12)

with \({\mathscr {B}}_{r,m}^{0}=\left\{ \left( {{\textbf {D}}},{{\textbf {a}}}\right) :{{\textbf {a}}}\in {\mathbb {R}}^{r},{{\textbf {D}}}=\left( \begin{array}{cc} {{\textbf {A}}}&-{{\textbf {B}}} \end{array} \right) \in {\mathbb {R}} ^{r\times m},{{\textbf {A}}}\in {\mathscr {O}}_{r,p},{{\textbf {B}}}\in {\mathscr {O}} _{r,q}\right\} .\)

Since the covariance matrix for the standardized random vector \({{\textbf {z}}}\) is given by

$$\begin{aligned} {{\textbf {M}}}=\left( \begin{array}{cc} {{\textbf {I}}}_{p} &{} \varvec{\varSigma }_{{\textbf {xx}}}^{-1/2}\varvec{\varSigma }_{ {\textbf {xy}}}\varvec{\varSigma }_{{\textbf {yy}}}^{-1/2} \\ \varvec{\varSigma }_{{\textbf {yy}}}^{-1/2}\varvec{\varSigma }_{{\textbf {yx}}}\varvec{ \varSigma }_{{\textbf {xx}}}^{-1/2} &{} {{\textbf {I}}}_{q} \end{array} \right) , \end{aligned}$$
(13)

to evaluate the “largeness”of the “residuals”\(\left\| {{\textbf {A}}}{\varvec{{\tilde{x}}}}-{{\textbf {B}}}{\varvec{{\tilde{y}}}}-{{\textbf {a}}}\right\| ^{2},\) an M-scale \( \sigma =\sigma ({{\textbf {A}}},{{\textbf {B}}},{{\textbf {a}}})\) is computed implicitly through

$$\begin{aligned} \sigma ({{\textbf {A}}},{{\textbf {B}}},{{\textbf {a}}})=\min \left\{ \sigma >0:E_{F}\chi \left( \frac{\left\| {{\textbf {A}}}{\varvec{{\tilde{x}}}}-\textbf{B}{\varvec{{\tilde{y}}}}-{{\textbf {a}}}\right\| ^{2}}{\sigma }\right) \le \delta \right\} , \end{aligned}$$
(14)

with \(\chi :[0,\infty )\rightarrow [0,1]\), \(\varvec{\varSigma }_{{\textbf {xx }}}^{(R)}\) and \(\varvec{\varSigma }_{{\textbf {yy}}}^{(R)}\) robust dispersion estimators for \(\varvec{\varSigma }_{{\textbf {xx}}}\) and \(\varvec{\varSigma }_{ {\textbf {yy}}}\), respectively, \(\varvec{{\tilde{x}}}=\left( \varvec{\varSigma }_{ {\textbf {xx}}}^{(R)}\right) ^{-1/2}{{\textbf {x}}}\), \(\varvec{{\tilde{y}}=}\) \( \left( \varvec{\varSigma }_{{\textbf {yy}}}^{(R)}\right) ^{-1/2}{{\textbf {y}}}\) and \( \left( \left( \begin{array}{cc} {{\textbf {A}}}&-{{\textbf {B}}} \end{array} \right) ,{{\textbf {a}}}\right) {\in }{\mathscr {B}}_{r,m}^{0}\). Then, the robust standardized SM-canonical vectors are defined through the equation

$$\begin{aligned} \left( {{\textbf {A}}}^{o}_{SM},{{\textbf {B}}}^{o}_{SM},{{\textbf {a}}}^{o}_{SM}\right) =\arg \min _{\left( {{\textbf {A}}},{{\textbf {B}}},{{\textbf {a}}}\right) \in {\mathscr {B}} _{r,m}^{0}}\sigma ({{\textbf {A}}},{{\textbf {B}}},{{\textbf {a}}}), \end{aligned}$$
(15)

and the final SM-canonical vectors are defined as

$$\begin{aligned} {{\textbf {A}}}_{SM}={{\textbf {A}}}^{o}_{SM}\left( \varvec{\varSigma }_{{\textbf {xx}} }^{(R)}\right) ^{-1/2}\text {, }{{\textbf {B}}}_{SM}={{\textbf {B}}}^{o}_{SM}\left( \varvec{\varSigma }_{{\textbf {yy}}}^{(R)}\right) ^{-1/2}. \end{aligned}$$

If we have a random sample \({{\textbf {z}}}_{1},\ldots ,{{\textbf {z}}}_{n}\) and \(F=F_{n}\) stands for the empirical distribution function based on \({{\textbf {z}}}_{1},\ldots , {{\textbf {z}}}_{n}\), the sample version of the estimates is simply obtained by replacing the population expectation by the empirical expectation, that is, \({\hat{\sigma }}\) is the robust scale based on the sample given in (14) and \(\left( \varvec{{\hat{A}}}_{SM}^{o},\varvec{{\hat{B}}}_{SM}^{o},\varvec{{\hat{a}}}_{SM}^{o}\right) \) the solutions to (15) using \({\hat{\sigma }}\), that is

$$\begin{aligned} \left( \varvec{{\hat{A}}}_{SM}^{o},\varvec{{\hat{B}}}_{SM}^{o},\varvec{{\hat{a}}} _{SM}^{o}\right)= & {} \arg \min _{\left( A,B,{{\textbf {a}}}\right) \in {\mathscr {B}} _{r,m}^{0}}{\hat{\sigma }}({{\textbf {A}}},{{\textbf {B}}},{{\textbf {a}}}), \end{aligned}$$
(16)
$$\begin{aligned} \varvec{{\hat{A}}}_{SM}= & {} \varvec{{\hat{A}}}_{SM}^{o}\left( \varvec{{\hat{\varSigma }}}_{{\textbf {xx}}}^{(R)}\right) ^{-1/2}\text {, }\varvec{{\hat{B}}} _{SM}=\varvec{{\hat{B}}}_{SM}^{o}\left( \varvec{{\hat{\varSigma }}}_{{\textbf {yy}} }^{(R)}\right) ^{-1/2}. \end{aligned}$$
(17)

The algorithm to compute the SM-estimators is easily derived from the fact that we have a constrained minimization and the Lagrange multipliers method applies (Adrover and Donato 2015). In the context of sparsity, (10) was also considered by Wilms and Croux (2015) as well as a robust proposal in Wilms and Croux (2016).

The outstanding robust performance of (5) and (15) suggests the study of asymptotic properties. Li and Chen (1985) dealt with a robust procedure by considering a robust dispersion S rather than the Var operator in the optimization scheme for PCA, Cui et al. (2003) obtained the asymptotic distribution of the procedure, and Croux and Ruiz-Gazen (2005) tackled the problem of influence function. Draǎković et al. (2019) deals with the derivation of the asymptotic behavior for robust PCA in the context of complex elliptically symmetric distributions based on the spectral decomposition of the Maronna’s monotone multivariate dispersion estimator (Maronna 1976), extending previous results (Tyler 1981; Boente 1987; Croux and Haesbroeck 2000). Croux et al. (2017) considered trimmed estimators in the PCA context (in special the SL-estimator given in Maronna 2005 is also included) studying theoretical properties such as consistency, influence function and breakdown point. ten Berge (1979) and Lemma 3 in Adrover and Donato (2015) explored the close relationship between PCA and CCA which is given by the fact that the principal vectors of \({{\textbf {M}}}\) defined in (13) comprise the transformed canonical vectors of \(\varvec{\varSigma }\). Thus, asymptotics and influence function for the SM-estimator given by Maronna (2005) in the context of PCA is easily derived from the arguments used in the CCA case included in this paper and therefore omitted.

Anderson (1999) derived the asymptotic distribution for the canonical correlation and the canonical vectors when sampling is from the normal distribution. Taskinen et al. (2006) stated asymptotic properties for CCA based on robust estimators of the covariance matrix as in (9). Alfons et al. (2017) treated asymptotic properties for projection pursuit estimators (see Branco et al. 2005), asymptotic distribution and influence function.

In Sect. 2, we establish the consistency under elliptical distributions for the SM-estimators given in (16) and (17). In Sect. 3, the asymptotic distribution is derived for the SM-estimators and Sect. 4 analyzes the influence function (IF) for the proposal. In Sect. 5, we revise the concept of association between random variables by analyzing some proposals for robust correlation measures which exploit the concept of residual smallness as in (2) and (10). In Sect. 6, we include some concluding remarks. Some relevant proofs are deferred to the “Appendix”.

2 Consistency of SM-estimators for CCA

In the multivariate location and dispersion model (MLDM), we have an m -dimensional random vector \({{\textbf {z}}}=(z_{1},\dots ,z_{m})^{t}\) with distribution \(F_{{\varvec{\mu }},\varvec{\varSigma }}(B)=F_{{{\textbf {0}}}}\left( \varvec{ \varSigma }^{-1/2}(B-\varvec{\mu })\right) {,}\) where \(F_{{{\textbf {0}}}}\) is a known distribution in \({\mathbb {R}}^{m}\), B is a Borel set in \({\mathbb {R}} ^{m},\) \(\varvec{\mu }\in \) \({\mathbb {R}}^{m}\) and \(\varvec{\varSigma }\in S_{m},\) the set of \(m\times m\) positive definite matrices. An important case is the family of elliptical distributions. The elliptical model allows for a great variety of distributions which comprises the majority of the distributions used in practice, not only the multivariate normal distribution but also distributions without finite moments. We say that an m-dimensional random vector has an elliptical distribution if it has a density of the form

$$\begin{aligned} f({{\textbf {z}}},{\varvec{\mu }}_{0}{} {\textbf {,}}\varvec{\varSigma } _{0})=\frac{1}{(\det \varvec{ \varSigma }_{0})^{1/2}}f_{0}({\textbf {(z}}-{\varvec{\mu }}_{0}{} {\textbf {)}}^{t}\varvec{\varSigma } _{0}^{{}-1}{} {\textbf {(z}}-\mu _{0}{} {\textbf {)),}} \end{aligned}$$
(18)

where \(f_{0}: {\mathbb {R}} ^{+}\rightarrow {\mathbb {R}}^{+}\) (it is denoted by \({{\textbf {z}}}\sim E_m({\varvec{\mu }}_{0},\varvec{\varSigma }_{0})\)). If \({{\textbf {z}}}\sim E_m({{\textbf {0}}},{\varvec{I}})\), then \({{\textbf {a}}}^{t}{{\textbf {z}}}\) has the same distribution for all \({{\textbf {a}}}\in S^{m-1}=\{{{\textbf {a}}}\in {\mathbb {R}}^{m}{} {\textbf {:||a||=}}1{\}}\). In case of having \({\textbf { z=(x}}^{t},{{\textbf {y}}}^{t})^{t}\sim E_m({\varvec{\mu }}_{0},\varvec{\varSigma }_{0})\), with the location and dispersion parameters partitioned as

$$\begin{aligned} {\varvec{\mu }}_{0}=\left( \begin{array}{c} {\varvec{\mu }}_{0,{{\textbf {x}}}} \\ {\varvec{\mu }}_{0,{{\textbf {y}}}} \end{array} \right) \text { and } \varSigma _{0}=\left( \begin{array}{cc} \varvec{\varSigma }_{0,{\textbf {xx}}} &{} \varvec{\varSigma }_{0,{\textbf {xy}}} \\ \varvec{\varSigma }_{0,{\textbf {yx}}} &{} \varvec{\varSigma }_{0,{\textbf {yy}}} \end{array} \right) \end{aligned}$$

respectively, then \({{\textbf {x}}}\sim E_m({\varvec{\mu }}_{0,{{\textbf {x}}}},\varvec{\varSigma }_{0,{\textbf {xx}}})\) and \({{\textbf {y}}}\sim E_m({\varvec{\mu }}_{0,{{\textbf {y}}}},\varvec{\varSigma }_{0,{\textbf {yy}}})\). Let us now take the random vector \({{\textbf {z}}}_0=({{\textbf {x}}}^{t} \varvec{\varSigma }_{0,{\textbf {xx}}}^{-1/2},{{\textbf {y}}}^{t}\varvec{\varSigma }_{ 0,{\textbf {yy}}}^{-1/2})^{t}\), then \({{\textbf {z}}}_0\sim E_m(\tilde{{\varvec{\mu }}}_{0},{\varvec{M}}_{0})\), with the location and dispersion parameters partitioned as

$$\begin{aligned} \tilde{{\varvec{\mu }}}_{0}=\left( \begin{array}{c} {\varvec{\mu }}_{0,{{\textbf {x}}}}\varvec{\varSigma }_{0,{\textbf {xx}}}^{-1/2}, \\ {\varvec{\mu }}_{0,{{\textbf {y}}}}\varvec{\varSigma }_{0,{\textbf {yy}}}^{-1/2}, \end{array} \right) \text { and } {{\textbf {M}}}_0=\left( \begin{array}{cc} {{\textbf {I}}}_{p} &{} \varvec{\varSigma }_{0,{\textbf {xx}}}^{-1/2}\varvec{\varSigma }_{ 0,{\textbf {xy}}}\varvec{\varSigma }_{0,{\textbf {yy}}}^{-1/2} \\ \varvec{\varSigma }_{0,{\textbf {yy}}}^{-1/2}\varvec{\varSigma }_{0,{\textbf {yx}}}\varvec{ \varSigma }_{0,{\textbf {xx}}}^{-1/2} &{} {{\textbf {I}}}_{q} \end{array} \right) . \end{aligned}$$

If \({{\textbf {z}}}\sim E_m({\varvec{\mu }}_{0},\varvec{\varSigma }_{0})\) has finite second moments, then \(E{{\textbf {z}}}={\varvec{\mu }}_0\), the covariance matrix \(\varvec{\varSigma }\) and \(\varvec{\varSigma }_{0}\) are equal up to a constant, that is, \(\varvec{ \varSigma }=c\varvec{\varSigma }_{0}\) for some positive constant c, and \({{\textbf {M}}}_0={{\textbf {M}}}\) with \({{\textbf {M}}}\) given in (13). For the sake of simplicity, we will only keep the notation \({{\textbf {M}}}\) for either \({{\textbf {M}}}\) or \({{\textbf {M}}}_0\) since they coincide in case of having finite second moments. From now on, we will use the symbols either \(\varvec{\varSigma }_{{\textbf {xx}}}\) and \(\varvec{\varSigma }_{{\textbf {yy}}}\) to refer to the multivariate dispersion parameters at the elliptical model. The possibility of considering a general elliptical distribution rather than a multivariate normal distribution let deal with a broader scenario for modeling.

Let us take the spectral decomposition for \({{\textbf {M}}}\), \({{\textbf {M}}} =\sum _{i=1}^{p+q}\gamma _{i}^0{{\textbf {t}}}_{i}^{0}\left( {{\textbf {t}}}_{i}^{0}\right) ^{t}\), with eigenvalues \(\gamma _{1}^0\ge \cdots \ge \gamma _{p+q}^0 \ge 0\) and eigenvectors \(\left\{ {{\textbf {t}}}_{j}^{0}\right\} _{j=1}^{p+q},\) \(\left( {{\textbf {t}}}_{i}^{0}\right) ^{t}{{\textbf {t}}}_{j}^{0}=\delta _{ij},1\le i,j\le p+q,\) with \(\delta _{ij}\) the Kronecker delta. Then, \({{\textbf {M}}}={{\textbf {P}}} _{0}\varvec{\varGamma }_{0}{{\textbf {P}}}_{0}^{t},\) with \(\varvec{\varGamma } _{0}=diag\left( \gamma _{1}^{0},\gamma _{2}^{0},\ldots ,\gamma _{p+q}^{0}\right) \) and \({{\textbf {P}}}_{0}\) an orthogonal matrix whose columns are \({{\textbf {t}}} _{1}^{0},\ldots ,{{\textbf {t}}}_{p+q}^{0}\). Take \({{\textbf {v}}}_{i}^{0}\in {\mathbb {R}}^{p}\) and \({{\textbf {w}}}_{i}^{0}\in \) \({\mathbb {R}} ^{q}\), \(i=1,\ldots ,p+q,\) as \({{\textbf {t}}}_{i}^{0}=(({{\textbf {v}}} _{i}^{0})^t,({{\textbf {w}}}_{i}^{0})^t)^{t}\). Then, let us call

$$\begin{aligned} \begin{aligned} {{\textbf {A}}}_{o}&=\left( \frac{{{\textbf {v}}}_{p+q-r+1}^{0}}{\left\| {{\textbf {v}}}_{p+q-r+1}^{0}\right\| },\dots ,\frac{{{\textbf {v}}}_{p+q}^{0}}{\left\| {{\textbf {v}}}_{p+q}^{0}\right\| } \right) ^{t}\in {\mathbb {R}}^{r\times p},\\ {{\textbf {B}}}_{o}&=\left( \frac{{{\textbf {w}}}_{p+q-r+1}^{0}}{\left\| {{\textbf {w}}}_{p+q-r+1}^{0}\right\| },\dots ,\frac{{{\textbf {w}}}_{p+q}^{0}}{\left\| {{\textbf {w}}}_{p+q}^{0}\right\| } \right) ^{t}\in {\mathbb {R}}^{r\times q},\\ {{\textbf {a}}}_{o}&={{\textbf {A}}}_{o}\varvec{\varSigma }_{{\textbf {xx}}}^{-1/2}{\varvec{\mu }}_{{{\textbf {x}}}}-{{\textbf {B}}}_{o}\varvec{\varSigma }_{{\textbf {yy}}}^{-1/2}{\varvec{\mu }}_{{{\textbf {y}}}},\\ \sigma _{o}&=\sigma ({{\textbf {A}}}_{o},{{\textbf {B}}}_{o},{{\textbf {a}}}_{o}{} {\textbf {)}}. \end{aligned} \end{aligned}$$

A zero \((r\times s)\)-matrix is a matrix all of whose entries are zero and we denote it as \({{\textbf {0}}}_{r\times s}\). Given square matrices \({{\textbf {N}}}_i\in {\mathbb {R}}^{p_i\times p_i}\), \(i=1,\ldots ,k\), set the matrix \({{\textbf {N}}}\) \(=\) \(diag({{\textbf {N}}}_1,\ldots ,{{\textbf {N}}}_k)\) \(\in \) \({\mathbb {R}}^{\sum _{i=1}^k p_i\times \sum _{i=1}^kp_i}\) whose entries off the blocks \({{\textbf {N}}}_1,\ldots ,{{\textbf {N}}}_k\) are zero. \(\left\{ {{\textbf {f}}} _{1}^{(s)},\dots ,{{\textbf {f}}}_{s}^{(s)}\right\} \) stands for the canonical basis in \( {\mathbb {R}}^{s}.\)

If \({{\textbf {w}}}\in {\mathbb {R}} ^{m}\) and \({\mathscr {L}}_{m}\) stands for the set of distributions of \({{\textbf {w}}}\) denoted by \(\mathscr {L}({{\textbf {w}}}{)}\), we call a multivariate location and dispersion functional to an application \(({\textbf {T,}}S{\textbf {): }}{\mathscr {L}}_{m}\rightarrow {\mathbb {R}}^{m}\times {\mathbb {R}} ^{m\times m}\) such that (i) \(S({\mathscr {L}}({{\textbf {w}}}))\in S_{m},\) (ii) it is affine equivariant, i.e., given a nonsingular matrix \({{\textbf {G}}}\in {\mathbb {R}}^{m\times m}\) and a vector \({{\textbf {b}}}\in {\mathbb {R}}^{m},\) \({{\textbf {T}}}({\mathscr {L}}({\textbf {Gw}}+{{\textbf {b}}}))\) \(=\) \({\textbf {GT}}({\mathscr {L}}({{\textbf {w}}}))+{{\textbf {b}}}\), \(S({\mathscr {L}}({\textbf {Gw}}+{{\textbf {b}}}))\) \(=\) \({{\textbf {G}}}S({\mathscr {L}}({{\textbf {w}}})){{\textbf {G}}}^{t}{} {\textbf {.}}\)

Let us take location and dispersion functionals \(({{\textbf {T}}},S)\) such that \(( {{\textbf {T}}}_{{{\textbf {x}}}},S_{{{\textbf {x}}}})=({{\textbf {T}}}(\mathscr {L}({{\textbf {x}}} {)})\), \(S(\mathscr {L}({{\textbf {x}}}{)}))\) and \(({{\textbf {T}}}_{ {{\textbf {y}}}},S_{{{\textbf {y}}}})=({{\textbf {T}}}(\mathscr {L}({{\textbf {y}}}{)}),S( \mathscr {L}({{\textbf {y}}}{)}))\), respectively. Take \({{\textbf {x}}}_{c}= {\textbf {x-T}}_{{{\textbf {x}}}}\), \(\varvec{{\bar{x}}}\) \(=\) \({{\textbf {S}}}_{{{\textbf {x}}}}^{-1/2}{{\textbf {x}}} _{c}\), \({{\textbf {y}}}_{c}={{\textbf {y}}}-{{\textbf {T}}}_{{{\textbf {y}}}}\), \(\varvec{{\bar{y}} =S}_{{{\textbf {y}}}}^{-1/2}{{\textbf {y}}}_{c}\) and \({{\textbf {z}}}_{c}={\textbf {z-T}}_{ {{\textbf {z}}}}{} {\textbf {.}}\)

The functional \(\left( {\textbf {T,S}}\right) \) for the location and dispersion parameters at MLDM is said to be Fisher consistent if \( {{\textbf {T}}}(F_{{\varvec{\mu }}, \varvec{\varSigma }})={\varvec{\mu }}\) and \({{\textbf {S}}}(F_{ {\varvec{\mu }},\varvec{\varSigma }})=\varvec{\varSigma } \). Adrover and Donato (2015) gave the definition of a Fisher consistent CCA functional, and they showed the Fisher consistency of the SM-estimator for CCA. For this purpose, Fisher consistent functionals for \(\varvec{\varSigma }_{{\textbf {xx}}} \) and \(\varvec{\varSigma }_{{\textbf {yy}}}\) are required.

In order to deal with SM-estimators well defined as well as consistent and asymptotically normal, some conditions are required:

  1. C0

    \({{\textbf {M}}}\) has eigenvalues \((\gamma _1^0,\ldots ,\gamma _m^0)\in \varGamma \), with

    $$\begin{aligned} \varGamma= & {} \left\{ (\gamma _1,\ldots ,\gamma _{p+q}): \gamma _{1}>\cdots>\gamma _{r+1}\ge \cdots \ge \gamma _{p+q-r}>\cdots >\gamma _{p+q}\ge 0\right\} . \end{aligned}$$
  2. C1

    \(\chi \left( \cdot \right) \) is nondecreasing.

  3. C2

    \(\chi (x)\) is left continuous for \(x>0\).

  4. C3

    \(\chi (0)=0.\)

  5. C4

    \(\chi \) is continuous in 0.

  6. C5

    \(\lim _{x\rightarrow \infty }\chi (x)=1.\)

  7. C6

    There exists \(c_{0}\in \left( 0,\infty \right) \) such that \(\chi (x)<1\) if \(0\le x<c_0\).

  8. C7

    \(\chi (x)=1\) for \(c_{0}<x<\infty \), \(c_0\) as in (C6).

In order to obtain the asymptotic behavior of the estimates, it is also assumed the following conditions regarding the model density \(f_0\) as well as the parameters \(\varvec{\varSigma }_{{\textbf {xx}}}\) and \(\varvec{\varSigma }_{{\textbf {yy}}}\), which are estimated before the SM-procedure applies.

C8:

\(\ f_{0}\) is nonincreasing, and there exist \(\xi \le \infty \) such that \(f_{0}(x)>0\) if \(x<\xi \) and \(f_{0}(x)=0\) if \(x>\xi .\)

C9:

Let \(\xi \) be as in C8. The functions \(f_{0}(.)\) and \(\chi \left( .\right) \) have at least a common point of strict monotonicity, that is, there exists \(d<\xi \) and a nondegenerate interval \(\ I\) such that \( d\in I\) and for all \(u,v\in I\) with \(u<d<v\) it holds that \(\chi (u)<\chi (d)<\chi (v)\) and \(f_{0}(u)>f_{0}(d)>f_{0}(v)\).

C10:

Let \(\xi \) and d be as in C8 and C9 and \(\varGamma \) as in C0. Let \(\left\{ \lambda _{1},\ldots ,\lambda _{r}\right\} \subset \varGamma \) such that \(\lambda _{j}\ge \lambda _{j+1}\) and \(\varLambda =diag\left( \lambda _{1},\dots ,\lambda _{r},0,\dots ,0\right) \). Let \(\sigma (\varLambda )\) be such that

$$\begin{aligned} \int \chi \left( {{\textbf {w}}}^{t}\varLambda {{\textbf {w}}}/\sigma (\varLambda ) \right) f_{0}\left( {{\textbf {w}}}^{t}{{\textbf {w}}}\right) d{{\textbf {w}}}{} {\textbf {=}} \delta \end{aligned}$$

and \(\sigma _{0}\) \(=\) \(\min _{\varLambda }\left\{ \sigma (\varLambda )\right\} \), then \(d\sigma _{0}/\gamma _{m-1}^{0} <\xi .\)

C11 a.:

Let \({{\textbf {z}}}_{1},\ldots ,{{\textbf {z}}}_{n}\) be a random sample in \({\mathbb {R}}^{m}\) from an elliptical distribution with density (18). Then, \( \left\{ \varvec{{\hat{\varSigma }}}_{{\textbf {xx}}}^{(R)}\right\} _{n=p+1}^{\infty }\) \(\subseteq S_{p}\) and \(\left\{ \varvec{{\hat{\varSigma }}}_{{\textbf {yy}} }^{(R)}\right\} _{n=q+1}^{\infty }\subseteq S_{q}\) based on \( {{\textbf {z}}}_{1},\ldots ,{{\textbf {z}}}_{n}\ \) are consistent estimators to \(\varvec{ \varSigma }_{{\textbf {xx}}}\) and \(\varvec{\varSigma }_{{\textbf {yy}}}\), respectively.

C11 b.:

If \(({{\textbf {x}}}^{t},{{\textbf {y}}}^{t})^{t}\sim F\) with density (18), then the multivariate dispersion functionals \(\varvec{\varSigma } _{{\textbf {xx}}}^{(R)}(F)=\varvec{\varSigma }_{{\textbf {xx}}}\) and \(\varvec{\varSigma } _{{\textbf {yy}}}^{(R)}(F)=\varvec{\varSigma }_{{\textbf {yy}}}.\)

C0 allows for a simpler way of presenting the consistency because of having eigenspaces of dimension 1. The conditions C1–C7 regarding to the loss function \(\chi \) are similar to the ones considered in the robust literature for redescending estimators (see Davies 1987; Maronna 2005; Maronna et al. 2019). C1 keeps the monotonicity displayed by the square loss function in the classical case, letting larger residuals have larger weights. C2 allows for the minimum rather than the infimum in the definition of M-scale given in (5). C3 stands for the fact that a zero residual has zero weight. C4 stands for the intuitive fact that residuals coming smaller and smaller cannot have positive weights above a threshold while C3 holds. C5 summarizes the fact of getting a bounded \(\chi \), which is instrumental to get robust estimators able to cope with a large proportion of outlying observations. C6 and C7 are required for technical reasons to derive the consistency. C8, C9 and C10 are crucial for the Fisher-consistency: they prevent from having other parameters different from the elliptical model parameters yielding the minimum in (15). C11 stands for the consistency and Fisher consistency for the preliminary estimators and functionals corresponding to the parameters \(\varvec{\varSigma }_{{\textbf {xx}}}\) and \(\varvec{\varSigma }_{{\textbf {yy}}}\). The limitations regarding the loss function \(\chi \) are not relevant to the practice, since they do not impose restrictions on the data distribution. The loss function affects the asymptotic properties as well as the robustness properties that the estimators will possess.

Next, a useful concept in the robustness literature is included, whose fulfillment is required to ensure that (12) can be solved properly. We say that a sample \(\left\{ {{\textbf {z}}}_{1},\ldots ,{{\textbf {z}}}_{n}\right\} \) is in r-general position if any linear manifold \({{\textbf {Z}}}_{n}=\left\{ {{\textbf {z}}}\in {\mathbb {R}}^{m}:{\textbf {Cz=a}}\right\} \) with \(\left( {{\textbf {C}}},{{\textbf {a}}}\right) \in {\mathscr {B}}_{r,m}^{1}\) has at most \(m-r+1\) points from the sample.

Lemma 1

Let \({{\textbf {z}}}_{1},\ldots ,{{\textbf {z}}}_{n}\) be a random sample in \({\mathbb {R}} ^{m}\) from an elliptical distribution with density (18). Let us suppose that conditions C1-C7 hold. If the sample \(\left\{ {{\textbf {z}}}_{1},\ldots ,{{\textbf {z}}}_{n}\right\} \) is in r-general position and \(m-r+1<n(1-\delta )\), then (12) has at least one solution with probability 1.

To show the consistency of a sequence of SM-estimators, Theorem 4.2, p. 665 of Rao (1962) is required. A useful generalization of this result is as follows.

Lemma 2

Let \({{\textbf {z}}}_{1},\ldots ,{{\textbf {z}}}_{n}\) be a random sample in \({\mathbb {R}} ^{m}\) from an elliptical distribution F with density (18). Let us suppose that conditions C1–C7 hold. Let \(S_{m,o}\subset {\mathbb {R}} ^{m\times m}\) be the set of nonnegative symmetric matrices and \(F_{n}\) stands for the empirical distribution function based on the sample \({{\textbf {z}}}_{1},\ldots ,{{\textbf {z}}}_{n}\). Then, it holds that

$$\begin{aligned} P\left( \lim _{n\rightarrow \infty }\sup _{\begin{array}{c} {{\textbf {G}}}\in S _{m,o} \\ {\varvec{\mu }}\in {\mathbb {R}}^{m} \\ \sigma \in \left( 0,\infty \right) \end{array}}\left| E_{F_{n}}\chi \left( \frac{\left( {{\textbf {z}}}-\mu \right) ^{t}{{\textbf {G}}}\left( {{\textbf {z}}}-\mu \right) }{\sigma }\right) -E_{F}\chi \left( \frac{\left( {{\textbf {z}}}-\mu \right) ^{t}{{\textbf {G}}}\left( {{\textbf {z}}}-\mu \right) }{\sigma }\right) \right| =0\right) =1. \end{aligned}$$

In what follows, let \(\varvec{{\hat{\varSigma }}}_{{\textbf {xx}}}^{(R)}\) and \( \varvec{{\hat{\varSigma }}}_{{\textbf {yy}}}^{(R)}\) be consistent estimators to \( \varvec{\varSigma }_{{\textbf {xx}}}\) and \(\varvec{\varSigma }_{{\textbf {yy}}}\), respectively. The Fisher consistency, the existence of the estimator for random samples and Lemma 2 entail the consistency of the SM-estimators, and it is omitted.

Theorem 1

Let \({{\textbf {z}}}_{1},\ldots ,{{\textbf {z}}}_{n}\) be a random sample in \({\mathbb {R}} ^{m}\) from an elliptical distribution F with density (18). Suppose that conditions C0-C11a. hold and \(m-r+1<n\left( 1-\delta \right) .\) Let \(\left( \varvec{{\hat{A}}}_{SM}^{o},\varvec{{\hat{B}}}_{SM}^{o},{{\varvec{{\hat{a}}}}}_{SM}^{o}\right) \) be solutions of (16), then

$$\begin{aligned} \lim _{n\rightarrow \infty }\left( \varvec{{\hat{A}}}_{SM}^{o},\varvec{{\hat{B}}} _{SM}^{o},\varvec{{\hat{a}}}_{SM}^{o}\right)= & {} \left( {{\textbf {A}}}_{o},{\textbf { B}}_{o},{{\textbf {a}}}_{o}\right) \text { a. s.}, \\ \lim _{n\rightarrow \infty }\left( \varvec{{\hat{A}}}_{SM}^{o}\left[ \varvec{ {\hat{\varSigma }}}_{{\textbf {xx}}}^{(R)}\right] ^{-1/2},\varvec{{\hat{B}}}_{SM}^{o} \left[ \varvec{{\hat{\varSigma }}}_{{\textbf {yy}}}^{(R)}\right] ^{-1/2},\varvec{{\hat{a}}}_{SM}^{o}\right)= & {} \left( {{\textbf {A}}}_{o}\varvec{\varSigma }_{{\textbf {xx}} }^{-1/2},{{\textbf {B}}}_{o}\varvec{\varSigma }_{{\textbf {yy}}}^{-1/2},{{\textbf {a}}} _{o}\right) \text { a. s.} \end{aligned}$$

Proof

It is deferred to the “Appendix”. \(\square \)

The previous theorem has shown the consistency of the vectors minimizing the M-scale. To derive the asymptotic behavior of the SM-estimators, we consider the critical points obtained from the constraint minimization

$$\begin{aligned} h({{\textbf {A}}},{{\textbf {B}}},{{\textbf {a}}},\varvec{\varTheta },\varvec{\varXi })= & {} \sigma ( {{\textbf {A}}},{{\textbf {B}}},{{\textbf {a}}})+tr({\textbf {AA}}^{t}\varvec{\varXi }^{t})+tr( {\textbf {BB}}^{t}\varvec{\varTheta }^{t})-tr(\varvec{\varTheta }^{t})-tr(\varvec{\varXi }^{t}) \nonumber \\ \end{aligned}$$
(19)

with \(\varvec{\varTheta ,\varXi \in {\mathbb {R}}}^{r\times r}\). Thus, we take a Lagrangian whose set of critical points contain the critical points of \( h({{\textbf {A}}},{{\textbf {B}}},{{\textbf {a}}},\varvec{\varTheta },\varvec{\varXi })\) in (19), that is, we consider

$$\begin{aligned} \sigma ({{\textbf {D}}},{\textbf {a)+}}tr({{\textbf {D}}}{{\textbf {D}}} ^{t}\varvec{\varLambda }^{t})-tr\left( \varvec{\varLambda }^{t}\right) ,{{\textbf {D}}}=\frac{1}{\sqrt{2}}\left( \begin{array}{cc} {{\textbf {A}}}&-{{\textbf {B}}} \end{array}\right) \in {\mathscr {B}}_{r,m}^{1},\varvec{\varLambda }\in {\mathbb {R}}^{r\times r}. \end{aligned}$$
(20)

Let us introduce some notation for the covariance matrices later used in the derivation of asymptotics. Set the parameters \(\widetilde{\varvec{\varSigma }}{\small =}diag\left( \varvec{\varXi }^{-1/2}, \varvec{\varTheta }^{-1/2}\right) \) with \(\varvec{\varXi }{\small \in }S_{p}\) and \(\varvec{\varTheta }\in S_{q}\). Let us take the functionals \(\varvec{{\tilde{\varSigma }}}(H) =\) \(diag\left( \left[ \varvec{\varSigma }_{{\textbf {xx}}}^{(R)}(H)\right] ^{-1/2},\left[ \varvec{\varSigma }_{{\textbf {yy}}}^{(R)}(H)\right] ^{-1/2}\right) \) and \(\widetilde{\varvec{\varSigma }}_{\varepsilon }= \widetilde{\varvec{\varSigma }}(F_{\varepsilon })\). Consider the estimators \(\widehat{\widetilde{\varvec{\varSigma }}}=\varvec{{\tilde{\varSigma }}} (F_{n})\), \(\varvec{\varSigma }_{{\textbf {xx}}}^{(R)}(F_{n}) =\varvec{{\hat{\varSigma }}}_{ {\textbf {xx}}}^{(R)}\) and \(\varvec{\varSigma }_{{\textbf {yy}}}^{(R)}(F_{n})= \varvec{{\hat{\varSigma }}}_{{\textbf {yy}}}^{(R)}\). If we are dealing with Fisher consistent functionals, we have that \(\varvec{{\tilde{\varSigma }}}(F)\) \(=\) \(\varvec{{\tilde{\varSigma }}}_{o}\) \(=\) \(diag\left( \varvec{\varSigma }_{{\textbf {xx}}}^{-1/2},\varvec{\varSigma }_{{\textbf {yy}}}^{-1/2}\right) \).

Take \(\varvec{\tilde{{\textbf {z}}}}=\left( \widetilde{{\textbf {x}}}^t, \widetilde{{\textbf {y}}}^t \right) ^t\), by deriving (20) we get the equivalent system,

$$\begin{aligned} \begin{array}{rrl} \displaystyle \frac{1}{n}\sum _{i=1}^{n}\chi ^{\prime }\left( \frac{\left\| {{\textbf {D}}} \varvec{\tilde{{\textbf {z}}}}_{i}-{{\textbf {a}}}\right\| ^{2}}{\sigma }\right) \left( {{\textbf {D}}}\varvec{\tilde{{\textbf {z}}}}_{i}-{{\textbf {a}}}\right) &{}=&{}{{\textbf {0}}} _{r}, \\ \left[ {{\textbf {I}}}_{m}-{{\textbf {D}}}^{t}{{\textbf {D}}}\right] \displaystyle \frac{1}{n}\sum _{i=1}^{n}\chi ^{\prime }\left( \frac{\left\| {{\textbf {D}}} \varvec{\tilde{{\textbf {z}}}}_{i}-{{\textbf {a}}}\right\| ^{2}}{\sigma }\right) \varvec{\tilde{{\textbf {z}}}}_{i}\left( {{\textbf {D}}}\varvec{\tilde{{\textbf {z}}}}_{i}-{{\textbf {a}}} \right) ^{t} &{}=&{}{{\textbf {0}}}_{m\times r}. \end{array} \end{aligned}$$
(21)

In case of having a sequence of consistent estimators initializing an iterative procedure to come up with a sequence of critical points solving (21), we can ensure that the sequence of critical points is also consistent in the CCA context. This is a result also available for some robust regression methods, and the proof is similar to that of those procedures [see, for instance, Theorem 3.2 in Yohai (1987), Theorem 4.1 in Yohai and Zamar (1988)].

Proposition 1

Let \({{\textbf {z}}}_{1},\ldots ,{{\textbf {z}}}_{n}\) be a random sample in \({\mathbb {R}} ^{m}\) from an elliptical distribution F with density (18). Suppose that conditions C0-C10 hold, moreover \(\chi \) is twice differentiable and concave in \(\left[ 0,c_{0}\right] ,\) \(c_{0}\) given in C6. If \(m-r+1<n\left( 1-\delta \right) ,\) \(\left( \varvec{{\hat{A}}} _{o}^{(0)},\varvec{{\hat{B}}}_{o}^{(0)},\varvec{{\hat{a}}}^{(0)}\right) \) is a sequence of consistent estimators for \(\left( {{\textbf {A}}}_{o},{{\textbf {B}}}_{o}, {{\textbf {a}}}_{o}\right) \) and the solutions of (21) \(\ \left( \varvec{{\hat{A}}}^{o},\varvec{{\hat{B}}} ^{o},\varvec{{\hat{a}}}^{o}\right) \) verify that \({\hat{\sigma }}\left( \varvec{{\hat{A}}}^{o},\varvec{{\hat{B}}}^{o},\varvec{{\hat{a}}} ^{o}\right) \le {\hat{\sigma }}\left( \varvec{{\hat{A}}}_{o}^{(0)},\varvec{\hat{B }}_{o}^{(0)},\varvec{{\hat{a}}}^{(0)}\right) ,\) then \(\lim _{n\rightarrow \infty }\left( \varvec{{\hat{A}}}^{o},\varvec{{\hat{B}}}^{o}, \varvec{{\hat{a}}}^{o}\right) =\left( {{\textbf {A}}}_{o},{{\textbf {B}}}_{o},{{\textbf {a}}} _{o}\right) \text { almost surely}.\)

3 Asymptotic behavior of SM-estimators for CCA

3.1 Notation

To deal with the asymptotic behavior of the SM-estimators, let us introduce some notation. The parameters in our setting are taken from the sets

$$\begin{aligned} \begin{array}{l} {\mathscr {G}}={\mathscr {O}}_{r,p}\times {\mathscr {O}}_{r,q}\times {\mathbb {R}}^{r}\times S_{p}\times S_{q}\times (0,\infty ), \\ \overline{{\mathscr {G}}}={ {\mathbb {R}}}^{r\times m}\times {\mathbb {R}}^{r}\times S_{p}\times S_{q}\times (0,\infty ), \end{array} \end{aligned}$$
(22)

The asymptotic distribution of SM-estimators and its influence curve, which is a useful tool to evaluate the behavior under infinitesimal contaminations, need to be established over some set of distributions. Let \({\mathscr {D}}\) be the set of distributions on \({\mathbb {R}}^{m}\), and \(G_{n}\) stands for the empirical distribution function based on n points in \({\mathbb {R}}^{m}\). Call \({{\mathscr {F}}}_{n}\) the subset of such distributions. Thus, let as define the following subsets of distributions,

$$\begin{aligned}{} & {} {\mathscr {E}} =\left\{ F\in {\mathscr {D}}:F\text { is elliptical}\right\} , \\{} & {} C_{\varepsilon }(\mathscr {E},{\mathscr {F}}_{1}{)} =\left\{ G\in {\mathscr {D}}:G=(1-\varepsilon )F+\varepsilon \delta _{{{\textbf {z}}} },F\in {\mathscr {E}}\text {,}\delta _{{{\textbf {z}}}}\in {\mathscr {F}}_{1},{{\textbf {z}}}\in {\mathbb {R}}^{m}\right\} ,\varepsilon \in \left[ 0,1\right] . \end{aligned}$$

Let \({\mathscr {H}}\) be a subset of \({\mathscr {D}}\) such that \(\mathscr {E}\cup {\mathscr {F}}_{n}\cup C_{\varepsilon }\left( {\mathscr {E}},{\mathscr {F}} _{1}\right) {\subset {\mathscr {H}}}\). Then, the SM-functional \(\varvec{{\tilde{\theta }}}{} {\textbf {:}}{\mathscr {H}} {{\rightarrow }}{\mathscr {G}}\) is defined as

$$\begin{aligned} \varvec{{\tilde{\theta }}}(H)&{\textbf {=}}&{\textbf {(A}} _{SM}^{o}(H),{{\textbf {B}}}_{SM}^{o}(H),{{\textbf {a}}}_{SM}^{o}(H),\varvec{\varSigma }_{{\textbf { xx}}}^{(R)}(H),\varvec{\varSigma }_{{\textbf {yy}}}^{(R)}(H),\sigma _{o}(H)). \end{aligned}$$

If \(vec:{\mathbb {R}}^{m\times r}\rightarrow {\mathbb {R}} ^{mr}\) stands for the operator which vectorizes a matrix by stacking the columns on top of each other, we establish some useful notation corresponding to parameters. Set \(\varvec{{\tilde{\theta }}} =\left( {{\textbf {A}}},{\textbf {B,a,}}\varvec{\varXi },\varvec{\varTheta },\sigma \right) \) and \( \varvec{\theta } =\left( \left[ vec \left( {{\textbf {D}}}^{t}\right) \right] ^{t}, {{\textbf {a}}}^{t}\right) ^{t} \). The SM-estimators turn out to be \({{\hat{{\textbf {D}}}}}_{SM}\) \(=\) \(\frac{1}{\sqrt{2}}\) \(\left( \begin{array}{cc} {{\hat{{\textbf {A}}}}}_{SM}^{o}&- {{\hat{{\textbf {B}}}}}_{SM}^{o} \end{array} \right) \) and \(\varvec{{\hat{\theta }}}\) \(=\) \(\varvec{{\tilde{\theta }}}(F_{n}) =\left( \right. {{\hat{{\textbf {A}}}}}_{SM}^{o}\), \({{\hat{{\textbf {B}}}}}_{SM}^{o}\),\({{\hat{{\textbf {a}}}}}_{SM}\), \(\varvec{{\hat{\varSigma }}}_{{\textbf {xx}}}^{(R)}\),\(\varvec{{\hat{\varSigma }}}_{{\textbf {yy}} }^{(R)}\),\(\left. {\hat{\sigma }}\right) \), with \(F_{n}\in {\mathscr {F}}_{n}\). Let us now denote some functionals, \({{\textbf {D}}}_{SM} \left( \cdot \right) \) \(=\) \(1/\sqrt{2}\) \(\left( \begin{array}{cc} {{\textbf {A}}}_{SM}^{o}{} {\textbf {(}}\cdot )&- {{\textbf {B}}}_{SM}^{o}{} {\textbf {(}}\cdot ) \end{array} \right) \), \({{\textbf {D}}}_{\varepsilon }={{\textbf {D}}}_{SM}\left( F_{\varepsilon }\right) \), \(\varvec{{\tilde{\theta }}}(F_{\varepsilon }) = \varvec{{\tilde{\theta }}} _{\varepsilon }\), with \(F_{\varepsilon }\in C_{\varepsilon }\left( {\mathscr {E}},{\mathscr {F}}_{1}\right) \), \(\varvec{{\tilde{\theta }}}_{o} =\left( {{\textbf {A}}}_{o},{{\textbf {B}}}_{o} {\textbf {,a}}_{o}\varvec{,\varSigma }_{{\textbf {xx}}},\varvec{\varSigma }_{{\textbf {yy}} },\sigma _{o}\right) \) and \({{\textbf {D}}}_{o}=\frac{1}{\sqrt{2}}\left( \begin{array}{cc} {{\textbf {A}}}_{o}&-{{\textbf {B}}}_{o} \end{array} \right) \). Take \(\varvec{\theta }_{o}\) \(=\) \(\left( \left[ vec \left( {{\textbf {D}}}_{o}^{t}\right) \right] ^{t},{{\textbf {a}}}_{o}^{t}\right) ^{t}\), \(\varvec{\theta }_{SM}\) \(=\) \(\left( \left[ vec\left( {{\textbf {D}}}_{SM}^{t}(\cdot )\right) \right] ^{t},{{\textbf {a}}}^{t}_{SM}(\cdot )\right) ^{t}\).

If \({{\textbf {D}}}^{*}={{\textbf {D}}}\widetilde{\varvec{\varSigma }},\) set \(\varvec{\theta }^{*}=\left( \left\{ vec\left[ \left( {{\textbf {D}}}^{*}\right) ^{t}\right] \right\} ^{t}, {{\textbf {a}}}^{t}\right) ^{t} \). If \(\varvec{{\hat{D}}}_{SM}^{*}=\varvec{{\hat{D}}}_{SM}\widehat{ \widetilde{\varvec{\varSigma }}}\), set

$$\begin{aligned} \varvec{{\hat{\theta }}}_{SM}= & {} \left( \left[ vec\left( {{\hat{{\textbf {D}}}}}^{t}_{SM}\right) \right] ^{t}, {{\hat{{\textbf {a}}}}}_{SM}^{t} \right) ^{t} \end{aligned}$$
(23)
$$\begin{aligned} \varvec{{\hat{\theta }} }^{*}_{SM}= & {} \left( \left\{ vec\left[ \left( \hat{{{\textbf {D}}}}^{*}_{SM}\right) ^{t}\right] \right\} ^{t}, \hat{{{\textbf {a}}}}^{t}_{SM}\right) ^{t},\end{aligned}$$
(24)
$$\begin{aligned} {{\hat{{\textbf {v}}}}}_{SM,k}= & {} \left( {{\hat{{\textbf {A}}}}}_{SM}\right) ^{t} {{\textbf {f}}}_{k}^{(r)} \text { and } {{\hat{{\textbf {w}}}}}_{SM,k}=\left( {{\hat{{\textbf {B}}}}}_{SM}\right) ^{t} {{\textbf {f}}}_{k}^{(r)}, \text { for } k=1,\ldots ,r. \end{aligned}$$
(25)

If \({{\textbf {D}}}_{o}^{*}={{\textbf {D}}} _{o}\widetilde{\varvec{\varSigma }}_{o}\), put \(\varvec{\theta }^{*}_{o}=\left( \left\{ vec\left[ \left( {{\textbf {D}}}^{*}_{o}\right) ^{t}\right] \right\} ^{t}, {{\textbf {a}}}^{t}_{o}\right) ^{t}\), \({{\textbf {v}}}_{k}\) \(=\) \(\left( \varvec{\varSigma }_{{\textbf {xx}}}^{-1/2} {{\textbf {A}}}_{o}^t\right) {{\textbf {f}}}_{k}^{(r)}\), and \({{\textbf {w}}}_{k}=\left( \varvec{\varSigma }_{{\textbf {yy}}}^{-1/2} {{\textbf {B}}}_{o}^t\right) {{\textbf {f}}}_{k}^{(r)}\), \(k=1,\ldots ,r\). Then, we can rewrite (21) in the following way,

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^{n}\chi ^{\prime }\left( \frac{\left\| {{\textbf {D}}} ^{*}{{\textbf {z}}}_{i}-{{\textbf {a}}}\right\| ^{2}}{\sigma }\right) \left( {{\textbf {D}}}^{*}{{\textbf {z}}}_{i}-{{\textbf {a}}}\right)= & {} {{\textbf {0}}}_{r}, \\ \widehat{\widetilde{\varvec{\varSigma }}}^{-1}\left[ \widehat{ \widetilde{\varvec{\varSigma }}}^{2}-\left( {{\textbf {D}}}^{*}\right) ^{t} {{\textbf {D}}}^{*}\right] \frac{1}{n}\sum _{i=1}^{n}\chi ^{\prime }\left( \frac{\left\| {{\textbf {D}}}^{*}{{\textbf {z}}}_{i}-{{\textbf {a}}}\right\| ^{2} }{\sigma }\right) {{\textbf {z}}}_{i}\left( {{\textbf {D}}}^{*}{{\textbf {z}}}_{i}- {{\textbf {a}}}\right) ^{t}= & {} {{\textbf {0}}}_{m\times r}. \end{aligned}$$

Then, let us take the functions \(\phi _{1}:{\mathbb {R}} ^{m}\times {\mathscr {G}}\rightarrow {\mathbb {R}}^{rm},\)\(\phi _{2}: {\mathbb {R}}^{m}\times {\mathscr {G}}\rightarrow {\mathbb {R}}^{r},\) \({\bar{\phi }}_{1}:{\mathbb {R}} ^{m}\times {\bar{\mathscr {G}}}\rightarrow {\mathbb {R}}^{rm},\)\({\bar{\phi }}_{2}:{\mathbb {R}} ^{m}\times {\bar{\mathscr {G}}}\rightarrow {\mathbb {R}}^{r}\) and \({\bar{\phi }}: {\mathbb {R}}^{m}\times {\bar{\mathscr {G}}}\rightarrow {\mathbb {R}}^{r(m+1)}\) as follows,

$$\begin{aligned} \phi _{1}\left( {{\textbf {z}}},{\varvec{\tilde{\theta }}}\right)= & {} \chi ^{\prime }\left( \frac{\left\| {{\textbf {D}}}\varvec{\tilde{\varvec{\varSigma }}}{{\textbf {z}}}- {{\textbf {a}}}\right\| ^{2}}{\sigma }\right) vec\left[ \left( {{\textbf {I}}} _{m}- {{\textbf {D}}}^{t}{{\textbf {D}}}\right) \varvec{\tilde{ \varSigma }}{{\textbf {z}}}\left( {{\textbf {D}}}\varvec{{\tilde{\varSigma }}} {{\textbf {z}}}-{{\textbf {a}}}\right) ^{t}\right] \nonumber \\ \phi _{2}\left( {{\textbf {z}}},{\varvec{\tilde{\theta }}}\right)= & {} \chi ^{\prime }\left( \frac{\left\| {{\textbf {D}}}\varvec{{\tilde{\varSigma }}}{{\textbf {z}}}- {{\textbf {a}}}\right\| ^{2}}{\sigma }\right) \left( {{\textbf {D}}}\varvec{ {\tilde{\varSigma }}}{{\textbf {z}}}-{{\textbf {a}}}\right) , \quad \phi =\left( \begin{array}{c} \phi _{1} \\ \phi _{2} \end{array} \right) , \end{aligned}$$
(26)
$$\begin{aligned} {\bar{\phi }}_{1}\left( {\textbf {z,D}}^{*}{} {\textbf {,a,}}\sigma \right)= & {} \chi ^{\prime }\left( \frac{\left\| {{\textbf {D}}}^{*}{{\textbf {z}}}- {{\textbf {a}}}\right\| ^{2}}{\sigma }\right) vec\left[ \varvec{{\tilde{\varSigma }}} ^{-1}\left( \varvec{{\tilde{\varSigma }}}^{2}-\left( {{\textbf {D}}} ^{*}\right) ^{t}{{\textbf {D}}}^{*}\right) {{\textbf {z}}}\left( {{\textbf {D}}} ^{*}{{\textbf {z}}}-{{\textbf {a}}}\right) ^{t}\right] \\ {\bar{\phi }}_{2}\left( {\textbf {z,D}}^{*}{} {\textbf {,a,}}\sigma \right)= & {} \chi ^{\prime }\left( \frac{\left\| {{\textbf {D}}}^{*}{{\textbf {z}}}-{{\textbf {a}}}\right\| ^{2}}{\sigma }\right) \left( {{\textbf {D}}}^{*}{{\textbf {z}}}-{{\textbf {a}}}\right) , \quad {\bar{\phi }}=\left( \begin{array}{c} {\bar{\phi }}_{1} \\ {\bar{\phi }}_{2} \end{array} \right) \end{aligned}$$

The corresponding expected values and covariance matrices are denoted by

$$\begin{aligned}{} & {} \varPhi _{1}: {\mathscr {G}} \rightarrow {\mathbb {R}}^{r}, \varPhi _{2}:{\mathscr {G}}\rightarrow {\mathbb {R}}^{rm}\text { and }\varPhi :{\mathscr {G}} \rightarrow {\mathbb {R}}^{r(m+1)} \\{} & {} {\bar{\varPhi }}_{1}: {\bar{\mathscr {G}}} \rightarrow {\mathbb {R}}^{r}, {\bar{\varPhi }}_{2}:{\bar{\mathscr {G}}} \rightarrow {\mathbb {R}}^{rm}\text { and }{\bar{\varPhi }}:{\bar{\mathscr {G}}} \rightarrow {\mathbb {R}}^{r(m+1)} \\{} & {} \varPhi _{h}\left( {\varvec{\tilde{\theta }}}\right) = E_{F}\left( \phi _{h}\left( {{\textbf {z}}},{\varvec{\tilde{\theta }}}\right) \right) ,h=1,2;\varPhi \left( {\varvec{\tilde{\theta }}}\right) =E_{F}\phi \left( {{\textbf {z}}},{\varvec{\tilde{\theta }}}\right) =\left( \varPhi _{1}^{t}\left( {\varvec{\tilde{\theta }}}\right) ,\varPhi _{2}^{t}\left( {\varvec{\tilde{\theta }}}\right) \right) ^{t} \\{} & {} {\bar{\varPhi }}_{h}\left( {\varvec{\tilde{\theta }}}\right) = E_{F}\left( {\bar{\phi }}_{h}\left( {{\textbf {z}}},{\varvec{\tilde{\theta }}}\right) \right) ,h=1,2; {\bar{\varPhi }}\left( {\varvec{\tilde{\theta }}}\right) =E_{F}{\bar{\phi }}\left( {{\textbf {z}}},{\varvec{\tilde{\theta }}}\right) =\left( {\bar{\varPhi }}_{1}^{t}\left( {\varvec{\tilde{\theta }}}\right) ,{\bar{\varPhi }}_{2}^{t}\left( {\varvec{\tilde{\theta }}}\right) \right) ^{t} \\{} & {} V({\varvec{\tilde{\theta }}}) = E_{F}\phi \left( {{\textbf {z}}},{\varvec{\tilde{\theta }}}\right) \left( \phi \left( {{\textbf {z}}},{\varvec{\tilde{\theta }}}\right) \right) ^{t}\text { and }V_{o}=V({\varvec{\tilde{\theta }}}_{o}). \end{aligned}$$

3.2 Conditions, asymptotic normality and variances

To deal with the asymptotic behavior of SM-estimators, let us introduce some extra conditions.

C12:

\(\chi :[0,\infty )\rightarrow \left[ 0,1\right] \) is twice continuously differentiable.

C13:

\(E_{F}\left[ \left\| {{\textbf {z}}}\right\| ^{8}\right] <\infty .\)

C14:

\(\varPhi :{\mathbb {R}}^{r(m+1)}\rightarrow {\mathbb {R}}^{r(m+1)}\) and \({\bar{\varPhi }}:{\mathbb {R}}^{r(m+1)}\rightarrow {\mathbb {R}} ^{r(m+1)}\) are continuously differentiable and the matrices

$$\begin{aligned} \varvec{\varOmega }=\frac{\partial \varPhi \left( \varvec{\theta }\right) }{\partial \varvec{\theta }}\left( \tilde{\varvec{\theta }}_{o}\right) \in {\mathbb {R}}^{r(m+1)\times r\left( m+1\right) }\text { and }\varvec{\varOmega }^{*}= \frac{\partial {\bar{\varPhi }}\left( \varvec{\theta }^{*}\right) }{ \partial \varvec{\theta }^{*}}\left( \tilde{\varvec{\theta }}_{o}\right) \in {\mathbb {R}}^{r(m+1)\times r\left( m+1\right) } \end{aligned}$$

are nonsingular.

The usual way to derive the asymptotic normality of an estimator obtained as zero of a equation, \({{\textbf {0}}} =\frac{1}{n}\sum _{i=1}^{n}\phi \left( {{\textbf {z}}}_{i}{,{\varvec{{\hat{\theta }}}}}\right) \) is to consider a Taylor expansion of the form

$$\begin{aligned} \varPhi \left( \varvec{{\tilde{\theta }}}\right) =\varPhi \left( \varvec{{\tilde{\theta }}_o}\right) + \frac{\partial \varPhi \left( \varvec{\theta }\right) }{\partial \varvec{\theta }}\left( \varvec{{\tilde{\theta }}_o}\right) \left( \varvec{\theta }-\varvec{\theta _o}\right) + R\left( \varvec{{\tilde{\theta }}}\right) \left( \varvec{\theta }- \varvec{\theta _o}\right) , \end{aligned}$$
(27)

where \(R\left( \varvec{{\tilde{\theta }}}\right) \rightarrow 0\) as \(\varvec{{\tilde{\theta }}}\rightarrow \varvec{{\tilde{\theta }}}_o\). After some manipulations, by summing up and subtracting some terms, we come up with an expression of the form

$$\begin{aligned} {{\textbf {0}}} =\varPhi (\hat{\varvec{\theta }})-\varPhi (\tilde{\varvec{\theta }}_o)+ {{\textbf {Z}}}_n + {{\textbf {W}}}_n,\end{aligned}$$
(28)

where \(\sqrt{n}{{\textbf {Z}}}_n\) converges in distribution to a multivariate normal distribution with \({{\textbf {0}}}\) mean and covariance matrix \(E\phi \phi ^t\) and \({{\textbf {W}}}_n=o_P\left( 1/\sqrt{n}\right) \) by arguments from the empirical processes theory. C13 is a restrictive condition on the model to obtain that the functions given in (26) belong to a Euclidean class (Pakes and Pollard 1989) and obtain the behavior of \({{\textbf {W}}}_n\). By joining (27) and (28), we get the asymptotic distribution of \(\sqrt{n}\left( \varvec{{\hat{\theta }}}_{SM}-\varvec{\theta }_o\right) \). This brief account for the asymptotic normality makes clearer the need for conditions C12 and C14. Kudraszow and Maronna (2010, 2011) used a similar approach to derive the asymptotic distribution of MM-estimators for the multivariate regression model.

If F is elliptically countered with density (18),  let us call \({{\textbf {r}}}_{o}\left( {{\textbf {z}}}\right) ={{\textbf {D}}}_{o}^{*} {\textbf {z-a}}_{o}\), \({{\textbf {C}}}\) \(=\) \(\sum _{1\le i,k\le r}\left[ {{\textbf {f}}}_{i}^{(r)}\left( {{\textbf {f}}}_{k}^{(r)}\right) ^{t}\otimes {{\textbf {t}}}_{i}{{\textbf {t}}} _{k}^{t}\right] \), \(\varvec{\varLambda } _{o}=diag\left( \gamma ^0_{1},\ldots ,\gamma _{r}^{0}\right) \), \({{\textbf {P}}}_{r}={{\textbf {I}}}_{m}- {{\textbf {D}}} _{o}^{t}{{\textbf {D}}}_{o}\). Then, under C12 and C14, \(\varvec{\varOmega }\) and \(\varvec{\varOmega } ^{*}\) turn out to be

$$\begin{aligned} \varvec{\varOmega }=\left( \begin{array}{cc} \varvec{\varOmega }_{11} &{} \varvec{\varOmega }_{12}\\ \varvec{\varOmega }_{21} &{} \varvec{\varOmega }_{22} \end{array} \right) \text { and }\varvec{\varOmega }_{o}^{*}=\left( \begin{array}{cc} \varvec{\varOmega }_{11}\left( {{\textbf {I}}}_{r}\otimes \left( \varvec{{\tilde{\varSigma }}} _{o}\right) ^{-1}\right) &{} \varvec{\varOmega }_{12} \\ \varvec{\varOmega }_{21}\left( {{\textbf {I}}}_{r}\otimes \left( \varvec{\tilde{ \varSigma }}_{o}\right) ^{-1}\right) &{} \varvec{\varOmega }_{22} \end{array} \right) , \end{aligned}$$
(29)

with \(\varvec{\varOmega }_{11}\in {\mathbb {R}}^{rm\times rm}\), \(\varvec{\varOmega }_{12}\in {\mathbb {R}}^{rm\times r}\), \(\varvec{\varOmega }_{21}\in {\mathbb {R}}^{r\times rm}\) and \(\varvec{\varOmega }_{22}\in {\mathbb {R}}^{r\times r}\). Take the matrix

$$\begin{aligned} \varvec{\varPsi }({{\textbf {z}}}) = \frac{2}{\sigma _{o}}\chi ^{\prime \prime }\left( \frac{\left\| {{\textbf {r}}}_{o}\left( {{\textbf {z}}}\right) \right\| ^{2}}{\sigma _{o}}\right) {{\textbf {r}}}_{o}\left( {{\textbf {z}}} \right) {{\textbf {r}}}_{o}^{t}\left( {{\textbf {z}}}\right) +\chi ^{\prime }\left( \frac{\left\| {{\textbf {r}}}_{o}\left( {{\textbf {z}}}\right) \right\| ^{2}}{\sigma _{o}}\right) {{\textbf {I}}}_{r}. \end{aligned}$$

Thus, we get

$$\begin{aligned} \varvec{\varOmega }_{11}= & {} \left( {{\textbf {I}}}_{r}\otimes {{\textbf {P}}} _{r}\right) E_{F}\left[ \varvec{\varPsi }({{\textbf {z}}}) \otimes \left( \varvec{{\tilde{\varSigma }}}_{o}{{\textbf {z}}}\right) \left( \varvec{{\tilde{\varSigma }}}_{o}{{\textbf {z}}}\right) ^{t}\right] -\left( \varvec{\varLambda }_{o}\otimes {{\textbf {I}}}_{m}\right) \left( {{\textbf {I}}} _{rm}+{{\textbf {C}}}\right) , \\ \varvec{\varOmega }_{22}= & {} -E_{F}\varvec{\varPsi }({{\textbf {z}}}), \\ \varvec{\varOmega }_{12}= & {} -\left( {{\textbf {I}}}_{r}\otimes {{\textbf {P}}}_{r}\right) E_F\left[ \varvec{\varPsi }({{\textbf {z}}})\otimes \left( \varvec{{\tilde{\varSigma }}}_{o}{{\textbf {z}}}\right) \right] , \\ \varvec{\varOmega }_{21}= & {} E_F\left[ \varvec{\varPsi }({{\textbf {z}}})\otimes \left( \varvec{{\tilde{\varSigma }}}_{o}{{\textbf {z}}}\right) ^t\right] . \end{aligned}$$

The matrix \({{\textbf {V}}}_{o}=E_{F}\phi \left( {{\textbf {z}}},\varvec{{\tilde{\theta }}} _{o}\right) \left( \phi \left( {{\textbf {z}}},{\varvec{\tilde{\theta }}}_{o}\right) \right) ^{t}\) turns out to be \({{\textbf {V}}}_{o}=\left( \begin{array}{cc} {{\textbf {V}}}_{11} &{} {{\textbf {V}}}_{12} \\ {{\textbf {V}}}_{21}&{} {{\textbf {V}}}_{22} \end{array} \right) ,\) with \({{\textbf {V}}}_{11}\in {\mathbb {R}}^{rm\times rm}\), \({{\textbf {V}}}_{12}\in {\mathbb {R}}^{rm\times r}\) and \({{\textbf {V}}}_{22}\in {\mathbb {R}} ^{r\times r},\) that is,

$$\begin{aligned} {{\textbf {V}}}_{11}= & {} E_{F}\left\{ \left[ \chi ^{\prime }\left( \frac{ \left\| {{\textbf {r}}}_{o}\left( {{\textbf {z}}}\right) \right\| ^{2}}{ \sigma _{o}}\right) {{\textbf {r}}}_{o}\left( {{\textbf {z}}}\right) {{\textbf {r}}} _{o}^{t}\left( {{\textbf {z}}}\right) \right] \otimes \left[ {{\textbf {P}}}_{r}\left( \varvec{{\tilde{\varSigma }}}_{o}{{\textbf {z}}}\right) \left( \varvec{ {\tilde{\varSigma }}}_{o}{{\textbf {z}}}\right) ^{t}{{\textbf {P}}}_{r}\right] \right\} , \\ {{\textbf {V}}}_{22}= & {} E_{F}\left[ \chi ^{\prime }\left( \frac{\left\| {{\textbf {r}}}_{o}\left( {{\textbf {z}}}\right) \right\| ^{2}}{\sigma _{o}} \right) {{\textbf {r}}}_{o}\left( {{\textbf {z}}}\right) {{\textbf {r}}} _{o}^{t}\left( {{\textbf {z}}}\right) \right] , \\ {{\textbf {V}}}_{12}= & {} E_{F}\left\{ \left[ \chi ^{\prime }\left( \frac{ \left\| {{\textbf {r}}}_{o}\left( {{\textbf {z}}}\right) \right\| ^{2}}{ \sigma _{o}}\right) {{\textbf {r}}}_{o}\left( {{\textbf {z}}}\right) {{\textbf {r}}} _{o}^{t}\left( {{\textbf {z}}}\right) \right] \otimes \left[ {{\textbf {P}}}_r\left( \varvec{\tilde{\varSigma }}_{o}{{\textbf {z}}}\right) \right] \right\} . \end{aligned}$$

The derivation of the asymptotic behavior of the SM-estimator is based on the empirical processes theory (see Pakes and Pollard 1989) and the following theorem establishes the convergence in distribution for the SM-estimators.

Theorem 2

Let \({{\textbf {z}}}_{1},\ldots ,{{\textbf {z}}}_{n}\) be a random sample in \({\mathbb {R}} ^{m}\) from an elliptical distribution F with density (18), location parameter \({\varvec{\mu }}_{0}\) and dispersion parameter \(\varvec{ \varSigma }_{0}\). Suppose that conditions C0-C14 hold. If \(\ \varvec{{\hat{\theta }}}_{SM}\) as in (23), \( \varvec{{\hat{\theta }}}_{SM}^*\) as in (24), \(\varvec{{\hat{v}}}_{SM,k}\) and \(\varvec{{\hat{w}}}_{SM,k}\) as in (25) are sequences of SM-estimators and \( \overset{{\mathscr {D}}}{\rightarrow }\) stands for convergence in distribution, then it holds that

  1. (i)

    \(\sqrt{n}\left( \varvec{{\hat{\theta }}}_{SM}-\varvec{\theta } _{o}\right) \overset{{\mathscr {D}}}{\rightarrow }N_{r(m+1)}\left( {\textbf { 0,V}}\right) \), where the asymptotic covariance matrix is given by \({{\textbf {V}}} =\varvec{\varOmega }^{-1}{{\textbf {V}}}_{o} \varvec{\varOmega }^{-t},\)

  2. (ii)

    \(\sqrt{n}\left( \varvec{{\hat{\theta }}}_{SM}^{*}-\varvec{\theta }_{o}^{*}\right) \overset{{\mathscr {D}}}{\rightarrow }N_{r(m+1)}\left( {\textbf {0,V}} ^{*}\right) \), with the asymptotic covariance matrix given by \({\textbf {V}}^{*}=\left( \varvec{\varOmega } ^{*}\right) ^{-1}{\varvec{V}}_{o}\left( \varvec{\varOmega }^{*}\right) ^{-t},\)

  3. (iii)

    \( \sqrt{n}\left( \varvec{{\hat{v}}}_{SM,k}-{{\textbf {v}}}_{k}\right) \overset{{\mathscr {D}}}{\rightarrow }N_{p}\left( {\textbf {0,V}}_{\alpha ,k}^{*}\right) \) and \(\sqrt{n}\left( \varvec{{\hat{w}}}_{SM,k}-{{\textbf {w}}}_{k}\right) \overset{{\mathscr {D}}}{\rightarrow }N_{q}\left( {\textbf {0,V}}_{\beta ,k}^{*}\right) ,\) for \(k =1,\ldots ,r\), with

    $$\begin{aligned} {{\textbf {V}}}_{\alpha ,k}^{*}= & {} \sum _{i,j=1}^{p}{{\textbf {f}}}_{j}^{(p)} {{\textbf {f}}}_{(k-1)m+j}^{(m)t}{\varvec{V}}^{*}{{\textbf {f}}}_{(k-1)m+i}^{(m)}\left( {{\textbf {f}}}_{i}^{(p)}\right) ^{t},\\ {{\textbf {V}}}_{\beta ,k}^{*}= & {} \sum _{i,j=1}^{q}{{\textbf {f}}}_{j}^{(q)}{{\textbf {f}}}_{km-q+j}^{(m)t}{\varvec{V}}^{*} {{\textbf {f}}}_{km-q+i}^{(m)}\left( {{\textbf {f}}}_{i}^{(q)}\right) ^{t}. \end{aligned}$$
  4. (iv)

    If \(\varvec{\varOmega }_{22}\) is a nonsingular matrix, then

$$\begin{aligned} {{\textbf {V}}}_{\alpha ,k}^{*}= & {} \varvec{\varSigma }_{{\textbf {xx}}}^{-1/2} {{\textbf {V}}}_{\alpha ,k}\varvec{\varSigma }_{{\textbf {xx}}}^{-1/2},~\text {with}~{{\textbf {V}}}_{\alpha ,k}=\sum _{i,j=1}^{p}{{\textbf {f}}}_{j}^{(p)}\left( {{\textbf {f}}} _{(k-1)m+j}^{(m)}\right) ^{t}{{\textbf {V}}}{{\textbf {f}}}_{(k-1)m+i}^{(m)}\left( {{\textbf {f}}}_{i}^{(p)}\right) ^{t} \\ {{\textbf {V}}}_{\beta ,k}^{*}= & {} \varvec{\varSigma }_{{\textbf {yy}}}^{-1/2} {{\textbf {V}}}_{\beta ,k}\varvec{\varSigma }_{{\textbf {yy}}}^{-1/2},~\text {with}~{{\textbf {V}}}_{\beta ,k}=\sum _{i,j=1}^{q}{{\textbf {f}}}_{j}^{(q)}\left( {{\textbf {f}}} _{km-q+j}^{(m)}\right) ^{t}{{\textbf {V}}}{{\textbf {f}}}_{km-q+i}^{(m)}\left( {{\textbf {f}}}_{i}^{(q)}\right) ^{t}. \end{aligned}$$

Proof

It is deferred to the “Appendix”. \(\square \)

Remark 1

The asymptotics for the SM-estimators in the PCA case given in (5) derive in a similar manner to that of (i) in Theorem 2.

3.3 Asymptotic relative efficiency in the Gaussian case

In order to assess the loss of efficiency under normality, the asymptotic covariance of the SM-estimators is compared with that of the maximum likelihood estimator under normality. Let \(\varvec{{\hat{v}}}_{SM,k}\) and \( \varvec{{\hat{w}}}_{SM,k}\) be the k-th canonical vectors as in (25) and \({{\varvec{{\hat{v}}}}}_{C,k}\) and \(\varvec{{\hat{w}}}_{C,k}\) the classical estimators obtained by solving (7) and (8) with the population covariances replaced by the empirical ones. Let \({{\textbf {V}}}_{\alpha ,k}^{*}\left( F\right) \) and \({{\textbf {V}}}_{\alpha ,k}^{*,C}\left( F\right) \) be the asymptotic variances of \(\varvec{{\hat{v}}}_{SM,k}\) and \( \varvec{{\hat{v}}}_{C,k}\) when the underlying distribution is F (\( {{\textbf {V}}}_{\beta ,k}^{*}\left( F\right) \) and \({{\textbf {V}}}_{\beta ,k}^{*,C}\left( F\right) \) for \(\varvec{{\hat{w}}}_{SM,k}\) and \( \varvec{{\hat{w}}}_{C,k}\), respectively). \({{\textbf {V}}}_{\alpha ,k}^{*,C}\left( F\right) \) can be derived from the asymptotic distribution for SM-estimators stated above by taking \(\chi \) as the identity function. Then, a measure of asymptotic relative efficiency is given by

$$\begin{aligned} ARE\left( \varvec{{\hat{v}}}_{SM,k},F\right) =\frac{trace\left( {{\textbf {V}}} _{\alpha ,k}^{*,C}\left( F\right) \right) }{trace\left( {{\textbf {V}}} _{\alpha ,k}^{*}\left( F\right) \right) },\qquad ARE\left( \varvec{{\hat{w}}}_{SM,k},F\right) =\frac{trace\left( {{\textbf {V}}}_{\beta ,k}^{*,C}\left( F\right) \right) }{trace\left( {{\textbf {V}}}_{\beta ,k}^{*}\left( F\right) \right) }. \end{aligned}$$

To illustrate the behavior of the SM-estimators with respect to the classical estimator, we take \(F=N_{4}\left( {{\textbf {0}}},\varvec{\varSigma }\right) \), with the partitioned matrix \(\varvec{\varSigma }\) given in (6) and \(\varvec{\varSigma }_{{\textbf {xx}}}=\varvec{\varSigma }_{{\textbf {yy}}}= {{\textbf {I}}}_{2},\varvec{\varSigma }_{{\textbf {xy}}}=diag\left( 0.9,0.5\right) .\) Since \(\varvec{\varSigma }_{{\textbf {xx}}}=\varvec{\varSigma }_{{\textbf {yy}}}={{\textbf {I}}}_{2}\), we have that \(\varvec{\varOmega =\varOmega }^{*}\) and \({{\textbf {V}}} ^{*}={{\textbf {V}}}.\) Thus, the efficiency for SM-estimators for canonical vectors coincides with that of the standardized SM-estimators.

Table 1 Asymptotic relative efficiency for the SM-estimators of canonical vectors as function of the parameter \(\delta \)

Table 1 displays the asymptotic relative efficiency of \(\varvec{{\hat{v}}} _{SM,k}\) and \(\varvec{{\hat{w}}}_{SM,k}\) for different values of \( \delta \) in the grid \(G=\left\{ 0.5,0.45,0.4,0.35,0.3\right\} \). The larger the parameter \(\delta ,\) the smaller the relative efficiency, since the parameter \(\delta \) is related to the robustness of the estimator whose breakdown point increases with \(\delta \) (Adrover and Donato 2015). The second canonical vector seems to be much more affected by the trade-off between high breakdown point and efficiency.

4 Robustness measures: qualitative robustness and influence function

4.1 Qualitative robustness

The qualitative robustness was introduced by Hampel (1971) and the concept captures the desirable fact that a robust estimator based on random samples from close distributions should induce close distributions either for any sample size. More precisely, a sequence of estimators \(\left\{ T_{n}\right\} _{n=1}^{\infty }\) is qualitatively robust in the probability measure F if and only if \(\forall \varepsilon >0\exists \) \(\delta >0 \text { and }\forall G,\forall n\), it holds that \(d_{PR}\left( F,G\right) <\delta \) \(\implies \) \(d_{PR}\left( {\mathscr {L}} _{F}\left( T_{n}\right) ,{\mathscr {L}}_{G}\left( T_{n}\right) \right) <\varepsilon ,\) where \(d_{PR}\) stands for the Prohorov distance between probability measures and \({\mathscr {L}}_{F}\left( T_{n}\right) \) is the distribution induced by \( T_{n}\) as the random sample comes from F.

A related concept is the continuity of estimators. Hampel (1971) proves that estimators \(\left\{ T_{n}\right\} _{n=1}^{\infty }\) obtained from a continuous functional on the class of empirical distributions \({\mathscr {F}}_{n}\) are qualitatively robust. A sequence of estimators \(\left\{ T_{n}\right\} _{n=1}^{\infty }\) is continuous in the probability measure F if and only if for all \(\varepsilon >0\) there exists \(\delta >0\) such that there exists \(n_{o}\) and for all \(n,m\ge n_{o}\) it holds that \(F_{n}\in {\mathscr {F}}_{n}\), \(F_{m}\in {\mathscr {F}}_{m}\), \(d_{PR}\left( F,F_{n}\right) <\delta \), \(d_{PR}\left( F,F_{m}\right) <\delta \) \(\implies \) \(\left| T\left( F_{n}\right) -T\left( F_{m}\right) \right| <\varepsilon \).

Then, we proceed similarly to the proof for Theorem 1 to derive the continuity of the SM-functionals.

Theorem 3

Let \({\textbf {z=}}\left( {{\textbf {x}}}^{t},{{\textbf {y}}}^{t}\right) ^{t}\in {\mathbb {R}}^{m}\) be from an elliptical distribution with density (18). Suppose that conditions C0-C10 hold. Let us take any sequence of distributions \(\left\{ F_{n}\right\} _{n=1}^{\infty }\) such that \(F_{n}\) weakly converges to F. Assume that the dispersion functionals \(\varvec{\varSigma } _{ {\textbf {xx}}}^{(R)}\) and \(\varvec{\varSigma } _{{\textbf {yy}}}^{(R)}\) are continuous in F. Let \(({{\textbf {A}}}_{SM}^{o},{{\textbf {B}}}_{SM}^{o},{{\textbf {a}}}_{SM}^{o})\) be the SM functional for CCA defined in (15). Therefore, it holds that \(\lim _{n\rightarrow \infty }({{\textbf {A}}}_{SM}^{o}(F_{n}),{{\textbf {B}}} _{SM}^{o}(F_{n}),{{\textbf {a}}}_{SM}^{o}(F_{n}))=({{\textbf {A}}}_{SM}^{o}(F),{\textbf { B}}_{SM}^{o}(F),{{\textbf {a}}}_{SM}^{o}(F))\).

4.2 Influence functions

To quantify the effect of contamination in the estimators and make comparisons among different proposals, the IF measures the rate of change because of infinitesimal effect provided by point masses, that is, for every \({{\textbf {z}}}_{0}\in {\mathbb {R}}^{m}\), \(m\ge 1\), we have

$$\begin{aligned} IF({{\textbf {z}}}_{0},T,F)=\lim _{\varepsilon \rightarrow 0^{+}}\frac{ T((1-\varepsilon )F+\varepsilon \delta _{{{\textbf {z}}}_{0}})-T(F)}{\varepsilon } =\left. \frac{\partial }{\partial \varepsilon }T(F_{\varepsilon })\right| _{\varepsilon =0}, \end{aligned}$$

with \(F_{\varepsilon }=(1-\varepsilon )F+\varepsilon \delta _{{{\textbf {z}}}_{0}} \) and \(\delta _{{{\textbf {z}}}_{0}}\) is a point mass distribution at \({{\textbf {z}}} _{0}.\)

Adrover and Donato (2015) treated two different versions for a robust canonical correlation. The matrix \({{\textbf {M}}}=\sum _{i=1}^{p+q}\gamma _{i}^{0}{{\textbf {t}}}_{i}^{0}\left( {{\textbf {t}}}_{i}^{0}\right) ^{t}\) in (13) has its eigenvalues in decreasing order. The \(k-th\) canonical correlation is given by \(\rho _{k}^{2}=\left( \gamma _{m-k+1}^{0}-1\right) ^{2}.\) To construct the SM-functional for the square of the \(k-th\) canonical correlation, we take

$$\begin{aligned} r({\textbf {z,}}H{\textbf {)}}&{\textbf {=}}&\frac{\left\| {{\textbf {D}}} _{SM}\left( H\right) \varvec{{\tilde{\varSigma }}}\left( H\right) {\textbf {z-a}}_{SM}\right\| ^{2}}{\sigma \left( H\right) }, \quad \varvec{{\tilde{\mu }}}\left( H\right) = \frac{E_{H}\left[ \chi ^{\prime }\left( r({\textbf {z,}}H{\textbf {)}}\right) \varvec{{\tilde{\varSigma }}}\left( H\right) {{\textbf {z}}}\right] }{E_{H}\left[ \chi ^{\prime }\left( r({\textbf {z,}}H {\textbf {)}}\right) \right] }, \end{aligned}$$

Given \(R=E_{H}\left[ \chi ^{\prime }\left( r({\textbf {z,}}H{\textbf {)}}\right) \left( \varvec{{\tilde{\varSigma }}} \left( H\right) {{\textbf {z}}}-{\tilde{\mu }}\left( H\right) \right) \left( \varvec{{\tilde{\varSigma }}}\left( H\right) {{\textbf {z}}}-{\tilde{\mu }}\left( H\right) \right) ^{t}\right] \), we call \({{\textbf {M}}}_{{\textbf {xy}}}^{0}\) the matrix obtained after pre- and post-multiplying R by \(\left( \begin{array}{cc} {{\textbf {I}}}_{p}&{{\textbf {0}}}_{p\times q} \end{array} \right) \) and \(\left( \begin{array}{cc} {{\textbf {0}}}_{q\times p}&{{\textbf {I}}}_{q} \end{array} \right) ^t \). Set

$$\begin{aligned} {{\textbf {M}}}^{0}\left( H\right)= & {} \left( \begin{array}{cc} {{\textbf {I}}}_{p} &{} {{\textbf {M}}}_{{\textbf {xy}}}^{0}\left( H\right) \\ \left( {{\textbf {M}}}_{{\textbf {xy}}}^{0}\left( H\right) \right) ^{t} &{} {{\textbf {I}}}_{q} \end{array} \right) , \end{aligned}$$

and take \(\left\{ \gamma _{j}^{0}\left( H\right) ,{{\textbf {t}}}_{SM,j}^0\left( H\right) \right\} _{j=1}^{m}\) the eigenvalues and eigenvectors of the functional \({{\textbf {M}}}^{0}\left( H\right) \), that is, \({{\textbf {M}}}^{0}\left( H\right) {{\textbf {t}}}^0_{SM,m-k+1}\left( H\right) =\gamma _{m-k+1}^{0}\left( H\right) {{\textbf {t}}}_{SM,m-k+1}^0\left( H\right) .\) The SM- \(k-th\) canonical correlation is given by \(\rho _{SM,k}^{2}\left( H\right) =\left( \gamma _{m-k+1}^{0}\left( H\right) -1\right) ^{2},\text { }k=1,\ldots ,r.\)

Another concept for measuring the association between canonical variates was given by Branco et al. (2005). Given \(\left( {{\textbf {x}}} ^{t},{{\textbf {y}}}^{t}\right) ^{t}\sim H\) and \(\left( {{\textbf {v}}}_{k}^{t}(H) {{\textbf {x}}}\right. \),\(\left. {{\textbf {w}}}_{k}^{t}(H){{\textbf {y}}}\right) \) \(\sim \) \(H_{k}\), let us take a robust bivariate dispersion functional \(\varvec{\varSigma }^{(R)}\left( H_{k}\right) \) whose ij entry is \(\sigma _{ij}^{(R)}\left( H_{k}\right) \), \(1\le {i,j}\le {2}.\) Thus, a robust correlation functional is easily obtained from \(\varvec{\varSigma }^{(R)}\left( H_{k}\right) \),

$$\begin{aligned} \rho _{C,k}^{2}\left( H_{k}\right) =\left( \sigma _{12}^{(R)}\right) ^{2} / \left[ \left( \sigma _{11}^{(R)}\right) ^{2}\left( H_{k}\right) \left( \sigma _{22}^{(R)}\right) ^{2}\left( H_{k}\right) \right] . \end{aligned}$$
(30)

Let us define some quantities related to the derivation of the IF. Given \( {{\textbf {z}}}_{0}=\left( {{\textbf {x}}}_{0}^{t},{{\textbf {y}}}_{0}^{t}\right) ^{t}\in {\mathbb {R}} ^{p+q}\) and \( \left\| {{\textbf {r}}}_{o}({{\textbf {z}}})\right\| \sim {\tilde{F}}\) if \({{\textbf {z}}}\sim F,\) let \(\sigma \) be an M-scale functional based on \(\left\| {{\textbf {r}}}_{o}({{\textbf {z}}})\right\| \) defined in (5) and (14) with influence function \(IFS=IF\left( \left\| {{\textbf {r}}}_{o}({{\textbf {z}}}_{0})\right\| ,\sigma ,{\tilde{F}}\right) \). Let \(\varvec{{\tilde{\varSigma }}}\in {\mathbb {R}}^{m\times m}\) be a functional with \(IFD=IF\left( {{\textbf {z}}}_{0},vec\left( {\varvec{\tilde{\varSigma }}}\right) ,F\right) \ \). Put

$$\begin{aligned} O(\varvec{{\tilde{\varSigma }}}_{o},{{\textbf {D}}}_{o},{{\textbf {a}}}_{o},{{\textbf {z}}})= & {} \left( \left( vec\left( \varvec{{\tilde{\varSigma }}}_{o}\right) \right) ^{t}\left( {{\textbf {D}}}_{o}^{t}{{\textbf {D}}} _{o}\otimes {\textbf {zz}}^{t}\right) -\left( vec\left( {\textbf {za}}_{o}^{t} {{\textbf {D}}}_{o}\right) \right) ^{t}\right) , \\ \xi _{o}= & {} E_{F}\left[ \chi ^{\prime }\left( \left\| {{\textbf {r}}} _{o}({\textbf {z)}}\right\| ^{2}/\sigma _{o}\right) \left\| {{\textbf {r}}} _{o}({\textbf {z)}}\right\| ^{2}\right] , \end{aligned}$$

and

$$\begin{aligned} {{\textbf {T}}}_{1}= & {} E_{F}\left[ \chi ^{\prime \prime }\left( \left\| {{\textbf {r}}}_{o}({{\textbf {z}}})\right\| ^{2}/\sigma _{o}\right) \left\| {{\textbf {r}}}_{o}({{\textbf {z}}})\right\| ^{2}\left( {{\textbf {r}}}_{o}({{\textbf {z}}})\otimes P_{r}\tilde{\varvec{\varSigma }}_{o}{{\textbf {z}}}\right) \right] (IFS), \\ {{\textbf {T}}}_{2}= & {} 2E_{F}\left[ \chi ^{\prime \prime }\left( \left\| {{\textbf {r}}}_{o}({{\textbf {z}}})\right\| ^{2}/\sigma _{o}\right) \left( {{\textbf {r}}}_{o}({{\textbf {z}}})\otimes P_{r}\varvec{{\tilde{\varSigma }}}_{o}{{\textbf {z}}} \right) O(\varvec{{\tilde{\varSigma }}}_{o},{{\textbf {D}}}_{o},{{\textbf {a}}}_{o},{{\textbf {z}}})\right] (IFD), \\ {{\textbf {T}}}_{3}= & {} E_{F}\left[ \chi ^{\prime }\left( \left\| {{\textbf {r}}}_{o}({{\textbf {z}}})\right\| ^{2}/\sigma _{o}\right) \left[ \left( {{\textbf {z}}}^{t}\otimes {{\textbf {D}}}_{o}\right) \left( IFD\right) \right] \otimes \left( {{\textbf {P}}}_{r}\varvec{{\tilde{\varSigma }}}_{o}{{\textbf {z}}}\right) \right] , \\ {{\textbf {T}}}_{4}= & {} E_{F}\left[ \chi ^{\prime }\left( \left\| {{\textbf {r}}}_{o}({{\textbf {z}}})\right\| ^{2}/\sigma _{o}\right) \left( {{\textbf {r}}}_{o}( {\textbf {z)}}\otimes \left( {{\textbf {z}}}^{t}\otimes {{\textbf {I}}}_{m}\right) \right) (IFD)\right] , \\ {{\textbf {T}}}_{5}= & {} \chi ^{\prime }\left( \left\| {{\textbf {r}}}_{o}( {{\textbf {z}}}_{0})\right\| ^{2}/\sigma _{o}\right) \left( {\textbf { r}}_{o}({{\textbf {z}}}_{0}{} {\textbf {)}}\otimes {{\textbf {P}}}_{r}\varvec{{\tilde{\varSigma }}_o} {{\textbf {z}}}_{0}\right) , \\ {{\textbf {T}}}_{1,0}= & {} E_{F}\left[ \chi ^{\prime \prime }\left( \left\| {{\textbf {r}}}_{o}({{\textbf {z}}})\right\| ^{2}/\sigma _{o}\right) \left\| {{\textbf {r}}}_{o}({{\textbf {z}}})\right\| ^{2}{{\textbf {r}}}_{o}({\textbf { z)}}\right] , \\ {{\textbf {T}}}_{2,0}= & {} 2E_{F}\left[ \chi ^{\prime \prime }\left( \left\| {{\textbf {r}}}_{o}({{\textbf {z}}})\right\| ^{2}/\sigma _{o}\right) {{\textbf {r}}}_{o}({\textbf {z)}}O(\varvec{{\tilde{\varSigma }}}_{o},{{\textbf {D}}}_{o},{{\textbf {a}}}_{o}, {{\textbf {z}}})\right] , \\ {{\textbf {T}}}_{3,0}= & {} E_{F}\left[ \chi ^{\prime }\left( \left\| {{\textbf {r}}}_{o}({{\textbf {z}}})\right\| ^{2}/\sigma _{o}\right) \left( {{\textbf {z}}}^{t}\otimes {{\textbf {D}}}_{o}\right) \right] , \\ {{\textbf {T}}}_{4,0}= & {} \chi ^{\prime }\left( \left\| {{\textbf {r}}}_{o}( {{\textbf {z}}}_{0})\right\| ^{2}/\sigma _{o}\right) {{\textbf {r}}}_{o}( {{\textbf {z}}}_{0}). \end{aligned}$$

Then, we can establish the following result for the influence function.

Theorem 4

Let \({{\textbf {z}}}_{1},\ldots ,{{\textbf {z}}}_{n}\) be a random sample in \({\mathbb {R}}^{m},m=p+q,\) from an elliptical distribution F with density (18), location parameter \({\varvec{\mu }}_{0}\) and dispersion parameter \(\varSigma _{0}\). Suppose that conditions C0-C14 hold and the functionals \(\varvec{\varSigma }_{{\textbf {xx}}}^{(R)}\) and \(\varvec{\varSigma }_{ {\textbf {yy}}}^{(R)}\) are Fisher-consistent. Let \(\varvec{\varOmega }\in {\mathbb {R}}^{r\left( m+1\right) \times r\left( m+1\right) }\) be the invertible matrix defined in (29). Then,

  1. (i)

    If \({{\textbf {T}}}_{{{\textbf {v}}}} =-\frac{1}{\sigma _{o}^2} {{\textbf {T}}}_{1}+\frac{2}{\xi _{o}}\left( {{\textbf {T}}}_{2}+\sigma _{o}\left( {{\textbf {T}}}_{3}+ {{\textbf {T}}}_{4}\right) \right) +{{\textbf {T}}}_{5}\) and \({{\textbf {T}}}_{{{\textbf {a}}}} =\frac{1}{\sigma _{o}}{{\textbf {T}}}_{1,0}IFS- \frac{1}{\xi _{o}}\left( {{\textbf {T}}}_{2,0}+\right. \) \(\left. \sigma _{o}{{\textbf {T}}}_{3,0}\right) \left( IFD\right) -{{\textbf {T}}}_{4,0}\), we get \(IF\left( {{\textbf {z}}}_{0},\varvec{\theta }_{SM},F\right) \) \(=\) \(\varvec{ \varOmega }^{-1}{{\textbf {T}}}\left( {{\textbf {z}}}_{0},F\right) ,\) with \({{\textbf {T}}}\left( {{\textbf {z}}}_{0},F\right) =\left( {{\textbf {T}}}_{{{\textbf {v}}}}^{t}, {{\textbf {T}}}_{{{\textbf {a}}}}^{t} \right) ^{t} \).

  2. (ii)

    The IF for \(\rho _{SM,k}^{2}\) is \(IF\left( {{\textbf {z}}}_{0},\rho _{SM,k}^{2},F\right) \) \(=\) \(2\left( \gamma _{k}^{0}-1\right) \left( {{\textbf {t}}}_{k}^{0}\right) ^{t}IF\left( {{\textbf {z}}} _{0},{{\textbf {M}}}^{0},F\right) {{\textbf {t}}}_{k}^{0}\), \(k=1,\ldots ,r\).

  3. (iii)

    Let \(\left( x_{0},y_{0}\right) =\left( \left( {{\textbf {v}}}_{k}^0\right) ^{t}{{\textbf {x}}} _{0},\left( {{\textbf {w}}}_{k}^0\right) ^{t}{{\textbf {y}}}_{0}\right) \), \(\left( \left( {{\textbf {v}}}_{k}^0\right) ^{t}{{\textbf {x}}},\left( {{\textbf {w}}}_{k}^0\right) ^{t}{{\textbf {y}}}\right) \sim {\tilde{F}}_{k}\) for \( \left( {{\textbf {x}}}^{t},{{\textbf {y}}}^{t}\right) ^{t}\sim F\). Take the functionals \({{\textbf {v}}}_{k}\)and \({{\textbf {w}}}_{k}\) corresponding to the \(k-\)canonical vector such that there exist their IF, \( IFV_{k}=IF\left( \left( {{\textbf {x}}}_{0},{{\textbf {y}}}_{0}\right) ,{{\textbf {v}}} _{k},F\right) \) and \(IFW_{k}=IF\left( \left( {{\textbf {x}}}_{0},{{\textbf {y}}} _{0}\right) ,{{\textbf {w}}}_{k},F\right) \), respectively. Let \(g_{i+j-1}\left( {\mathscr {L}}\left( {\textbf {x,y}}\right) ,{\textbf {v,w}}\right) =\left( \sigma _{ij}^{\left( R\right) }\right) ^{2} \left( {\mathscr {L}}\left( {{\textbf {v}}}^{t} {\textbf {x,w}}^{t}{{\textbf {y}}}\right) \right) ,1\le i\le j\le 2\) such that there exist \(g_{l,k,{{\textbf {v}}}}=\frac{\partial g_{l}}{\partial {{\textbf {v}}}} \left( {\tilde{F}}_k,{{\textbf {v}}}_{k}^0,{{\textbf {w}}}_{k}^0\right) \) and \(g_{l,k, {{\textbf {w}}}}=\frac{\partial g_{l}}{\partial {{\textbf {w}}}}\left( {\tilde{F}}_k, {{\textbf {v}}}_{k}^0,{{\textbf {w}}}_{k}^0\right) \). Then, the IF of \(\rho _{C,k}^{2}\) is given by

$$\begin{aligned}{} & {} IF\left( \left( {{\textbf {x}}}_{0},{{\textbf {y}}}_{0}\right) ,\rho _{C,k}^{2},F)\right) \\{} & {} \quad =\frac{1}{g_{1}^{2}\left( {\tilde{F}}_k,{{\textbf {v}}}_{k}^0,{{\textbf {w}}}_{k}^0\right) g_{3}^{2}\left( {\tilde{F}}_k,{{\textbf {v}}}_{k}^0,{{\textbf {w}}}_{k}^0\right) } \\{} & {} \quad \times \frac{1}{2}\left\{ \left[ \sum _{\begin{array}{c} 1\le i,j,l\le 3 \\ i\ne j,i\ne l,,j\ne l \end{array}}\left( -1\right) ^{i+j}g_{i}\left( {\tilde{F}}_k, {{\textbf {v}}}_{k}^0,{{\textbf {w}}}_{k}^0\right) g_{j}\left( {\tilde{F}}_k,{{\textbf {v}}}_{k}^0,{{\textbf {w}}}_{k}^0\right) IF\left( \left( x_{0},y_{0}\right) ,g_{l}, {\tilde{F}}_k\right) \right] \right. \\ \\{} & {} \quad +\left[ \sum _{\begin{array}{c} 1\le i,j,l\le 3 \\ i\ne j,i\ne l,,j\ne l \end{array}} \left( -1\right) ^{i+j}g_{i}\left( {\tilde{F}}_k,{{\textbf {v}}}_{k}^0,{{\textbf {w}}}_{k}^0\right) g_{j}\left( {\tilde{F}}_k,{{\textbf {v}}}_{k}^0,{{\textbf {w}}}_{k}^0\right) g_{l,k,{{\textbf {v}}}}\right] IFV_{k} \\{} & {} \quad +\left. \left[ \sum _{\begin{array}{c} 1\le i,j,l\le 3 \\ i\ne j,i\ne l,,j\ne l \end{array}}\left( -1\right) ^{i+j}g_{i}\left( {\tilde{F}}_k,{{\textbf {v}}}_{k}^0, {{\textbf {w}}}_{k}^0\right) g_{j}\left( {\tilde{F}}_k,{{\textbf {v}}}_{k}^0,{{\textbf {w}}}_{k}^0\right) g_{l,k,{{\textbf {w}}}}\right] IFW_{k}\right\} . \end{aligned}$$

Proof

It is deferred to the “Appendix”. \(\square \)

4.3 Discussion

The term \({{\textbf {T}}}_{5}\) through the vector \({{\textbf {P}}}_{r}\varvec{\tilde{ \varSigma }}_{o}{{\textbf {z}}}_{0}\) is the only one affected by the contaminated point \({{\textbf {z}}}_{0}=\left( {{\textbf {x}}}_{0},{{\textbf {y}}}_{0}\right) \), yielding unbounded IF which coincides with the usual behavior for \(S-\)estimation. This term displays that the worst scenario for the IF yielding unboundedness is given by outliers living in the subspace orthogonal to the subspace generated by the eigenvectors of \({{\textbf {M}}}\) under the elliptical model. It is worth noting that even \(\varvec{\varSigma }_{{\textbf {xx}}}^{(R)}\) and \(\varvec{\varSigma }_{{\textbf {yy}}}^{(R)}\) with bounded IF do not guarantee bounded IF for the CCA functionals. We next display some pictures to show the behavior of the IF under two point mass contaminations. Let us consider a vector \({{\textbf {z}}}=\left( {{\textbf {x}}}^{t}, {{\textbf {y}}}^{t}\right) ^{t}\sim N_{4}\left( {\varvec{\mu }},\varvec{\varSigma } \right) \) with \({\varvec{\mu }} =\left( 0.5,1,1.5,2\right) ^{t},\) \(\varvec{\varSigma }_{{\textbf {xx}}}=\varvec{\varSigma }_{{\textbf {yy}}}= {{\textbf {I}}}_{2},\varvec{\varSigma }_{{\textbf {xy}}}=diag\left( 0.9,0.5\right) \). To ease the computation, the covariance matrices \(\varvec{\varSigma }_{{\textbf { xx}}}\) and \(\varvec{\varSigma }_{{\textbf {yy}}}\) are assumed to be known and therefore they are not estimated from the data. The eigenvectors associated with the largest eigenvalues are \({{\textbf {t}}}_{1}^0=\left( {1}/{\sqrt{2}},0,-{1}/{\sqrt{2}},0\right) ^{t}\) and \({{\textbf {t}}}_{2}^0=\left( 0,-{1}/{\sqrt{2}},0,{1}/{ \sqrt{2}}\right) ^{t},\) and the point mass is taken either \({{\textbf {z}}}_{0}=\left( x_{01},x_{02},0,0\right) \) or \({{\textbf {z}}}_{0}=\left( x_{01},0,y_{01},0\right) .\) The plots display unstandardized influence functions to depict the behavior under point masses contamination, \(\left( x_{01},x_{02}\right) \) vs \(\left\| IF\left( {{\textbf {z}}}_{0},{{\textbf {w}}} _{SM,1},F\right) \right\| \) and \(\left( x_{01},x_{02}\right) \)vs\( \left\| IF\left( {{\textbf {z}}}_{0},{{\textbf {w}}}_{C,1},F\right) \right\| \), with \({{\textbf {w}}}_{C,1}\) and \({{\textbf {w}}}_{SM,1}\) the classical estimator and the SM-estimator, respectively, for the first canonical vector associated with the random vector \({{\textbf {y}}}\).

Fig. 1
figure 1

The plot \(\left( x_{01},x_{02}\right) \) vs \(\left\| IF\left( {{\textbf {z}}}_{0},{{\textbf {w}}}_{C,1},F\right) \right\| \) (above on the left) and the plot \(\left( x_{01},x_{02}\right) \) vs \(\left\| IF\left( {{\textbf {z}}}_{0},{{\textbf {w}}}_{SM,1},F\right) \right\| \) (above on the right) correspond to the IF for the classical and SM estimators for the first canonical vector associated with the vector \( {{\textbf {y}}}\) and \({{\textbf {z}}}_{0}=\left( x_{01},x_{02},0,0\right) \). The plot \( \left( x_{01},y_{01}\right) \) vs \(\left\| IF\left( {{\textbf {z}}}_{0},{{\textbf {w}}}_{C,1},F\right) \right\| \) (below on the left) and the plot \(\left( x_{01},y_{01}\right) \) vs \(\left\| IF\left( {{\textbf {z}}}_{0},{{\textbf {w}}} _{SM,1},F\right) \right\| \) (below on the right) correspond to the IF for the classical and SM estimators for the first canonical vector associated with the vector \({{\textbf {y}}}\) and \({{\textbf {z}}} _{0}=\left( x_{01},0,x_{02},0\right) \)

5 Robust proposals for measuring association

The classical scenario for CCA involves the correlation as a measure for the maximal association between the linear combinations of two random vectors, and the correlation between those linear combinations is called the canonical correlation. (30) is a robust correlation functional to measure association obtained from a bivariate robust dispersion estimator. A special case arises from the S-estimators defined by Davies (1987) for multivariate location and scatter \(\left( \varvec{\mu }, \varvec{\varSigma }\right) \in {\mathbb {R}}^{p}\times {\mathscr {O}}_{p}\). Given the set

$$\begin{aligned} \mathscr {D}=\left\{ \left( \varvec{\mu ,\varvec{\varSigma }}\right) \in {\mathbb {R}}^{p}\times {\mathscr {O}}_{p}:E\chi \left[ \left( {{\textbf {x}}}-{\varvec{\mu }} \right) ^{t}\varvec{\varSigma }^{-1}\left( {{\textbf {x}}}-{\varvec{\mu }}\right) \right] \le \delta \right\} , \end{aligned}$$

S-estimators are defined as the solutions \(\left( \varvec{{\hat{\mu }},\hat{ \varvec{\varSigma }}}\right) =\arg \min _{{\mathscr {D}}}\det \left( \varvec{\varSigma }\right) \). This formulation is equivalent to take the M-scale \(s\left( \varvec{\mu ,}\varvec{\varSigma }\right) \) \(=\) \(\min \big \{ s>0:E\chi \big [ \left( {{\textbf {x}}}-{\varvec{\mu }}\right) ^{t}\varvec{ \varSigma }^{-1}\big ( {{\textbf {x}}}-{\varvec{\mu }} \big ) /s\big ] \) \(\le \delta \), \( \det \left( \varvec{\varSigma }\right) =1\big \} \), \(\left( {\varvec{\mu }}^{*}{} {\textbf {,}}\varSigma ^{*}\right) \) \(=\) \(\arg \min s\left( \varvec{\mu ,}\varvec{\varSigma }\right) ,\) and \(\varvec{\varSigma }=\left[ s\left( {\varvec{\mu }}^{*}{} {\textbf {,}}\varvec{\varSigma }^{*}\right) \right] ^{1/p}\varvec{\varSigma }^{*}\). If \(p=2\), let \(\hat{\varvec{\mu }}\), \({\hat{\sigma }}_{11}\) and \({\hat{\sigma }}_{22}\) be preliminary estimates for the location and dispersion parameters \({\varvec{\mu }}\), \(\sigma _{11}\) and \(\sigma _{22}\). Set \(\widehat{\widetilde{\varvec{\varSigma }}}=\left( \begin{array}{cc} {\hat{\sigma }}_{11}&{}0\\ 0&{}{\hat{\sigma }}_{22} \end{array}\right) \) and take the standardized vector \(\varvec{{\tilde{x}}}=\widehat{\widetilde{\varvec{\varSigma }}}({{\textbf {x}}}-\hat{\varvec{\mu }})\). Then, we might estimate the correlation parameter as

$$\begin{aligned} b^{*}=arg\min s\left( b\right) , \end{aligned}$$
(31)

with s(b) defined as

$$\begin{aligned} s\left( b\right)= & {} \min \left\{ s>0:E\chi \left( \frac{\varvec{{\tilde{x}}} ^{t}\varvec{\varSigma }^{-1}\varvec{{\tilde{x}}}}{s}\right) \le \delta ,\varvec{\varSigma }=\frac{1}{\left( 1-b^{2}\right) ^{1/2}}\left( \begin{array}{cc} 1 &{} b \\ b &{} 1 \end{array} \right) \right\} . \end{aligned}$$

We next motivate this robust measure of association \(b^{*}\) by giving another point of view closely related to that of closeness between two random variables, which is behind the proposals (5) and (15). We also discuss another robust procedure for robust correlation. We later analyze the usual properties that a measure of association is usually required for both procedures.

5.1 Motivating the robust correlation \(b^*\) and association properties

Let us suppose that we have two random variables X and Y with distribution functions \(F_{X}\) and \( F_{Y}\), respectively. Let T(.) and S(.) be two equivariant estimators for location and dispersion which allow us to take the standardized random variables \(U=(X-T(F_{X}))/S(F_{X})\) and \(V=(Y-T(F_{Y}))/S(F_{Y}).\) In case of having finite second moments, and taking T as the expected value and S as the standard deviation, when considering either the objective function \( E(U-\lambda V)^{2}\) or \(E(V-\lambda U)^{2}\) one obtains \({\hat{\lambda }} =E(UV)=\rho \), the Pearson correlation between U and V,  as a global minimizer. A measure of association \(\nu \) must be symmetric, that is, it should verify that \(\nu (U,V)=\nu \left( V,U\right) ,\) which entails to consider \(E\left[ (U-\lambda V)^{2}+(V-\lambda U)^{2}\right] \) as objective function. The argument in the expected value is a quadratic form with eigenvalues \(\left( 1-\lambda \right) ^{2}\) and \(\left( 1+\lambda \right) ^{2}\) associated with the eigenvectors \(2^{-1/2}\left( 1,1\right) ^{t}\) and \(2^{-1/2}\left( 1,-1\right) ^{t}\), respectively, that is,

$$\begin{aligned} (U-\lambda V)^{2}+(V-\lambda U)^{2} =\left( 1-\lambda \right) ^{2}\left( \frac{U+V}{\sqrt{2}}\right) ^{2}+\left( 1+\lambda \right) ^{2}\left( \frac{U-V}{\sqrt{2}}\right) ^{2} \nonumber \\ \end{aligned}$$
(32)

The variables \(Z=(U+V)/\sqrt{2}\) and \(W=(U-V)/\sqrt{2}\) are uncorrelated, \( Var(Z)=1+\rho ,\) \(Var(W)=1-\rho \), with \(\rho =E(UV).\) The minimization of the objective function \(E\left[ \left( 1-\lambda \right) ^{2}Z^{2}+\left( 1+\lambda \right) ^{2}W^{2}\right] \) suggests to consider a more general expression, by taking any pair of uncorrelated zero mean random variables Z and W such that their joint distribution is parametrized by \(\eta \in C\subset {\mathbb {R}}\), that is, \(E_{\eta }Z=0=E_{\eta }W,\) \(E_{\eta }\left( ZW\right) =0,\) \( Var_{\eta }(Z)=z(\eta ),\) \(Var_{\eta }(W)=w(\eta )\). Given functions \( a:C\rightarrow {\mathbb {R}}\) and \(b:C\rightarrow {\mathbb {R}},\) we proceed similarly to (10) and (11), coming up with a new pair of variables \(\alpha Z\) and \(\beta W\) “as close as possible”, that is, we look for \({\hat{\lambda }}\) such that

$$\begin{aligned} E_{\eta }(a({\hat{\lambda }})Z-b({\hat{\lambda }})W)^{2}\le E_{\eta }(a(\lambda )Z-b(\lambda )W)^{2}\text { for all }\lambda ,\eta \in C\subset {\mathbb {R}} \end{aligned}$$
(33)

subject to

$$\begin{aligned} 0<Var_{\eta }(a(\eta )Z)=Var_{\eta }(b(\eta )W). \end{aligned}$$
(34)

We want to derive which smooth functions a and b should be chosen to make the underlying parameter \(\eta \) the value in which the minimum is obtained. The objective function subject to the constraint (34) to be minimized is \(E_{\eta }(a(\lambda )Z-b(\lambda )W)^{2}=a^{2}(\lambda )z(\eta )+b^{2}(\lambda )w(\eta )\). If \(h(\lambda )=a^{2}(\lambda )\) and \(g(\lambda )=b^{2}(\lambda )\) are differentiable, the critical points solve the equation \(h^{\prime }(\lambda )z(\eta )+g^{\prime }(\lambda )w(\eta )=0.\) In order to make the true parameter \( \eta \) a critical point of the function \(E_{\eta }(a(\lambda )Z-b(\lambda )W)^{2}\), it must hold that \(h^{\prime }(\eta )z(\eta )=-g^{\prime }(\eta )w(\eta ),\) which let us say that \(g(\eta )\) \(=\) \(C^{\prime }\left( z(\eta )/w(\eta )\right) ^{1/2}\) and \(h(\eta )\) \(=\) \(C^{\prime }\left( w(\eta )/z(\eta )\right) ^{1/2}\) with \(C^{\prime }>0.\) Observe that the function \( (a(\lambda )Z-b(\lambda )W)^{2}\) has one term which is independent of \( \lambda \) which entails to consider the quadratic form

$$\begin{aligned} q(\lambda )= & {} h(\lambda )Z^{2}+g(\lambda )W^{2}\nonumber \\= & {} \left( \frac{w(\lambda )}{ z(\lambda )}\right) ^{1/2}\left( \frac{U+V}{\sqrt{2}}\right) ^{2}+\left( \frac{z(\lambda )}{w(\lambda )}\right) ^{1/2}\left( \frac{U-V}{\sqrt{2}} \right) ^{2}. \end{aligned}$$
(35)

Let \(\chi \) be verifying the conditions C1, C2, C3 and C5. If \(0<\delta <1\) and \(\phi (\lambda )=\left( w(\lambda )/z(\lambda )\right) ^{1/2}=\theta \), we can define an M-scale to evaluate the largeness of \(q(\lambda )\) through the equation

$$\begin{aligned} s(\theta )=\min \left\{ s>0:E\chi \left( \frac{\theta \left( \frac{U+V}{ \sqrt{2}}\right) ^{2}+\theta ^{-1}\left( \frac{U-V}{\sqrt{2}}\right) ^{2}}{s} \right) \le \delta \right\} , \end{aligned}$$
(36)

and the estimator is defined to be \({\hat{\theta }}=\arg \min _{\theta \in \phi (C)}s(\theta )\). If the function \(\phi \) is invertible, the estimator for \( \lambda \) is given by \({\hat{\lambda }}=\phi ^{-1}({\hat{\theta }})\). Then, going back to Z and W, with \(z(\eta )=1+\eta \) and \(w(\eta )=1-\eta ,\) we have that the correlation parameter \(\eta \) is estimated through the quadratic form \(q(\lambda )=2^{-1}\left[ \lambda _{1}\left( U+V\right) ^{2}+\lambda _{2}\left( U-V\right) ^{2}\right] \) with eigenvalues \(\lambda _{1}=\left( (1-\lambda )/(1+\lambda )\right) ^{1/2}\) and \(\lambda _{2}=\left( (1+\lambda )/(1-\lambda )\right) ^{1/2}\). Then, the M-scale in (36) can be defined as

$$\begin{aligned} s(\lambda )=\min \left\{ s>0:E\chi \left( \frac{q(\lambda )}{s}\right) \le \delta \right\} ,\lambda \in (-1,1). \end{aligned}$$
(37)

Put \(s(1)=\liminf _{\lambda \uparrow 1^{-}}s(\lambda )\) and \( s(-1)=\liminf _{\lambda \downarrow -1^{+}}s(\lambda )\). Then, we define the SM-estimator for correlation as

$$\begin{aligned} {\hat{\lambda }}=\arg \min _{\lambda \in \left[ -1,1\right] }s(\lambda ). \end{aligned}$$
(38)

If we use an M-scale based on the quadratic form given in (32), similarly to (37) and (38), we obtain that the solution is \({\hat{\lambda }}=\frac{1-\sqrt{1-\rho ^{2}}}{\rho } 1_{(0,1)}\left( \rho \right) +\frac{1+\sqrt{1-\rho ^{2}}}{\rho } 1_{(-1,0)}\left( \rho \right) \) when (XY) is elliptically distributed with correlation \(\rho ,\) and \(\rho \) is obtained as \(\rho =2{\hat{\lambda }} /(1+{\hat{\lambda }}^{2})\).

It is easily seen that (38) and (31) yield the same estimation.

Lemma 3

Let \({\hat{\lambda }}\) be as in (38). Then, \({\hat{\lambda }} =b^{*}.\)

Proof

It is deferred to the “Appendix”. \(\square \)

Since the eigenvalues of the quadratic form (32) do not follow (35), \(\rho \) is obtained after transforming \({\hat{\lambda }}\). Next Lemma will show that the estimator (38) has the usual properties required for an association estimator, except for the cases \({\hat{\lambda }}=-1\) or \({\hat{\lambda }}=1\) which entail that the random variables X and Y are linearly related with a probability greater than or equal to \(1-\delta .\)

Lemma 4

Let \({\hat{\lambda }}\) be as in (38). Then, (i) the estimator is location and scale invariant. (ii) \({\hat{\lambda }}\in \left[ -1,1 \right] .\) (iii) If \({\hat{\lambda }}=1\), then \(P\left( X=aY+b\right) \ge 1-\delta \), with \(a=S(F_{X})/S(F_{Y})\) and \(b=-(S(F_{X})T(F_{Y}))/S(F_{Y})+T(F_{X})\). (respectively, if \({\hat{\lambda }}=-1\), then \(P\left( X=cY+d\right) \) \(\ge 1-\delta \), with \(c=-S(F_{X})/S(F_{Y})\) and \(d=S(F_{X})T(F_{Y})/S(F_{Y})+T(F_{X})\). (iv) If (XY) is elliptically distributed with correlation \(\rho ,\) then \(\hat{ \lambda }=\rho \).

Proof

It is deferred to the “Appendix”. \(\square \)

5.2 A “depth-based” correlation measure and association properties

The quadratic form (32) can be used to derive another robust procedure to detect \(\rho \) consistently which is reminiscent of depth-based procedures (see Adrover et al. 2002). We can pin a residual against adversaries as follows. Let us take a measure of how badly \(\lambda \) performs compared with another fit using \(\theta \) with lower residuals, that is, \(p(\lambda )=\max _{\theta }P\left( (U-\theta V)^{2}+(V-\theta U)^{2}<(U-\lambda V)^{2}+(V-\lambda U)^{2}\right) \). Finally, the parameter with the best worst performance is chosen as, \({\hat{\lambda }}=\arg \min _{\lambda }p(\lambda ).\) It is easy to see that the maximum in \( p(\lambda )\) is obtained when \(\theta =\lambda \) and

$$\begin{aligned} p(\lambda )=\max \left( P\left( (2UV)/(U^{2}+V^{2})<\lambda \right) ,P\left( (2UV)/(U^{2}+V^{2})>\lambda \right) \right) . \end{aligned}$$

Then, the minimum occurs if \( {\hat{\lambda }}=med\left( (2UV)/(U^{2}+V^{2})\right) .\)

The following lemma shows the properties that \({\hat{\lambda }}\) possesses as an association measure.

Lemma 5

Let \({\hat{\lambda }}=\text {med}\left( \frac{2UV}{U^{2}+V^{2}}\right) .\) Then, (i) The estimator is location and scale invariant. (ii) \({\hat{\lambda }}\in \left[ -1,1 \right] .\) (iii) If \({\hat{\lambda }}=1\), then \(P\left( X=aY+b\right) \ge 0.5\), with \(a=S(F_{X})/S(F_{Y})\) and \(b=-aT(F_{Y})+T(F_{X})\) (respectively, if \({\hat{\lambda }}=-1\), then \(P\left( X=cY+d\right) \ge 0.5\), with \(c=-S(F_{X})/S(F_{Y})\) and \(d=\left( -cT(F_{Y})\right) +T(F_{X})\). (iv) If (XY) is elliptically distributed with correlation \(\rho \), then \(\hat{ \lambda }=\rho \).

Proof

It is deferred to the “Appendix”. \(\square \)

6 Concluding remarks

Maronna (2005) shows the remarkable prediction behavior of SM-estimation for PCA compared to some other robust procedures. Adrover and Donato (2015) also conclude in a similar way in the CCA context, by considering either mean squared error or relative prediction error as performance measures under contamination scenarios. By these means, SM-estimation deserved to be thoroughly analyzed, studying its asymptotic properties as consistency and asymptotic normality. Adrover and Donato (2015) specially highlighted the relationship between PCA and CCA (see also ten Berge 1979). Therefore, the derivation of these asymptotic properties of the estimators for both procedures as well as the IF is totally similar. We also considered robust properties such as qualitative robustness and influence function and the unbounded influence, which is a usual behavior of the S-estimation for other models, is also shown. In the end, we turned our attention to a basic problem, the robust association or correlation. We reasoned similarly to that of SM-estimation in the CCA, and we come up with a robust correlation measure, which is totally related to S-estimation for bivariate dispersion. We also discuss another consistent robust correlation by using an approach for regression depth from Adrover et al. (2002). The usual properties for correlation are discussed.