Abstract
Principal component analysis (PCA) and canonical correlation analysis (CCA) are dimension-reduction techniques in which either a random vector is well approximated in a lower dimensional subspace or two random vectors from high dimensional spaces are reduced to a new pair of low dimensional vectors after applying linear transformations to each of them. In both techniques, the closeness between the higher dimensional vector and the lower representations is under concern, measuring the closeness through a robust function. Robust SM-estimation has been treated in the context of PCA and CCA showing an outstanding performance under casewise contamination, which encourages the study of asymptotic properties. We analyze consistency and asymptotic normality for the SM-canonical vectors. As a by-product of the CCA derivations, the asymptotics for PCA can also be obtained. A classical measure of robustness as the influence function is analyzed, showing the usual performance of S-estimation in different statistical models. The general ideas behind SM-estimation in either PCA or CCA are specially tailored to the context of association, rendering robust measures of association between random variables. By these means, a robust correlation measure is derived and the connection with the association measure provided by S-estimation for bivariate scatter is analyzed. On the other hand, we also propose a second robust correlation measure which is reminiscent of depth-based procedures.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Principal component analysis (PCA) and canonical correlation analysis (CCA) are two-dimensional reduction techniques of widespread use in statistics. For a random vector \({{\textbf {z}}}\) in the Euclidean space of dimension m, with positive definite covariance matrix \(\varvec{\varSigma }\), PCA looks for the spectral decomposition of \(\varvec{\varSigma }\), the eigenvectors \({{\textbf {t}}} _{1}^{1},\ldots ,{{\textbf {t}}}_{m}^{1}\) associated with the corresponding m-uple of eigenvalues \((\gamma _{1}^{1},\ldots ,\gamma _{m}^{1})\) in decreasing order, \(\gamma _{j}^{1}\ge \gamma _{j+1}^{1}>0,\) for all \(1\le j\le m-1\) that is,
where \({{\textbf {T}}}^{1}\in {\mathbb {R}}^{m\times m}\) is an orthonormal matrix whose columns are \({{\textbf {t}}} _{j}^{1},j=1,\ldots ,m\) and \(\varvec{\varDelta }^{1}=diag\left( \gamma _{1}^{1},\ldots ,\gamma _{m}^{1}\right) \). The variables \(\left( {{\textbf {t}}} _{1}^{1}\right) ^{t}({\textbf {z-}}E{\textbf {z),\dots ,}}\left( {{\textbf {t}}} _{m}^{1}\right) ^{t}({\textbf {z-}}E{\textbf {z)}}\) are usually referred as principal components. The eigenvalues and eigenvectors can be also obtained through an optimization scheme (Seber 2004, p.181). On the other hand, the principal components are the best linear predictors for \({\textbf {z-}}E{{\textbf {z}}}\) when looking for linear combinations \(\sum _{k=1}^{p}({{\textbf {a}}}_{k}^{t}({\textbf {z-}}E{\textbf {z))a}} _{k} \) based on an orthonormal set \(\left\{ {{\textbf {a}}}_{1},\dots ,{{\textbf {a}}} _{p},\dots ,{{\textbf {a}}}_{m}\right\} \), \(p<m\). Let \({{\textbf {z}}}\sim F\), then the principal components solve the optimization problem
where \({{\textbf {P}}}_{V}\) stands for the orthogonal projection on a subspace V of dimension \(p<m\), \(V=\) \(\left\langle {{\textbf {a}}}_{1}{} {\textbf {,\ldots ,a }}_{p}\right\rangle \) means that V is generated by the orthonormal set \(\left\{ {{\textbf {a}}}_{1}{} {\textbf {,\ldots ,a}}_{p}\right\} \) and \(V^{\perp }=\left\langle {{\textbf {a}}}_{p+1},\ldots ,{{\textbf {a}}} _{m}\right\rangle \) denotes the orthogonal complement of V. Then, \(({\varvec{\mu }}_{{{\textbf {z}}}}{} {\textbf {,}}V_{p})\) for (2) are given by \({\varvec{\mu }}_{{{\textbf {z}}}}=E{\textbf {z,}}\ \ V_{p}=\left\langle {{\textbf {t}}} _{1}^{1},\ldots ,{\textbf {t}}_{p}^{1}\right\rangle \ \text {and}\ {{\textbf {P}}}_{V_{p}}( {\textbf {z)}}=\sum _{k=1}^{p}(\left( {{\textbf {t}}}_{k}^{1}\right) ^{t}\left( {\textbf {z-}}E{{\textbf {z}}}\right) {\textbf {)t}}_{k}^{1}.\)
Let us denote the subset \({\mathscr {O}} _{r,m}=\left\{ {{\textbf {A}}}\in {\mathbb {R}} ^{r\times m}:{\textbf {AA}}^{t}={{\textbf {I}}}_{r}\right\} \) and \(p^{\prime }=m-p\). According to (2), it is easy to see that
with
To downweight outlying observations in (3), Maronna (2005) defined SM-estimators for principal vectors in PCA through the equations
with \(\chi :[0,\infty )\rightarrow \left[ 0,1\right] \). He also considered SL-estimators for principal vectors by minimization of an L-scale rather than an M-scale as in (5).
CCA was proposed by Hotelling (1936) to determine the relationship between two sets of variables obtained by transforming the vectors \({{\textbf {x}}}\) and \({{\textbf {y}}}\) into two vectors \({{\textbf {z}}}\) and \({{\textbf {w}}}\) in lower dimensions, whose association has been greatly strengthened (see Das and Sen 1998 for a very thorough account on CCA and their wide variety of applications). In recent years, CCA has also gained popularity as a method for the analysis of genomic data, since it has the potential to be a powerful tool for identifying relationships between genotype and gene expression. It has also been used in geostatistical applications (see Furrer and Genton 2011). CCA is closely related to multivariate regression when the vectors \({{\textbf {x}}}\) and \({{\textbf {y}}}\) are not treated symmetrically (see Yohai and García Ben 1980). Given the two random vectors \({{\textbf {x}}}\) and \( {{\textbf {y}}}\) of dimensions p and q, respectively, the joint covariance matrix is given by
CCA seeks sets \(\left\{ {\varvec{{\alpha }}}_{1},\ldots ,{\varvec{{\alpha }}} _{r}\right\} \subset {{\mathbb {R}}}^{p}\) and \(\left\{ {\varvec{\beta }}_{1},\ldots ,{\varvec{\beta }}_{r}\right\} \subset {\mathbb {R}}^{q}\), respectively, to yield uncorrelated standardized linear combinations of the variables in \({{\textbf {x}}}\) and the variables in \({{\textbf {y}}}\) that are maximally correlated with each other. We can define the canonical vectors \({\varvec{\alpha }}_{j},{\varvec{\beta }}_{j},\) \(j=1,\ldots ,r\) (except for the signs) as solutions to an optimization problem (Seber 2004, p. 258). Suppose we take unit length vectors \(\left( {\textbf {a,b}}\right) \in {\mathbb {R}}^{p}\times {\mathbb {R}}^{q}\) such that \(Var({{\textbf {a}}}^{t}{} {\textbf {x)}}=1=Var({{\textbf {b}}}^{t}{} {\textbf {y)}}\) and \(Corr({{\textbf {b}}}^{t}{\varvec{{y,\beta }}}_{j}^{t}{} {\textbf {y)}}=0=Corr({{\textbf {a}}}^{t} {\varvec{{x,\alpha }}}_{j}^{t}{} {\textbf {x)}}, j=1,2,\ldots ,k-1\), where Var and Corr stand for the variance and the correlation operators for random variables. With this constraint, we choose \(({\varvec{\alpha }}_{k},{\varvec{\beta }}_{k})\) to yield the maximum squared correlation between \({{\textbf {a}}}^{t}{{\textbf {x}}}\) and \({{\textbf {b}}}^{t}{{\textbf {y}}}\). If \(\rho _{k}\) stands for the positive correlation between \(\varvec{\alpha } _{k}^{t}{{\textbf {x}}}\) and \(\varvec{\beta }_{k}^{t}{{\textbf {y}}}\) (the k-th canonical correlation)\({\textbf {,}}\) then \(\rho _{k}^{2}=\left( Corr(\varvec{\alpha }_{k}^{t}{\varvec{x}},\varvec{\beta }_{k}^{t}{{\textbf {y}}})\right) ^{2}\) and one gets a decreasing sequence of squared canonical correlations, \(\rho _{1}^{2}\ge \cdots \ge \rho _{r}^{2}\). \(\varvec{\alpha }_{k}\) and \(\varvec{\beta } _{k}\) will be unique (apart from signs) if the canonical correlations are distinct. It is well known that the optimization problem is equivalent to solving the eigensystem
which makes the search computationally more tractable. Classical estimators are obtained by replacing in (7) and (8) by the sample covariance matrix. A robust counterpart of (7) and (8) can be easily performed by solving the linear system, for \(k=1,\ldots ,r\),
with \(\varvec{\varSigma }^{(R)}\) a robust dispersion estimator.
The canonical variables \({{\textbf {z}}}=(\alpha _{1}^{t}({\textbf {x}}-E{{\textbf {x}}}),\ldots ,\alpha _{r}^{t}({\textbf {x}}-E{\textbf {x))}}^{t}\) and \({{\textbf {w}}}=(\beta _{1}^{t}({\textbf {y- }}E{\textbf {y)}},\)..., \({\varvec{\beta }} _{r}^{t}({\textbf {y}}-E{\textbf {y))}}^{t}\) are also the best linear combinations to predict each other by making the mean squared loss \(E_F\left\| {\textbf {z-w}}\right\| ^{2}\) as small as possible (see Seber 2004, p. 260), since they solve the optimization problem
with
\({{\textbf {I}}}_{r}\) an \(r\times r\) identity matrix, \({\varvec{\mu }}_{{{\textbf {x}}}}=E{{\textbf {x}}}\) and \({\varvec{\mu }}_{{{\textbf {y}}}}=E{{\textbf {y}}}\). The subscript C stands for Classical.
Adrover and Donato (2015) introduced SM-estimators for canonical vectors in CCA as follows. Given the matrices \(\varvec{{\bar{A}}}\in {\mathbb {R}}^{r\times p}\) and \(\varvec{{\bar{B}}}\in {\mathbb {R}} ^{r\times q},\) let us take \({{\textbf {A}}}=\varvec{{\bar{A}}}\varvec{\varSigma }_{{\textbf {xx}} }^{1/2},\) \({{\textbf {B}}}=\varvec{{\bar{B}}}\varvec{\varSigma }_{{\textbf {yy}}}^{1/2},\) with \(\varvec{\varSigma }_{{\textbf {xx}}}\) and \( \varvec{\varSigma }_{{\textbf {yy}}}\) given in (6), \({{\textbf {D}}}=\left( \begin{array}{cc} {{\textbf {A}}}&-{{\textbf {B}}} \end{array} \right) \in {\mathbb {R}} ^{r\times m},\) \(m=p+q\) and the random vector \({{\textbf {z}}}=({{\textbf {x}}}^{t} \varvec{\varSigma }_{{\textbf {xx}}}^{-1/2},{{\textbf {y}}}^{t}\varvec{\varSigma }_{ {\textbf {yy}}}^{-1/2})^{t}\). By reformulating (10) and (11) for the standardized vectors \(\ \varvec{\varSigma }_{{\textbf {xx}}}^{-1/2}{{\textbf {x}}}\) and \(\varvec{\varSigma }_{ {\textbf {yy}}}^{-1/2}{} {\textbf {y,}}\) we have
with \({\mathscr {B}}_{r,m}^{0}=\left\{ \left( {{\textbf {D}}},{{\textbf {a}}}\right) :{{\textbf {a}}}\in {\mathbb {R}}^{r},{{\textbf {D}}}=\left( \begin{array}{cc} {{\textbf {A}}}&-{{\textbf {B}}} \end{array} \right) \in {\mathbb {R}} ^{r\times m},{{\textbf {A}}}\in {\mathscr {O}}_{r,p},{{\textbf {B}}}\in {\mathscr {O}} _{r,q}\right\} .\)
Since the covariance matrix for the standardized random vector \({{\textbf {z}}}\) is given by
to evaluate the “largeness”of the “residuals”\(\left\| {{\textbf {A}}}{\varvec{{\tilde{x}}}}-{{\textbf {B}}}{\varvec{{\tilde{y}}}}-{{\textbf {a}}}\right\| ^{2},\) an M-scale \( \sigma =\sigma ({{\textbf {A}}},{{\textbf {B}}},{{\textbf {a}}})\) is computed implicitly through
with \(\chi :[0,\infty )\rightarrow [0,1]\), \(\varvec{\varSigma }_{{\textbf {xx }}}^{(R)}\) and \(\varvec{\varSigma }_{{\textbf {yy}}}^{(R)}\) robust dispersion estimators for \(\varvec{\varSigma }_{{\textbf {xx}}}\) and \(\varvec{\varSigma }_{ {\textbf {yy}}}\), respectively, \(\varvec{{\tilde{x}}}=\left( \varvec{\varSigma }_{ {\textbf {xx}}}^{(R)}\right) ^{-1/2}{{\textbf {x}}}\), \(\varvec{{\tilde{y}}=}\) \( \left( \varvec{\varSigma }_{{\textbf {yy}}}^{(R)}\right) ^{-1/2}{{\textbf {y}}}\) and \( \left( \left( \begin{array}{cc} {{\textbf {A}}}&-{{\textbf {B}}} \end{array} \right) ,{{\textbf {a}}}\right) {\in }{\mathscr {B}}_{r,m}^{0}\). Then, the robust standardized SM-canonical vectors are defined through the equation
and the final SM-canonical vectors are defined as
If we have a random sample \({{\textbf {z}}}_{1},\ldots ,{{\textbf {z}}}_{n}\) and \(F=F_{n}\) stands for the empirical distribution function based on \({{\textbf {z}}}_{1},\ldots , {{\textbf {z}}}_{n}\), the sample version of the estimates is simply obtained by replacing the population expectation by the empirical expectation, that is, \({\hat{\sigma }}\) is the robust scale based on the sample given in (14) and \(\left( \varvec{{\hat{A}}}_{SM}^{o},\varvec{{\hat{B}}}_{SM}^{o},\varvec{{\hat{a}}}_{SM}^{o}\right) \) the solutions to (15) using \({\hat{\sigma }}\), that is
The algorithm to compute the SM-estimators is easily derived from the fact that we have a constrained minimization and the Lagrange multipliers method applies (Adrover and Donato 2015). In the context of sparsity, (10) was also considered by Wilms and Croux (2015) as well as a robust proposal in Wilms and Croux (2016).
The outstanding robust performance of (5) and (15) suggests the study of asymptotic properties. Li and Chen (1985) dealt with a robust procedure by considering a robust dispersion S rather than the Var operator in the optimization scheme for PCA, Cui et al. (2003) obtained the asymptotic distribution of the procedure, and Croux and Ruiz-Gazen (2005) tackled the problem of influence function. Draǎković et al. (2019) deals with the derivation of the asymptotic behavior for robust PCA in the context of complex elliptically symmetric distributions based on the spectral decomposition of the Maronna’s monotone multivariate dispersion estimator (Maronna 1976), extending previous results (Tyler 1981; Boente 1987; Croux and Haesbroeck 2000). Croux et al. (2017) considered trimmed estimators in the PCA context (in special the SL-estimator given in Maronna 2005 is also included) studying theoretical properties such as consistency, influence function and breakdown point. ten Berge (1979) and Lemma 3 in Adrover and Donato (2015) explored the close relationship between PCA and CCA which is given by the fact that the principal vectors of \({{\textbf {M}}}\) defined in (13) comprise the transformed canonical vectors of \(\varvec{\varSigma }\). Thus, asymptotics and influence function for the SM-estimator given by Maronna (2005) in the context of PCA is easily derived from the arguments used in the CCA case included in this paper and therefore omitted.
Anderson (1999) derived the asymptotic distribution for the canonical correlation and the canonical vectors when sampling is from the normal distribution. Taskinen et al. (2006) stated asymptotic properties for CCA based on robust estimators of the covariance matrix as in (9). Alfons et al. (2017) treated asymptotic properties for projection pursuit estimators (see Branco et al. 2005), asymptotic distribution and influence function.
In Sect. 2, we establish the consistency under elliptical distributions for the SM-estimators given in (16) and (17). In Sect. 3, the asymptotic distribution is derived for the SM-estimators and Sect. 4 analyzes the influence function (IF) for the proposal. In Sect. 5, we revise the concept of association between random variables by analyzing some proposals for robust correlation measures which exploit the concept of residual smallness as in (2) and (10). In Sect. 6, we include some concluding remarks. Some relevant proofs are deferred to the “Appendix”.
2 Consistency of SM-estimators for CCA
In the multivariate location and dispersion model (MLDM), we have an m -dimensional random vector \({{\textbf {z}}}=(z_{1},\dots ,z_{m})^{t}\) with distribution \(F_{{\varvec{\mu }},\varvec{\varSigma }}(B)=F_{{{\textbf {0}}}}\left( \varvec{ \varSigma }^{-1/2}(B-\varvec{\mu })\right) {,}\) where \(F_{{{\textbf {0}}}}\) is a known distribution in \({\mathbb {R}}^{m}\), B is a Borel set in \({\mathbb {R}} ^{m},\) \(\varvec{\mu }\in \) \({\mathbb {R}}^{m}\) and \(\varvec{\varSigma }\in S_{m},\) the set of \(m\times m\) positive definite matrices. An important case is the family of elliptical distributions. The elliptical model allows for a great variety of distributions which comprises the majority of the distributions used in practice, not only the multivariate normal distribution but also distributions without finite moments. We say that an m-dimensional random vector has an elliptical distribution if it has a density of the form
where \(f_{0}: {\mathbb {R}} ^{+}\rightarrow {\mathbb {R}}^{+}\) (it is denoted by \({{\textbf {z}}}\sim E_m({\varvec{\mu }}_{0},\varvec{\varSigma }_{0})\)). If \({{\textbf {z}}}\sim E_m({{\textbf {0}}},{\varvec{I}})\), then \({{\textbf {a}}}^{t}{{\textbf {z}}}\) has the same distribution for all \({{\textbf {a}}}\in S^{m-1}=\{{{\textbf {a}}}\in {\mathbb {R}}^{m}{} {\textbf {:||a||=}}1{\}}\). In case of having \({\textbf { z=(x}}^{t},{{\textbf {y}}}^{t})^{t}\sim E_m({\varvec{\mu }}_{0},\varvec{\varSigma }_{0})\), with the location and dispersion parameters partitioned as
respectively, then \({{\textbf {x}}}\sim E_m({\varvec{\mu }}_{0,{{\textbf {x}}}},\varvec{\varSigma }_{0,{\textbf {xx}}})\) and \({{\textbf {y}}}\sim E_m({\varvec{\mu }}_{0,{{\textbf {y}}}},\varvec{\varSigma }_{0,{\textbf {yy}}})\). Let us now take the random vector \({{\textbf {z}}}_0=({{\textbf {x}}}^{t} \varvec{\varSigma }_{0,{\textbf {xx}}}^{-1/2},{{\textbf {y}}}^{t}\varvec{\varSigma }_{ 0,{\textbf {yy}}}^{-1/2})^{t}\), then \({{\textbf {z}}}_0\sim E_m(\tilde{{\varvec{\mu }}}_{0},{\varvec{M}}_{0})\), with the location and dispersion parameters partitioned as
If \({{\textbf {z}}}\sim E_m({\varvec{\mu }}_{0},\varvec{\varSigma }_{0})\) has finite second moments, then \(E{{\textbf {z}}}={\varvec{\mu }}_0\), the covariance matrix \(\varvec{\varSigma }\) and \(\varvec{\varSigma }_{0}\) are equal up to a constant, that is, \(\varvec{ \varSigma }=c\varvec{\varSigma }_{0}\) for some positive constant c, and \({{\textbf {M}}}_0={{\textbf {M}}}\) with \({{\textbf {M}}}\) given in (13). For the sake of simplicity, we will only keep the notation \({{\textbf {M}}}\) for either \({{\textbf {M}}}\) or \({{\textbf {M}}}_0\) since they coincide in case of having finite second moments. From now on, we will use the symbols either \(\varvec{\varSigma }_{{\textbf {xx}}}\) and \(\varvec{\varSigma }_{{\textbf {yy}}}\) to refer to the multivariate dispersion parameters at the elliptical model. The possibility of considering a general elliptical distribution rather than a multivariate normal distribution let deal with a broader scenario for modeling.
Let us take the spectral decomposition for \({{\textbf {M}}}\), \({{\textbf {M}}} =\sum _{i=1}^{p+q}\gamma _{i}^0{{\textbf {t}}}_{i}^{0}\left( {{\textbf {t}}}_{i}^{0}\right) ^{t}\), with eigenvalues \(\gamma _{1}^0\ge \cdots \ge \gamma _{p+q}^0 \ge 0\) and eigenvectors \(\left\{ {{\textbf {t}}}_{j}^{0}\right\} _{j=1}^{p+q},\) \(\left( {{\textbf {t}}}_{i}^{0}\right) ^{t}{{\textbf {t}}}_{j}^{0}=\delta _{ij},1\le i,j\le p+q,\) with \(\delta _{ij}\) the Kronecker delta. Then, \({{\textbf {M}}}={{\textbf {P}}} _{0}\varvec{\varGamma }_{0}{{\textbf {P}}}_{0}^{t},\) with \(\varvec{\varGamma } _{0}=diag\left( \gamma _{1}^{0},\gamma _{2}^{0},\ldots ,\gamma _{p+q}^{0}\right) \) and \({{\textbf {P}}}_{0}\) an orthogonal matrix whose columns are \({{\textbf {t}}} _{1}^{0},\ldots ,{{\textbf {t}}}_{p+q}^{0}\). Take \({{\textbf {v}}}_{i}^{0}\in {\mathbb {R}}^{p}\) and \({{\textbf {w}}}_{i}^{0}\in \) \({\mathbb {R}} ^{q}\), \(i=1,\ldots ,p+q,\) as \({{\textbf {t}}}_{i}^{0}=(({{\textbf {v}}} _{i}^{0})^t,({{\textbf {w}}}_{i}^{0})^t)^{t}\). Then, let us call
A zero \((r\times s)\)-matrix is a matrix all of whose entries are zero and we denote it as \({{\textbf {0}}}_{r\times s}\). Given square matrices \({{\textbf {N}}}_i\in {\mathbb {R}}^{p_i\times p_i}\), \(i=1,\ldots ,k\), set the matrix \({{\textbf {N}}}\) \(=\) \(diag({{\textbf {N}}}_1,\ldots ,{{\textbf {N}}}_k)\) \(\in \) \({\mathbb {R}}^{\sum _{i=1}^k p_i\times \sum _{i=1}^kp_i}\) whose entries off the blocks \({{\textbf {N}}}_1,\ldots ,{{\textbf {N}}}_k\) are zero. \(\left\{ {{\textbf {f}}} _{1}^{(s)},\dots ,{{\textbf {f}}}_{s}^{(s)}\right\} \) stands for the canonical basis in \( {\mathbb {R}}^{s}.\)
If \({{\textbf {w}}}\in {\mathbb {R}} ^{m}\) and \({\mathscr {L}}_{m}\) stands for the set of distributions of \({{\textbf {w}}}\) denoted by \(\mathscr {L}({{\textbf {w}}}{)}\), we call a multivariate location and dispersion functional to an application \(({\textbf {T,}}S{\textbf {): }}{\mathscr {L}}_{m}\rightarrow {\mathbb {R}}^{m}\times {\mathbb {R}} ^{m\times m}\) such that (i) \(S({\mathscr {L}}({{\textbf {w}}}))\in S_{m},\) (ii) it is affine equivariant, i.e., given a nonsingular matrix \({{\textbf {G}}}\in {\mathbb {R}}^{m\times m}\) and a vector \({{\textbf {b}}}\in {\mathbb {R}}^{m},\) \({{\textbf {T}}}({\mathscr {L}}({\textbf {Gw}}+{{\textbf {b}}}))\) \(=\) \({\textbf {GT}}({\mathscr {L}}({{\textbf {w}}}))+{{\textbf {b}}}\), \(S({\mathscr {L}}({\textbf {Gw}}+{{\textbf {b}}}))\) \(=\) \({{\textbf {G}}}S({\mathscr {L}}({{\textbf {w}}})){{\textbf {G}}}^{t}{} {\textbf {.}}\)
Let us take location and dispersion functionals \(({{\textbf {T}}},S)\) such that \(( {{\textbf {T}}}_{{{\textbf {x}}}},S_{{{\textbf {x}}}})=({{\textbf {T}}}(\mathscr {L}({{\textbf {x}}} {)})\), \(S(\mathscr {L}({{\textbf {x}}}{)}))\) and \(({{\textbf {T}}}_{ {{\textbf {y}}}},S_{{{\textbf {y}}}})=({{\textbf {T}}}(\mathscr {L}({{\textbf {y}}}{)}),S( \mathscr {L}({{\textbf {y}}}{)}))\), respectively. Take \({{\textbf {x}}}_{c}= {\textbf {x-T}}_{{{\textbf {x}}}}\), \(\varvec{{\bar{x}}}\) \(=\) \({{\textbf {S}}}_{{{\textbf {x}}}}^{-1/2}{{\textbf {x}}} _{c}\), \({{\textbf {y}}}_{c}={{\textbf {y}}}-{{\textbf {T}}}_{{{\textbf {y}}}}\), \(\varvec{{\bar{y}} =S}_{{{\textbf {y}}}}^{-1/2}{{\textbf {y}}}_{c}\) and \({{\textbf {z}}}_{c}={\textbf {z-T}}_{ {{\textbf {z}}}}{} {\textbf {.}}\)
The functional \(\left( {\textbf {T,S}}\right) \) for the location and dispersion parameters at MLDM is said to be Fisher consistent if \( {{\textbf {T}}}(F_{{\varvec{\mu }}, \varvec{\varSigma }})={\varvec{\mu }}\) and \({{\textbf {S}}}(F_{ {\varvec{\mu }},\varvec{\varSigma }})=\varvec{\varSigma } \). Adrover and Donato (2015) gave the definition of a Fisher consistent CCA functional, and they showed the Fisher consistency of the SM-estimator for CCA. For this purpose, Fisher consistent functionals for \(\varvec{\varSigma }_{{\textbf {xx}}} \) and \(\varvec{\varSigma }_{{\textbf {yy}}}\) are required.
In order to deal with SM-estimators well defined as well as consistent and asymptotically normal, some conditions are required:
-
C0
\({{\textbf {M}}}\) has eigenvalues \((\gamma _1^0,\ldots ,\gamma _m^0)\in \varGamma \), with
$$\begin{aligned} \varGamma= & {} \left\{ (\gamma _1,\ldots ,\gamma _{p+q}): \gamma _{1}>\cdots>\gamma _{r+1}\ge \cdots \ge \gamma _{p+q-r}>\cdots >\gamma _{p+q}\ge 0\right\} . \end{aligned}$$ -
C1
\(\chi \left( \cdot \right) \) is nondecreasing.
-
C2
\(\chi (x)\) is left continuous for \(x>0\).
-
C3
\(\chi (0)=0.\)
-
C4
\(\chi \) is continuous in 0.
-
C5
\(\lim _{x\rightarrow \infty }\chi (x)=1.\)
-
C6
There exists \(c_{0}\in \left( 0,\infty \right) \) such that \(\chi (x)<1\) if \(0\le x<c_0\).
-
C7
\(\chi (x)=1\) for \(c_{0}<x<\infty \), \(c_0\) as in (C6).
In order to obtain the asymptotic behavior of the estimates, it is also assumed the following conditions regarding the model density \(f_0\) as well as the parameters \(\varvec{\varSigma }_{{\textbf {xx}}}\) and \(\varvec{\varSigma }_{{\textbf {yy}}}\), which are estimated before the SM-procedure applies.
- C8:
-
\(\ f_{0}\) is nonincreasing, and there exist \(\xi \le \infty \) such that \(f_{0}(x)>0\) if \(x<\xi \) and \(f_{0}(x)=0\) if \(x>\xi .\)
- C9:
-
Let \(\xi \) be as in C8. The functions \(f_{0}(.)\) and \(\chi \left( .\right) \) have at least a common point of strict monotonicity, that is, there exists \(d<\xi \) and a nondegenerate interval \(\ I\) such that \( d\in I\) and for all \(u,v\in I\) with \(u<d<v\) it holds that \(\chi (u)<\chi (d)<\chi (v)\) and \(f_{0}(u)>f_{0}(d)>f_{0}(v)\).
- C10:
-
Let \(\xi \) and d be as in C8 and C9 and \(\varGamma \) as in C0. Let \(\left\{ \lambda _{1},\ldots ,\lambda _{r}\right\} \subset \varGamma \) such that \(\lambda _{j}\ge \lambda _{j+1}\) and \(\varLambda =diag\left( \lambda _{1},\dots ,\lambda _{r},0,\dots ,0\right) \). Let \(\sigma (\varLambda )\) be such that
$$\begin{aligned} \int \chi \left( {{\textbf {w}}}^{t}\varLambda {{\textbf {w}}}/\sigma (\varLambda ) \right) f_{0}\left( {{\textbf {w}}}^{t}{{\textbf {w}}}\right) d{{\textbf {w}}}{} {\textbf {=}} \delta \end{aligned}$$and \(\sigma _{0}\) \(=\) \(\min _{\varLambda }\left\{ \sigma (\varLambda )\right\} \), then \(d\sigma _{0}/\gamma _{m-1}^{0} <\xi .\)
- C11 a.:
-
Let \({{\textbf {z}}}_{1},\ldots ,{{\textbf {z}}}_{n}\) be a random sample in \({\mathbb {R}}^{m}\) from an elliptical distribution with density (18). Then, \( \left\{ \varvec{{\hat{\varSigma }}}_{{\textbf {xx}}}^{(R)}\right\} _{n=p+1}^{\infty }\) \(\subseteq S_{p}\) and \(\left\{ \varvec{{\hat{\varSigma }}}_{{\textbf {yy}} }^{(R)}\right\} _{n=q+1}^{\infty }\subseteq S_{q}\) based on \( {{\textbf {z}}}_{1},\ldots ,{{\textbf {z}}}_{n}\ \) are consistent estimators to \(\varvec{ \varSigma }_{{\textbf {xx}}}\) and \(\varvec{\varSigma }_{{\textbf {yy}}}\), respectively.
- C11 b.:
-
If \(({{\textbf {x}}}^{t},{{\textbf {y}}}^{t})^{t}\sim F\) with density (18), then the multivariate dispersion functionals \(\varvec{\varSigma } _{{\textbf {xx}}}^{(R)}(F)=\varvec{\varSigma }_{{\textbf {xx}}}\) and \(\varvec{\varSigma } _{{\textbf {yy}}}^{(R)}(F)=\varvec{\varSigma }_{{\textbf {yy}}}.\)
C0 allows for a simpler way of presenting the consistency because of having eigenspaces of dimension 1. The conditions C1–C7 regarding to the loss function \(\chi \) are similar to the ones considered in the robust literature for redescending estimators (see Davies 1987; Maronna 2005; Maronna et al. 2019). C1 keeps the monotonicity displayed by the square loss function in the classical case, letting larger residuals have larger weights. C2 allows for the minimum rather than the infimum in the definition of M-scale given in (5). C3 stands for the fact that a zero residual has zero weight. C4 stands for the intuitive fact that residuals coming smaller and smaller cannot have positive weights above a threshold while C3 holds. C5 summarizes the fact of getting a bounded \(\chi \), which is instrumental to get robust estimators able to cope with a large proportion of outlying observations. C6 and C7 are required for technical reasons to derive the consistency. C8, C9 and C10 are crucial for the Fisher-consistency: they prevent from having other parameters different from the elliptical model parameters yielding the minimum in (15). C11 stands for the consistency and Fisher consistency for the preliminary estimators and functionals corresponding to the parameters \(\varvec{\varSigma }_{{\textbf {xx}}}\) and \(\varvec{\varSigma }_{{\textbf {yy}}}\). The limitations regarding the loss function \(\chi \) are not relevant to the practice, since they do not impose restrictions on the data distribution. The loss function affects the asymptotic properties as well as the robustness properties that the estimators will possess.
Next, a useful concept in the robustness literature is included, whose fulfillment is required to ensure that (12) can be solved properly. We say that a sample \(\left\{ {{\textbf {z}}}_{1},\ldots ,{{\textbf {z}}}_{n}\right\} \) is in r-general position if any linear manifold \({{\textbf {Z}}}_{n}=\left\{ {{\textbf {z}}}\in {\mathbb {R}}^{m}:{\textbf {Cz=a}}\right\} \) with \(\left( {{\textbf {C}}},{{\textbf {a}}}\right) \in {\mathscr {B}}_{r,m}^{1}\) has at most \(m-r+1\) points from the sample.
Lemma 1
Let \({{\textbf {z}}}_{1},\ldots ,{{\textbf {z}}}_{n}\) be a random sample in \({\mathbb {R}} ^{m}\) from an elliptical distribution with density (18). Let us suppose that conditions C1-C7 hold. If the sample \(\left\{ {{\textbf {z}}}_{1},\ldots ,{{\textbf {z}}}_{n}\right\} \) is in r-general position and \(m-r+1<n(1-\delta )\), then (12) has at least one solution with probability 1.
To show the consistency of a sequence of SM-estimators, Theorem 4.2, p. 665 of Rao (1962) is required. A useful generalization of this result is as follows.
Lemma 2
Let \({{\textbf {z}}}_{1},\ldots ,{{\textbf {z}}}_{n}\) be a random sample in \({\mathbb {R}} ^{m}\) from an elliptical distribution F with density (18). Let us suppose that conditions C1–C7 hold. Let \(S_{m,o}\subset {\mathbb {R}} ^{m\times m}\) be the set of nonnegative symmetric matrices and \(F_{n}\) stands for the empirical distribution function based on the sample \({{\textbf {z}}}_{1},\ldots ,{{\textbf {z}}}_{n}\). Then, it holds that
In what follows, let \(\varvec{{\hat{\varSigma }}}_{{\textbf {xx}}}^{(R)}\) and \( \varvec{{\hat{\varSigma }}}_{{\textbf {yy}}}^{(R)}\) be consistent estimators to \( \varvec{\varSigma }_{{\textbf {xx}}}\) and \(\varvec{\varSigma }_{{\textbf {yy}}}\), respectively. The Fisher consistency, the existence of the estimator for random samples and Lemma 2 entail the consistency of the SM-estimators, and it is omitted.
Theorem 1
Let \({{\textbf {z}}}_{1},\ldots ,{{\textbf {z}}}_{n}\) be a random sample in \({\mathbb {R}} ^{m}\) from an elliptical distribution F with density (18). Suppose that conditions C0-C11a. hold and \(m-r+1<n\left( 1-\delta \right) .\) Let \(\left( \varvec{{\hat{A}}}_{SM}^{o},\varvec{{\hat{B}}}_{SM}^{o},{{\varvec{{\hat{a}}}}}_{SM}^{o}\right) \) be solutions of (16), then
Proof
It is deferred to the “Appendix”. \(\square \)
The previous theorem has shown the consistency of the vectors minimizing the M-scale. To derive the asymptotic behavior of the SM-estimators, we consider the critical points obtained from the constraint minimization
with \(\varvec{\varTheta ,\varXi \in {\mathbb {R}}}^{r\times r}\). Thus, we take a Lagrangian whose set of critical points contain the critical points of \( h({{\textbf {A}}},{{\textbf {B}}},{{\textbf {a}}},\varvec{\varTheta },\varvec{\varXi })\) in (19), that is, we consider
Let us introduce some notation for the covariance matrices later used in the derivation of asymptotics. Set the parameters \(\widetilde{\varvec{\varSigma }}{\small =}diag\left( \varvec{\varXi }^{-1/2}, \varvec{\varTheta }^{-1/2}\right) \) with \(\varvec{\varXi }{\small \in }S_{p}\) and \(\varvec{\varTheta }\in S_{q}\). Let us take the functionals \(\varvec{{\tilde{\varSigma }}}(H) =\) \(diag\left( \left[ \varvec{\varSigma }_{{\textbf {xx}}}^{(R)}(H)\right] ^{-1/2},\left[ \varvec{\varSigma }_{{\textbf {yy}}}^{(R)}(H)\right] ^{-1/2}\right) \) and \(\widetilde{\varvec{\varSigma }}_{\varepsilon }= \widetilde{\varvec{\varSigma }}(F_{\varepsilon })\). Consider the estimators \(\widehat{\widetilde{\varvec{\varSigma }}}=\varvec{{\tilde{\varSigma }}} (F_{n})\), \(\varvec{\varSigma }_{{\textbf {xx}}}^{(R)}(F_{n}) =\varvec{{\hat{\varSigma }}}_{ {\textbf {xx}}}^{(R)}\) and \(\varvec{\varSigma }_{{\textbf {yy}}}^{(R)}(F_{n})= \varvec{{\hat{\varSigma }}}_{{\textbf {yy}}}^{(R)}\). If we are dealing with Fisher consistent functionals, we have that \(\varvec{{\tilde{\varSigma }}}(F)\) \(=\) \(\varvec{{\tilde{\varSigma }}}_{o}\) \(=\) \(diag\left( \varvec{\varSigma }_{{\textbf {xx}}}^{-1/2},\varvec{\varSigma }_{{\textbf {yy}}}^{-1/2}\right) \).
Take \(\varvec{\tilde{{\textbf {z}}}}=\left( \widetilde{{\textbf {x}}}^t, \widetilde{{\textbf {y}}}^t \right) ^t\), by deriving (20) we get the equivalent system,
In case of having a sequence of consistent estimators initializing an iterative procedure to come up with a sequence of critical points solving (21), we can ensure that the sequence of critical points is also consistent in the CCA context. This is a result also available for some robust regression methods, and the proof is similar to that of those procedures [see, for instance, Theorem 3.2 in Yohai (1987), Theorem 4.1 in Yohai and Zamar (1988)].
Proposition 1
Let \({{\textbf {z}}}_{1},\ldots ,{{\textbf {z}}}_{n}\) be a random sample in \({\mathbb {R}} ^{m}\) from an elliptical distribution F with density (18). Suppose that conditions C0-C10 hold, moreover \(\chi \) is twice differentiable and concave in \(\left[ 0,c_{0}\right] ,\) \(c_{0}\) given in C6. If \(m-r+1<n\left( 1-\delta \right) ,\) \(\left( \varvec{{\hat{A}}} _{o}^{(0)},\varvec{{\hat{B}}}_{o}^{(0)},\varvec{{\hat{a}}}^{(0)}\right) \) is a sequence of consistent estimators for \(\left( {{\textbf {A}}}_{o},{{\textbf {B}}}_{o}, {{\textbf {a}}}_{o}\right) \) and the solutions of (21) \(\ \left( \varvec{{\hat{A}}}^{o},\varvec{{\hat{B}}} ^{o},\varvec{{\hat{a}}}^{o}\right) \) verify that \({\hat{\sigma }}\left( \varvec{{\hat{A}}}^{o},\varvec{{\hat{B}}}^{o},\varvec{{\hat{a}}} ^{o}\right) \le {\hat{\sigma }}\left( \varvec{{\hat{A}}}_{o}^{(0)},\varvec{\hat{B }}_{o}^{(0)},\varvec{{\hat{a}}}^{(0)}\right) ,\) then \(\lim _{n\rightarrow \infty }\left( \varvec{{\hat{A}}}^{o},\varvec{{\hat{B}}}^{o}, \varvec{{\hat{a}}}^{o}\right) =\left( {{\textbf {A}}}_{o},{{\textbf {B}}}_{o},{{\textbf {a}}} _{o}\right) \text { almost surely}.\)
3 Asymptotic behavior of SM-estimators for CCA
3.1 Notation
To deal with the asymptotic behavior of the SM-estimators, let us introduce some notation. The parameters in our setting are taken from the sets
The asymptotic distribution of SM-estimators and its influence curve, which is a useful tool to evaluate the behavior under infinitesimal contaminations, need to be established over some set of distributions. Let \({\mathscr {D}}\) be the set of distributions on \({\mathbb {R}}^{m}\), and \(G_{n}\) stands for the empirical distribution function based on n points in \({\mathbb {R}}^{m}\). Call \({{\mathscr {F}}}_{n}\) the subset of such distributions. Thus, let as define the following subsets of distributions,
Let \({\mathscr {H}}\) be a subset of \({\mathscr {D}}\) such that \(\mathscr {E}\cup {\mathscr {F}}_{n}\cup C_{\varepsilon }\left( {\mathscr {E}},{\mathscr {F}} _{1}\right) {\subset {\mathscr {H}}}\). Then, the SM-functional \(\varvec{{\tilde{\theta }}}{} {\textbf {:}}{\mathscr {H}} {{\rightarrow }}{\mathscr {G}}\) is defined as
If \(vec:{\mathbb {R}}^{m\times r}\rightarrow {\mathbb {R}} ^{mr}\) stands for the operator which vectorizes a matrix by stacking the columns on top of each other, we establish some useful notation corresponding to parameters. Set \(\varvec{{\tilde{\theta }}} =\left( {{\textbf {A}}},{\textbf {B,a,}}\varvec{\varXi },\varvec{\varTheta },\sigma \right) \) and \( \varvec{\theta } =\left( \left[ vec \left( {{\textbf {D}}}^{t}\right) \right] ^{t}, {{\textbf {a}}}^{t}\right) ^{t} \). The SM-estimators turn out to be \({{\hat{{\textbf {D}}}}}_{SM}\) \(=\) \(\frac{1}{\sqrt{2}}\) \(\left( \begin{array}{cc} {{\hat{{\textbf {A}}}}}_{SM}^{o}&- {{\hat{{\textbf {B}}}}}_{SM}^{o} \end{array} \right) \) and \(\varvec{{\hat{\theta }}}\) \(=\) \(\varvec{{\tilde{\theta }}}(F_{n}) =\left( \right. {{\hat{{\textbf {A}}}}}_{SM}^{o}\), \({{\hat{{\textbf {B}}}}}_{SM}^{o}\),\({{\hat{{\textbf {a}}}}}_{SM}\), \(\varvec{{\hat{\varSigma }}}_{{\textbf {xx}}}^{(R)}\),\(\varvec{{\hat{\varSigma }}}_{{\textbf {yy}} }^{(R)}\),\(\left. {\hat{\sigma }}\right) \), with \(F_{n}\in {\mathscr {F}}_{n}\). Let us now denote some functionals, \({{\textbf {D}}}_{SM} \left( \cdot \right) \) \(=\) \(1/\sqrt{2}\) \(\left( \begin{array}{cc} {{\textbf {A}}}_{SM}^{o}{} {\textbf {(}}\cdot )&- {{\textbf {B}}}_{SM}^{o}{} {\textbf {(}}\cdot ) \end{array} \right) \), \({{\textbf {D}}}_{\varepsilon }={{\textbf {D}}}_{SM}\left( F_{\varepsilon }\right) \), \(\varvec{{\tilde{\theta }}}(F_{\varepsilon }) = \varvec{{\tilde{\theta }}} _{\varepsilon }\), with \(F_{\varepsilon }\in C_{\varepsilon }\left( {\mathscr {E}},{\mathscr {F}}_{1}\right) \), \(\varvec{{\tilde{\theta }}}_{o} =\left( {{\textbf {A}}}_{o},{{\textbf {B}}}_{o} {\textbf {,a}}_{o}\varvec{,\varSigma }_{{\textbf {xx}}},\varvec{\varSigma }_{{\textbf {yy}} },\sigma _{o}\right) \) and \({{\textbf {D}}}_{o}=\frac{1}{\sqrt{2}}\left( \begin{array}{cc} {{\textbf {A}}}_{o}&-{{\textbf {B}}}_{o} \end{array} \right) \). Take \(\varvec{\theta }_{o}\) \(=\) \(\left( \left[ vec \left( {{\textbf {D}}}_{o}^{t}\right) \right] ^{t},{{\textbf {a}}}_{o}^{t}\right) ^{t}\), \(\varvec{\theta }_{SM}\) \(=\) \(\left( \left[ vec\left( {{\textbf {D}}}_{SM}^{t}(\cdot )\right) \right] ^{t},{{\textbf {a}}}^{t}_{SM}(\cdot )\right) ^{t}\).
If \({{\textbf {D}}}^{*}={{\textbf {D}}}\widetilde{\varvec{\varSigma }},\) set \(\varvec{\theta }^{*}=\left( \left\{ vec\left[ \left( {{\textbf {D}}}^{*}\right) ^{t}\right] \right\} ^{t}, {{\textbf {a}}}^{t}\right) ^{t} \). If \(\varvec{{\hat{D}}}_{SM}^{*}=\varvec{{\hat{D}}}_{SM}\widehat{ \widetilde{\varvec{\varSigma }}}\), set
If \({{\textbf {D}}}_{o}^{*}={{\textbf {D}}} _{o}\widetilde{\varvec{\varSigma }}_{o}\), put \(\varvec{\theta }^{*}_{o}=\left( \left\{ vec\left[ \left( {{\textbf {D}}}^{*}_{o}\right) ^{t}\right] \right\} ^{t}, {{\textbf {a}}}^{t}_{o}\right) ^{t}\), \({{\textbf {v}}}_{k}\) \(=\) \(\left( \varvec{\varSigma }_{{\textbf {xx}}}^{-1/2} {{\textbf {A}}}_{o}^t\right) {{\textbf {f}}}_{k}^{(r)}\), and \({{\textbf {w}}}_{k}=\left( \varvec{\varSigma }_{{\textbf {yy}}}^{-1/2} {{\textbf {B}}}_{o}^t\right) {{\textbf {f}}}_{k}^{(r)}\), \(k=1,\ldots ,r\). Then, we can rewrite (21) in the following way,
Then, let us take the functions \(\phi _{1}:{\mathbb {R}} ^{m}\times {\mathscr {G}}\rightarrow {\mathbb {R}}^{rm},\)\(\phi _{2}: {\mathbb {R}}^{m}\times {\mathscr {G}}\rightarrow {\mathbb {R}}^{r},\) \({\bar{\phi }}_{1}:{\mathbb {R}} ^{m}\times {\bar{\mathscr {G}}}\rightarrow {\mathbb {R}}^{rm},\)\({\bar{\phi }}_{2}:{\mathbb {R}} ^{m}\times {\bar{\mathscr {G}}}\rightarrow {\mathbb {R}}^{r}\) and \({\bar{\phi }}: {\mathbb {R}}^{m}\times {\bar{\mathscr {G}}}\rightarrow {\mathbb {R}}^{r(m+1)}\) as follows,
The corresponding expected values and covariance matrices are denoted by
3.2 Conditions, asymptotic normality and variances
To deal with the asymptotic behavior of SM-estimators, let us introduce some extra conditions.
- C12:
-
\(\chi :[0,\infty )\rightarrow \left[ 0,1\right] \) is twice continuously differentiable.
- C13:
-
\(E_{F}\left[ \left\| {{\textbf {z}}}\right\| ^{8}\right] <\infty .\)
- C14:
-
\(\varPhi :{\mathbb {R}}^{r(m+1)}\rightarrow {\mathbb {R}}^{r(m+1)}\) and \({\bar{\varPhi }}:{\mathbb {R}}^{r(m+1)}\rightarrow {\mathbb {R}} ^{r(m+1)}\) are continuously differentiable and the matrices
$$\begin{aligned} \varvec{\varOmega }=\frac{\partial \varPhi \left( \varvec{\theta }\right) }{\partial \varvec{\theta }}\left( \tilde{\varvec{\theta }}_{o}\right) \in {\mathbb {R}}^{r(m+1)\times r\left( m+1\right) }\text { and }\varvec{\varOmega }^{*}= \frac{\partial {\bar{\varPhi }}\left( \varvec{\theta }^{*}\right) }{ \partial \varvec{\theta }^{*}}\left( \tilde{\varvec{\theta }}_{o}\right) \in {\mathbb {R}}^{r(m+1)\times r\left( m+1\right) } \end{aligned}$$are nonsingular.
The usual way to derive the asymptotic normality of an estimator obtained as zero of a equation, \({{\textbf {0}}} =\frac{1}{n}\sum _{i=1}^{n}\phi \left( {{\textbf {z}}}_{i}{,{\varvec{{\hat{\theta }}}}}\right) \) is to consider a Taylor expansion of the form
where \(R\left( \varvec{{\tilde{\theta }}}\right) \rightarrow 0\) as \(\varvec{{\tilde{\theta }}}\rightarrow \varvec{{\tilde{\theta }}}_o\). After some manipulations, by summing up and subtracting some terms, we come up with an expression of the form
where \(\sqrt{n}{{\textbf {Z}}}_n\) converges in distribution to a multivariate normal distribution with \({{\textbf {0}}}\) mean and covariance matrix \(E\phi \phi ^t\) and \({{\textbf {W}}}_n=o_P\left( 1/\sqrt{n}\right) \) by arguments from the empirical processes theory. C13 is a restrictive condition on the model to obtain that the functions given in (26) belong to a Euclidean class (Pakes and Pollard 1989) and obtain the behavior of \({{\textbf {W}}}_n\). By joining (27) and (28), we get the asymptotic distribution of \(\sqrt{n}\left( \varvec{{\hat{\theta }}}_{SM}-\varvec{\theta }_o\right) \). This brief account for the asymptotic normality makes clearer the need for conditions C12 and C14. Kudraszow and Maronna (2010, 2011) used a similar approach to derive the asymptotic distribution of MM-estimators for the multivariate regression model.
If F is elliptically countered with density (18), let us call \({{\textbf {r}}}_{o}\left( {{\textbf {z}}}\right) ={{\textbf {D}}}_{o}^{*} {\textbf {z-a}}_{o}\), \({{\textbf {C}}}\) \(=\) \(\sum _{1\le i,k\le r}\left[ {{\textbf {f}}}_{i}^{(r)}\left( {{\textbf {f}}}_{k}^{(r)}\right) ^{t}\otimes {{\textbf {t}}}_{i}{{\textbf {t}}} _{k}^{t}\right] \), \(\varvec{\varLambda } _{o}=diag\left( \gamma ^0_{1},\ldots ,\gamma _{r}^{0}\right) \), \({{\textbf {P}}}_{r}={{\textbf {I}}}_{m}- {{\textbf {D}}} _{o}^{t}{{\textbf {D}}}_{o}\). Then, under C12 and C14, \(\varvec{\varOmega }\) and \(\varvec{\varOmega } ^{*}\) turn out to be
with \(\varvec{\varOmega }_{11}\in {\mathbb {R}}^{rm\times rm}\), \(\varvec{\varOmega }_{12}\in {\mathbb {R}}^{rm\times r}\), \(\varvec{\varOmega }_{21}\in {\mathbb {R}}^{r\times rm}\) and \(\varvec{\varOmega }_{22}\in {\mathbb {R}}^{r\times r}\). Take the matrix
Thus, we get
The matrix \({{\textbf {V}}}_{o}=E_{F}\phi \left( {{\textbf {z}}},\varvec{{\tilde{\theta }}} _{o}\right) \left( \phi \left( {{\textbf {z}}},{\varvec{\tilde{\theta }}}_{o}\right) \right) ^{t}\) turns out to be \({{\textbf {V}}}_{o}=\left( \begin{array}{cc} {{\textbf {V}}}_{11} &{} {{\textbf {V}}}_{12} \\ {{\textbf {V}}}_{21}&{} {{\textbf {V}}}_{22} \end{array} \right) ,\) with \({{\textbf {V}}}_{11}\in {\mathbb {R}}^{rm\times rm}\), \({{\textbf {V}}}_{12}\in {\mathbb {R}}^{rm\times r}\) and \({{\textbf {V}}}_{22}\in {\mathbb {R}} ^{r\times r},\) that is,
The derivation of the asymptotic behavior of the SM-estimator is based on the empirical processes theory (see Pakes and Pollard 1989) and the following theorem establishes the convergence in distribution for the SM-estimators.
Theorem 2
Let \({{\textbf {z}}}_{1},\ldots ,{{\textbf {z}}}_{n}\) be a random sample in \({\mathbb {R}} ^{m}\) from an elliptical distribution F with density (18), location parameter \({\varvec{\mu }}_{0}\) and dispersion parameter \(\varvec{ \varSigma }_{0}\). Suppose that conditions C0-C14 hold. If \(\ \varvec{{\hat{\theta }}}_{SM}\) as in (23), \( \varvec{{\hat{\theta }}}_{SM}^*\) as in (24), \(\varvec{{\hat{v}}}_{SM,k}\) and \(\varvec{{\hat{w}}}_{SM,k}\) as in (25) are sequences of SM-estimators and \( \overset{{\mathscr {D}}}{\rightarrow }\) stands for convergence in distribution, then it holds that
-
(i)
\(\sqrt{n}\left( \varvec{{\hat{\theta }}}_{SM}-\varvec{\theta } _{o}\right) \overset{{\mathscr {D}}}{\rightarrow }N_{r(m+1)}\left( {\textbf { 0,V}}\right) \), where the asymptotic covariance matrix is given by \({{\textbf {V}}} =\varvec{\varOmega }^{-1}{{\textbf {V}}}_{o} \varvec{\varOmega }^{-t},\)
-
(ii)
\(\sqrt{n}\left( \varvec{{\hat{\theta }}}_{SM}^{*}-\varvec{\theta }_{o}^{*}\right) \overset{{\mathscr {D}}}{\rightarrow }N_{r(m+1)}\left( {\textbf {0,V}} ^{*}\right) \), with the asymptotic covariance matrix given by \({\textbf {V}}^{*}=\left( \varvec{\varOmega } ^{*}\right) ^{-1}{\varvec{V}}_{o}\left( \varvec{\varOmega }^{*}\right) ^{-t},\)
-
(iii)
\( \sqrt{n}\left( \varvec{{\hat{v}}}_{SM,k}-{{\textbf {v}}}_{k}\right) \overset{{\mathscr {D}}}{\rightarrow }N_{p}\left( {\textbf {0,V}}_{\alpha ,k}^{*}\right) \) and \(\sqrt{n}\left( \varvec{{\hat{w}}}_{SM,k}-{{\textbf {w}}}_{k}\right) \overset{{\mathscr {D}}}{\rightarrow }N_{q}\left( {\textbf {0,V}}_{\beta ,k}^{*}\right) ,\) for \(k =1,\ldots ,r\), with
$$\begin{aligned} {{\textbf {V}}}_{\alpha ,k}^{*}= & {} \sum _{i,j=1}^{p}{{\textbf {f}}}_{j}^{(p)} {{\textbf {f}}}_{(k-1)m+j}^{(m)t}{\varvec{V}}^{*}{{\textbf {f}}}_{(k-1)m+i}^{(m)}\left( {{\textbf {f}}}_{i}^{(p)}\right) ^{t},\\ {{\textbf {V}}}_{\beta ,k}^{*}= & {} \sum _{i,j=1}^{q}{{\textbf {f}}}_{j}^{(q)}{{\textbf {f}}}_{km-q+j}^{(m)t}{\varvec{V}}^{*} {{\textbf {f}}}_{km-q+i}^{(m)}\left( {{\textbf {f}}}_{i}^{(q)}\right) ^{t}. \end{aligned}$$ -
(iv)
If \(\varvec{\varOmega }_{22}\) is a nonsingular matrix, then
Proof
It is deferred to the “Appendix”. \(\square \)
Remark 1
The asymptotics for the SM-estimators in the PCA case given in (5) derive in a similar manner to that of (i) in Theorem 2.
3.3 Asymptotic relative efficiency in the Gaussian case
In order to assess the loss of efficiency under normality, the asymptotic covariance of the SM-estimators is compared with that of the maximum likelihood estimator under normality. Let \(\varvec{{\hat{v}}}_{SM,k}\) and \( \varvec{{\hat{w}}}_{SM,k}\) be the k-th canonical vectors as in (25) and \({{\varvec{{\hat{v}}}}}_{C,k}\) and \(\varvec{{\hat{w}}}_{C,k}\) the classical estimators obtained by solving (7) and (8) with the population covariances replaced by the empirical ones. Let \({{\textbf {V}}}_{\alpha ,k}^{*}\left( F\right) \) and \({{\textbf {V}}}_{\alpha ,k}^{*,C}\left( F\right) \) be the asymptotic variances of \(\varvec{{\hat{v}}}_{SM,k}\) and \( \varvec{{\hat{v}}}_{C,k}\) when the underlying distribution is F (\( {{\textbf {V}}}_{\beta ,k}^{*}\left( F\right) \) and \({{\textbf {V}}}_{\beta ,k}^{*,C}\left( F\right) \) for \(\varvec{{\hat{w}}}_{SM,k}\) and \( \varvec{{\hat{w}}}_{C,k}\), respectively). \({{\textbf {V}}}_{\alpha ,k}^{*,C}\left( F\right) \) can be derived from the asymptotic distribution for SM-estimators stated above by taking \(\chi \) as the identity function. Then, a measure of asymptotic relative efficiency is given by
To illustrate the behavior of the SM-estimators with respect to the classical estimator, we take \(F=N_{4}\left( {{\textbf {0}}},\varvec{\varSigma }\right) \), with the partitioned matrix \(\varvec{\varSigma }\) given in (6) and \(\varvec{\varSigma }_{{\textbf {xx}}}=\varvec{\varSigma }_{{\textbf {yy}}}= {{\textbf {I}}}_{2},\varvec{\varSigma }_{{\textbf {xy}}}=diag\left( 0.9,0.5\right) .\) Since \(\varvec{\varSigma }_{{\textbf {xx}}}=\varvec{\varSigma }_{{\textbf {yy}}}={{\textbf {I}}}_{2}\), we have that \(\varvec{\varOmega =\varOmega }^{*}\) and \({{\textbf {V}}} ^{*}={{\textbf {V}}}.\) Thus, the efficiency for SM-estimators for canonical vectors coincides with that of the standardized SM-estimators.
Table 1 displays the asymptotic relative efficiency of \(\varvec{{\hat{v}}} _{SM,k}\) and \(\varvec{{\hat{w}}}_{SM,k}\) for different values of \( \delta \) in the grid \(G=\left\{ 0.5,0.45,0.4,0.35,0.3\right\} \). The larger the parameter \(\delta ,\) the smaller the relative efficiency, since the parameter \(\delta \) is related to the robustness of the estimator whose breakdown point increases with \(\delta \) (Adrover and Donato 2015). The second canonical vector seems to be much more affected by the trade-off between high breakdown point and efficiency.
4 Robustness measures: qualitative robustness and influence function
4.1 Qualitative robustness
The qualitative robustness was introduced by Hampel (1971) and the concept captures the desirable fact that a robust estimator based on random samples from close distributions should induce close distributions either for any sample size. More precisely, a sequence of estimators \(\left\{ T_{n}\right\} _{n=1}^{\infty }\) is qualitatively robust in the probability measure F if and only if \(\forall \varepsilon >0\exists \) \(\delta >0 \text { and }\forall G,\forall n\), it holds that \(d_{PR}\left( F,G\right) <\delta \) \(\implies \) \(d_{PR}\left( {\mathscr {L}} _{F}\left( T_{n}\right) ,{\mathscr {L}}_{G}\left( T_{n}\right) \right) <\varepsilon ,\) where \(d_{PR}\) stands for the Prohorov distance between probability measures and \({\mathscr {L}}_{F}\left( T_{n}\right) \) is the distribution induced by \( T_{n}\) as the random sample comes from F.
A related concept is the continuity of estimators. Hampel (1971) proves that estimators \(\left\{ T_{n}\right\} _{n=1}^{\infty }\) obtained from a continuous functional on the class of empirical distributions \({\mathscr {F}}_{n}\) are qualitatively robust. A sequence of estimators \(\left\{ T_{n}\right\} _{n=1}^{\infty }\) is continuous in the probability measure F if and only if for all \(\varepsilon >0\) there exists \(\delta >0\) such that there exists \(n_{o}\) and for all \(n,m\ge n_{o}\) it holds that \(F_{n}\in {\mathscr {F}}_{n}\), \(F_{m}\in {\mathscr {F}}_{m}\), \(d_{PR}\left( F,F_{n}\right) <\delta \), \(d_{PR}\left( F,F_{m}\right) <\delta \) \(\implies \) \(\left| T\left( F_{n}\right) -T\left( F_{m}\right) \right| <\varepsilon \).
Then, we proceed similarly to the proof for Theorem 1 to derive the continuity of the SM-functionals.
Theorem 3
Let \({\textbf {z=}}\left( {{\textbf {x}}}^{t},{{\textbf {y}}}^{t}\right) ^{t}\in {\mathbb {R}}^{m}\) be from an elliptical distribution with density (18). Suppose that conditions C0-C10 hold. Let us take any sequence of distributions \(\left\{ F_{n}\right\} _{n=1}^{\infty }\) such that \(F_{n}\) weakly converges to F. Assume that the dispersion functionals \(\varvec{\varSigma } _{ {\textbf {xx}}}^{(R)}\) and \(\varvec{\varSigma } _{{\textbf {yy}}}^{(R)}\) are continuous in F. Let \(({{\textbf {A}}}_{SM}^{o},{{\textbf {B}}}_{SM}^{o},{{\textbf {a}}}_{SM}^{o})\) be the SM functional for CCA defined in (15). Therefore, it holds that \(\lim _{n\rightarrow \infty }({{\textbf {A}}}_{SM}^{o}(F_{n}),{{\textbf {B}}} _{SM}^{o}(F_{n}),{{\textbf {a}}}_{SM}^{o}(F_{n}))=({{\textbf {A}}}_{SM}^{o}(F),{\textbf { B}}_{SM}^{o}(F),{{\textbf {a}}}_{SM}^{o}(F))\).
4.2 Influence functions
To quantify the effect of contamination in the estimators and make comparisons among different proposals, the IF measures the rate of change because of infinitesimal effect provided by point masses, that is, for every \({{\textbf {z}}}_{0}\in {\mathbb {R}}^{m}\), \(m\ge 1\), we have
with \(F_{\varepsilon }=(1-\varepsilon )F+\varepsilon \delta _{{{\textbf {z}}}_{0}} \) and \(\delta _{{{\textbf {z}}}_{0}}\) is a point mass distribution at \({{\textbf {z}}} _{0}.\)
Adrover and Donato (2015) treated two different versions for a robust canonical correlation. The matrix \({{\textbf {M}}}=\sum _{i=1}^{p+q}\gamma _{i}^{0}{{\textbf {t}}}_{i}^{0}\left( {{\textbf {t}}}_{i}^{0}\right) ^{t}\) in (13) has its eigenvalues in decreasing order. The \(k-th\) canonical correlation is given by \(\rho _{k}^{2}=\left( \gamma _{m-k+1}^{0}-1\right) ^{2}.\) To construct the SM-functional for the square of the \(k-th\) canonical correlation, we take
Given \(R=E_{H}\left[ \chi ^{\prime }\left( r({\textbf {z,}}H{\textbf {)}}\right) \left( \varvec{{\tilde{\varSigma }}} \left( H\right) {{\textbf {z}}}-{\tilde{\mu }}\left( H\right) \right) \left( \varvec{{\tilde{\varSigma }}}\left( H\right) {{\textbf {z}}}-{\tilde{\mu }}\left( H\right) \right) ^{t}\right] \), we call \({{\textbf {M}}}_{{\textbf {xy}}}^{0}\) the matrix obtained after pre- and post-multiplying R by \(\left( \begin{array}{cc} {{\textbf {I}}}_{p}&{{\textbf {0}}}_{p\times q} \end{array} \right) \) and \(\left( \begin{array}{cc} {{\textbf {0}}}_{q\times p}&{{\textbf {I}}}_{q} \end{array} \right) ^t \). Set
and take \(\left\{ \gamma _{j}^{0}\left( H\right) ,{{\textbf {t}}}_{SM,j}^0\left( H\right) \right\} _{j=1}^{m}\) the eigenvalues and eigenvectors of the functional \({{\textbf {M}}}^{0}\left( H\right) \), that is, \({{\textbf {M}}}^{0}\left( H\right) {{\textbf {t}}}^0_{SM,m-k+1}\left( H\right) =\gamma _{m-k+1}^{0}\left( H\right) {{\textbf {t}}}_{SM,m-k+1}^0\left( H\right) .\) The SM- \(k-th\) canonical correlation is given by \(\rho _{SM,k}^{2}\left( H\right) =\left( \gamma _{m-k+1}^{0}\left( H\right) -1\right) ^{2},\text { }k=1,\ldots ,r.\)
Another concept for measuring the association between canonical variates was given by Branco et al. (2005). Given \(\left( {{\textbf {x}}} ^{t},{{\textbf {y}}}^{t}\right) ^{t}\sim H\) and \(\left( {{\textbf {v}}}_{k}^{t}(H) {{\textbf {x}}}\right. \),\(\left. {{\textbf {w}}}_{k}^{t}(H){{\textbf {y}}}\right) \) \(\sim \) \(H_{k}\), let us take a robust bivariate dispersion functional \(\varvec{\varSigma }^{(R)}\left( H_{k}\right) \) whose i, j entry is \(\sigma _{ij}^{(R)}\left( H_{k}\right) \), \(1\le {i,j}\le {2}.\) Thus, a robust correlation functional is easily obtained from \(\varvec{\varSigma }^{(R)}\left( H_{k}\right) \),
Let us define some quantities related to the derivation of the IF. Given \( {{\textbf {z}}}_{0}=\left( {{\textbf {x}}}_{0}^{t},{{\textbf {y}}}_{0}^{t}\right) ^{t}\in {\mathbb {R}} ^{p+q}\) and \( \left\| {{\textbf {r}}}_{o}({{\textbf {z}}})\right\| \sim {\tilde{F}}\) if \({{\textbf {z}}}\sim F,\) let \(\sigma \) be an M-scale functional based on \(\left\| {{\textbf {r}}}_{o}({{\textbf {z}}})\right\| \) defined in (5) and (14) with influence function \(IFS=IF\left( \left\| {{\textbf {r}}}_{o}({{\textbf {z}}}_{0})\right\| ,\sigma ,{\tilde{F}}\right) \). Let \(\varvec{{\tilde{\varSigma }}}\in {\mathbb {R}}^{m\times m}\) be a functional with \(IFD=IF\left( {{\textbf {z}}}_{0},vec\left( {\varvec{\tilde{\varSigma }}}\right) ,F\right) \ \). Put
and
Then, we can establish the following result for the influence function.
Theorem 4
Let \({{\textbf {z}}}_{1},\ldots ,{{\textbf {z}}}_{n}\) be a random sample in \({\mathbb {R}}^{m},m=p+q,\) from an elliptical distribution F with density (18), location parameter \({\varvec{\mu }}_{0}\) and dispersion parameter \(\varSigma _{0}\). Suppose that conditions C0-C14 hold and the functionals \(\varvec{\varSigma }_{{\textbf {xx}}}^{(R)}\) and \(\varvec{\varSigma }_{ {\textbf {yy}}}^{(R)}\) are Fisher-consistent. Let \(\varvec{\varOmega }\in {\mathbb {R}}^{r\left( m+1\right) \times r\left( m+1\right) }\) be the invertible matrix defined in (29). Then,
-
(i)
If \({{\textbf {T}}}_{{{\textbf {v}}}} =-\frac{1}{\sigma _{o}^2} {{\textbf {T}}}_{1}+\frac{2}{\xi _{o}}\left( {{\textbf {T}}}_{2}+\sigma _{o}\left( {{\textbf {T}}}_{3}+ {{\textbf {T}}}_{4}\right) \right) +{{\textbf {T}}}_{5}\) and \({{\textbf {T}}}_{{{\textbf {a}}}} =\frac{1}{\sigma _{o}}{{\textbf {T}}}_{1,0}IFS- \frac{1}{\xi _{o}}\left( {{\textbf {T}}}_{2,0}+\right. \) \(\left. \sigma _{o}{{\textbf {T}}}_{3,0}\right) \left( IFD\right) -{{\textbf {T}}}_{4,0}\), we get \(IF\left( {{\textbf {z}}}_{0},\varvec{\theta }_{SM},F\right) \) \(=\) \(\varvec{ \varOmega }^{-1}{{\textbf {T}}}\left( {{\textbf {z}}}_{0},F\right) ,\) with \({{\textbf {T}}}\left( {{\textbf {z}}}_{0},F\right) =\left( {{\textbf {T}}}_{{{\textbf {v}}}}^{t}, {{\textbf {T}}}_{{{\textbf {a}}}}^{t} \right) ^{t} \).
-
(ii)
The IF for \(\rho _{SM,k}^{2}\) is \(IF\left( {{\textbf {z}}}_{0},\rho _{SM,k}^{2},F\right) \) \(=\) \(2\left( \gamma _{k}^{0}-1\right) \left( {{\textbf {t}}}_{k}^{0}\right) ^{t}IF\left( {{\textbf {z}}} _{0},{{\textbf {M}}}^{0},F\right) {{\textbf {t}}}_{k}^{0}\), \(k=1,\ldots ,r\).
-
(iii)
Let \(\left( x_{0},y_{0}\right) =\left( \left( {{\textbf {v}}}_{k}^0\right) ^{t}{{\textbf {x}}} _{0},\left( {{\textbf {w}}}_{k}^0\right) ^{t}{{\textbf {y}}}_{0}\right) \), \(\left( \left( {{\textbf {v}}}_{k}^0\right) ^{t}{{\textbf {x}}},\left( {{\textbf {w}}}_{k}^0\right) ^{t}{{\textbf {y}}}\right) \sim {\tilde{F}}_{k}\) for \( \left( {{\textbf {x}}}^{t},{{\textbf {y}}}^{t}\right) ^{t}\sim F\). Take the functionals \({{\textbf {v}}}_{k}\)and \({{\textbf {w}}}_{k}\) corresponding to the \(k-\)canonical vector such that there exist their IF, \( IFV_{k}=IF\left( \left( {{\textbf {x}}}_{0},{{\textbf {y}}}_{0}\right) ,{{\textbf {v}}} _{k},F\right) \) and \(IFW_{k}=IF\left( \left( {{\textbf {x}}}_{0},{{\textbf {y}}} _{0}\right) ,{{\textbf {w}}}_{k},F\right) \), respectively. Let \(g_{i+j-1}\left( {\mathscr {L}}\left( {\textbf {x,y}}\right) ,{\textbf {v,w}}\right) =\left( \sigma _{ij}^{\left( R\right) }\right) ^{2} \left( {\mathscr {L}}\left( {{\textbf {v}}}^{t} {\textbf {x,w}}^{t}{{\textbf {y}}}\right) \right) ,1\le i\le j\le 2\) such that there exist \(g_{l,k,{{\textbf {v}}}}=\frac{\partial g_{l}}{\partial {{\textbf {v}}}} \left( {\tilde{F}}_k,{{\textbf {v}}}_{k}^0,{{\textbf {w}}}_{k}^0\right) \) and \(g_{l,k, {{\textbf {w}}}}=\frac{\partial g_{l}}{\partial {{\textbf {w}}}}\left( {\tilde{F}}_k, {{\textbf {v}}}_{k}^0,{{\textbf {w}}}_{k}^0\right) \). Then, the IF of \(\rho _{C,k}^{2}\) is given by
Proof
It is deferred to the “Appendix”. \(\square \)
4.3 Discussion
The term \({{\textbf {T}}}_{5}\) through the vector \({{\textbf {P}}}_{r}\varvec{\tilde{ \varSigma }}_{o}{{\textbf {z}}}_{0}\) is the only one affected by the contaminated point \({{\textbf {z}}}_{0}=\left( {{\textbf {x}}}_{0},{{\textbf {y}}}_{0}\right) \), yielding unbounded IF which coincides with the usual behavior for \(S-\)estimation. This term displays that the worst scenario for the IF yielding unboundedness is given by outliers living in the subspace orthogonal to the subspace generated by the eigenvectors of \({{\textbf {M}}}\) under the elliptical model. It is worth noting that even \(\varvec{\varSigma }_{{\textbf {xx}}}^{(R)}\) and \(\varvec{\varSigma }_{{\textbf {yy}}}^{(R)}\) with bounded IF do not guarantee bounded IF for the CCA functionals. We next display some pictures to show the behavior of the IF under two point mass contaminations. Let us consider a vector \({{\textbf {z}}}=\left( {{\textbf {x}}}^{t}, {{\textbf {y}}}^{t}\right) ^{t}\sim N_{4}\left( {\varvec{\mu }},\varvec{\varSigma } \right) \) with \({\varvec{\mu }} =\left( 0.5,1,1.5,2\right) ^{t},\) \(\varvec{\varSigma }_{{\textbf {xx}}}=\varvec{\varSigma }_{{\textbf {yy}}}= {{\textbf {I}}}_{2},\varvec{\varSigma }_{{\textbf {xy}}}=diag\left( 0.9,0.5\right) \). To ease the computation, the covariance matrices \(\varvec{\varSigma }_{{\textbf { xx}}}\) and \(\varvec{\varSigma }_{{\textbf {yy}}}\) are assumed to be known and therefore they are not estimated from the data. The eigenvectors associated with the largest eigenvalues are \({{\textbf {t}}}_{1}^0=\left( {1}/{\sqrt{2}},0,-{1}/{\sqrt{2}},0\right) ^{t}\) and \({{\textbf {t}}}_{2}^0=\left( 0,-{1}/{\sqrt{2}},0,{1}/{ \sqrt{2}}\right) ^{t},\) and the point mass is taken either \({{\textbf {z}}}_{0}=\left( x_{01},x_{02},0,0\right) \) or \({{\textbf {z}}}_{0}=\left( x_{01},0,y_{01},0\right) .\) The plots display unstandardized influence functions to depict the behavior under point masses contamination, \(\left( x_{01},x_{02}\right) \) vs \(\left\| IF\left( {{\textbf {z}}}_{0},{{\textbf {w}}} _{SM,1},F\right) \right\| \) and \(\left( x_{01},x_{02}\right) \)vs\( \left\| IF\left( {{\textbf {z}}}_{0},{{\textbf {w}}}_{C,1},F\right) \right\| \), with \({{\textbf {w}}}_{C,1}\) and \({{\textbf {w}}}_{SM,1}\) the classical estimator and the SM-estimator, respectively, for the first canonical vector associated with the random vector \({{\textbf {y}}}\).
5 Robust proposals for measuring association
The classical scenario for CCA involves the correlation as a measure for the maximal association between the linear combinations of two random vectors, and the correlation between those linear combinations is called the canonical correlation. (30) is a robust correlation functional to measure association obtained from a bivariate robust dispersion estimator. A special case arises from the S-estimators defined by Davies (1987) for multivariate location and scatter \(\left( \varvec{\mu }, \varvec{\varSigma }\right) \in {\mathbb {R}}^{p}\times {\mathscr {O}}_{p}\). Given the set
S-estimators are defined as the solutions \(\left( \varvec{{\hat{\mu }},\hat{ \varvec{\varSigma }}}\right) =\arg \min _{{\mathscr {D}}}\det \left( \varvec{\varSigma }\right) \). This formulation is equivalent to take the M-scale \(s\left( \varvec{\mu ,}\varvec{\varSigma }\right) \) \(=\) \(\min \big \{ s>0:E\chi \big [ \left( {{\textbf {x}}}-{\varvec{\mu }}\right) ^{t}\varvec{ \varSigma }^{-1}\big ( {{\textbf {x}}}-{\varvec{\mu }} \big ) /s\big ] \) \(\le \delta \), \( \det \left( \varvec{\varSigma }\right) =1\big \} \), \(\left( {\varvec{\mu }}^{*}{} {\textbf {,}}\varSigma ^{*}\right) \) \(=\) \(\arg \min s\left( \varvec{\mu ,}\varvec{\varSigma }\right) ,\) and \(\varvec{\varSigma }=\left[ s\left( {\varvec{\mu }}^{*}{} {\textbf {,}}\varvec{\varSigma }^{*}\right) \right] ^{1/p}\varvec{\varSigma }^{*}\). If \(p=2\), let \(\hat{\varvec{\mu }}\), \({\hat{\sigma }}_{11}\) and \({\hat{\sigma }}_{22}\) be preliminary estimates for the location and dispersion parameters \({\varvec{\mu }}\), \(\sigma _{11}\) and \(\sigma _{22}\). Set \(\widehat{\widetilde{\varvec{\varSigma }}}=\left( \begin{array}{cc} {\hat{\sigma }}_{11}&{}0\\ 0&{}{\hat{\sigma }}_{22} \end{array}\right) \) and take the standardized vector \(\varvec{{\tilde{x}}}=\widehat{\widetilde{\varvec{\varSigma }}}({{\textbf {x}}}-\hat{\varvec{\mu }})\). Then, we might estimate the correlation parameter as
with s(b) defined as
We next motivate this robust measure of association \(b^{*}\) by giving another point of view closely related to that of closeness between two random variables, which is behind the proposals (5) and (15). We also discuss another robust procedure for robust correlation. We later analyze the usual properties that a measure of association is usually required for both procedures.
5.1 Motivating the robust correlation \(b^*\) and association properties
Let us suppose that we have two random variables X and Y with distribution functions \(F_{X}\) and \( F_{Y}\), respectively. Let T(.) and S(.) be two equivariant estimators for location and dispersion which allow us to take the standardized random variables \(U=(X-T(F_{X}))/S(F_{X})\) and \(V=(Y-T(F_{Y}))/S(F_{Y}).\) In case of having finite second moments, and taking T as the expected value and S as the standard deviation, when considering either the objective function \( E(U-\lambda V)^{2}\) or \(E(V-\lambda U)^{2}\) one obtains \({\hat{\lambda }} =E(UV)=\rho \), the Pearson correlation between U and V, as a global minimizer. A measure of association \(\nu \) must be symmetric, that is, it should verify that \(\nu (U,V)=\nu \left( V,U\right) ,\) which entails to consider \(E\left[ (U-\lambda V)^{2}+(V-\lambda U)^{2}\right] \) as objective function. The argument in the expected value is a quadratic form with eigenvalues \(\left( 1-\lambda \right) ^{2}\) and \(\left( 1+\lambda \right) ^{2}\) associated with the eigenvectors \(2^{-1/2}\left( 1,1\right) ^{t}\) and \(2^{-1/2}\left( 1,-1\right) ^{t}\), respectively, that is,
The variables \(Z=(U+V)/\sqrt{2}\) and \(W=(U-V)/\sqrt{2}\) are uncorrelated, \( Var(Z)=1+\rho ,\) \(Var(W)=1-\rho \), with \(\rho =E(UV).\) The minimization of the objective function \(E\left[ \left( 1-\lambda \right) ^{2}Z^{2}+\left( 1+\lambda \right) ^{2}W^{2}\right] \) suggests to consider a more general expression, by taking any pair of uncorrelated zero mean random variables Z and W such that their joint distribution is parametrized by \(\eta \in C\subset {\mathbb {R}}\), that is, \(E_{\eta }Z=0=E_{\eta }W,\) \(E_{\eta }\left( ZW\right) =0,\) \( Var_{\eta }(Z)=z(\eta ),\) \(Var_{\eta }(W)=w(\eta )\). Given functions \( a:C\rightarrow {\mathbb {R}}\) and \(b:C\rightarrow {\mathbb {R}},\) we proceed similarly to (10) and (11), coming up with a new pair of variables \(\alpha Z\) and \(\beta W\) “as close as possible”, that is, we look for \({\hat{\lambda }}\) such that
subject to
We want to derive which smooth functions a and b should be chosen to make the underlying parameter \(\eta \) the value in which the minimum is obtained. The objective function subject to the constraint (34) to be minimized is \(E_{\eta }(a(\lambda )Z-b(\lambda )W)^{2}=a^{2}(\lambda )z(\eta )+b^{2}(\lambda )w(\eta )\). If \(h(\lambda )=a^{2}(\lambda )\) and \(g(\lambda )=b^{2}(\lambda )\) are differentiable, the critical points solve the equation \(h^{\prime }(\lambda )z(\eta )+g^{\prime }(\lambda )w(\eta )=0.\) In order to make the true parameter \( \eta \) a critical point of the function \(E_{\eta }(a(\lambda )Z-b(\lambda )W)^{2}\), it must hold that \(h^{\prime }(\eta )z(\eta )=-g^{\prime }(\eta )w(\eta ),\) which let us say that \(g(\eta )\) \(=\) \(C^{\prime }\left( z(\eta )/w(\eta )\right) ^{1/2}\) and \(h(\eta )\) \(=\) \(C^{\prime }\left( w(\eta )/z(\eta )\right) ^{1/2}\) with \(C^{\prime }>0.\) Observe that the function \( (a(\lambda )Z-b(\lambda )W)^{2}\) has one term which is independent of \( \lambda \) which entails to consider the quadratic form
Let \(\chi \) be verifying the conditions C1, C2, C3 and C5. If \(0<\delta <1\) and \(\phi (\lambda )=\left( w(\lambda )/z(\lambda )\right) ^{1/2}=\theta \), we can define an M-scale to evaluate the largeness of \(q(\lambda )\) through the equation
and the estimator is defined to be \({\hat{\theta }}=\arg \min _{\theta \in \phi (C)}s(\theta )\). If the function \(\phi \) is invertible, the estimator for \( \lambda \) is given by \({\hat{\lambda }}=\phi ^{-1}({\hat{\theta }})\). Then, going back to Z and W, with \(z(\eta )=1+\eta \) and \(w(\eta )=1-\eta ,\) we have that the correlation parameter \(\eta \) is estimated through the quadratic form \(q(\lambda )=2^{-1}\left[ \lambda _{1}\left( U+V\right) ^{2}+\lambda _{2}\left( U-V\right) ^{2}\right] \) with eigenvalues \(\lambda _{1}=\left( (1-\lambda )/(1+\lambda )\right) ^{1/2}\) and \(\lambda _{2}=\left( (1+\lambda )/(1-\lambda )\right) ^{1/2}\). Then, the M-scale in (36) can be defined as
Put \(s(1)=\liminf _{\lambda \uparrow 1^{-}}s(\lambda )\) and \( s(-1)=\liminf _{\lambda \downarrow -1^{+}}s(\lambda )\). Then, we define the SM-estimator for correlation as
If we use an M-scale based on the quadratic form given in (32), similarly to (37) and (38), we obtain that the solution is \({\hat{\lambda }}=\frac{1-\sqrt{1-\rho ^{2}}}{\rho } 1_{(0,1)}\left( \rho \right) +\frac{1+\sqrt{1-\rho ^{2}}}{\rho } 1_{(-1,0)}\left( \rho \right) \) when (X, Y) is elliptically distributed with correlation \(\rho ,\) and \(\rho \) is obtained as \(\rho =2{\hat{\lambda }} /(1+{\hat{\lambda }}^{2})\).
It is easily seen that (38) and (31) yield the same estimation.
Lemma 3
Let \({\hat{\lambda }}\) be as in (38). Then, \({\hat{\lambda }} =b^{*}.\)
Proof
It is deferred to the “Appendix”. \(\square \)
Since the eigenvalues of the quadratic form (32) do not follow (35), \(\rho \) is obtained after transforming \({\hat{\lambda }}\). Next Lemma will show that the estimator (38) has the usual properties required for an association estimator, except for the cases \({\hat{\lambda }}=-1\) or \({\hat{\lambda }}=1\) which entail that the random variables X and Y are linearly related with a probability greater than or equal to \(1-\delta .\)
Lemma 4
Let \({\hat{\lambda }}\) be as in (38). Then, (i) the estimator is location and scale invariant. (ii) \({\hat{\lambda }}\in \left[ -1,1 \right] .\) (iii) If \({\hat{\lambda }}=1\), then \(P\left( X=aY+b\right) \ge 1-\delta \), with \(a=S(F_{X})/S(F_{Y})\) and \(b=-(S(F_{X})T(F_{Y}))/S(F_{Y})+T(F_{X})\). (respectively, if \({\hat{\lambda }}=-1\), then \(P\left( X=cY+d\right) \) \(\ge 1-\delta \), with \(c=-S(F_{X})/S(F_{Y})\) and \(d=S(F_{X})T(F_{Y})/S(F_{Y})+T(F_{X})\). (iv) If (X, Y) is elliptically distributed with correlation \(\rho ,\) then \(\hat{ \lambda }=\rho \).
Proof
It is deferred to the “Appendix”. \(\square \)
5.2 A “depth-based” correlation measure and association properties
The quadratic form (32) can be used to derive another robust procedure to detect \(\rho \) consistently which is reminiscent of depth-based procedures (see Adrover et al. 2002). We can pin a residual against adversaries as follows. Let us take a measure of how badly \(\lambda \) performs compared with another fit using \(\theta \) with lower residuals, that is, \(p(\lambda )=\max _{\theta }P\left( (U-\theta V)^{2}+(V-\theta U)^{2}<(U-\lambda V)^{2}+(V-\lambda U)^{2}\right) \). Finally, the parameter with the best worst performance is chosen as, \({\hat{\lambda }}=\arg \min _{\lambda }p(\lambda ).\) It is easy to see that the maximum in \( p(\lambda )\) is obtained when \(\theta =\lambda \) and
Then, the minimum occurs if \( {\hat{\lambda }}=med\left( (2UV)/(U^{2}+V^{2})\right) .\)
The following lemma shows the properties that \({\hat{\lambda }}\) possesses as an association measure.
Lemma 5
Let \({\hat{\lambda }}=\text {med}\left( \frac{2UV}{U^{2}+V^{2}}\right) .\) Then, (i) The estimator is location and scale invariant. (ii) \({\hat{\lambda }}\in \left[ -1,1 \right] .\) (iii) If \({\hat{\lambda }}=1\), then \(P\left( X=aY+b\right) \ge 0.5\), with \(a=S(F_{X})/S(F_{Y})\) and \(b=-aT(F_{Y})+T(F_{X})\) (respectively, if \({\hat{\lambda }}=-1\), then \(P\left( X=cY+d\right) \ge 0.5\), with \(c=-S(F_{X})/S(F_{Y})\) and \(d=\left( -cT(F_{Y})\right) +T(F_{X})\). (iv) If (X, Y) is elliptically distributed with correlation \(\rho \), then \(\hat{ \lambda }=\rho \).
Proof
It is deferred to the “Appendix”. \(\square \)
6 Concluding remarks
Maronna (2005) shows the remarkable prediction behavior of SM-estimation for PCA compared to some other robust procedures. Adrover and Donato (2015) also conclude in a similar way in the CCA context, by considering either mean squared error or relative prediction error as performance measures under contamination scenarios. By these means, SM-estimation deserved to be thoroughly analyzed, studying its asymptotic properties as consistency and asymptotic normality. Adrover and Donato (2015) specially highlighted the relationship between PCA and CCA (see also ten Berge 1979). Therefore, the derivation of these asymptotic properties of the estimators for both procedures as well as the IF is totally similar. We also considered robust properties such as qualitative robustness and influence function and the unbounded influence, which is a usual behavior of the S-estimation for other models, is also shown. In the end, we turned our attention to a basic problem, the robust association or correlation. We reasoned similarly to that of SM-estimation in the CCA, and we come up with a robust correlation measure, which is totally related to S-estimation for bivariate dispersion. We also discuss another consistent robust correlation by using an approach for regression depth from Adrover et al. (2002). The usual properties for correlation are discussed.
References
Adrover J, Donato SM (2015) A robust predictive approach for canonical correlation analysis. J Multivar Anal 133:356–376
Adrover J, Maronna R, Yohai V (2002) Relationships between maximum depth and projection estimates. J Stat Plan Inference 105:363–375
Alfons A, Croux C, Filzmoser P (2017) Robust maximum association estimators. J Am Stat Assoc 112(517):436–445
Anderson TW (1999) Asymptotic theory for canonical correlation analysis. J Multivar Anal 70(1):1–29
Boente G (1987) Asymptotic theory for robust principal components. J Multivar Anal 21(1):67–78
Branco JA, Croux C, Filzmoser P, Oliveira MR (2005) Robust canonical correlations: a comparative study. Comput Stat 20(2):203–229
Croux C, Haesbroeck G (2000) Principal component analysis based on robust estimators of the covariance or correlation matrix: influence functions and efficiencies. Biometrika 87:603–618
Croux C, Ruiz-Gazen A (2005) High breakdown estimators for principal components: the projection-pursuit approach revisited. J Multivar Anal 95:206–226
Croux C, García-Escudero LA, Gordaliza A, Ruwet C, San Martín R (2017) Robust PCA based on trimming. Stat Sin 27:1437–1459
Cui H, He X, Ng KW (2003) Asymptotic distributions of principal components based on robust dispersions. Biometrika 90:953–966
Das S, Sen PK (1998) Canonical correlations. In: Armitage P, Colton T (eds) Encyclopedia of biostatistics. Wiley, New York, pp 468–482
Davies PL (1987) Asymptotic behavior of S-estimators of multivariate location estimators and dispersion matrices. Ann Stat 15:1269–1292
Draǎković G, Breloy A, Pascal F (2019) On the asymptotics of Maronna’s robust PCA. IEEE Trans Signal Process 67(19):4964–4975
Furrer R, Genton MG (2011) Aggregation-cokriging for highly multivariate spatial data. Biometrika 98:615–631
Hampel FR (1971) A general qualitative definition of robustness. Ann Math Stat 42(6):1887–1896
Hotelling H (1936) Relations between two sets of variables. Biometrika 28:321–377
Kudraszow N, Maronna RA (2010) Estimates of MM type for the multivariate linear model. Technical report. arXiv:1004.4883
Kudraszow N, Maronna RA (2011) Estimates of MM type for the multivariate linear model. J Multivar Anal 102:1280–1292
Li G, Chen Z (1985) Projection-pursuit approach to robust dispersion matrices and principal components: primary theory and Monte Carlo. J Am Stat Assoc 80:759–766
Maronna R (1976) Robust M-estimators of multivariate location and scatter. Ann Stat 4(1):51–67
Maronna R (2005) Principal components and orthogonal regression based on robust scales. Technometrics 47(3):264–273
Maronna R, Martin D, Yohai, V, Salibián-Barrera (2019) Robust statistics, theory and methods, 2nd edn. Wiley Series in Probability and Statistics. Wiley, New York
Pakes A, Pollard D (1989) Simulation and the asymptotics of optimization estimators. Econometrica 57(5):1027–1057
Rao CR (1962) Relations between weak and uniform convergence of measures with applications. Ann Math Stat 33:659–680
Seber GAF (2004) Multivariate observations, 2nd edn. Wiley, New York
Taskinen S, Croux C, Kankainen A, Ollila E, Oja H (2006) Influence functions and efficiencies of the canonical correlation and vector estimates based on scatter and shape matrices. J Multivar Anal 97:359–384
ten Berge JMF (1979) On the equivalence of two oblique congruence rotation methods and orthogonal approximations. Psychometrika 44:359–364
Tyler D (1981) Asymptotic inference for eigenvectors. Ann Stat 9(4):725–736
Wilms I, Croux C (2015) Sparse canonical correlation analysis from a predictive point of view. Biom J 57(5):834–851
Wilms I, Croux C (2016) Robust sparse canonical correlation analysis. BMC Syst Biol 10:72. https://doi.org/10.1186/s12918-016-0317-9
Yohai V (1987) High breakdown-point and high efficiency robust estimates for regression. Ann Stat 15(2):642–658
Yohai V, García Ben M (1981) Canonical variables as optimal predictors. Ann Stat 8(4):865–869
Yohai V, Zamar R (1988) High breakdown-point estimates of regression by means of the minimization of an efficient scale. J Am Stat Assoc 83(402):406–413
Acknowledgements
We would like to thank two anonymous referees and the Associate Editor for their comments and suggestions that have resulted in a much improved paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Jorge G. Adrover: Research partially supported by Grants PICT 0821 and 0397 from ANPCYT, Grant 05/B424 Secyt, UNC, and Grant 20020100100276 from Secyt, UBA, Argentina. Stella M. Donato: Research partially supported by Grants PICT 0397 from ANPCYT and Secyt, UNC, Argentina.
Appendix
Appendix
Proof of Theorem 1
Let \({\mathscr {G}}\) be as in (22), \(\varvec{\varXi }_o=\varvec{\varSigma }_{{{\textbf {x}}}{{\textbf {x}}}}\), \(\varvec{\varGamma }_o=\varvec{\varSigma }_{{{\textbf {y}}}{{\textbf {y}}}}\) and the sets
The function \(g_P({{\textbf {A}}},{{\textbf {B}}},{{\textbf {a}}},\varvec{\varXi },\varvec{\varGamma ,\sigma })= E_P\left( \frac{{{\textbf {A}}}\varvec{\varXi }^{-1/2}{{\textbf {x}}}-{{\textbf {B}}}\varvec{\varGamma }^{-1/2}{{\textbf {y}}}}{\sigma }\right) \) is continuous in \({\mathscr {G}}\) for \(P=F\). Given \(\epsilon >0\), by using Theorem 1 in Adrover and Donato (2015), we get that there exist \(\eta >0\) and \(0<\tilde{\eta }<\delta \)
Then, if \(({{\textbf {A}}},{{\textbf {B}}},{{\textbf {a}}},\varvec{\varXi },\varvec{\varGamma })\in E_{\eta }\), there exists \(n_0(\epsilon )\) such that the M-scale \({\hat{\sigma }}\in [\sigma _o-\epsilon ,\sigma _o+\epsilon ]\) for all \(n>n_o(\epsilon )\). If \(({{\textbf {A}}},{{\textbf {B}}},{{\textbf {a}}},\varvec{\varXi },\varvec{\varGamma })\notin E_{\eta }\), there exists \(\eta >0\) and \(M>0\) such that for all \(n>n_1(\epsilon )\) and \(F_{n}\in {\mathscr {F}}_{n}\), it holds that
Consequently, we can conclude that the SM- estimators in (15) belong to a closed bounded set. Moreover, \(\lim _{n\rightarrow \infty }{\hat{\sigma }}=\sigma _o\). Therefore, the Fisher consistency given in Theorem 1 in Adrover and Donato (2015) let us conclude that any convergent subsequent should converge to \(({{\textbf {A}}}_o,{{\textbf {B}}}_o,{{\textbf {a}}}_o)\) and the consistency follows.
The following technical lemma is needed to prove the asymptotic normality.
Lemma A.1
Let \({{\textbf {z}}}_{1},\dots ,{{\textbf {z}}}_{n}\) be a random sample in \({\mathbb {R}}^{m}\) from an elliptical distribution F with density (18), location parameter \(\varvec{\mu }_{0}\) and dispersion parameter \(\varvec{\varSigma }_{0}\). Suppose that conditions C0-C14 hold. Let \({\mathscr {G}}\) be as in (22). Let \(\phi _{1}\) and \(\phi _{2}\) be defined as in (26). Then, there exists a function \(\tilde{ \theta }:{\mathscr {H}}\rightarrow {\mathscr {G}}\) and a bounded set \({\mathscr {C}}\subset {\mathscr {G}}\) such that \({\tilde{\theta }}_{o}\) is an interior point of \({\tilde{\theta }}\left( {\mathscr {C}}\right) \) and the sets \({\mathscr {F}}_{1i} =\left\{ \phi _{1i}\left( {\textbf {z,}}{\tilde{\theta }}\left( \xi \right) \right) :\xi \in {\mathscr {C}}\right\} \), \(i=1,\ldots ,rm\) and \({\mathscr {F}}_{2k} =\left\{ \phi _{2k}\left( {\textbf {z,}}{\tilde{\theta }} \left( \xi \right) \right) :\xi \in {\mathscr {C}}\right\} \), \(k=1,\ldots ,r\) are Euclidean classes with envelopes \(F_{1i},\) \(i=1,\ldots ,m\) and \(F_{2k},\) \( k=1,\ldots ,r\), such that \(E_{F}\left( F_{1i}\right) ^{2} <\infty \) and \( E_{F}\left( F_{2k}\right) ^{2} <\infty \).
Proof of Theorem 2(i)
Given \({\varvec{\tilde{\theta }}}_{o},\) Lemma A.1 says that \(\phi _{1i}\left( {\textbf {z,}}{\varvec{\tilde{\theta }}}_{o}\right) \in {\mathscr {F}} _{1i}\), \(i=1,\ldots ,mr\) and \(\phi _{2k}\left( {\textbf {z,}}{\varvec{\tilde{\theta }}} _{o}\right) \in {\mathscr {F}}_{2k}\), \(k=1,\ldots ,r.\) Moreover, since Theorem 1 ensures the consistency of \(\varvec{{\hat{\theta }}}=\left( \hat{{\textbf {{A}}}} _{SM}^{o},\hat{{\textbf {{B}}}}_{SM}^{o},\hat{{{\textbf {a}}}}_{SM}{} {\textbf {,}}\hat{{\varvec{\varSigma }}}_{{\textbf {xx}}}^{(R)},{\varvec{\hat{\varSigma }}}_{{\textbf {yy}}}^{(R)},{\hat{\sigma }}\right) \) to \({\varvec{\tilde{\theta }}}_{o},\) given \(\varepsilon _{0}>0\) we can find \(n_{0}\) such that for any \(n\ge n_{0}\), it holds that \(P\left( \phi _{1i}\left( {{\textbf {z}}}, \hat{\varvec{\theta }}\right) \in {\mathscr {F}} _{1i},\right. \) \(\phi _{2k}\left( {{\textbf {z}}},{\varvec{\hat{\theta }}}\right) \) \(\in \) \(\left. {\mathscr {F}} _{2k}\right) >1-\varepsilon _{0}\) for all \(i\in \left\{ 1,\ldots ,mr\right\} \) and \(k\in \left\{ 1,\ldots ,r\right\} \). Given \({\mathscr {F}}\) a Euclidean class and \(\delta >0\), set \([\delta ]=\left\{ (f_1,f_2)\in {\mathscr {F}}\times {\mathscr {F}}:\int (f_1-f_2)^2dP<\delta ^2\right\} \). Given a sequence of independent identically distributed random variables \(\xi _1,\dots ,\xi _n\) such that \(\xi _1\sim P\), set
Given \(\varepsilon >0\) and \(\eta >0\), Lemma 2.16 of Pakes and Pollard (1989), C12 and C13 say that there exist \(\delta >0\) and \(n_{1}\in {\mathbb {N}}\) such that, for all \(n\ge n_{1}\), \(\left( \phi _{1j}\left( {{\textbf {z}}},{\varvec{\hat{\theta }}}\right) ,\phi _{1j}\left( {{\textbf {z}}},{\varvec{\tilde{\theta }}}_{o}\right) \right) \in \) \(\left[ \delta \right] \) and \(\underset{n\rightarrow \infty }{\lim \sup }\ \; P\left\{ \sup _{\left[ \delta \right] }\left| \nu _{n}\left( \phi _{1j}\left( \cdot ,\varvec{{\hat{\theta }}} \right) \right) -\nu _{n}\left( \phi _{1j}\left( \cdot ,\varvec{{\tilde{\theta }}}_{o}\right) \right) \right| >\eta \right\} <\varepsilon .\) Then, we can conclude that
Since \(\varvec{\varPhi }\left( \varvec{{\tilde{\theta }}}_{o}\right) ={{\textbf {0}}},\) by summing up and subtracting some terms we have
By C14 and (27), it holds that \(-\frac{1}{\sqrt{n}}\nu _{n}\left( \phi \left( \cdot ,\varvec{{\tilde{\theta }}} _{o}\right) \right) {\textbf {=}}\left[ \varvec{\varOmega } +o_{P}\left( 1\right) \right] \left( \varvec{{\hat{\theta }}}_{SM}-\varvec{\theta }_{o}\right) +o_{P}\left( 1/\sqrt{n}\right) ,\) and by the Central Limit Theorem we have that \(\nu _{n}\left( \phi \left( \cdot ,\varvec{{\tilde{\theta }}}_{o}\right) \right) \overset{{\mathscr {D}}}{\rightarrow } N_{r\left( m+1\right) }\left( {\textbf {0,}}{{\textbf {V}}}\right) \). Since \(\varvec{\varOmega } \) is invertible, we obtain that \(\sqrt{n}\left( \varvec{{\hat{\theta }}}_{SM}-\varvec{\theta }_{o}\right) \overset{{\mathscr {D}}}{\rightarrow } N_{r\left( m+1\right) } \left( {{\textbf {0}}},\varvec{\varOmega } ^{-1}{{\textbf {V}}}_{o}\left( \varvec{\varOmega }^{-1}\right) ^{t}\right) .\) Straightforward computation let us conclude the explicit form of the asymptotic dispersion matrix and the proof follows. (ii) follows closely from (i). (iii) follows from the fact that \(\varvec{{\hat{v}}}_{SM,k}\) and \(\varvec{{\hat{w}}}_{SM,k}\) are subvectors of \(\varvec{{\hat{\theta }}}_{SM}^{*}\) given in (ii). (iv) gives a simpler form of the asymptotic covariance matrix obtained in (iii) in case of having a non-singular matrix of derivatives \(\frac{\partial \varPhi _2\left( \varvec{\theta }\right) }{\partial {{\textbf {a}}}}\left( \varvec{{\tilde{\theta }}_o}\right) \). \(\square \)
Proof of Theorem 4
Given \(F_{\varepsilon }=\left( 1-\varepsilon \right) F+\varepsilon \delta _{{{\textbf {z}}}_{0}}\), we have to look for the SM-functionals defined as \(g\left( {{\textbf {D}}},{\varvec{\tilde{\varSigma }}}_{\varepsilon },{{\textbf {a}}},\sigma _{\varepsilon }\right) =E_{F_{\varepsilon }}\chi \left( \frac{\left\| {{\textbf {D}}}{\varvec{\tilde{\varSigma }}}_{\varepsilon }{} {\textbf {z-a}}\right\| ^{2} }{\sigma _{\varepsilon }\left( {{\textbf {D}}},{{\textbf {a}}}\right) }\right) =\delta .\) Then, we look for a restricted minimum \({{\textbf {D}}}\in {\mathscr {O}}_{r,m}\), \( {{\textbf {t}}}_{1},\ldots ,{{\textbf {t}}}_{r}\) are the rows of \({{\textbf {D}}}\) and the Lagrangian can be expressed as
where \({{\textbf {t}}}_{1,\varepsilon },\ldots ,{{\textbf {t}}}_{r,\varepsilon },{{\textbf {a}}} _{\varepsilon }\) are critical points for L. The proof follows closely to that of Theorem 1 in Croux and Ruiz-Gazen (2005). \(\square \)
Proof of Lemma 3
The eigenvectors \(\left( 1,1\right) \) and \(\left( 1,-1\right) \) of \({\varvec{\tilde{\varSigma }}}\) correspond to the eigenvalues \(\left( (1-b)/(1+b)\right) ^{1/2} \) and \(\left( (1+b)/(1-b)\right) ^{1/2}\), respectively. Then, the quadratic forms in both definitions coincide and \({\hat{\lambda }}=b^{*}.\) \(\square \)
Proof of Lemma 4
(i) and (ii) are easily derived. (iii) \({\hat{\lambda }}=1\) implies that \(s(1)\le s(\lambda )\) for all \(\lambda \in \left[ -1,1\right] \). Let \(q(\lambda )\) be as in (35). Since \(0=\lim _{s\rightarrow \infty }E\chi \left( q(\lambda )/s\right) \le \delta ,\) this implies that \(s(\lambda )<\infty \ \)and \(s(1)<\infty \). Thus, \(\delta \ge \lim _{\lambda \rightarrow 1^{-}}E\chi \left( q\left( \lambda \right) /s(\lambda )\right) \) and \(P(U=V)\ge 1-\delta \) which says that \(P\left( X=aY+b\right) \ge 1-\delta \), with \(a=S(F_{X})/S(F_{Y})\) and \(b=\) \(-aT(F_{Y})+T(F_{X})\). In case that \({\hat{\lambda }}=-1\), we get \(P(U=-V)\ge 1-\delta \), and \(P\left( X=cY+d\right) \) \(\ge 1-\delta \), with \(c=-S(F_{X})/S(F_{Y})\) and \(d=S(F_{X})T(F_{Y})/S(F_{Y})+T(F_{X})\). Thus, (iii) is proved. (iv) In case of having (X, Y) elliptically distributed with correlation \(\rho ,\) Lemma 3 let us affirm that \(\hat{ \lambda }=\rho .\) \(\square \)
Proof of Lemma 5
(i) is easily derived and (iii) follows as in Lemma 4 (iii). (ii) Since \((U\pm V)^2\ge 0\), then \(-(U^2+V^2)\le UV\le (U^2+V^2)\) and \(|2UV/(U^2+V^2)|\le 1\). Therefore, \(\text {med}(2UV/(U^2+V^2))\in [-1,1]\). (iv) In case of having (X, Y) elliptically distributed with correlation \(\rho ,\) (U, V) is elliptically distributed with density \(f(u,v)=1/Kf_0((u^2+2\rho uv+v^{2})/\sqrt{1-\rho ^{2}}))\) and \(K=\pi (1-\rho ^2)^{-1/2}(F_0( \infty )-F_0(0))\) with \(F_0\) a primitive of \(f_0\). To see that \(P_{c}=P\left( 2UV/(U^{2}+V^{2})\le \rho \right) =0.5\), we perform some change of variables to get that
Using spherical coordinates, \(1+\rho \cos 2\theta \ge 0\) and the fact that \( \frac{\cos 2\theta +\rho }{1+\rho \cos 2\theta }\le \rho \) if and only if and \(\cos 2\theta \le 0,\) then we have
which shows that \({\hat{\lambda }}=\rho \) and the result follows. \(\square \)
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Adrover, J.G., Donato, S.M. Aspects of robust canonical correlation analysis, principal components and association. TEST 32, 623–650 (2023). https://doi.org/10.1007/s11749-023-00846-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11749-023-00846-1