1 Introduction

Since Hotelling [1] proposed canonical correlation analysis (CCA) as the method for describing the relation between the scores of a set of observation units on two groups of variables, CCA has become an important method in numerous applications, including cluster analysis, principal component analysis, multidimensional scaling, pattern recognition, artificial intelligence, and bioinformatics. Several generalizations of canonical correlation analysis for multiple-sets of variables have been proposed by Steel [2], Kettenring [3], Horst [4], Van de Geer [5], Hanafi and Kiers [6], Via et al. [7, 8], and other scholars. Among the five CCA criteria in [3], the sum of correlation (SUMCOR) or its covariance-based version called Maxbet in [5] is the most popular one in practical applications and has been extensively studied. In this paper, we shall concern ourselves with the Genvar criterion originally proposed by Steel [2], which can be introduced briefly as follows.

Let \(y_i=\left( \begin{array}{c} y_{i,1}\\ \vdots \\ y_{i,n_i} \end{array} \right) \),   \(i=1,\ldots ,m\) be m-sets of random variables. Suppose the covariance matrix A of \(y=\left( \begin{array}{c} y_1\\ \vdots \\ y_m \end{array} \right) \) exists and is given by

$$\begin{aligned} A =\left( \varSigma _{ij}\right) _{m\times m},\quad \varSigma _{ii}\in \mathbb {R}^{n_i\times n_i}. \end{aligned}$$

Here \(\varSigma _{ij}\) is the covariance between \(y_i\) and \(y_j\), and we suppose all of the covariance matrix \(\varSigma _{ii}\) of \(y_i\) are invertible. Then the correlation matrix of \(y=\left( \begin{array}{c} y_1\\ \vdots \\ y_m \end{array} \right) \) is given by

$$\begin{aligned} R =\left( R_{ij}\right) _{m\times m},\quad R_{ij}=\varSigma ^{-\frac{1}{2}}_{ii}\varSigma _{ij}\varSigma ^{-\frac{1}{2}}_{jj}. \end{aligned}$$
(1.1)

Here \(\varSigma _{ii}^{\frac{1}{2}}\) denotes the square root of \(\varSigma _{ii}\).

Let

$$\begin{aligned} \varSigma _m= & {} \left\{ t=\left( \begin{array}{c} t_1\\ \vdots \\ t_m \end{array}\right) \in \mathbb {R}^n:t_i\in \mathbb {R}^{n_i},\quad \parallel t_i\parallel _2=1\right\} , \end{aligned}$$
(1.2)
$$\begin{aligned} R(t)= & {} \left( t^T_iR_{ij}t_j\right) _{m\times m}. \end{aligned}$$
(1.3)

The Genvar criterion proposed by Steel [2] is to find a minimizer of the following optimization problem

$$\begin{aligned} \min \ \det (R(t)),\quad \ s.t.\ \ t\in \varSigma _m. \end{aligned}$$
(1.4)

Because \(\varSigma _m\) is a bounded and closed set in \(\mathbb {R}^n\), \(\det (R(t))\) is a continuous function, we see that the solution of Genvar always exists. One important feature of the Genvar criterion is that it reduces to Hotelling’s CCA when the number of sets is only two. Another feature of Genvar is that it is related to the multi-normal negative log-likelihood [2, 9]. Its covariance-based version is closely related to the Box–Cox approach in regression [9]. In [10], some interesting statistical properties of Genvar are proved. In the recent paper [11], the authors point out that the multi-feature fusion method based on SUMCOR is difficult to reveal integral relation among multi-set variables, so they suggest applying Genvar to extract multi-set integral canonical correlation features. In the present paper, we mainly focus on the computational issue of Genvar. The Genvar criterion is a constrained nonlinear optimization problem and can only be solved by means of iteration. General-purpose iterative methods (see, e.g., [12]) can be employed. However, these methods stop at a stationary point, which serves only as a necessary condition for the global minima of Genvar. For statistical applications on the other hand, a global solution is quite desirable, which has substantial importance for the reliability of the statistical prediction. In fact, finding a global solution of the general Genvar is a very hard problem because a particular Genvar where \(m=n\) is equivalent to the so-called Boolean quadratic program, which is long-known to be computationally difficult and belongs to the class of NP-hard problems (see e.g., [13]). To tackle Genvar, Steel [2] using a compound matrix approach developed a system of nonlinear equations. These equations seem to be quite difficult to solve except in special cases. Kettenring [3] outlined an alternating variable method (AVM, see Algorithm 2.2 in Sect. 2), which is a descent-type iterative procedure. To our best knowledge, this is the state-of-art algorithm for Genvar. The AVM algorithm has several disadvantages. Numerical experiment indicates that AVM depends closely on the starting point, and unless it is specially selected, the AVM algorithm has a high probability to stop at a local but non-global minima of Genvar. In addition, for some cases, the AVM algorithm with random starting point converges very slowly. In the present paper, we will numerically demonstrate the performance of AVM and present some theoretical results. An inexact version of AVM is presented. Two starting point strategies are suggested to improve these iterative methods in both reducing the number of iterations and boosting up the probability of finding a global solution of (1.4).

This paper is organised as follows. Section 2 focuses on the optimality conditions for (1.4), and suggests the Gauss–Seidel method. We present upper and lower bounds of the optimal objective function value in Sect. 3. In Sect. 4, we suggest two starting point strategies. We present numerical test results in Sect. 5 and followed by some remarks in Sect. 6.

2 Optimality Conditions

Let \(R_i(t)\) denote the matrix obtained by deleting the i-th row and column of R(t). We note that \(R_i(t)\) is independent of \(t_i\).

Let

$$\begin{aligned} G_i=\left( \begin{array}{c} t_1^TR_{1,i}\\ \vdots \\ t_{i-1}^TR_{i-1,i}\\ t_{i+1}^TR_{i+1,i}\\ \vdots \\ t_m^TR_{m,i}\\ \end{array} \right) \end{aligned}$$

Theorem 2.1

Suppose \(t\in \varSigma _m\) is a solution of (1.4). Then there exist real numbers \(u_1,\ldots ,u_m\) such that

$$\begin{aligned} \left( \begin{array}{ccc} \det (R_1(t))G_1^TR_1(t)^{-1}G_1 &{} &{} \\ &{}\ddots &{} \\ &{} &{}\det (R_m(t))G_m^TR_m(t)^{-1}G_m \end{array}\right) t=\left( \begin{array}{ccc} u_1I_{n_1} &{} &{} \\ &{}\ddots &{} \\ &{} &{}u_mI_{n_m} \end{array}\right) t\nonumber \\ \end{aligned}$$
(2.1)

Proof

Consider the Lagrangian function of (1.4)

$$\begin{aligned} L(t,\ u)=\det (R(t))+\sum \limits _{j=1}^m u_j\left( t_j^Tt_j-1\right) . \end{aligned}$$

We see that

$$\begin{aligned} \det (R(t))=\det (R_i(t))\left( 1-(G_it_i)^TR_i(t)^{-1}G_it_i\right) .\ \end{aligned}$$
(2.2)

Therefore,

$$\begin{aligned} \frac{\partial \det (R(t))}{\partial t_i}=-2\det (R_i(t))G_i^TR_i(t)^{-1}G_it_i. \end{aligned}$$

Consequently,

$$\begin{aligned} \frac{\partial L(t,\ u)}{\partial t}=-2 diag\left( \det (R_1(t))G_1^TR_1(t)^{-1}G_1,\ldots ,\det (R_m(t))G_m^TR_m(t)^{-1}G_m\right) t+2\varOmega t. \end{aligned}$$

Here \(\varOmega \equiv diag\left( u_1I_{n_1},\ldots ,u_mI_{n_m}\right) .\) It follows that (2.1) holds. \(\square \)

We note that (2.1) can be rewritten as

$$\begin{aligned} \det (R_i(t))G_i^TR_i(t)^{-1}G_it_i=u_it_i,\quad \parallel t_i\parallel _2=1,\quad i=1,2,\ldots ,m, \end{aligned}$$
(2.3)

and \(G_i\) and \(R_i(t)\) are independent of \(t_i\), but depend on \(t_j(j\ne i)\). Hence, naturally, we suggest the following Gauss–Seidel method to solve (2.3).

Algorithm 2.1

(Gauss–Seidel type method, G–S method)

  1. (1)

    Take \(t^{(0)}=\left( \begin{array}{c} t_1^{(0)}\\ \vdots \\ t_m^{(0)}\\ \end{array} \right) \in \varSigma _m.\)

  2. (2)

    Suppose that the current approximate point is \(t^{(k)}=\left( \begin{array}{c} t_1^{(k)}\\ \vdots \\ t_m^{(k)}\\ \end{array} \right) \in \varSigma _m\). Then compute \(t^{(k+1)}\) as follows.

    $$\begin{aligned} y_i^{(k)}= & {} G_i\left( z^{(i)}\right) ^TR_i\left( z^{(i)}\right) ^{-1}G_i\left( z^{(i)}\right) t_i^{(k)},\nonumber \\ u_i^{(k)}= & {} \parallel y_i^{(k)}\parallel _2,\nonumber \\ t_i^{(k+1)}= & {} \frac{1}{u_i^{(k)}}y_i^{(k)}. \end{aligned}$$
    (2.4)

    Here

    $$\begin{aligned} z^{(i)}=\left( t_1^{(k+1)^T},\ldots ,t_{i-1}^{(k+1)^T},t_i^{(k)^T},\ldots ,t_m^{(k)^T}\right) ^T,\quad i=1,2,\ldots ,m. \end{aligned}$$
  3. (3)

    If \(\mid \det (R(t^{(k+1)}))-\det (R(t^{(k)}))\mid \leqslant \varepsilon \), then stop the algorithm with the approximate solution \(t^{(k+1)}\).

The expression (2.2) of the objective function motivates us to employ an alternating variable method (AVM) to solve (1.4), which is a natural extension of the so-called coordinated descent method [12, p. 53] and is described in Algorithm 2.2.

Algorithm 2.2

(Alternating variable method [3], AVM)

  1. (1)

    Take \(t^{(0)}=\left( \begin{array}{c} t_1^{(0)}\\ \vdots \\ t_m^{(0)}\\ \end{array} \right) \in \varSigma _m.\)

  2. (2)

    Suppose that the current approximate point is \(t^{(k)}=\left( \begin{array}{c} t_1^{(k)}\\ \vdots \\ t_m^{(k)}\\ \end{array} \right) \in \varSigma _m\). Then compute \(t^{(k+1)}\) by

    $$\begin{aligned} t^{(k+1)}_i=\arg \min \limits _{\parallel t_i\parallel _2=1}\det \left( R(t_1^{(k+1)},\ldots ,t_{i-1}^{(k+1)},t_i,t^{(k)}_{i+1},\ldots ,t_m^{(k)})\right) ,\quad i=1,2,\ldots ,m.\nonumber \\ \end{aligned}$$
    (2.5)
  3. (3)

    If \(\mid \det (R(t^{(k+1)}))-\det (R(t^{(k)}))\mid \leqslant \varepsilon \), then stop the algorithm, and \(t^{(k+1)}\) is an approximate solution.

This algorithm was outlined by Kettenring [3]. We will present some analysis below and numerically demonstrate its performance in Sect. 5.

Let

$$\begin{aligned} \rho (t)=\det (R(t)),\ \ \hat{z}^{(i)}=\left( t_1^{(k+1)T},\ldots ,t_{i-1}^{(k+1)T},t_i^{(k)T},\ldots ,t_m^{(k)T}\right) ^T. \end{aligned}$$

We see that

$$\begin{aligned} \rho \left( t^{(k)}\right) \ge \rho \left( \hat{z}^{(2)}\right) \ge \cdots \ge \rho \left( \hat{z}^{(m+1)}\right) =\rho \left( t^{(k+1)}\right) . \end{aligned}$$
(2.6)

Thus it is very clear that the sequence \(\{\rho (t^{(k)})\}\) is monotonically non-increasing and bounded. This directly implies the convergence of \(\{\rho (t^{(k)})\}\).

Let

$$\begin{aligned} G_i^{(k)}=\left( \begin{array}{c} t_1^{(k+1)T}R_{1,i}\\ \vdots \\ t_{i-1}^{(k+1)T}R_{i-1,i}\\ t_{i+1}^{(k)T}R_{i+1,i}\\ \vdots \\ t_m^{(k)T}R_{m,i}\\ \end{array} \right) . \end{aligned}$$

Then (2.2) implies that (2.5) is equivalent to the following optimization problem:

$$\begin{aligned} \max \limits _{\parallel t_i\parallel _2=1}t_i^TG_i^{(k)T}R_i(z^{(i)})^{-1}G_i^{(k)}t_i, \end{aligned}$$
(2.7)

and its solution \(t_i^{(k+1)}\) is the unit eigenvector according to the largest eigenvalue of \(G_i^{(k)T}R_i(z^{(i)})^{-1}G_i^{(k)}\). There are several efficient algorithms to compute \(t_i^{(k+1)}\), see e.g., [14].

Next, we will show Algorithm 2.1 is closely relating to Algorithm 2.2. Indeed, we will apply iterative majorization principle(see, e.g., [15]) to (2.7) to demonstrate that Algorithm 2.1 is an inexact version of Algorithm 2.2. To this end, let

$$\begin{aligned} M_{ik}&=G_i^{(k)T}R_i(z^{(i)})^{-1}G_i^{(k)},\\ g(t_i)&=t_i^TM_{ik}t_i. \end{aligned}$$

Then, \(M_{ik}\) is a symmetric and positive semi-definite matrix, \(g(t_i)\) is a convex function. We note that

$$\begin{aligned} g(t_i)= & {} g\left( t_i^{(k)}\right) +2 \left( M_{ik}t_i^{(k)}\right) ^T \left( t_i-t_i^{(k)}\right) + \left( t_i-t_i^{(k)}\right) ^TM_{ik}\left( t_i-t_i^{(k)}\right) \nonumber \\\ge & {} h_i(t_i)\equiv g_i\left( t_i^{(k)}\right) +2\left( M_{ik}t_i^{(k)}\right) ^T\left( t_i-t_i^{(k)}\right) . \end{aligned}$$
(2.8)

Noticing that \(g_i(t_i^{(k)})=h_i(t_i^{(k)})\), (2.7) can be approximately solved by

$$\begin{aligned} t_i^{(k+1)}= & {} \arg \ \max \limits _{\parallel t_i\parallel _2=1}h_i(t_i)\nonumber \\= & {} y_i^{(k)}/\parallel y_i^{(k)}\parallel _2. \end{aligned}$$
(2.9)

Here \(y_i^{(k)}=M_{ik}t_i^{(k)}\).

We see that (2.9) is same as (2.4). Therefore, Algorithm 2.1 can be seen as an inexact version of the AVM algorithm. From (2.8), we see that the sequence \(\{t^{(k)}\}\) generated by Algorithm 2.1 has the following property:

$$\begin{aligned} \rho \left( t^{(k)}\right) \ge \rho \left( z^{(2)}\right) \ge \cdots \ge \rho \left( z^{(m)}\right) \ge \rho \left( t^{(k+1)}\right) . \end{aligned}$$
(2.10)

That is, Algorithm 2.1 is also a descent algorithm. Furthermore, we see that \(\rho (t^{(k)})=\rho (t^{(k+1)})\) if and only if there exists \(u_i^{(k)}\ge 0\) to satisfy

$$\begin{aligned} M_{ik}t_i^{(k)}=u_i^{(k)}t_i^{(k)}. \end{aligned}$$

In other words, for Algorithm 2.1, \(\rho (t^{(k)})=\rho (t^{(k+1)})\) if and only if \(t^{(k)}\) is a stationary point of the Genvar criterion.

Algorithms 2.1 and 2.2 mainly deal with the first-stage of Genvar. That is, a first-stage canonical vector, \(s^{(1)}\), is obtained. We can expand these algorithms successively to later stages as follows. Suppose we have obtained the first \(p-1\) canonical vectors \(s^{(i)}=\left( \begin{array}{c}s^{(i)}_1\\ \vdots \\ s^{(i)}_{m}\end{array}\right) \in \varSigma _m,\quad i=1,2,\ldots ,p-1;\ \ p\le \min \limits _{1\le i\le m}n_i\). To compute the p-stage canonical vector \(s^{(p)}=\left( \begin{array}{c}s^{(p)}_1\\ \vdots \\ s^{(p)}_{m}\end{array}\right) \), we consider the following optimization problem

$$\begin{aligned} \min \det \left( R(t)\right) ,\quad s.t.\ t\in \varSigma _m, t^T_is^{(l)}_i=0,\quad \mathrm{for}\quad l=1,\ldots ,p-1;\ i=1,\ldots ,m.\nonumber \\ \end{aligned}$$
(2.11)

Let \(Q^{(1)}_i=[s^{(1)}_i,\ldots ,s^{(p-1)}_i]\) and let \(Q_i=[Q^{(1)}_i,X_i]\) be an \(n_i\times n_i\) orthogonal matrix. Let

$$\begin{aligned} \tilde{R}&=\left( \tilde{R}_{ij}\right) _{m\times m},\quad \tilde{R}_{ij}\equiv X^T_iR_{ij}X_j,\\ \tilde{\varSigma }_m&=\left\{ \tilde{t}=\left( \begin{array}{c}\tilde{t}_1\\ \vdots \\ \tilde{t}_{m}\end{array}\right) :\ \tilde{t}_i\in \mathbb {R}^{n_i-p+1},\quad \parallel \tilde{t}_i\parallel _2=1\right\} ,\\ \tilde{R}(\tilde{t})&=\left( \tilde{t}^T_i\tilde{R}_{ij}\tilde{t}_j\right) _{m\times m}. \end{aligned}$$

The constraints in (2.11) mean that \(t_{i}\) can be expressed as \(t_i=X_i\tilde{t}_i\) and so we consider the following optimization problem

$$\begin{aligned} \min \det \left( \tilde{R}(\tilde{t})\right) ,\quad s.t.\ \tilde{t}\in \tilde{\varSigma }_m. \end{aligned}$$
(2.12)

Suppose \(\tilde{t}=\left( \begin{array}{c}\tilde{t}^{(p)}_1\\ \vdots \\ \tilde{t}^{(p)}_{m}\end{array}\right) \) is one solution of (2.12), then we see that \(s^{(p)}=\left( \begin{array}{c}s^{(p)}_1\\ \vdots \\ s^{(p)}_{m}\end{array}\right) \) with \(s^{(p)}_i=X_i\tilde{t}_i^{(p)}(i=1,2,\ldots ,m)\) is the p-stage canonical vector of Genvar. Obviously, we can apply Algorithms 2.1 and 2.2 to solve problem (2.12).

3 Upper and lower bounds of the optimal objective function value

In this section, we present some upper and lower bounds of the optimal objective function value of (1.4). Throughout this section we suppose R is an invertible matrix (then symmetric and positive definite matrix), and therefore can be factorized as

$$\begin{aligned} R=P^TP,\ \ P=[P_1,\ldots ,P_m],\quad P_j\in \mathbb {R}^{n\times n_j}. \end{aligned}$$
(3.1)

Clearly, \(P_j^TP_j=I_{n_j}\quad \ (j=1,2,\ldots ,m)\).

Let

$$\begin{aligned} R_m(t)&=(t_i^TR_{ij}t_j)_{1\le i,j\le m-1},\\ a_m(t)&=\left( \begin{array}{c} t_1^{T}R_{1,m}t_m\\ \vdots \\ t_{m-1}^{T}R_{m-1,m}t_m\\ \end{array} \right) . \end{aligned}$$

Then, we see that

$$\begin{aligned} \det (R(t))=\det (R_m(t))(1-a_m^TR_m(t)^{-1}a_m). \end{aligned}$$
(3.2)

Let

$$\begin{aligned} L(t)=[P_1t_1,\ldots ,P_mt_m],\quad L_{m-1}(t)=[P_1t_1,\ldots ,P_{m-1}t_{m-1}]. \end{aligned}$$

The following auxiliary result presents a least squares expression of \(1-a_m^TR_m(t)^{-1}a_m\).

Lemma 3.1

$$\begin{aligned} 1-a_m^TR_m(t)^{-1}a_m=\min \limits _{y\in \mathbb {R}^{m-1}}\quad \parallel P_mt_m-L_{m-1}(t)y\parallel _2^2. \end{aligned}$$
(3.3)

Proof

The solution of the least squares problem (3.3) is

$$\begin{aligned} y=L_{m-1}(t)^{\dagger }P_mt_m. \end{aligned}$$

Consequently,

$$\begin{aligned} \gamma _m&\equiv \ \min \limits _{y\in \mathbb {R}^{m-1}}\parallel P_mt_m-L_{m-1}(t)y\parallel _2^2\\&=\parallel P_mt_m-L_{m-1}L_{m-1}^{\dagger }P_mt_m\parallel _2^2\\&=\parallel P_mt_m\parallel _2^2-\parallel L_{m-1}L_{m-1}^{\dagger }P_mt_m\parallel _2^2\\&=1-\left( L_{m-1}^TP_mt_m\right) ^TR_m(t)^{-1}\left( L_{m-1}^TP_mt_m\right) \\&=1-a_m^TR_m(t)^{-1}a_m. \end{aligned}$$

\(\square \)

Now, we present a lower bound of \(\min \limits _{t\in \varSigma _m}det(R(t))\). Let

$$\begin{aligned} H_{m-1}=\left[ P_1,\ldots ,P_{m-1}\right] . \end{aligned}$$
(3.4)

Then,

$$\begin{aligned} \gamma _m= & {} \min \limits _{y\in \mathbb {R}^{m-1}}\parallel P_mt_m-H_{m-1}\left( \begin{array}{c} y_1t_1\\ \vdots \\ y_{m-1}t_{m-1} \end{array}\right) \parallel _2^2\nonumber \\\ge & {} \min \limits _{z\in \mathbb {R}^{n-n_m}}\parallel P_mt_m-H_{m-1}z\parallel _2^2\nonumber \\= & {} 1-\parallel H_{m-1}H_{m-1}^{\dagger }P_mt_m\parallel _2^2. \end{aligned}$$
(3.5)

Theorem 3.1

Let \(H_k=[P_1,\dots ,P_k]\), then,

$$\begin{aligned} \det (R(t))\ge \prod \limits _{k=1}^{m-1}\left( 1-\parallel H_kH_k^{\dagger }P_{k+1}\parallel _2^2\right) ,\quad \hbox {for all}\quad t\in \varSigma _m. \end{aligned}$$
(3.6)

Proof

From (3.5), we see that

$$\begin{aligned} \det (R(t))&=\det (R_m(t))(1-a_m^TR_m(t)^{-1}a_m)\\&\ge \det (R_m(t))(1-\parallel H_{m-1}H_{m-1}^{\dagger }P_m\parallel _2^2). \end{aligned}$$

Recursively, it follows that (3.6) holds. \(\square \)

An upper bound of \(\min \limits _{t\in \varSigma _m}\ \det (R(t))\) is given below.Let

$$\begin{aligned} \rho ^{(1)}&=\parallel R_{i_1,j_1}\parallel _2=\max \limits _{i\ne j}\parallel R_{i,j}\parallel _2,\\ \rho ^{(2)}&=\parallel R_{i_2,j_2}\parallel _2=\max \limits _{{\begin{array}{c} i\ne i_1\\ j\ne j_1 \end{array}}} \parallel R_{i,j}\parallel _2,\\&\vdots \\ \rho ^{(k)}&=\parallel R_{i_k,j_k}\parallel _2=\max \limits _{{\begin{array}{c} i\ne i_1,\ldots ,i_{k-1}\\ j\ne j_1,\ldots ,j_{k-1} \end{array}}}\parallel R_{i,j}\parallel _2,\\ k&\equiv \left[ \frac{m}{2}\right] . \end{aligned}$$

Theorem 3.2

$$\begin{aligned} \min \limits _{t\in \varSigma _m}\ \det (R(t))\le \prod \limits _{j=1}^k(1-\rho ^{(j)^2}). \end{aligned}$$
(3.7)

Proof

Without loss of generality, we suppose that

$$\begin{aligned} (i_1,j_1)=(1,2),\ \ (i_s,j_s)=(2s-1,2s),\quad \ s=2,3,\ldots ,k. \end{aligned}$$

Let \(t_{2s-1}\) and \(t_{2s}\) be respectively the unit right and left singular vectors corresponding to the largest singular value of \(R_{2s-1,2s}\). Then

$$\begin{aligned} R_{2s-1,2s}t_{2s-1}=\parallel R_{2s-1,2s}\parallel _2t_{2s},\ \ \ s=1,2,\ldots ,k. \end{aligned}$$

The Hadamard inequality implies that

$$\begin{aligned} \min \limits _{t\in \varSigma _m}\ \det (R(t))&\le \prod \limits _{s=1}^k\det \left( \begin{array}{cc} 1&{}t_{2s-1}^TR_{2s-1,2s}t_{2s}\\ t_{2s-1}^TR_{2s-1,2s}t_{2s}&{}1 \end{array}\right) \\&=\prod \limits _{s=1}^k\left( 1-\rho ^{(s)^2}\right) . \end{aligned}$$

\(\square \)

From the view point of practical application, the covariance matrix A usually is unknown, we should replace A by a sample covariance matrix, and consequently a perturbed correlation matrix R is applied in Genvar.

If \(\rho _{min}\equiv \min \limits _{t\in \varSigma _m}\det \left( R(t)\right) \) is very small, then the perturbation of R may result in a large relative perturbation of \(\rho _{min}\). The inequality (3.7) means that if any among \(\rho ^{(s)}\) is close to 1 then \(\rho _{min}\) will be small. We note that \(\rho ^{(s)}\) is the canonical correlation between \(y_{2s-1}\) and \(y_{2s}\)[1]. Furthermore, by applying the interlacing property of eigenvalues for real symmetric matrix, it is easy to see that \(1-\rho ^{(s)}\) is an upper bound of the smallest eigenvalue of R.

4 Starting point strategies

Although enjoying the monotone convergence, neither of Algorithms 2.1 and 2.2 can guarantee convergence to a global solution of (1.4). So in this section we consider a preconditioning technique to improve the performance of Algorithms 2.1 and 2.2. The idea of the preconditioning technique is to choose a good approximate solution \(t^{(0)}\) of (1.4) as a starting point for both algorithms. Since Algorithms 2.1 and 2.2 are monotone, and

$$\begin{aligned} \rho (t^{(0)})-\rho (t^{(k)})=\sum \limits _{i=1}^{k}(\rho (t^{(i-1)})-\rho (t^{(i)})), \end{aligned}$$
(4.1)

we see that if \(\rho (t^{(0)})\) has been close to the optimal objective function value of Genvar, then (4.1) means that the sequence \(\{\rho (t^{(i-1)})-\rho (t^{(i)})\}\) will quickly converges to zero, and therefore the special selection of \(t^{(0)}\) increases the rate of convergence. On the other hand, if \(t^{*}\) is an accumulation point of \(\{t^{(k)}\}\), then

$$\begin{aligned} \rho (t^{*})\le \rho (t^{(0)}). \end{aligned}$$

This result implies that some stationary points of (1.4) can be ruled out from the limiting points of Algorithms 2.1 and 2.2. In this sense, a good starting point \(t^{(0)}\) can boost up the probability of finding a global solution.

Let the eigenvalues of R(t) be \(\lambda _1(t)\ge \ldots \ge \lambda _m(t)\). Then

$$\begin{aligned}&\lambda _1(t)+\cdots +\lambda _m(t)=m, \end{aligned}$$
(4.2)
$$\begin{aligned}&\det (R(t))=\prod \limits _{j=1}^{m}\lambda _j(t), \end{aligned}$$
(4.3)
$$\begin{aligned}&\lambda _{min}(R)\le \lambda _j(t)\le \lambda _{max}(R). \end{aligned}$$
(4.4)

Noticing that the solution of the maximization problem \(\max \limits _{{s.t.s_1+\cdots +s_m=m}}\prod \limits _{j=1}^ms_j\) is \(s_1=\cdots =s_m=1\), we consider two starting point strategies as following, in which the aim is to make \(\lambda _1(t)\) as large as possible (Strategy 4.1) or to make \(\lambda _m(t)\) as small as possible (Strategy 4.2).

Strategy 4.1

Compute the eigenvector \(y=\left( \begin{array}{c} y_1\\ \vdots \\ y_m \end{array} \right) \) corresponding to the largest eigenvalue \(\lambda _{max}(R)\) of R, then

$$\begin{aligned} t_i^{(0)}=\left\{ \begin{array}{cc} y_i/\parallel y_i\parallel _2,\ \ &{}if \parallel y_i\parallel _2>0,\\ u_i,&{}otherwise. \end{array}\right. \end{aligned}$$
(4.5)

Here, \(u_i\in \mathbb {R}^{n_i}\), \(\parallel u_i\parallel _2=1\).

Strategy 4.2

Compute the eigenvector \(z=\left( \begin{array}{c} z_1\\ \vdots \\ z_m \end{array} \right) \) corresponding to the smallest eigenvalue \(\lambda _{min}(R)\) of R, then

$$\begin{aligned} t_i^{(0)}=\left\{ \begin{array}{cc} z_i/\parallel z_i\parallel _2,\ \ &{}if \parallel z_i\parallel _2>0,\\ \nu _i,&{}otherwise. \end{array}\right. \end{aligned}$$
(4.6)

Here, \(\nu _i\in \mathbb {R}^{n_i}\), \(\parallel \nu _i\parallel _2=1\).

We note that the following results have been proved by Kettenring [3].

$$\begin{aligned} \min \limits _{t\in \varSigma _m}\lambda _m(t)= & {} \lambda _{min}(R),\nonumber \\ \max \limits _{t\in \varSigma _m}\lambda _1(t)= & {} \lambda _{max}(R). \end{aligned}$$
(4.7)

These results imply that \(t^{(0)}\) generated by Strategy 4.1 is a solution of the so-called Maxvar, and \(t^{(0)}\) generated by Strategy 4.2 is a solution of the so-called Minvar. Since both Maxvar and Minvar are generalizations of CCA, and Genvar also is a generalization of CCA, intuitively, \(t^{(0)}\) should be a good approximate solution of the Genvar criterion.

If \(\lambda _{min}(R)\) is small, then employ Strategy 4.2, however, if \(\lambda _{max}(R)\) is large, then employ Strategy 4.1. we now present some estimates of \(\lambda _{max}(R)\) and \(\lambda _{min}(R)\).

Proposition 4.1

Let \(Rx=\lambda _{max}(R)x\), \(\parallel x\parallel _2=1\), \(x=\left( \begin{array}{c}x_1\\ \vdots \\ x_m\end{array}\right) \), \(x_j\in \mathbb {R}^{n_j}\). Then

$$\begin{aligned} \lambda _{max}(R)\le \left( \sum \limits _{j=1}^m\parallel x_j\parallel _2\right) ^2. \end{aligned}$$

Proof

Consider the decomposition of R (block Cholesky decomposition)

$$\begin{aligned} R=L^TL,\ \ L=[L_1,\ldots ,L_m],\quad L_i\in \mathbb {R}^{n\times n_i}. \end{aligned}$$

Then,

$$\begin{aligned} \lambda _{max}(R)&=x^TRx\\&=\parallel Lx\parallel _2^2\\&=\left( \parallel \sum \limits _{j=1}^mL_jx_j\parallel _2\right) ^2\\&\le \left( \sum \limits _{j=1}^m\parallel x_j\parallel _2\right) ^2. \end{aligned}$$

\(\square \)

The proof of Proposition 4.1 shows that, unless \(L_jx_j\approx L_1x_1,\ \ j=2,\ldots ,m\), \(\lambda _{max}(R)\) is not very close to m. This seems to imply that usually Strategy 4.2 is better than Strategy 4.1. The following result is a consequence of interlacing property of eigenvalues of real symmetric matrix.

Proposition 4.2

$$\begin{aligned} \lambda _{min}(R)\le 1-\rho _{max}^2. \end{aligned}$$

Here, \(\rho _{max}=\max \limits _{i\ne j}\parallel R_{ij}\parallel _2\).

5 Numerical Examples

In this section, we present our numerical experiment of Algorithms 2.1 and 2.2 with Strategies 4.1 and 4.2 to show their efficiency, and most importantly, the effectiveness of Strategies 4.1 and 4.2 both in reducing the number of iterations and in seeking the global minima of Genvar. All of our tests were conducted in MATLAB on a PC with Intel(R) Pentium(R)4 processor of 3.20 GHZ. The defaults value of \(\varepsilon \) is \(10^{-5}\).

For our comparison, we first create two small examples, Examples 5.1 and 5.2. Then, in Examples 5.3 and 5.4, the matrices are of size \(100\times 100\), which were randomly generated, and we ran both algorithms for each matrix starting from 100 random initial points. The test results are documented in Tables 1, 2 and 3, respectively. Under columns “\(Avg.Iter\sharp \)” are the average numbers of iterations needed to meet the stopping criterion. Under columns ”Avg.Time” are the average CPU times (in seconds, measured by the MATLAB function cputime) including the cost of computing initial point. Under columns “\(\bar{\rho }_{Genvar}\)” are the average objective function values. Under columns “\(Total\ Global\%\)” are the sample probabilities, out of all converged objective function values for fixed A, of hitting the best we’ve seen.

First, we list the related algorithms below:

  • Gauss-R: Algorithm 2.1 with random starting point.

  • Gauss-4.1: Algorithm 2.1 with Strategy 4.1.

  • Gauss-4.2: Algorithm 2.1 with Strategy 4.2.

  • AVM-R: Algorithm 2.2 with random starting point.

  • AVM-4.1: Algorithm 2.2 with Strategy 4.1.

  • AVM-4.2: Algorithm 2.2 with Strategy 4.2.

Example 5.1

The matrix X is given by:

We centered each column of X and got a matrix \(\hat{X}\). Then, the matrix A was formed by \(A=\hat{X}^T\hat{X}\). Here, \(m=4, n_1=4, n_2=3, n_3=4, n_4=3\). This example, which is based on data from sensory evaluation of port wines, had been used by Hanafi and Kiers [6].

Example 5.2

The matrix A is given by

with \(m=3, n_1=3, n_2=3, n_3=4\).

The numerical results of Examples 5.1 and 5.2 are documented in Table 1. These results show that for these examples Strategies 4.1 and 4.2 improved the performance of Algorithms 2.1 and 2.2 in terms of iteration numbers and CPU time.

Table 1 Numerical results for Examples 5.1 and 5.2

Example 5.3

In this example, we randomly generated 100 matrices of size \(100\times 100\), with \(m = 4,\ n_1=10,\ n_i=n_1+(i-1)10,\quad i=2,3,4\). These matrices were constructed by the following Matlab code.

$$\begin{aligned} C&=rand(100,100);\\ [Q,R]&=qr(C);\\ \varLambda&=diag(rand(1,100));\\ A&=Q*\varLambda *Q'. \end{aligned}$$

For each matrix A, we ran Algorithms 2.1 and 2.2 starting from 100 randomly generated initial points. Thus for each A, we obtained 204 objective function values. The numerical results are documented in Table 2.

Table 2 Numerical results for Example 5.3

The results in Table 2 show that Strategy 4.2 significantly improved the performance of Algorithms 2.1 and 2.2 in terms of iterative steps, CPU time and “Total Global”. Strategy 4.2 is much better than Strategy 4.1. In addition, for this example, Algorithm 2.1 is faster than Algorithm 2.2.

Example 5.4

In this example, we randomly generated 100 matrices, with \(m = 4, n_1=10, n_i=n_1+(i-1)10,i=2,3,4\). These matrices were constructed by the following Matlab code.

$$\begin{aligned} C&=rand(100,100);\\ [Q,R]&=qr(C);\\ \varLambda&=diag(100*rand(1,20),rand(1,80));\\ A&=Q*\varLambda *Q' \end{aligned}$$

The numerical results are documented in Table 3.

Table 3 Numerical results for Example 5.4

The results in Table 3 show that Strategy 4.1 significantly improved the performance of Algorithms 2.1 and 2.2 in terms of iterative steps, CPU time and “Total Global”. Strategy 4.1 is much better than Strategy 4.2.

Summarizing the results of Examples 5.1-5.4, we can make several observations as follows.

  1. (i)

    Generally, both AVM and Gauss–Seidel methods converge to the global minima with a low probability if the starting point is randomly selected.

  2. (ii)

    Strategies 4.1 and 4.2 can significantly improve these iterative methods in terms of iterative steps and CPU time.

  3. (iii)

    Strategy 4.1 and/or Strategy 4.2 can significantly boost up the probability of getting a “better” solution of Genvar.

  4. (iv)

    The Gauss–Seidel method frequently converges faster than the AVM in term of CPU time.

6 Final Remarks

In this paper, the AVM method proposed by Kettenring for the Genvar criterion is analysed. An inexact version of AVM, the Gauss–Seidel method, is presented. We suggest two starting point strategies to improve these iterative methods in terms of rate of convergence and the probability of finding a global solution of Genvar.

We ran Examples 5.3 and 5.4 for large m. The results show that, as m increases, the efficiency of Strategies 4.1 and 4.2 descend. Hence, how to construct other starting point strategy is a future topic. We hope the results in Sect. 3 can enlighten us on this topic.