1 Introduction

In recent years, we have witnessed a burst of quaternion representations in many fields, including color imaging [9,10,11, 27, 39], signal processing [14, 15], robotics [12], rolling bearing fault diagnosis [42], quaternion convolutional neural networks (QCNNs) [29, 43], etc. Moreover, there are some noticeable steps towards optimizing the corresponding quaternion represented problems. Specifically, Qi et al. [31, 32] conducted a systematic study on quaternion matrix optimization, and Flamant et al. [16] proposed a general framework for constrained convex quaternion optimization. In terms of algorithms in the quaternion domain, affine projection algorithms [38] and learning algorithms [21] based on gradient and Hessian have been proposed and analyzed. Hence, the increasing number of quaternion-represented applications and the studies on the associated optimization problems call for a deeper understanding of the quaternion structure that could lead to some efficient solution methods. In this paper, we shall focus on one algebraic quaternion structure: matrix rank-one decomposition, and show that such decomposition admits a stronger property than that in the real and complex domain by leveraging the intrinsic quaternion nature. We further show that such merit of rank-one decomposition can be extended to some of its theoretical implications such as \({\mathcal {S}}\)-Procedure, joint numerical range and quadratically constrained quadratic programming (QCQP) in the quaternion domain, and improve the associated results in the real and complex domains.

The matrix rank-one decomposition that we discuss in this paper is a technique of decomposing a positive semidefinite Hermitian matrix into the sum of rank-one matrices to satisfy the so-called equal inner product property, i.e., the inner product between some given matrices and each rank-one term in the decomposition has the same value. The first such type of decomposition was introduced by Sturm and Zhang in [36] with the equal inner product property valid for one matrix, and it was used as a key technique to establish the Linear Matrix Inequality presentation of a class of matrix cones with its quadratic form co-positive over the real domain. Moreover, such a decomposition technique was found to be useful in quadratic minimization [41] and designing approximation algorithms for biquadratic optimization [25]. Soon after the work of [36], Huang and Zhang [20] extended the matrix rank-one decomposition to the complex domain such that the equal inner product property holds for two matrices. Interestingly, we find that such property remains valid for four matrices when the rank-one decomposition is conducted in the quaternion domain. Moreover, our proof is fully constructive and the corresponding computational procedure is summarized in Algorithm 1.

Theoretical implications of our novel matrix decomposition technique are quite versatile and yield stronger results than those in the real and complex domains. The first two theoretical implications are in joint numerical range and \({\mathcal {S}}\)-Procedure, both of which have some fundamental impacts and wide applications in many fields. In particular, the joint numerical range is an important tool in linear algebra and convex analysis, and it is found to be useful in spectrum analysis [33] and quantum computing [13, 35]. \({\mathcal {S}}\)-Procedure occupies a crucial position in the field of robust optimization [2, 3, 17], statistics [19], signal processing [26], among others [30]. With our matrix rank-one decomposition result, we manage to establish the convexity of joint numerical range [4, 28] for five matrices and the lossless of \({\mathcal {S}}\)-Procedure for four Hermitian forms. As a comparison, similar results only hold for fewer matrices in the real and complex domains. In addition, our result can also be applied to quadratically constrained quadratic programming (QCQP). To be specific, when the number of quadratic constraints is no larger than 4, we show that a rank-one solution of the SDP relaxation of (QCQP), which is hence an optimal solution of (QCQP), can be recovered from our matrix rank-one decomposition technique.

This paper is organized as follows. In Sect. 2, we introduce some notations and definitions used throughout this paper. Section 3 is devoted to the new quaternion matrix rank-one decomposition theorem. To showcase the capability of our new theorem, we illustrate some improved results in the joint numerical range in Sect. 4 and the \({\mathcal {S}}\)-Procedure in Sect. 5. Finally, we present how to solve the (QCQP) with our novel decomposition technique as another theoretical implication of our result in Sect. 6.

2 Preliminaries

In this section, we introduce some basic algebraic operations in the quaternion domain for scalars, vectors and matrices.

2.1 Quaternion operations for scalars

We define the set of quaternions \({\mathbb {H}}\) as a 4-dimensional normed division algebra over the real numbers \({\mathbb {R}}\). It has a canonical basis \(\{1, {\varvec{i}}, {\varvec{j}}, {\varvec{k}}\}\), where \({\varvec{i}}, {\varvec{j}}, {\varvec{k}}\) are imaginary units such that

$$\begin{aligned} {\varvec{i}}^2={\varvec{j}}^2={\varvec{k}}^2={\varvec{i}}{\varvec{j}}{\varvec{k}}=-1,\quad {\varvec{i}}{\varvec{j}}=-{\varvec{j}}{\varvec{i}}={\varvec{k}}. \end{aligned}$$
(1)

Then, any quaternion \(q \in {\mathbb {H}}\) can be written as

$$\begin{aligned} q = q_a + q_b{\varvec{i}}+ q_c{\varvec{j}}+ q_d{\varvec{k}}, \end{aligned}$$

where \(q_a,q_b,q_c,q_d \in {\mathbb {R}}\) are the components of q. The real and imaginary parts of q are denoted as \(\mathrm{Re~}q=q_a\) and \(\mathrm{Im~}q=q_b{\varvec{i}}+ q_c{\varvec{j}}+ q_d{\varvec{k}}\) respectively. Note that, in contrast with the product operation in the real and complex domain, the product operation in the quaternion domain is noncommutative, i.e., \(q \cdot p \ne p \cdot q\) for \(p, q \in {\mathbb {H}}\).

We denote by \({\overline{q}}=\mathrm{Re~}q-\mathrm{Im~}q\) the quaternion conjugate of q, and it holds that \(\overline{(p \cdot q)}={\overline{q}} \cdot {\overline{p}}\). For a given quaternion \(q \in {\mathbb {H}}\), |q| denotes its modulus and can be expressed as

$$\begin{aligned} |q|=\sqrt{q \cdot {\overline{q}}}=\sqrt{{\overline{q}} \cdot q}=\sqrt{q_a^2+q_b^2+q_c^2+q_d^2}. \end{aligned}$$

At last, any non-zero quaternion q has an inverse \(q^{-1}={\overline{q}}/|q|^2\) and the inverse of the product of two quaternions is \((p \cdot q)^{-1}=q^{-1} \cdot p^{-1}\).

2.2 Quaternion vectors and quaternion matrices

Similar to the scalar case, a quaternion vector \({\varvec{q}}\in {\mathbb {H}}^n\) can be written as

$$\begin{aligned} {\varvec{q}}= {\varvec{q}}_a + {\varvec{q}}_b{\varvec{i}}+ {\varvec{q}}_c{\varvec{j}}+ {\varvec{q}}_d{\varvec{k}}, \end{aligned}$$

where \({\varvec{q}}_a,{\varvec{q}}_b,{\varvec{q}}_c,{\varvec{q}}_d \in {\mathbb {R}}^n\) are the components of \({\varvec{q}}\). For a quaternion vector \({\varvec{q}}\), \({\varvec{q}}^{\top }\) denotes the transpose of \({\varvec{q}}\), and \({\varvec{q}}^{H}=(\overline{{\varvec{q}}})^{\top } = \overline{({\varvec{q}}^{\top })}\) denotes its conjugate transpose. Any quaternion matrix \(A\in {\mathbb {H}}^{m\times n}\) can be expressed as

$$\begin{aligned} A = A_a+A_b{\varvec{i}}+A_c{\varvec{j}}+A_d{\varvec{k}}\end{aligned}$$

with \(A_a,A_b,A_c,A_d \in {\mathbb {R}}^{m\times n}\). The transpose and the conjugate transpose of A are \(A^{\top }\) and \(A^H=({\overline{A}})^{\top } = \overline{(A^{\top })}\), respectively. For two quaternion matrices \(A \in {\mathbb {H}}^{m \times n}\) and \(B \in {\mathbb {H}}^{n \times p}\), we have \((AB)^H = B^HA^H\); however, due to the noncommutativity of the product operation, \((AB)^\top \not = B^\top A^\top \) and \(\overline{(AB)} \not = {\overline{A}} \ {\overline{B}}\). For two vectors \({\varvec{q}}, {\varvec{p}}\in {\mathbb {H}}^n\), their inner product

$$\begin{aligned} {\varvec{q}}\bullet {\varvec{p}}= \mathrm{Re~}({\varvec{q}}^H{\varvec{p}}) = {\varvec{q}}_a^\top {\varvec{p}}_a+{\varvec{q}}_b^\top {\varvec{p}}_b+{\varvec{q}}_c^\top {\varvec{p}}_c+{\varvec{q}}_d^\top {\varvec{p}}_d. \end{aligned}$$

Similarly, for two matrices \(A ,B \in {\mathbb {H}}^{n \times n}\), their inner product is defined as

$$\begin{aligned} A \bullet B := \mathrm{Re~}(\mathrm{tr}\, A^HB)=\mathrm{tr}\, (A_a^\top B_a + A_b^\top B_b + A_c^\top B_c + A_d^\top B_d), \end{aligned}$$

where ‘tr’ denotes the trace of a matrix.

2.3 Hermitian and positive semidefinite matrices

We call X a quaternion Hermitian matrix if it satisfies \(X = X^H\). The cone of quaternion Hermitian matrices is denoted as \({\mathcal {H}}^{n}\). Then \({\varvec{u}}^HX{\varvec{u}}\) is real for all \({\varvec{u}}\in {\mathbb {H}}^n\) if and only if \(X \in {\mathcal {H}}^{n}\). We denote by \({\mathcal {S}}_{+}^{n}({\mathcal {S}}_{++}^{n})\), \({\mathcal {C}}_{+}^{n}({\mathcal {C}}_{++}^{n})\) and \({\mathcal {H}}_{+}^{n}({\mathcal {H}}_{++}^{n})\) the cones of real symmetric positive semidefinite (positive definite), complex Hermitian positive semidefinite (positive definite) and quaternion Hermitian positive semidefinite (positive definite) matrices, respectively. The notation \(X \succeq 0\) (\(X \succ 0\)) means that X is positive semidefinite (positive definite). Then, for any matrix \(X \succeq 0\) (\(X \succ 0\)), we have \({\varvec{u}}^HX{\varvec{u}}\) is real and nonnegative (positive) for all \({\varvec{u}}\in {\mathbb {H}}^n\) (\(0 \ne {\varvec{u}}\in {\mathbb {H}}^n\)).

3 The quaternion matrix Rank-One decomposition method

In this section, we discuss a particular rank-one decomposition of matrices over the quaternion domain such that the inner product between some given matrices and each rank-one term in the decomposition has the same value.

In the real domain, Sturm and Zhang [36] showed that for a rank-r symmetric positive semidefinite matrix \(Y \in {\mathcal {S}}_+^{n}\) and a real symmetric matrix \(B \in {\mathcal {S}}^{n}\), there is a rank-one decomposition of Y such that:

$$\begin{aligned} Y = \sum _{i=1}^r{\varvec{y}}_i{\varvec{y}}_i^\top ~~\mathrm{and}~~{\varvec{y}}_i^\top B{\varvec{y}}_i= ({\varvec{y}}_i{\varvec{y}}_i^\top ) \bullet B = \frac{B \bullet Y}{r},~~\text{ for }~i=1,2,\cdots ,r. \end{aligned}$$

Subsequently, Huang and Zhang [20] proved that the above decomposition could hold for two matrices in the complex domain. In particular, suppose \(Z\in {\mathcal {C}}_+^n\) is a complex Hermitian positive semidefinite matrix of rank r, and \(C_1,C_2 \in {\mathcal {C}}^n\) are two given complex Hermitian matrices. Then, there is a rank-one decomposition of Z such that:

$$\begin{aligned} Z = \sum _{i=1}^r{\varvec{z}}_i{\varvec{z}}_i^H~~\mathrm{and}~~{\varvec{z}}_i^H C_k{\varvec{z}}_i = \frac{C_k \bullet Z}{r}, ~~\text{ for }~i=1,2,\cdots ,r~\mathrm{and}~k=1,2. \end{aligned}$$

Thus, a natural question arises: can a similar decomposition holds for more matrices over the quaternion domain?

3.1 The main result

The following theorem is the main result of this paper and gives an affirmative answer to the question above.

Theorem 1

Let \(A_k \in {\mathcal {H}}^{n}\), \(k = 1, 2, 3, 4\), and rank-r matrix \(X \succeq 0\). There exists a rank-one decomposition of X such that

$$\begin{aligned} X = \sum _{i=1}^r{\varvec{x}}_i{\varvec{x}}_i^H~~\mathrm{and}~~{\varvec{x}}_i^HA_k{\varvec{x}}_i = \frac{A_k \bullet X}{r},~~{\mathrm{for}}~~i = 1,\cdots ,r \end{aligned}$$
(2)

and \(k = 1, 2, 3, 4\).

Proof

We start with a weaker version of (2) such that it holds only for two matrices \(A_1\) and \(A_2\), i.e., there exists a rank-one decomposition of X such that:

$$\begin{aligned} X = \sum _{i=1}^r{\varvec{u}}_i{\varvec{u}}_i^H ~~\mathrm{and}~~ {\varvec{u}}_i^HA_k{\varvec{u}}_i = \frac{A_k \bullet X}{r}, ~\mathrm{for}~ i = 1,\cdots ,r~\text{ and }~ k = 1, 2 \end{aligned}$$
(3)

according to the similar argument of Theorem 2.1 in [20]. The rest of the proof consists of two steps.

Step 1: prove (2) for \(k=1,2,3\)

The conclusion follows if the decomposition in (3) is also valid for matrix \(A_3\), i.e., \({\varvec{u}}_i^HA_3{\varvec{u}}_i = \frac{A_3 \bullet X}{r}\), \(i = 1,\cdots ,r\). Otherwise, without loss of generality, there exist two vectors \({\varvec{u}}_1\) and \({\varvec{u}}_2\) such that

$$\begin{aligned} {\varvec{u}}_1^HA_3{\varvec{u}}_1 > \frac{A_3 \bullet X}{r} \ \mathrm{and} \ {\varvec{u}}_2^HA_3{\varvec{u}}_2 < \frac{A_3 \bullet X}{r}. \end{aligned}$$

Denote \({\varvec{u}}_1^HA_1{\varvec{u}}_2 = a_1 + b_1{\varvec{i}}+ c_1{\varvec{j}}+ d_1{\varvec{k}}\) and \({\varvec{u}}_1^HA_2{\varvec{u}}_2 = a_2 + b_2{\varvec{i}}+ c_2{\varvec{j}}+ d_2{\varvec{k}}\). Let \(\omega = \omega _a + \omega _b{\varvec{i}}+ \omega _c{\varvec{j}}+ \omega _d{\varvec{k}}\in {\mathbb {H}}\) and construct

$$\begin{aligned} {\varvec{v}}_1 = \frac{{\varvec{u}}_1\omega + {\varvec{u}}_2}{\sqrt{1 + |\omega |^2}} \ , \ {\varvec{v}}_2 = \frac{-{\varvec{u}}_1 + {\varvec{u}}_2\overline{\omega }}{\sqrt{1 + |\omega |^2}}. \end{aligned}$$

Then, it is easy to verify that

$$\begin{aligned}&{\varvec{v}}_1{\varvec{v}}_1^H + {\varvec{v}}_2{\varvec{v}}_2^H \nonumber \\&\quad = \frac{1}{1 + |\omega |^2}\left( ({\varvec{u}}_1\omega + {\varvec{u}}_2)({\varvec{u}}_1\omega + {\varvec{u}}_2)^H\right) + \frac{1}{1 + |\omega |^2}\left( (-{\varvec{u}}_1 + {\varvec{u}}_2\overline{\omega })(-{\varvec{u}}_1 + {\varvec{u}}_2\overline{\omega })^H\right) \nonumber \\&\quad = \frac{1}{1 + |\omega |^2}\big (({\varvec{u}}_1\omega \overline{\omega }{\varvec{u}}_1^H + {\varvec{u}}_1\omega {\varvec{u}}_2^H + {\varvec{u}}_2\overline{\omega }{\varvec{u}}_1^H + {\varvec{u}}_2{\varvec{u}}_2^H) \nonumber \\&\qquad + ({\varvec{u}}_1{\varvec{u}}_1^H - {\varvec{u}}_1\omega {\varvec{u}}_2^H - {\varvec{u}}_2\overline{\omega }{\varvec{u}}_1^H + {\varvec{u}}_2\overline{\omega }\omega {\varvec{u}}_2^H)\big ) \nonumber \\&\quad = \frac{|\omega |^2{\varvec{u}}_1{\varvec{u}}_1^H + {\varvec{u}}_2{\varvec{u}}_2^H + {\varvec{u}}_1{\varvec{u}}_1^H + |\omega |^2{\varvec{u}}_2{\varvec{u}}_2^H}{1 + |\omega |^2} \nonumber \\&\quad = {\varvec{u}}_1{\varvec{u}}_1^H + {\varvec{u}}_2{\varvec{u}}_2^H. \end{aligned}$$
(4)

Note that the identity above holds for any \(\omega \), which will allow us some flexibility to find an appropriate \(\omega \) such that

$$\begin{aligned} (1 + |\omega |^2){\varvec{v}}_1^HA_k{\varvec{v}}_1=(1 + |\omega |^2)\frac{A_k \bullet X}{r},~~\text{ for }~~k=1,2. \end{aligned}$$
(5)

In particular, observe that

$$\begin{aligned}&(1 + |\omega |^2){\varvec{v}}_1^HA_k{\varvec{v}}_1 \nonumber \\&\quad =(\overline{\omega }{\varvec{u}}_1^H + {\varvec{u}}_2^H)A_k({\varvec{u}}_1\omega + {\varvec{u}}_2) \nonumber \\&\quad = \overline{\omega }{\varvec{u}}_1^HA_k{\varvec{u}}_1\omega + {\varvec{u}}_2^HA_k{\varvec{u}}_2 + \overline{\omega }{\varvec{u}}_1^HA_k{\varvec{u}}_2 + {\varvec{u}}_2^HA_k{\varvec{u}}_1\omega \nonumber \\&\quad = (1 + |\omega |^2)\frac{A_k \bullet X}{r} + 2\mathrm{Re~}(\overline{\omega }{\varvec{u}}_1^HA_k{\varvec{u}}_2) \nonumber \\&\quad = (1 + |\omega |^2)\frac{A_k \bullet X}{r}, ~\text{ for } ~k=1,2 \end{aligned}$$
(6)

as long as

$$\begin{aligned}&\mathrm{Re~}(\overline{\omega }{\varvec{u}}_1^HA_k{\varvec{u}}_2) \nonumber \\&\quad = \mathrm{Re~}\left( (\omega _a - \omega _b{\varvec{i}}- \omega _c{\varvec{j}}- \omega _d{\varvec{k}})(a_k + b_k{\varvec{i}}+ c_k{\varvec{j}}+ d_k{\varvec{k}})\right) \nonumber \\&\quad = a_k\omega _a + b_k\omega _b + c_k\omega _c + d_k\omega _d=0. \end{aligned}$$
(7)

Therefore, any solution of the following equations

$$\begin{aligned} {\left\{ \begin{array}{ll} a_1\omega _a + b_1\omega _b + c_1\omega _c + d_1\omega _d = 0 \\ a_2\omega _a + b_2\omega _b + c_2\omega _c + d_2\omega _d = 0, \end{array}\right. } \end{aligned}$$
(8)

will make (5) valid. Moreover, for a particular solution \((\omega _{a*}, \omega _{b*}, \omega _{c*}, \omega _{d*})\) of (8) with \(\omega _{a*}^2 + \omega _{b*}^2 + \omega _{c*}^2+ \omega _{d*}^2 =1\), \(\omega (\alpha ) = \alpha (\omega _{a*}+ \omega _{b*}{\varvec{i}}+ \omega _{c*}{\varvec{j}}+ \omega _{d*}{\varvec{k}})\) is also a solution of (8) with \( |\omega (\alpha )|=\alpha \) for any \(\alpha > 0\), as (8) is homogeneous. To extend the validness of (5) to matrix \(A_3\) for some \(\alpha \), i.e.,

$$\begin{aligned} (1 + \alpha ^2){\varvec{v}}_1^HA_3{\varvec{v}}_1= (1 + |\omega (\alpha )|^2){\varvec{v}}_1^HA_3{\varvec{v}}_1=(1 + \alpha ^2)\frac{A_3 \bullet X}{r}, \end{aligned}$$
(9)

we compute

$$\begin{aligned} \begin{aligned}&(1 + |\omega (\alpha )|^2){\varvec{v}}_1^HA_3{\varvec{v}}_1 \\&\quad = (\overline{\omega (\alpha )}{\varvec{u}}_1^H + {\varvec{u}}_2^H)A_3({\varvec{u}}_1\omega (\alpha ) + {\varvec{u}}_2) \\&\quad = \overline{\omega (\alpha )}{\varvec{u}}_1^HA_3{\varvec{u}}_1\omega (\alpha ) + {\varvec{u}}_2^HA_3{\varvec{u}}_2 + 2\mathrm{Re~}(\overline{\omega (\alpha )}{\varvec{u}}_1^HA_3{\varvec{u}}_2) \\&\quad = |\omega (\alpha )|^2{\varvec{u}}_1^HA_3{\varvec{u}}_1 + {\varvec{u}}_2^HA_3{\varvec{u}}_2 + 2\mathrm{Re~}(\overline{\omega (\alpha )}{\varvec{u}}_1^HA_3{\varvec{u}}_2)\\&\quad = \alpha ^2{\varvec{u}}_1^HA_3{\varvec{u}}_1 + {\varvec{u}}_2^HA_3{\varvec{u}}_2 + 2\mathrm{Re~}(\overline{\omega (\alpha )}{\varvec{u}}_1^HA_3{\varvec{u}}_2). \end{aligned} \end{aligned}$$
(10)

Combing (10) and \(\mathrm{Re~}(\overline{\omega (\alpha )}{\varvec{u}}_1^HA_3{\varvec{u}}_2)=\alpha \mathrm{Re~}(\overline{\omega ^*}{\varvec{u}}_1^HA_3{\varvec{u}}_2)\), identity (9) that we want to prove is equivalent to

$$\begin{aligned} \left( {\varvec{u}}_1^HA_3{\varvec{u}}_1 - \frac{A_3 \bullet X}{r}\right) \alpha ^2 + 2\iota \cdot \alpha + \left( {\varvec{u}}_2^HA_3{\varvec{u}}_2 - \frac{A_3 \bullet X}{r}\right) = 0, \end{aligned}$$
(11)

\(\iota = \mathrm{Re~}(\overline{\omega ^*}{\varvec{u}}_1^HA_3{\varvec{u}}_2)\) and \(\omega ^*=\omega _{a^*}+\omega _{b^*}{\varvec{i}}+\omega _{c^*}{\varvec{j}}+\omega _{d^*}{{\varvec{k}}}\). Then (11) is just a quadratic equation in \(\alpha \), and it must have two real roots with opposite signs since \({\varvec{u}}_1^HA_3{\varvec{u}}_1 > \frac{A_3 \bullet X}{r}\) and \({\varvec{u}}_2^HA_3{\varvec{u}}_2 < \frac{A_3 \bullet X}{r}\). Let \(\alpha _*\) be the positive root of (11) and construct \(\omega _* = \alpha _*(\omega _{a*} + \omega _{b*}{\varvec{i}}+ \omega _{c*}{\varvec{j}}+ \omega _{d*}{\varvec{k}})\). Then (9) holds with \(\alpha = \alpha _*\), and thus \({\varvec{v}}_1^HA_k{\varvec{v}}_1 = \frac{A_k \bullet X}{r}\) for \(k = 1, 2, 3\). Since \(X = \sum _{i=1}^r{\varvec{x}}_i{\varvec{x}}_i^H\), we have

Therefore, rank\((X - {\varvec{v}}_1{\varvec{v}}_1^H) = r -1\), , \(i = 3,\cdots ,r\) for \(k = 1, 2\), and

Then, we can recursively repeat the above process on \(X - {\varvec{v}}_1{\varvec{v}}_1^H\) and finally obtain a rank-one decomposition of X such that

$$\begin{aligned} X = \sum _{i=1}^r{\varvec{v}}_i{\varvec{v}}_i^H~~\mathrm{and}~~{\varvec{v}}_i^HA_k{\varvec{v}}_i = \frac{A_k \bullet X}{r},~\text{ for }~i = 1,\cdots ,r~\text{ and }~k=1,2,3. \end{aligned}$$
(12)

Step 2: prove (2) for \(k=1,2,3,4\)

The conclusion follows if the decomposition in (12) is also valid for matrix \(A_4\), i.e., \({\varvec{v}}_i^HA_4{\varvec{v}}_i = \frac{A_4 \bullet X}{r}\), \(i = 1,\cdots ,r\). Otherwise, without loss of generality, there exist two vectors \({\varvec{v}}_1\) and \({\varvec{v}}_2\) such that

$$\begin{aligned} {\varvec{v}}_1^HA_4{\varvec{v}}_1 > \frac{A_4 \bullet X}{r} \ \mathrm{and} \ {\varvec{v}}_2^HA_4{\varvec{v}}_2 < \frac{A_4 \bullet X}{r}. \end{aligned}$$

Denote \({\varvec{v}}_1^HA_1{\varvec{v}}_2 = {\hat{a}}_1 + {\hat{b}}_1{\varvec{i}}+ {\hat{c}}_1{\varvec{j}}+ {\hat{d}}_1{\varvec{k}}\), \({\varvec{v}}_1^HA_2{\varvec{v}}_2 = {\hat{a}}_2 + {\hat{b}}_2{\varvec{i}}+ {\hat{c}}_2{\varvec{j}}+ {\hat{d}}_2{\varvec{k}}\) and \({\varvec{v}}_1^HA_3{\varvec{v}}_2 = {\hat{a}}_3 + {\hat{b}}_3{\varvec{i}}+ {\hat{c}}_3{\varvec{j}}+ {\hat{d}}_3{\varvec{k}}\). Let \({\hat{\omega }} = {\hat{\omega }}_a + {\hat{\omega }}_b{\varvec{i}}+ {\hat{\omega }}_c{\varvec{j}}+ {\hat{\omega }}_d{\varvec{k}}\in {\mathbb {H}}\) and construct

$$\begin{aligned} {\varvec{x}}_1 = \frac{{\varvec{v}}_1{\hat{\omega }} + {\varvec{v}}_2}{\sqrt{1 + |{\hat{\omega }}|^2}} \ , \ {\varvec{x}}_2 = \frac{-{\varvec{v}}_1 + {\varvec{v}}_2\overline{{\hat{\omega }}}}{\sqrt{1 + |{\hat{\omega }}|^2}}. \end{aligned}$$

Then, similar to (4), we can verify that

$$\begin{aligned} {\varvec{x}}_1{\varvec{x}}_1^H + {\varvec{x}}_2{\varvec{x}}_2^H={\varvec{v}}_1{\varvec{v}}_1^H + {\varvec{v}}_2{\varvec{v}}_2^H \end{aligned}$$
(13)

holds for any \({\hat{\omega }}\). Moreover, when \(\mathrm{Re~}(\overline{{\hat{\omega }}}{\varvec{v}}_1^HA_k{\varvec{v}}_2) = 0\), or equivalently \({\hat{a}}_k{\hat{\omega }}_a + {\hat{b}}_k{\hat{\omega }}_b + {\hat{c}}_k{\hat{\omega }}_c + {\hat{d}}_k{\hat{\omega }}_d = 0\), it holds that

$$\begin{aligned} (1 + |{\hat{\omega }}|^2){\varvec{x}}_1^HA_k{\varvec{x}}_1=(1 + |{\hat{\omega }}|^2)\frac{A_k \bullet X}{r},~~\text{ for }~~k=1,2,3. \end{aligned}$$
(14)

In other words, any solution of the following equations

$$\begin{aligned} {\left\{ \begin{array}{ll} {\hat{a}}_1{\hat{\omega }}_a + {\hat{b}}_1{\hat{\omega }}_b + {\hat{c}}_1{\hat{\omega }}_c + {\hat{d}}_1{\hat{\omega }}_d = 0 \\ {\hat{a}}_2{\hat{\omega }}_a + {\hat{b}}_2{\hat{\omega }}_b + {\hat{c}}_2{\hat{\omega }}_c + {\hat{d}}_2{\hat{\omega }}_d = 0 \\ {\hat{a}}_3{\hat{\omega }}_a + {\hat{b}}_3{\hat{\omega }}_b + {\hat{c}}_3{\hat{\omega }}_c + {\hat{d}}_3{\hat{\omega }}_d = 0, \end{array}\right. } \end{aligned}$$
(15)

will make (14) valid. Since (14) is homogeneous and has three identities and four variables, we can find a particular solution \(({\hat{\omega }}_{a*}, {\hat{\omega }}_{b*}, {\hat{\omega }}_{c*}, {\hat{\omega }}_{d*})\) of (15) with \({\hat{\omega }}_{a*}^2 + {\hat{\omega }}_{b*}^2 + {\hat{\omega }}_{c*}^2+ {\hat{\omega }}_{d*}^2 =1\) such that \({\hat{\omega }}({\hat{\alpha }}) = {\hat{\alpha }}({\hat{\omega }}_{a*}+ {\hat{\omega }}_{b*}{\varvec{i}}+ {\hat{\omega }}_{c*}{\varvec{j}}+ {\hat{\omega }}_{d*}{\varvec{k}})\) is also a solution of (15) with \( |{\hat{\omega }}({\hat{\alpha }})|={\hat{\alpha }}\) for any \({\hat{\alpha }} > 0\). To extend the validness of (14) to matrix \(A_4\) for some \({\hat{\alpha }}\), i.e.,

$$\begin{aligned} (1 + {\hat{\alpha }}^2){\varvec{x}}_1^HA_4{\varvec{x}}_1= (1 + |{\hat{\omega }}({\hat{\alpha }})|^2){\varvec{x}}_1^HA_4{\varvec{x}}_1=(1 + {\hat{\alpha }}^2)\frac{A_4 \bullet X}{r}, \end{aligned}$$
(16)

we compute

$$\begin{aligned} \begin{aligned}&(1 + |{\hat{\omega }}({\hat{\alpha }})|^2){\varvec{x}}_1^HA_4{\varvec{x}}_1 \\&\quad = (\overline{{\hat{\omega }}({\hat{\alpha }})}{\varvec{v}}_1^H + {\varvec{v}}_2^H)A_4({\varvec{v}}_1{\hat{\omega }}({\hat{\alpha }}) + {\varvec{v}}_2) \\&\quad = |{\hat{\omega }}({\hat{\alpha }})|^2{\varvec{v}}_1^HA_4{\varvec{v}}_1 + {\varvec{v}}_2^HA_4{\varvec{v}}_2 + 2\mathrm{Re~}(\overline{{\hat{\omega }}({\hat{\alpha }})}{\varvec{v}}_1^HA_4{\varvec{v}}_2)\\&\quad = {\hat{\alpha }}^2{\varvec{v}}_1^HA_4{\varvec{v}}_1 + {\varvec{v}}_2^HA_4{\varvec{v}}_2 + 2\mathrm{Re~}(\overline{{\hat{\omega }}({\hat{\alpha }})}{\varvec{v}}_1^HA_4{\varvec{v}}_2). \end{aligned} \end{aligned}$$
(17)

Combining \(\mathrm{Re~}(\overline{{\hat{\omega }}({\hat{\alpha }})}{\varvec{v}}_1^HA_4{\varvec{v}}_2)={\hat{\alpha }}\mathrm{Re~}(\overline{{\hat{\omega }}^*}{\varvec{v}}_1^HA_4{\varvec{v}}_2)\) with (17), identity (16) that we want to prove is equivalent to the following real quadratic equation in terms of \({\hat{\alpha }}\):

$$\begin{aligned} \left( {\varvec{v}}_1^HA_4{\varvec{v}}_1 - \frac{A_4 \bullet X}{r}\right) {\hat{\alpha }}^2 + 2{\hat{\iota }}{\hat{\alpha }} + \left( {\varvec{v}}_2^HA_4{\varvec{v}}_2 - \frac{A_4 \bullet X}{r}\right) = 0, \end{aligned}$$
(18)

where \({\hat{\iota }}=\mathrm{Re~}(\overline{{\hat{\omega }}^*}{\varvec{v}}_1^HA_4{\varvec{v}}_2)\) is independent of \({{\hat{\alpha }}}\) as \({\hat{\omega }}^*={\hat{\omega }}_{a*} + {\hat{\omega }}_{b*}{\varvec{i}}+ {\hat{\omega }}_{c*}{\varvec{j}}+ {\hat{\omega }}_{d*}{\varvec{k}}\) and \({\varvec{v}}_1^HA_4{\varvec{v}}_2\) are both independent of \({{\hat{\alpha }}}\). Note that (18) must have two real roots with opposite signs since \({\varvec{v}}_1^HA_4{\varvec{v}}_1 > \frac{A_4 \bullet X}{r}\) and \({\varvec{v}}_2^HA_4{\varvec{v}}_2 < \frac{A_4 \bullet X}{r}\). Let \({\hat{\alpha }}_*\) be the positive root of (18) and construct \({\hat{\omega }}_* = {\hat{\alpha }}_*({\hat{\omega }}_{a*} + {\hat{\omega }}_{b*}{\varvec{i}}+ {\hat{\omega }}_{c*}{\varvec{j}}+ {\hat{\omega }}_{d*}{\varvec{k}})\). Then (16) holds with \({\hat{\alpha }} = {\hat{\alpha }}_*\), and thus \({\varvec{v}}_1^HA_k{\varvec{v}}_1 = \frac{A_k \bullet X}{r}\) for \(k = 1, 2, 3, 4\). Recall that \(X = \sum _{i=1}^r{\varvec{x}}_i{\varvec{x}}_i^H\), we have

$$\begin{aligned} X - {\varvec{x}}_1{\varvec{x}}_1^H \overset{(13)}{=} \sum _{j = 3}^r{\varvec{v}}_j{\varvec{v}}_j^H + {\varvec{x}}_2{\varvec{x}}_2^H \succeq 0. \end{aligned}$$

Therefore, rank\((X - {\varvec{x}}_1{\varvec{x}}_1^H) = r -1\), \( {\varvec{v}}_i^HA_k{\varvec{v}}_i \overset{(12)}{=} \frac{A_k \bullet X}{r}\), \(i = 3,\cdots ,r\) for \(k = 1, 2, 3\), and

Repeating the above process recursively on \(X - {\varvec{x}}_1{\varvec{x}}_1^H\), we will finally obtain a rank-one decomposition of X such that (2) holds for \(k=1,2,3,4\). \(\square \)

3.2 The algorithm and numerical verification

Note that the rank-one decomposition procedure we provide in the proof of Theorem 1 is actually implementable. To make it clear, we summarize all the computational procedures in Algorithm 1 below. In particular, Step 1 and Step 2 in the proof are described by Algorithm 1 with \(\ell =3\) and \(\ell =4\), respectively.

figure a

Next, we verify Theorem 1 by implementing Algorithm 1 and decomposing some randomly generated quaternion Hermitian positive semidefinite matrices. The details of generating those matrices are explained below.

  1. 1.

    Randomly generate four real \(n\times n\) matrices \(A_a, A_b, A_c, A_d\) with each component following the standard normal distribution. Construct quaternion matrix \(A=A_a+A_b{\varvec{i}}+A_c{\varvec{j}}+A_d{\varvec{k}}\).

  2. 2.

    Construct a \(4n\times 4n\) real matrix

    $$\begin{aligned} A_R=\begin{bmatrix} A_a &{} -A_b &{} -A_c &{} -A_d \\ A_b &{} ~~A_a &{} -A_d &{} ~~A_c \\ A_c &{} ~~A_d &{} ~~A_a &{} -A_b \\ A_d &{} -A_c &{} ~~A_b &{} ~~A_a \end{bmatrix}; \end{aligned}$$
  3. 3.

    Compute \(X_R=A_RA_R^{\top }\), and \(X_R\) can be written in the form of

    $$\begin{aligned} X_R=\begin{bmatrix} X_a &{} -X_b &{} -X_c &{} -X_d \\ X_b &{} ~~X_a &{} -X_d &{} ~~X_c \\ X_c &{} ~~X_d &{} ~~X_a &{} -X_b \\ X_d &{} -X_c &{} ~~X_b &{} ~~X_a \end{bmatrix}; \end{aligned}$$
  4. 4.

    Construct \(X=X_a+X_b{\varvec{i}}+X_c{\varvec{j}}+X_d{\varvec{k}}\). Then, X is a quaternion Hermitian positive semidefinite matrix as its real representation \(X_R\) [22, 24, 34] is positive semidefinite.

The procedure of randomly generating a quaternion Hermitian matrix is the same as above, except that in step 3, we let \(B_R = A_R+A_R^{\top }\) and write it as

$$\begin{aligned} B_R=\begin{bmatrix} B_a &{} -B_b &{} -B_c &{} -B_d \\ B_b &{} ~~B_a &{} -B_d &{} ~~B_c \\ B_c &{} ~~B_d &{} ~~B_a &{} -B_b \\ B_d &{} -B_c &{} ~~B_b &{} ~~B_a \end{bmatrix}. \end{aligned}$$

Then \(B=B_a+B_b{\varvec{i}}+B_c{\varvec{j}}+B_d{\varvec{k}}\) is a quaternion Hermitian matrix as the real representation \(B_R\) [22, 24, 34] is symmetric.

Now, we start to verify Theorem 1 with a toy example. In particular, we apply the above procedures to randomly generate a \(2\times 2\) quaternion Hermitian positive semidefinite matrix of rank 2:

$$\begin{aligned} X=\left[ \begin{array}{ll} 21.3453 &{} \quad 4.0942+5.8149{\varvec{i}}+11.5374{\varvec{j}}-3.1914{\varvec{k}}\\ 4.0942-5.8149{\varvec{i}}-11.5374{\varvec{j}}+3.1914{\varvec{k}}&{} \quad 22.8604 \end{array}\right] , \end{aligned}$$

and randomly generate four quaternion Hermitian matrices:

$$\begin{aligned} \begin{aligned} A1&=\left[ \begin{array}{ll} -0.1241 &{} \quad 1.4494+0.9624{\varvec{i}}-0.1539{\varvec{j}}+0.8378{\varvec{k}}\\ 1.4494-0.9624{\varvec{i}}+0.1539{\varvec{j}}-0.8378{\varvec{k}}&{} \quad 1.4172 \end{array}\right] ,\\ A2&=\left[ \begin{array}{ll} -1.0689 &{} \quad -1.8769+1.0626{\varvec{i}}+0.2803{\varvec{j}}-0.0674{\varvec{k}}\\ -1.8769-1.0626{\varvec{i}}-0.2803{\varvec{j}}+0.0674{\varvec{k}}&{} \quad 1.4384 \end{array}\right] ,\\ A3&=\left[ \begin{array}{ll} 1.0933 &{} \quad 0.1228+0.5533{\varvec{i}}-0.2985{\varvec{j}}+0.2600{\varvec{k}}\\ 0.1228-0.5533{\varvec{i}}+0.2985{\varvec{j}}-0.2600{\varvec{k}}&{} \quad 0.0774 \end{array}\right] ,\\ A4&=\left[ \begin{array}{ll} 1.5442 &{} \quad -0.7028-1.4830{\varvec{i}}-0.8267{\varvec{j}}-0.3328{\varvec{k}}\\ -0.7028+1.4830{\varvec{i}}+0.8267{\varvec{j}}+0.3328{\varvec{k}}&{} \quad -0.7423 \end{array}\right] . \end{aligned} \end{aligned}$$

Then, we apply Algorithm 1 to perform the quaternion matrix rank-one decomposition with the accuracy \(\epsilon =1e^{-8}\), and obtain:

$$\begin{aligned} \begin{aligned} x_1&=\begin{pmatrix} -0.4022+1.4665{\varvec{i}}-1.9469{\varvec{j}}+1.7181{\varvec{k}}\\ -1.2756+2.3439{\varvec{i}}+1.5973{\varvec{j}}+1.6337{\varvec{k}}\end{pmatrix},\\ x_2&= \begin{pmatrix} -0.3345+2.3804{\varvec{i}}+2.3264{\varvec{j}}+1.0489{\varvec{k}}\\ ~~2.4951+0.2452{\varvec{i}}-0.5890{\varvec{j}}+1.9713{\varvec{k}}\end{pmatrix}. \end{aligned} \end{aligned}$$

We compute the decomposition residuals

$$\begin{aligned} res_{ki}=x_i^HA_kx_i-\frac{A_k \bullet X }{r}, \text{ for } k=1,2,3,4 \text{ and } i = 1,\ldots ,r, \end{aligned}$$
(19)

and obtain

$$\begin{aligned} \begin{aligned}&res_{11}=-3.5527e^{-15},\qquad res_{12}=~~3.5527e^{-15};\\&res_{21}=~~1.7764e^{-15},\qquad res_{22}=-3.5527e^{-15};\\&res_{31}=-7.1054e^{-15},\qquad res_{32}=~~1.7764e^{-15};\\&res_{41}=~~0, \qquad \qquad \qquad ~ res_{42}=-5.3291e^{-15}. \end{aligned} \end{aligned}$$

Thus, we can see that Theorem 1 is numerically verified as the residuals are very small.

To further verify Theorem 1 for the more general case, we randomly generate 600 instances. In each instance, matrices \(X, A_1, A_2, A_3, A_4\) are generated according to the above procedure. The size of those matrices is also randomly generated under the following two settings:

$$\begin{aligned} (1)\; n=6+ \lfloor 30* \xi \rfloor , \quad (2)\; n=50+ \lfloor 30* \xi \rfloor , \end{aligned}$$

where \(\xi \) is uniformly drawn from [0, 1]. In particular, one half of the 600 instances are generated under setting (1) and setting (2) respectively. After applying Algorithm 1, the performance of each instance is measured by mean residuals, which is the average of the 4r terms defined in (19). We plot the performance of our algorithm for 600 random instances in Fig. 1 and Fig. 2, and the red line in the figures is the mean error, i.e., the average mean residuals over all instances. As we can see that Algorithm 1 performs remarkably stable and accurate as the average mean residuals is in the order of \(10^{-13}\), which further numerically verifies Theorem 1.

Fig. 1
figure 1

Performance of Algorithm 1 for 300 random instances under setting (1)

4 The joint numerical range

Numerical range is an important tool in linear algebra and convex analysis, and it has wide applications in spectrum analysis [33], quantum computing [13, 35], engineering, etc. Joint numerical range was first proposed in 1979 and is an extension of numerical range [4], which mainly focuses on the geometric properties like convexity of the joint field values of several matrices. Joint numerical range also has as wide applications as that of numerical range, and theoretically, it has a close connection to other fundamental results such as \({\mathcal {S}}\)-Lemma, which will be discussed in the next section. On the other hand, people generalized the results of joint numerical ranges in various ways [1, 8, 23, 37]. In this section, we show how to further extend the classical results on the convexity of the joint numerical ranges [4, 28] to the quaternion domain via the rank-one decomposition of quaternion matrices studied in the last section. We start with some basic concepts that will be used later.

Fig. 2
figure 2

Performance of Algorithm 1 for 300 instances under the setting (2)

Definition 1

Let A be any \(n \times n\) quaternion matrix, the numerical range of A is given by

$$\begin{aligned} {\mathcal {F}}(A) := \left\{ {\varvec{x}}^HA{\varvec{x}}\ : \ {{\varvec{x}}^H{\varvec{x}}= 1}, {\varvec{x}}\in {\mathbb {H}}^n \right\} \subseteq {\mathbb {H}}. \end{aligned}$$

With this concept in hand, we formally define the joint numerical range of a set of matrices as follows.

Definition 2

The \(joint \ numerical \ range\) of \(n \times n\) quaternion matrices \(A_1, \cdots , A_m\) is defined to be

$$\begin{aligned} {\mathcal {F}}(A_1, \cdots , A_m) := \left\{ \begin{pmatrix} {\varvec{x}}^HA_1{\varvec{x}}\\ {\varvec{x}}^HA_2{\varvec{x}}\\ \vdots \\ {\varvec{x}}^HA_m{\varvec{x}}\end{pmatrix} ~: ~{{\varvec{x}}^H{\varvec{x}}= 1}, {\varvec{x}}\in {\mathbb {H}}^n \right\} . \end{aligned}$$

Regarding the convexity of the joint numerical ranges, Hausdorff [18] showed that

“ If \(A_1\) and \(A_2\) are complex Hermitian, then \({\mathcal {F}}(A_1, A_2)\) is a convex set.”

In a slightly different form, Brickman [8] extended the above result to three matrices, i.e., suppose \(A_1, A_2, A_3\) are \(n \times n\) complex Hermitian matrices, then

$$\begin{aligned} \left\{ \begin{pmatrix} {\varvec{x}}^HA_1{\varvec{x}}\\ {\varvec{x}}^HA_2{\varvec{x}}\\ {\varvec{x}}^HA_3{\varvec{x}}\end{pmatrix} ~:~ {\varvec{x}}\in {\mathbb {C}}^n \right\} \end{aligned}$$
(20)

is a convex set, where \({\mathbb {C}}^n\) is the set of complex vectors.

As a matter of fact, the convexity of the joint numerical ranges in complex domain can also be extended to quaternion domain for more matrices. In particular, Au-Yeung and Poon [4] established the following result for quaternion Hermitian matrices.

Theorem 2

(Au-Yeung and Poon [4]) If \(n \not = 2\) and \(A_1, \cdots , A_5 \in {\mathcal {H}}^{n}\) are quaternion hermitian matrices, then

$$\begin{aligned} \left\{ \begin{pmatrix} {\varvec{x}}^HA_1{\varvec{x}}\\ {\varvec{x}}^HA_2{\varvec{x}}\\ \vdots \\ {\varvec{x}}^HA_5{\varvec{x}}\end{pmatrix} ~:~ {\varvec{x}}^H{\varvec{x}}= 1, {\varvec{x}}\in {\mathbb {H}}^n \right\} \end{aligned}$$
(21)

is a convex set.

Note that the above theorem assumes \(n \not = 2\) and this condition is necessary as a counterexample is presented in [34]. However, such condition can be relaxed if we consider the slightly different form of (20) by Brickman, where the restriction \({\varvec{x}}^H{\varvec{x}}=1\) is dropped. We present this result in Theorem 3, which is proved by our quaternion matrix rank-one decomposition theorem.

Theorem 3

Suppose that \(A_k \in {\mathcal {H}}^{n}\), \(k = 1, \cdots , 5\). Then

$$\begin{aligned} \left\{ \begin{pmatrix} {\varvec{x}}^HA_1{\varvec{x}}\\ {\varvec{x}}^HA_2{\varvec{x}}\\ \cdots \\ {\varvec{x}}^HA_5{\varvec{x}}\end{pmatrix} ~:~ {\varvec{x}}\in {\mathbb {H}}^n \right\} \end{aligned}$$
(22)

is a convex set.

Proof

To complete the proof, it suffices to show

$$\begin{aligned} \left\{ \begin{pmatrix} {\varvec{x}}^HA_1{\varvec{x}}\\ {\varvec{x}}^HA_2{\varvec{x}}\\ \cdots \\ {\varvec{x}}^HA_5{\varvec{x}}\end{pmatrix} ~:~ {\varvec{x}}\in {\mathbb {H}}^n \right\} = \left\{ \begin{pmatrix} A_1 \bullet X \\ A_2 \bullet X \\ \cdots \\ A_5 \bullet X \end{pmatrix} ~:~ X \succeq 0 \right\} \end{aligned}$$

as the right-hand side is a convex set. Noting that the right-hand side is the convex hull of the left-hand side, we only need to show the right-hand side is included by the left-hand side.

Take any nonzero vector

$$\begin{aligned} \begin{pmatrix} v_1 \\ v_2 \\ \cdots \\ v_5 \end{pmatrix} \in \left\{ \begin{pmatrix} A_1 \bullet X \\ A_2 \bullet X \\ \cdots \\ A_5 \bullet X \end{pmatrix} ~:~ X \succeq 0 \right\} . \end{aligned}$$

Without loss of generality, suppose \(v_5 \not = 0\) and \((A_k - \frac{v_k}{v_5}A_5) \bullet X = 0, ~\mathrm{for} \ k = 1, 2, 3, 4\). Then, by Theorem 1, there exits a rank-one decomposition \(X = \sum _{i=1}^r{\varvec{x}}_i{\varvec{x}}_i^H\) such that

$$\begin{aligned} {\varvec{x}}_i^H\left( A_k - \frac{v_k}{v_5}A_5\right) {\varvec{x}}_i = \frac{(A_k - \frac{v_k}{v_5}A_5) \bullet X}{r} = 0, \ k = 1,2, 3, 4 \end{aligned}$$
(23)

for \(i = 1, \cdots , r\). Since \(v_5 = A_5 \bullet X = \sum _{i=1}^r{\varvec{x}}_i^HA_5{\varvec{x}}_i\), there is at least one vector, say \({\varvec{x}}_1\) such that \({\varvec{x}}_1^HA_5{\varvec{x}}_1\) shares the same sign as \(v_5\). Let \(\rho = \sqrt{\frac{v_5}{{\varvec{x}}_1^HA_5{\varvec{x}}_1}}\) and \({\varvec{x}}= \rho {\varvec{x}}_1\), then we have

$$\begin{aligned} {\varvec{x}}^HA_5{\varvec{x}}= \rho ^2{\varvec{x}}_1^HA_5{\varvec{x}}_1 = v_5, \end{aligned}$$

and (23) also holds for \({\varvec{x}}\). Plugging \({\varvec{x}}\) and \({\varvec{x}}^HA_5{\varvec{x}}= v_5\) into (23), we have

$$\begin{aligned} {\varvec{x}}^HA_k{\varvec{x}}={\varvec{x}}^H\frac{v_k}{v_5}A_5{\varvec{x}}=\frac{v_k}{v_5}\cdot v_5=v_k \end{aligned}$$

for \(k = 1, 2, 3, 4\). Hence the vector \( \begin{pmatrix} v_1 \\ v_2 \\ \cdots \\ v_5 \end{pmatrix} \in \left\{ \begin{pmatrix} {\varvec{x}}^HA_1{\varvec{x}}\\ {\varvec{x}}^HA_2{\varvec{x}}\\ \cdots \\ {\varvec{x}}^HA_5{\varvec{x}}\end{pmatrix} ~:~ x \in {\mathbb {H}}^n \right\} \) and the conclusion follows. \(\square \)

5 S-Procedure

It is well known that \({\mathcal {S}}\)-procedure plays an important role in robust optimization [2, 3, 17], statistics [19], signal processing [26], control and stability problems [5, 6], among others. In this section, we discuss another interesting theoretical implication of our new rank-one decomposition (Theorem 1) in \({\mathcal {S}}\)-procedure. We first recall the following lemma, which is due to [40], about \({\mathcal {S}}\)-procedure in real domain.

Lemma 1

-(\({\mathcal {S}}\)-Procedure) Let \(F({\varvec{x}})={\varvec{x}}^{\top }A_0{\varvec{x}}+ 2{\varvec{b}}_0^{\top }{\varvec{x}}+ c_0\) and \(G_i({\varvec{x}}) = {\varvec{x}}^{\top }A_i{\varvec{x}}+ 2{\varvec{b}}_i^{\top }{\varvec{x}}+ c_i\), \(i=1,\ldots ,m\) be quadratic functions of \({\varvec{x}}\in {\mathbb {R}}^n\). Then \(F({\varvec{x}}) \ge 0\) for all \({\varvec{x}}\) such that \(G_i({\varvec{x}})\ge 0\), \(i=1,\ldots ,m\), if there exist \(\tau _i \ge 0 \) such that

$$\begin{aligned} \left[ \begin{array}{cc}c_0 &{} {\varvec{b}}_0^{\top }\\ {\varvec{b}}_0 &{} A_{0}\\ \end{array} \right] - \sum \limits _{i=1}^{p}\tau _i \cdot \left[ \begin{array}{cc}c_i &{} {\varvec{b}}_i^{\top }\\ {\varvec{b}}_i &{} A_i \end{array} \right] \succeq 0. \end{aligned}$$

Moreover, if \(m=1\) then the converse holds if there exists \({\varvec{x}}_0\) such that \(F({\varvec{x}}_0)>0\).

We remark that the above result essentially study the relationship between

$$\begin{aligned} G_1({\varvec{x}}) \ge 0, G_2({\varvec{x}}) \ge 0, \cdots , G_m({\varvec{x}}) \ge 0 \Rightarrow F({\varvec{x}}) \ge 0, \end{aligned}$$
(24)

and

$$\begin{aligned} \exists \tau _1 \ge 0, \tau _2 \ge 0, \cdots , \tau _m \ge 0 \ \mathrm{such \ that} \ F({\varvec{x}}) - \sum _{i=1}^m \tau _iG_i({\varvec{x}}) \ge 0 \ \forall {\varvec{x}}. \end{aligned}$$
(25)

It is obvious that (25) implies (24), and the converse also holds when \(m=1\). In the following, we call \({\mathcal {S}}\)-procedure is lossless if (24) and (25) are equivalent. Although \({\mathcal {S}}\)-procedure was first studied in the real domain, interestingly, a stronger result has been established in the complex domain.

Lemma 2

(Yakubovich [40]) Suppose \(F, G_1, G_2\) are Hermitian forms and satisfy (24). Moreover, there is \({\varvec{x}}_0 \in {\mathbb {C}}^n\) such that \(G_i({\varvec{x}}_0) > 0, i = 1,2\). Then the S-procedure is lossless for \(m = 2\).

Inspired by such result, we further consider \({\mathcal {S}}\)-procedure in the quaternion domain and manage to show that it is lossless for \(m = 4\) by Theorem 1.

Theorem 4

Suppose \(F, G_1, G_2, G_3, G_4\) are quaternion Hermitian forms and satisfy (24). Moreover, there is

$$\begin{aligned} {\varvec{x}}_0 \in {\mathbb {H}}^n~\text{ such } \text{ that }~G_i({\varvec{x}}_0) > 0,~ i = 1, 2, 3, 4. \end{aligned}$$
(26)

Then the S-procedure is lossless for \(m = 4\).

Proof

To prove the conclusion, it suffices to establish (25) for \(m=4\). Let \(F({\varvec{x}}) = {\varvec{x}}^HA_0{\varvec{x}}\) and \(G_i({\varvec{x}}) = {\varvec{x}}^HA_i{\varvec{x}}\) for \(i = 1, 2, 3, 4\). Consider the following set

$$\begin{aligned} {\mathbb {D}}=\left\{ \begin{pmatrix} {\varvec{x}}^HA_0{\varvec{x}}\\ {\varvec{x}}^HA_1{\varvec{x}}\\ \cdots \\ {\varvec{x}}^HA_4{\varvec{x}}\end{pmatrix} ~:~ {\varvec{x}}\in {\mathbb {H}}^n \right\} , \end{aligned}$$

which is a convex cone in \({\mathbb {R}}^5\) by Theorem 3. Moreover, since \(F({\varvec{x}})\) and \(G_i({\varvec{x}})\), \(i = 1, \cdots , 4\) satisfy (24), i.e.,

$$\begin{aligned} G_1({\varvec{x}}) \ge 0, \cdots , G_4({\varvec{x}}) \ge 0 \Rightarrow F({\varvec{x}}) \ge 0 , \end{aligned}$$

we have \({\mathbb {D}}\cap {\mathbb {E}}=\emptyset \), where \({\mathbb {E}} = \{(y_0,y_1,y_2,y_3,y_4) |~ y_0<0,~ y_i>0,~i=1,2,3,4\}\). Noting that \({\mathbb {E}} \) is also a convex cone, the standard separating hyperplane theorem [7] implies that there exists \({\varvec{t}}=(t_0, t_1, t_2, t_3, t_4) \not = 0\) such that

$$\begin{aligned} \sum _{k = 0}^4t_ky_k \le 0 ~\mathrm{for~ all}~ (y_0, y_1, y_2, y_3, y_4)\in {\mathbb {E}} \end{aligned}$$
(27)

and

$$\begin{aligned} \sum _{k = 0}^4t_k{\varvec{x}}^HA_k{\varvec{x}}\ge 0 ~\mathrm{for~ all}~ {\varvec{x}}\in {\mathbb {H}}^n. \end{aligned}$$
(28)

Then we must have \(t_0 \ge 0\) and \(t_k \le 0\) for \(k = 1, 2, 3, 4\) by inequality (27). Moreover, we can confirm that \(t_0 > 0\), otherwise since \({\varvec{t}}\ne 0 \) there exists some \(k \in \{1,2,3,4\}\) such that \(t_k < 0\) and thus for \({\varvec{x}}_0\) in (26) it holds that \(\sum _{k = 0}^4t_k{\varvec{x}}_0^HA_k{\varvec{x}}_0 = \sum _{k = 1}^4t_k{\varvec{x}}_0^HA_k{\varvec{x}}_0 < 0\), which contradicts with (28). Finally, the validness of (25) with \(m=4\) is implied by (28) by dividing both sides of (28) by \(t_0\) and letting \(\tau _k = \frac{-t_k}{t_0} \ge 0 \) for \(k=1,2,3,4\). \(\square \)

6 Quaternion quadratically constrained quadratic optimization

In this section, we consider the following quadratically constrained quadratic programming in the quaternion domain:

$$\begin{aligned} \begin{aligned} \mathrm{(QCQP)} \quad \max \quad&{\varvec{x}}^HQ{\varvec{x}}+ 2\mathrm{Re}({\varvec{x}}^Hq)\\ \mathrm{s.t.} \quad&{\varvec{x}}^HA_j{\varvec{x}}+ 2\mathrm{Re}({\varvec{x}}^Hb_j) + c_j \le 0, \ j = 1,\cdots ,m, \end{aligned} \end{aligned}$$

where \(Q, A_j \in {\mathcal {H}}^n\), \(b_j, q \in {\mathbb {H}}^n\), \(c_j \in {\mathbb {R}}\), \(j = 1,\cdots ,m\). Normally, solving the above problem is very challenging. However, when the number of constraints m is small, the problem is possibly tractable. For instance, in the complex domain, Huang and Zhang [20] showed that the problem (QCQP) with \(m=2\) could be cast as an SDP and thus could be solved in polynomial time. While regarding the quaternion domain, we shall show that a similar result holds for a larger value of m (in particular for \(m=4\)) by the afore-mentioned rank-one decomposition in Theorem 1.

To relate this problem to SDP, we rewrite (QCQP) as the following matrix form:

$$\begin{aligned} \begin{aligned} \quad \max \quad&B_0 \bullet \begin{bmatrix} 1 &{} {\varvec{x}}^H \\ {\varvec{x}}&{} {\varvec{x}}{\varvec{x}}^H \\ \end{bmatrix}\\ \mathrm{s.t.} \quad&B_j \bullet \begin{bmatrix} 1 &{} {\varvec{x}}^H \\ {\varvec{x}}&{} {\varvec{x}}{\varvec{x}}^H \\ \end{bmatrix} \le 0, ~j = 1,\cdots ,m, \end{aligned} \end{aligned}$$
(29)

where \(B_0 = \begin{bmatrix} 0 &{} q^H \\ q &{} Q \\ \end{bmatrix}, \ B_j = \begin{bmatrix} c_j &{} b_j^H \\ b_j &{} A_j \\ \end{bmatrix}, ~j = 1,\cdots ,m\). Problem (29) can be homogenized by introducing a new variable t and requiring \(|t|^2=1\):

$$\begin{aligned} \begin{aligned} \mathrm{(HQCQP)} \quad \max \quad&B_0 \bullet \begin{bmatrix} |t|^2 &{} t \cdot {\varvec{x}}^H \\ {\varvec{x}}\cdot {\bar{t}} &{} {\varvec{x}}{\varvec{x}}^H \\ \end{bmatrix}\\ \mathrm{s.t.} \quad&B_j \bullet \begin{bmatrix} |t|^2 &{} t \cdot {\varvec{x}}^H \\ {\varvec{x}}\cdot {\bar{t}} &{} {\varvec{x}}{\varvec{x}}^H \\ \end{bmatrix} \le 0, ~j = 1,\cdots ,m, \\&B_{m+1} \bullet \begin{bmatrix} |t|^2 &{} t \cdot {\varvec{x}}^H \\ {\varvec{x}}\cdot {\bar{t}} &{} {\varvec{x}}{\varvec{x}}^H \\ \end{bmatrix} = 1, \end{aligned} \end{aligned}$$

where \( B_{m+1} = \begin{bmatrix} 1 &{} 0 \\ 0 &{} 0 \\ \end{bmatrix}\). Then for any solution \(\begin{pmatrix} t \\ {\varvec{x}}\\ \end{pmatrix}\) of (HQCQP), since \(|t|^2=1\), \({\varvec{x}}\cdot {\bar{t}}\) is a solution to (29) and hence a solution to (QCQP). Therefore, we shall work on (HQCQP) in the rest of this section. By letting

$$\begin{aligned} X = \begin{bmatrix} |t|^2 &{} t \cdot {\varvec{x}}^H \\ {\varvec{x}}\cdot {\bar{t}} &{} {\varvec{x}}{\varvec{x}}^H \\ \end{bmatrix}~\text{ and } \text{ dropping } \text{ the } \text{ constraint: }~\mathrm{rank}\, (X) =1, \end{aligned}$$

we obtain an SDP relaxation of (HQCQP):

$$\begin{aligned} \begin{aligned} \mathrm{(QCQPR)} \quad \max \quad&B_0 \bullet X \\ \mathrm{s.t.} \quad&B_j \bullet X \le 0, ~j = 1,\cdots ,m, \\&B_{m+1} \bullet X = 1, \\&X \succeq 0, \end{aligned} \end{aligned}$$

whose dual is given by

$$\begin{aligned} \begin{aligned} \mathrm{(DQCQPR)} \quad \min \quad&y_0 \\ \mathrm{s.t.} \quad&Y = \sum _{j=1}^m y_jB_j - B_0 + y_0B_{m+1} \succeq 0, \\&y_j \ge 0,~j = 1,\cdots ,m,~y_0 \ \mathrm{free}. \end{aligned} \end{aligned}$$

To proceed, we assume (QCQP) satisfies the Slater condition, that is, there exists \({\varvec{x}}_0 \in {\mathbb {H}}^n\) such that

$$\begin{aligned} q_j({\varvec{x}}_0) := B_j \bullet \begin{bmatrix} 1 &{} {\varvec{x}}_0^H \\ {\varvec{x}}_0 &{} {\varvec{x}}_0{\varvec{x}}_0^H \\ \end{bmatrix} < 0, j = 1,\cdots ,m. \end{aligned}$$
(30)

Accordingly, (QCQPR) satisfies the Slater condition as well. Now we are ready to present the main theorem of this section, which states that (QCQP) is essentially equivalent to an SDP when \(m = 4\).

Theorem 5

Suppose (QCQP) satisfies the Slater condition (30) and \(m = 4\). Then (QCQP) and (QCQPR) have the same optimal value. Moreover, an optimal solution to (QCQP) can be constructed from that of (QCQPR).

Proof

Denote ‘\(\trianglelefteq \)’ to be either ‘<’ or ‘=’. For the primal optimal solution \(X^*\), the first group of constraints in (QCQPR) can be rewritten as

$$\begin{aligned} B_j \bullet X^* \trianglelefteq _j 0, ~j = 1,2,3,4. \end{aligned}$$

Letting \(r=\mathrm{rank}\, X^*\), by Theorem 1, there exist some non-zero quaternion vectors \({\tilde{{\varvec{x}}}}_k = \begin{pmatrix} t_k \\ {{\varvec{x}}}_k \\ \end{pmatrix} \in {\mathbb {H}}^{n+1}, ~k = 1,\cdots ,r\), such that

$$\begin{aligned} X^* = \sum _{k=1}^r {\tilde{{\varvec{x}}}}_k{\tilde{{\varvec{x}}}}_k^H \ \mathrm{~and~} \ B_j \bullet {\tilde{{\varvec{x}}}}_k{\tilde{{\varvec{x}}}}_k^H \trianglelefteq _j 0,~\text{ for }~ \ j = 1,2,3,4, \ k = 1,\cdots ,r. \end{aligned}$$
(31)

Since \(B_{m+1} \bullet X^* = 1\), we have \(\sum _{k=1}^{r}|t_k|^2 = X^*_{11} = 1\) and there exits \(\ell \in \left\{ 1, \cdots , r \right\} \) such that \(t_\ell \not = 0\). Therefore, according to (31), it holds that

$$\begin{aligned} q_j\left( {{\varvec{x}}}_\ell \cdot \frac{1}{t_\ell }\right)= & {} B_j \bullet \begin{bmatrix} 1 &{} \left( {{\varvec{x}}}_\ell \cdot \frac{1}{t_\ell }\right) ^H \\ \left( {{\varvec{x}}}_\ell \cdot \frac{1}{t_\ell }\right) &{} \left( {{\varvec{x}}}_\ell \cdot \frac{1}{t_\ell }\right) \left( {{\varvec{x}}}_\ell \cdot \frac{1}{t_\ell }\right) ^H \\ \end{bmatrix} \\= & {} \frac{1}{|t_\ell |^2} \cdot B_j \bullet \begin{bmatrix} |t_\ell |^2 &{} {t_\ell } \cdot {{\varvec{x}}}_\ell ^H \\ {{\varvec{x}}}_\ell \cdot \bar{t_\ell } &{} {{\varvec{x}}}_\ell {{\varvec{x}}}_{\ell }^H \\ \end{bmatrix}\\= & {} \frac{1}{|t_\ell |^2} B_j \bullet {\tilde{{\varvec{x}}}}_{\ell }{\tilde{{\varvec{x}}}}_{\ell }^H \trianglelefteq _j 0, j = 1,2,3,4, \end{aligned}$$

which further implies \({\varvec{x}}_{\ell } \cdot \frac{1}{t_\ell }\) is a feasible solution to (QCQP).

Suppose \((y_0^*, y_1^*, y_2^*, y_3^*, y_4^*, Y^*)\) is a dual optimal solution. Since the Slater condition holds, the strong duality is valid between (QCQPR) and (DQCQPR). Consequently, the complementary slackness condition holds, i.e.,

$$\begin{aligned} \sum _{i=1}^{r} Y^* \bullet {\tilde{{\varvec{x}}}}_i{\tilde{{\varvec{x}}}}_i^H = Y^* \bullet X^* = 0,~\text{ and }~y_j^*\cdot (B_j \bullet X^*) = 0 ~\mathrm{for}~ j = 1,2,3,4. \end{aligned}$$

Moreover, since \(X^*\succeq 0\) and \(Y^*\succeq 0\), we have \(Y^* \bullet {\tilde{{\varvec{x}}}}_{\ell }{\tilde{{\varvec{x}}}}_{\ell }^H = 0\). We also observe from (31) that

$$\begin{aligned} B_j \bullet {\tilde{{\varvec{x}}}}_{\ell }{\tilde{{\varvec{x}}}}_{\ell }^H \trianglelefteq _j 0 \Leftrightarrow B_j \bullet X^* \trianglelefteq _j 0 ,~\text{ for }~ \ j = 1,2,3,4, \end{aligned}$$

which indicates that

$$\begin{aligned} y_j^*\cdot \left( B_j \bullet {\tilde{{\varvec{x}}}}_{\ell }{\tilde{{\varvec{x}}}}_{\ell }^H\right) = y_j^*\cdot (B_j \bullet X^*) = 0 ~\mathrm{for}~ j = 1,2,3,4. \end{aligned}$$

Combining the above results with \(\begin{pmatrix} 1 \\ {{\varvec{x}}}_{\ell } \cdot \frac{1}{t_{\ell }}\\ \end{pmatrix} \begin{pmatrix} 1 \\ {{\varvec{x}}}_{\ell } \cdot \frac{1}{t_{\ell }}\\ \end{pmatrix}^H = \frac{1}{|t_{\ell }|^2}\cdot {\tilde{{\varvec{x}}}}_{\ell }{\tilde{{\varvec{x}}}}_{\ell }^H\) yields that \(\begin{pmatrix} 1 \\ {{\varvec{x}}}_{\ell } \cdot \frac{1}{t_{\ell }}\\ \end{pmatrix} \begin{pmatrix} 1 \\ {{\varvec{x}}}_{\ell } \cdot \frac{1}{t_{\ell }}\\ \end{pmatrix}^H\) is complementary to \((y_0^*, y_1^*, y_2^*, y_3^*, y_4^*, Y^*)\), and it is an optimal solution to (QCQPR). Therefore \({{\varvec{x}}}_{\ell } \cdot \frac{1}{t_{\ell }}\) is optimal for (QCQP) and the conclusion follows. \(\square \)