Abstract
The class of quaternion matrix optimization (QMO) problems, with quaternion matrices as decision variables, has been widely used in color image processing and other engineering areas in recent years. However, optimization theory for QMO is far from adequate. The main objective of this paper is to provide necessary theoretical foundations on optimality analysis, in order to enrich the contents of optimization theory and to pave way for the design of efficient numerical algorithms as well. We achieve this goal by conducting a thorough study on the first-order and second-order (sub)differentiation of real-valued functions in quaternion matrices, with a newly introduced operation called R-product as the key tool for our calculus. Combining with the classical optimization theory, we establish the first-order and the second-order optimality analysis for QMO. Particular treatments on convex functions, the \(\ell _0\)-norm and the rank function in quaternion matrices are tailored for a sparse low rank QMO model, arising from color image denoising, to establish its optimality conditions via stationarity.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Quaternion matrices have been widely used in color image processing, including color image denoising and inpainting, color face recognition, color image classification, etc., [4, 5, 9, 11, 17,18,19, 21, 23, 26]. Quaternions have also been widely used in other engineering areas [20]. The core mathematical models among most of these works can be formulated by optimization problems with quaternion matrices as decision variables, such as the low rank quaternion decomposition (LRQD) model proposed in [4], the robust quaternion matrix completion (RQMC) model proposed in [9], and the low rank quaternion approximation (LRQA) model proposed in [5]. All of these motivate us to focus on the quaternion matrix optimization (QMO) where quaternion matrices are acting as decision variables.
Comparing to the classical optimization defined in the real field, optimization theory for QMO is far from complete, even though all involved objective and constraint functions are real-valued. The main difficulty comes from the calculus for quaternions. In this paper, we intend to provide necessary theoretical foundations on optimality analysis for QMO, in order to enrich the contents of optimization theory for QMO, and to pave way for further developments of algorithm design for solving QMO.
Various real-valued functions of quaternion matrix variables, such as Frobenius norms, nuclear norms, spectral norms, \(\ell _1\)-norms, traces, \(\ell _0\)-norms and rank functions of quaternion matrices, arise from engineering applications.
To optimize such real-valued functions, derivatives, subdifferentials and generalized subdifferentials are needed to handle them.
For the color image denoising problem, optimization models may involve \(\ell _0\)-norms and rank functions of quaternion matrix variables. They are discrete. In sparse optimization [2, 12], generalized subdifferentials of \(\ell _0\)-norms and rank functions of real matrix variables are used. This uses the knowledge of variational analysis [6, 8, 10, 15, 16]. In this paper, we aim to develop adequate analysis tools for developing derivatives, subdifferentials and generalized subdifferentials of real-valued functions of quaternion matrix variables, such that we can establish optimality conditions of such optimization models and pave the way for future works on convergence analysis of algorithms for such optimization models.
In the literature, there are some works on derivatives of real-valued functions of quaternion variables [14, 20], and subgradients of norms of quaternion matrix variables [9]. There are no discussion on generalized subdifferentials of \(\ell _0\)-norms and rank functions of quaternion matrix variables.
In [4], Chen, Qi, Zhang and Xu formulated the color image inpainting problem as an equality constrained optimization problem of real-valued functions in quaternion matrix variables, and proposed a lower rank quaternion decomposition (LRQD) algorithm to solve it. To conduct convergence analysis for their algorithm, they introduced a concise form
for the gradient of a real-valued function f in a quaternion matrix variable \(X=X_0 + X_1\mathbf {i}+ X_2\mathbf {j}+ X_3 \mathbf {k}\), where \(\mathbf {i}, \mathbf {j}\) and \(\mathbf {k}\) are three imaginary units of quaternions. This form is different from the generalized HR calculus studied in [20]. The first-order optimality necessary condition of their quaternion least squares problem has a simple expression with this form. With this tool, convergence and convergence rate of their algorithm are established.
To explore the above approach further, we introduce R-product and R-linear independence of quaternion matrix vectors, and establish the first-order necessary optimality condition for the QMO problem. We also present a method to calculate the second-order partial derivatives of real-valued functions in quaternion matrix variables, and establish the second-order necessary optimality condition and second-order sufficient optimality condition for the QMO problem.
Norms of quaternion matrix variables may not be continuously differentiable, but they are always convex. In 2019, Jia, Ng and Song [9] introduced subgradients for norms of quaternion matrix variables.
In this paper, we show that our approach is consistent with the subgradient concept for norms of quaternion matrix variables, introduced in [9]. We also establish the relations between subgradients and derivatives of real-valued functions of quaternion matrix variables.
The \(\ell _0\)-norms and the ranks of quaternion matrices are not continuous, but are very useful in applications [5, 21]. In this paper, we develop the concepts of generalized subdifferentials of proper functions of quaternion matrices, and use them to analyze the optimality conditions of a sparse color image denoising (SCID) model. We show that the real representation set of low rank quaternion matrices is closed and semi-algebraic.
The generalized subdifferential calculus is totally new in the literature and useful in color image applications.
While some important real-valued functions, such as the squares of the Frobenius norms, of quaternion matrix variables are separable in the sense that they can be calculated with respect to each real matrix variable, then summed together, the rank function of quaternion matrix variables is not separable. We treat this in a nontrivial way by considering the real representation of a quaternion matrix, and show that the real representation set of low rank quaternion matrices is semi-algebraic. This will be useful for convergence analysis of some first-order algorithms.
In the next section, we present some necessary preliminary knowledge on quaternions and quaternion matrices. The QMO problem and some of its special cases are presented in Sect. 3. In Sect. 4, we introduce R-product and R-linear independence as our key tools for defining first- and second-order derivatives of real-valued functions of quaternion matrix variables, and then present first- and second-order optimality conditions for QMO. In Sect. 5, we study convex functions of quaternion matrix variables, their subdifferentials, and their relations with derivatives of real-valued functions of quaternion matrix variables. We introduce the generalized subdifferentials of proper functions of quaternion matrices, and use them to analyze the optimality conditions of the SCID problem in Sect. 6. Some final remarks are made in Sect. 7.
2 Preliminaries
2.1 Quaternions
In general, we use the notation in [22, 24]. We denote the real field, the complex field and the quaternion algebra by \({{\mathbb {R}}}\), \({\mathbb {C}}\) and \({\mathbb {Q}}\), respectively. Scalars, vectors and matrices are denoted by small letters, bold small letters and capital letters, respectively. Vectors with matrix components are denoted by bold capital letters. For example, we have \(\mathbf{X} = (W, Y, Z)\). They are called matrix component vectors. We use \(\mathbf {0}, O\) and \(\mathbf{O}\) to denote zero vector, zero matrix and zero matrix component vector with appropriate dimensions. The three imaginary units of quaternions are denoted by \(\mathbf {i}, \mathbf {j}\) and \(\mathbf {k}\), which satisfy
These rules, along with the distribution law, determine the product of two quaternions. The multiplication of quaternions is noncommutative.
Let \(x = x_0 + x_1\mathbf {i}+ x_2\mathbf {j}+ x_3\mathbf {k}\in {{\mathbb {Q}}}\), where \(x_0, x_1, x_2, x_3 \in {{\mathbb {R}}}\). The conjugate of x is
the modulus of x is
and if \(x \not = 0\), then \(x^{-1} = {x^* \over |x|^2}\).
2.2 Quaternion Matrices
The collections of real, complex and quaternion \(m \times n\) matrices are denoted by \({{\mathbb {R}}}^{m \times n}\), \({{\mathbb {C}}}^{m \times n}\) and \({{\mathbb {Q}}}^{m \times n}\), respectively.
A quaternion matrix \(A= (a_{ij}) \in {{\mathbb {Q}}}^{m \times n}\) can be denoted as
where \(A_0, A_1, A_2, A_3 \in {{\mathbb {R}}}^{m \times n}\). The transpose of A is \(A^\top = (a_{ji})\). The conjugate of A is \(\bar{A} = (a_{ij}^*)\). The conjugate transpose of A is \(A^* = (a_{ji}^*) = {\bar{A}}^T\). For \(A, B \in {{\mathbb {Q}}}^{m \times n}\), their inner product is defined as
where \(\mathrm{Tr}(A^*B)\) denotes the trace of \(A^*B\). The Frobenius norm of A is
the \(\ell _1\)-norm of \(A = (a_{ij}) \in {{\mathbb {Q}}}^{m \times n}\) is defined by \(\Vert A\Vert _1 = \sum _{i=1}^m \sum _{j=1}^n |a_{ij}|\), and the \(\ell _\infty \)-norm of A is defined by \(\Vert A\Vert _\infty = \max _{i, j} |a_{ij}|\) [9]. Analogous to real/complex vectors or matrices, we can also introduce the so-called \(\ell _0\)-norm for a quaternion matrix to measure the sparsity among quaternion entries. Specifically, the \(\ell _0\)-norm of \(A\in {{\mathbb {Q}}}^{m \times n}\), termed as \(\Vert A\Vert _0\), is defined as the number of nonzero entries in A. It is not a true norm as it does not satisfy the positive homogeneity as required for norms, and it is discrete. However, it plays an important role in sparse color image processing [21]. In some papers, it is called the counting function [10].
The following theorem for the Quaternion Singular Value Decomposition (QSVD) of a quaternion matrix was proved by Zhang [24].
Theorem 2.1
(Zhang 1997) Any quaternion matrix \(A \in {{\mathbb {Q}}}^{m \times n}\) has the following QSVD form
where \(U \in {{\mathbb {Q}}}^{m \times m}\) and \(V \in {{\mathbb {Q}}}^{n \times n}\) are unitary, and \(\Sigma _r\) = diag\(\{ \sigma _1, \cdots , \sigma _r\}\) is a real positive \(r \times r\) diagonal matrix, with \(\sigma _1 \ge \cdots \ge \sigma _r\) as the singular values of A.
Similar to the real/complex matrix case, the rank of a quaternion matrix A, denoted as rank(A), is defined by the number of singular values, the spectral norm and the nuclear norm of A are defined by \(\Vert A\Vert _S = \max \{ \sigma _1, \cdots , \sigma _r \}\) and \(\Vert A\Vert _* = \sum _{i=1}^r \sigma _i\), respectively (See, e.g., [9, Definitions 1 & 2]). Similar to the case of real matrices, it is revealed in [9, Proposition 2] that the convex envelope of the quaternion matrix rank function on the unit ball in the sense of the spectral norm is exactly the nuclear norm, which indicates that the nuclear norm can be regarded as the best convex and continuous surrogate for the rank function in quaternion matrices.
For a quaternion matrix \(A = A_0 + A_1\mathbf {i}+ A_2\mathbf {j}+ A_3\mathbf {k}\in {{\mathbb {Q}}}^{m \times n}\), its real representation [22] is a \(4m\times 4n\) real matrix of the form
A color image dataset can be represented as a pure quaternion matrix
where \(A_1, A_2\) and \(A_3\) are real \(m \times n\) matrices. We use \(O_{m \times n}\) to denote the zero matrix in \({{\mathbb {Q}}}^{m \times n}\). More knowledge of quaternion matrices can be found in [22, 24].
3 The QMO and Its Motivation
Consider a matrix component vector \(\mathbf{X} := (W, Y, Z)\), where \(W \in {{\mathbb {Q}}}^{m_1 \times n_1}\), \(Y \in {{\mathbb {Q}}}^{m_2 \times n_2}\) and \(Z \in {{\mathbb {Q}}}^{m_3 \times n_3}\). Here matrix component vectors may have more components, while three components will be enough to illustrate the problem. In this paper, we are interested in the QMO problem which takes the following form
where \(f, h_j, g_k :{{\mathbb {H}}} \rightarrow {{\mathbb {R}}}\) are real-valued functions, for \(j = 1, \cdots , p\) and \(k = 1, \cdots , q\), with \({{\mathbb {H}}} := {{\mathbb {Q}}}^{m_1 \times n_1} \times {{\mathbb {Q}}}^{m_2 \times n_2} \times {{\mathbb {Q}}}^{m_3 \times n_3}\).
For \(\mathbf{X} = (W, Y, Z)\), suppose that
where \(W_i \in {{\mathbb {R}}}^{m_1 \times n_1}\), \(Y_i \in {\mathbb R}^{m_2 \times n_2}\) and \(Z_i \in {{\mathbb {R}}}^{m_3 \times n_3}\), for \(i = 0, 1, 2, 3\). Denote
and call it the real form of \(\mathbf{X}\). Note that f can be regarded as a function in \(R(\mathbf{X})\). To avoid notational confusion, we introduce the real representation of f, termed as \(f^R\), with the relation that
Similarly, we introduce \({h_j^R}\) and \(g_k^R\) for the real representations of \(h_j\) and \(g_k\), respectively. Such real representations of real-valued quaternion matrix functions will greatly facilitate the sequent analysis in calculus. The first benefit contributes to the characterization of differentiability. Given \(f:{{\mathbb {H}}} \rightarrow {{\mathbb {R}}}\), we say that f is locally Lipschitz continuous (continuously differentiable, or twice continuously differentiable, respectively) in \(\mathbf{X}\) if \(f^R\) is locally Lipschitz continuous (continuously differentiable, or twice continuously differentiable, respectively) in \(R(\mathbf{X})\). Based on these three types of differentiation properties, we introduce the follow three assumptions for the QMO model (3).
Normal Assumption: f, \(h_j\)’s and \(g_k\)’s in (3) are all locally Lipschitz continuous in \(\mathbf{X}\).
Middle Assumption: f, \(h_j\)’s and \(g_k\)’s in (3) are all continuously differentiable in \(\mathbf{X}\).
Strong Assumption: f, \(h_j\)’s and \(g_k\)’s in (3) are all twice continuously differentiable in \(\mathbf{X}\).
The QMO model (3) provides a unified form for many quaternion matrix optimization problems arising from color image processing. Four selected ones are elaborated as below for illustration.
3.1 LRQD: Color Image Inpainting
The LRQD model, arising from the color image inpainting [4, 5], takes the form of
where \(W \in {{\mathbb {Q}}}^{m \times n}\), \(Y \in {{\mathbb {Q}}}^{m \times r}\) and \(Z \in {{\mathbb {Q}}}^{r \times n}\) are decision variables, \(D\in {{\mathbb {Q}}}^{m \times n}\) is a given quaternion dataset matrix, \(\Omega \) denotes the index set of the observed part of D, the equality \(W_\Omega = D_\Omega \) means that the corresponding entries of W and D are equal at the index set \(\Omega \). The inherent low-rankness of the quaternion matrix for the given color image is exploited in terms of the decomposition YZ with a relatively small integer \(r<\min \{m,n\}\).
Apparently, if we take \(\mathbf{X} =(W,Y,Z)\), \(f(\mathbf{X}) = \left\| YZ-W\right\| _F^2\), \(h_j(\mathbf{X}) = w_{il}-d_{il}\), with (i, l) the jth index in \(\Omega \), for \(j= 1,..., |\Omega |\) (the cardinality of \(\Omega \)), then (3) is reduced to (5). Thus, (5) is a special case of the QMO problem which satisfies the strong assumption.
3.2 RQMC: Color Image Inpainting
In [9], Jia, Ng and Song proposed the following RQMC model for color image inpainting:
where \(Y, Z, D \in {{\mathbb {Q}}}^{m \times n}\), D is a quaternion dataset matrix, \(\Omega \) denotes the index set of the observed part of D, \(\lambda \) is a positive parameter.
Clearly, the RQMC problem (6) is a special case of the QMO problem (3). The objective function here is not differentiable, but locally Lipschitz continuous, and hence it satisfies the normal assumption. It is worth mentioning that the objective function is convex, with the first term the convex surrogate for the rank function, and the second term the convex relaxation for the \(\ell _0\)-norm.
3.3 LRQA: Color Image Denoising and Inpainting
In [5], Chen, Xiao and Zhou proposed the following LRQA model for color image denoising and inpainting:
where \(W \in {{\mathbb {Q}}}^{m \times n}\) is the decision variable, \(D\in {{\mathbb {Q}}}^{m \times n}\) is a given quaternion dataset matrix, W is a surrogate quaternion matrix, \(\sigma _i(W)\) denotes the ith largest singular value of W, \(\phi \) is a nonconvex surrogate for the rank function, such as Schatten-\(\gamma \) function, logarithm-determinant function and Laplace function, \(\gamma \) is a nonnegative parameter related to the specific \(\phi \). As one can see, the LRQA problem (7) is also a special case of the QMO problem (3).
3.4 SCID: Color Image Denoising and Inpainting
Inspired by the promising performance of the greedy-type methods tackling the original \(\ell _0\)-norm and rank function in vector/matrix cases [13, 25], we formulate the color image denoising and inpainting by the following sparse color image denoising (SCID) model
where \(D \in {{\mathbb {Q}}}^{m \times n}\) is an observed quaternion matrix of the color image, \(Y \in {{\mathbb {Q}}}^{m \times n}\) is a low rank quaternion matrix to approximate D, r is a prescribed integer for the upper bound of the rank of Y, \(Z \in {\mathbb Q}^{m \times n}\) is the color image noise to be detected, \(\lambda > 0\) is a prescribed parameter, \(L : {{\mathbb {Q}}}^{m \times n} \rightarrow {{\mathbb {Q}}}^{m \times n}\) is a linear operator, for example, a projection operator to indicate the observed area in the color image inpainting problem, \(\mathbf{X} = (Y, Z)\) is the quaternion matrix vector variable, \(f : {{\mathbb {M}}} := {{\mathbb {Q}}}^{m \times n} \times {{\mathbb {Q}}}^{m \times n} \rightarrow {{\mathbb {R}}}\) is the objective function. The SCID problem is also a special case of the QMO problem (3). Since the involved \(\ell _0\)-norm in the objective function, and the rank function in the constraint are both discontinuous, the SCID problem does not satisfy even the normal assumption. Thus, a special treatment for the SCID will be provided in Sect. 6 by exploiting properties on the \(\ell _0\)-norm and the rank function of quaternion matrices.
Besides the aforementioned instances of QMO, more specific models in the class of QMO can be found in the color image recognition problem [26] and the color image classification problem [23], etc.
4 R-Product and R-Linear Independence
In this section, we introduce R-product and R-linear independence, use them to calculate the first-order derivatives and the second-order partial derivatives of real-valued functions in quaternion matrix variables, and establish first- and second-order optimality conditions of the QMO problem.
Suppose that A and E are two real matrices with the same dimension. For example, assume that \(A = (a_{ij})\) and \(E = (e_{ij})\) are \(m \times n\) real matrices. Define the R-product (real product) of A and E as
Consider the matrix component vector space \({{\mathbb {H}}}\).
For \(\mathbf{A} := (B, C, D) \in {{\mathbb {H}}}\) and \(\mathbf{H} := (E, F, G) \in {{\mathbb {H}}}\), we may define the inner product of \(\mathbf{A}\) and \(\mathbf{H}\) as
Then define the R-product (real product) of \(\mathbf{A}\) and \(\mathbf{H}\) as the real part of \(\langle \mathbf{A}, \mathbf{H} \rangle \):
We have the following proposition.
Proposition 4.1
Suppose that \(\mathbf{A} = (B, C, D) \in {{\mathbb {H}}}\) and \(\mathbf{H} = (E, F, G) \in {{\mathbb {H}}}\). Assume that \(B = B_0 + B_1\mathbf {i}+ B_2\mathbf {j}+ B_3\mathbf {k}, \cdots , G = G_0 + G_1\mathbf {i}+ G_2\mathbf {j}+ G_3\mathbf {k}\), where \(B_i, C_i, D_i, E_i, F_i\) and \(G_i\) are real matrices with corresponding dimensions. Then the R-product of \(\mathbf{A}\) and \(\mathbf{H}\) has the following form:
Proof
Let
Then
We see that the real part of \(\left( b_{pq0} - b_{pq1}\mathbf {i}- b_{pq2}\mathbf {j}- b_{pq3}\mathbf {k}\right) \left( e_{pq0} + e_{pq1}\mathbf {i}+ en_{pq2}\mathbf {j} + e_{pq3}\mathbf {k}\right) \),
This implies that the real part of \(\langle B, E \rangle \) is \(\sum _{i=0}^3 B_i \cdot E_i\). Hence, the conclusion holds.
\(\square \)
This proposition reveals the meaning of the R-product. In Sect. 5, it connects our derivative concept with the subgradient concept of Jia, Ng and Song [9].
Suppose that \(\mathbf{A^{(j)}} := (A^{(j)}, B^{(j)}, C^{(j)}) \in {{\mathbb {H}}}\) for \(j = 1, \cdots , p\). We say that \(\left\{ \mathbf{A^{(j)}} : j = 1, \cdots , p \right\} \) is R-linearly independent if there do not exist real numbers \(\alpha _j\) for \(j = 1, \cdots , p\) such that some of them are nonzero and
4.1 First-Order Derivatives
Suppose that \(f(\mathbf{X}) : {{\mathbb {H}}} \rightarrow {{\mathbb {R}}}\) satisfies the normal assumption, with \(\mathbf{X} = (W, Y, Z)\), and f is differentiable at \(\mathbf{X}\) with respect to \(W_i, Y_i\) and \(Z_i\) for \(i = 0, 1, 2, 3\). Then we define the partial derivatives and gradient of f at \(\mathbf{X}\) as:
We see that \(\nabla f(\mathbf{X}) \in {{\mathbb {H}}}\).
Define the directional derivative of f at \(\mathbf{X} = (W, Y, Z) \in {{\mathbb {H}}}\) in the direction \(\varvec{\Delta X} = (\Delta W, \Delta Y, \Delta Z) \in {{\mathbb {H}}}\) as
Note that while the gradient of f is in \({{\mathbb {H}}}\), the directional derivative of f is real. They are connected via the R-product operation in \({{\mathbb {H}}}\).
Proposition 4.2
Suppose that \(\mathbf{X} = (W, Y, Z) \in {{\mathbb {H}}}\), \(\varvec{\Delta X} = (\Delta W, \Delta Y, \Delta Z) \in {{\mathbb {H}}}\), and f satisfies the middle assumption. Then
Furthermore, we have
Proof
We may regard \(f(\mathbf{X})\) as a function of \(W_i, Y_i\) and \(Z_i\), \(i = 0, 1, 2, 3\). Then
Furthermore,
\(\square \)
Now, we may study the first-order optimality conditions of (3).
Theorem 4.3
Suppose that the functions \(f, h_j\) and \(g_k\) for \(j = 1, \cdots , p\) and \(k = 1, \cdots , q\) satisfy the middle assumption. Assume that \(\mathbf{X}^{\#} = \left( W^{\#}, Y^{\#}, Z^{\#}\right) \in {{\mathbb {H}}}\) is an optimal solution of (3). If
is R-linearly independent, then there are Lagrangian multipliers \(\lambda _j, \mu _k \in {{\mathbb {R}}}\) for \(j = 1, \cdots , p\) and \(k = 1, \cdots , q\), such that
Proof
Again, we may regard \(f, h_j, g_k\) for \(j = 1, \cdots , p\), \(k = 1, \cdots , q\) as functions of \(W_i, Y_i\) and \(Z_i\), \(i = 0, 1, 2, 3\). Then (10) is equivalent to linear independence constraint qualification of such an optimization problem of \(W_i, Y_i\) and \(Z_i\), \(i = 0, 1, 2, 3\). Then from the first-order optimality condition for real constrained optimization, we have (11-13). \(\square \)
Note that the Lagrangian multipliers are real numbers. We call \(\mathbf{X}^{\#} \in {{\mathbb {H}}}\), which satisfies (11), (12) and (13) with some Lagrangian multipliers, a stationary point of (3). We may reduce the R-linear independence condition in Theorem 4.3 to other constraint qualifications for nonlinear programs.
We now consider the LRQD problem (5). Let \(f(\mathbf{X}) = {1 \over 2}\Vert YZ-W\Vert _F^2\). Then by [4], we have
and the following theorem.
Theorem 4.4
Assume that \(\mathbf{X}^{\#} = \left( W^{\#}, Y^{\#}, Z^{\#}\right) \in {{\mathbb {H}}}\) is an optimal solution of (5). Then \(\mathbf{X}^{\#}\) is a stationary point of (5), i.e.,
where \(\Omega ^C\) is the complementary set of \(\Omega \).
We now study the product rule and the chain rule of first-order derivatives.
Theorem 4.5
Suppose that \(f(\mathbf{X}), g(\mathbf{X}): {{\mathbb {H}}} \rightarrow {{\mathbb {R}}}\) satisfies the middle assumption, with \(\mathbf{X} = (W, Y, Z)\). Then
Proof
We have
Similarly, we have
and
Then
\(\square \)
Similarly, we may prove the following theorem.
Theorem 4.6
Suppose that \(f(\mathbf{X}): {{\mathbb {H}}} \rightarrow {{\mathbb {R}}}\) satisfies the middle assumption, with \(\mathbf{X} = (W, Y, Z)\), and \(\phi : {\mathbb R} \rightarrow {{\mathbb {R}}}\) is continuously differentiable. Then
In [4], the Kurdyka- Lojasiewicz inequality was established for real-valued functions of quaternion matrix variables.
4.2 Second-Order Derivatives
Suppose that \(f(\mathbf{X}) = f(W, Y, Z) : {{\mathbb {H}}} \rightarrow {{\mathbb {R}}}\) is twice continuously differentiable in the sense of the strong assumption. Then the second-order derivative of f exists. We first consider the second-order partial derivatives of f. To make an example, we consider \({\partial ^2 \over \partial Y \partial W}f(\mathbf{X})\). It is more convenient to consider \({\partial ^2 \over \partial Y \partial W}f(\mathbf{X})\Delta Y\) and \({\partial ^2 \over \partial Y \partial W}f(\mathbf{X})\Delta Y \cdot \Delta W\).
Let \({\partial \over \partial W}f(\mathbf{X})\) be as considered in the last section. Let \(\Delta Y \in {{\mathbb {Q}}}^{m_2 \times n_2}\). Suppose that
where \(\phi \) is R-linear in \(\Delta Y\) in the sense that for any \(\alpha , \beta \in {{\mathbb {R}}}\) and \(\Delta Y^{(1)}, \Delta Y^{(2)} \in {{\mathbb {Q}}}^{m_2 \times n_2}\),
and
i.e.,
Then we have
Later, we will see that we may use this approach to calculate \({\partial ^2 \over \partial Y \partial W}f(\mathbf{X})\Delta Y\) and \({\partial ^2 \over \partial Y \partial W}f(\mathbf{X})\Delta Y \cdot \Delta W\), for f defined by the LRQD problem (5).
We may express the other second-order partial derivatives of f similarly.
Proposition 4.7
Under the strong assumption, we have
and
Proof
Let \(\Delta W = \Delta W_0 + \Delta W_1\mathbf {i}+ \Delta W_2\mathbf {j}+ \Delta W_3\mathbf {k}\in {{\mathbb {Q}}}^{m_1 \times n_1}\) and \(\Delta Y = \Delta Y_0 + \Delta Y_1\mathbf {i}+ \Delta Y_2\mathbf {j}+ \Delta Y_3\mathbf {k}\in {{\mathbb {Q}}}^{m_2 \times n_2}\). Then \({\partial ^2 \over \partial Y \partial W}f(\mathbf{X})\Delta Y \cdot \Delta W\) and \({\partial ^2 \over \partial W \partial Y}f(\mathbf{X})\Delta W \cdot \Delta Y\) are real. We have
The other two equalities can be proved similarly. \(\square \)
Then we may define \(\nabla ^2 f(\mathbf{X})\) by
Here, \(\varvec{\Delta X} = (\Delta W, \Delta Y, \Delta Z) \in {\mathbb H}\).
If \({1 \over 2}\nabla ^2 f(\mathbf{X}) \varvec{\Delta X} \cdot \varvec{\Delta X} \ge 0\) for any \(\varvec{\Delta X} \in {{\mathbb {H}}}\), then we say that \(\nabla ^2 f\) is positive semi-definite at \(\mathbf{X}\). If \({1 \over 2}\nabla ^2 f(\mathbf{X}) \varvec{\Delta X} \cdot \varvec{\Delta X} > 0\) for any \(\varvec{\Delta X} \in {{\mathbb {H}}}\) and \(\varvec{\Delta X} \not = \mathbf{O}\), then we say that \(\nabla ^2 f\) is positive definite at \(\mathbf{X}\).
Proposition 4.8
Under the strong assumption, we have
Proof
Again, we may regard \(f(\mathbf{X})\) as a function of \(W_i, Y_i\) and \(Z_i\), \(i = 0, 1, 2, 3\). Then
We have (19). \(\square \)
We now consider the function f in the LRQD problem (5). Then, we have the following result.
Theorem 4.9
Suppose that \(f(\mathbf{X}) = \Vert YZ-W\Vert _F^2\), where \(W \in {\mathbb Q}^{m \times n}\), \(Y \in {{\mathbb {Q}}}^{m \times r}\) and \(Z \in {{\mathbb {Q}}}^{r \times n}\). Then,
Proof
By (14), we have
Thus,
This implies (20). By above, we also have
This implies (21). The formulas (22-28) can be proved similarly. \(\square \)
We study the second-order optimality conditions of (3) now.
Suppose that \(\mathbf{X}^{\#} \in {{\mathbb {H}}}\) is a stationary point of (3), i.e., there exist some Lagrangian multipliers \(\lambda _j, \mu _k \in {{\mathbb {R}}}\) for \(j = 1, \cdots , p\) and \(k = 1, \cdots , q\), such that (11), (12) and (13) are satisfied. Let
We say that \(\varvec{\Delta X} \in {{\mathbb {H}}}\) is in the critical cone \(C(\mathbf{X}^{\#})\) of (3) at \(\mathbf{X}^{\#}\), if
By regarding \(f, h_j, g_k\) for \(j = 1, \cdots , p\), \(k = 1, \cdots , q\) as functions of \(W_i, Y_i\) and \(Z_i\), \(i = 0, 1, 2, 3\), from the second-order optimality necessary condition and sufficient condition for real constrained optimization, we have the following two theorems.
Theorem 4.10
Suppose that the functions \(f, h_j\) and \(g_k\) for \(j = 1, \cdots , p\) and \(k = 1, \cdots , q\) satisfy the strong assumption. Assume that \(\mathbf{X}^{\#} = \left( W^{\#}, Y^{\#}, Z^{\#}\right) \in {{\mathbb {H}}}\) is an optimal solution of (3), and (10) is R-linearly independent. Then for any \(\varvec{\Delta X} \in C(\mathbf{X}^{\#})\),
Theorem 4.11
Suppose that the functions \(f, h_j\) and \(g_k\) for \(j = 1, \cdots , p\) and \(k = 1, \cdots , q\) satisfy the strong assumption. Assume that \(\mathbf{X}^{\#} = \left( W^{\#}, Y^{\#}, Z^{\#}\right) \in {{\mathbb {H}}}\) is a stationary point of (3), and for any \(\varvec{\Delta X} \in C(\mathbf{X}^{\#})\),
Then \(\mathbf{X}^{\#}\) is an optimal minimizer of (3).
5 Convex Functions of Quaternion Matrix Variables
In this section, we study convex functions of quaternion matrix variables.
Suppose that \(f(\mathbf{X}) : {{\mathbb {H}}} \rightarrow {{\mathbb {R}}}\). A natural definition for f to be convex is as follows. We say that f is a convex function if for any \(\mathbf{X, {\hat{\mathbf{X}}}} \in {{\mathbb {H}}}\), and any \(t \in {{\mathbb {R}}}\), \(0 \le t \le 1\), we have
It is possible to use the modern convex analysis terminology, epigraphs, to define convex functions of quaternion matrix variables. Here, we use the classical definition to make the definition, such that it is more convenient for engineering readers.
Then, a question is: Is f a convex function if and only if \(f^R\) is convex?
Proposition 5.1
Suppose that \(f(\mathbf{X}) : {{\mathbb {H}}} \rightarrow {{\mathbb {R}}}\). Then f is a convex function if and only if \(f^R\) is convex.
Proof
Let \(\mathbf{X} = \left( W, Y, Z\right) , {{\hat{\mathbf{X}}}} = \left( {\hat{W}}, {\hat{Y}}, {\hat{Z}}\right) \in {{\mathbb {H}}}\), \(t \in {{\mathbb {R}}}\) and \(0 \le t \le 1\). Then
From here, we may conclude that f is a convex function if and only if \(f^R\) is convex. \(\square \)
We now define subgradients and subdifferentials by R-product. Suppose that \(f(\mathbf{X}) : {{\mathbb {H}}} \rightarrow {{\mathbb {R}}}\), and \({{\bar{\mathbf{X}}}} = \left( {\bar{W}}, {\bar{Y}}, {\bar{Z}}\right) \in {{\mathbb {H}}}\). Let \(\mathbf{G} = \left( A, B, C\right) \in {{\mathbb {H}}}\). We say that \(\mathbf{G}\) is a subgradient of f at \({{\bar{\mathbf{X}}}}\) if for any \(\mathbf{X} = \left( W, Y, Z\right) \in {{\mathbb {H}}}\), we have
The set of all subgradients of f at \({{\bar{\mathbf{X}}}}\) is called the subdifferential of f at \({{\bar{\mathbf{X}}}}\) and denoted as \(\partial f({{\bar{\mathbf{X}}}})\).
By the definition of R-product, we see that \(\mathbf{G}\) is a subgradient of f at \({{\bar{\mathbf{X}}}}\) if and only if \(R(\mathbf{G})\) is a subgradient of \(f^R\) at \(R({{\bar{\mathbf{X}}}})\).
This definition is slightly different from the definition of Jia, Ng and Song [9] in face. They used the real part of the inner product of \(\mathbf{G}\) and \(\mathbf{X}-{{\bar{\mathbf{X}}}}\), instead of their R-product here. By Proposition 4.1, these two definitions are the same. This also reveals that the subgradient concept introduced in [9] can be regarded as subgradients of the norms of real matrix variables.
From Proposition 5.1, the definition of R-product and the knowledge of convex functions of real variables, we have the following proposition.
Proposition 5.2
Suppose that \(f(\mathbf{X}) : {{\mathbb {H}}} \rightarrow {{\mathbb {R}}}\) is a convex function. Then for any \({{\bar{\mathbf{X}}}} = \left( {\bar{W}}, {\bar{Y}}, \bar{Z}\right) \in {{\mathbb {H}}}\), the subdifferential \(\partial f({{\bar{\mathbf{X}}}})\) is a nonempty, convex and compact set in \({{\mathbb {H}}}\). The subdifferential \(\partial f({{\bar{\mathbf{X}}}})\) is a singleton if and only if f is differentiable at \({\bar{\mathbf{X}}}\). In this case, \(\nabla f({{\bar{\mathbf{X}}}})\) is the unique subgradient of f at \({\bar{\mathbf{X}}}\).
By Proposition 5.1 and the knowledge of convex functions of real variables, we have the following proposition.
Proposition 5.3
Suppose that \(f(\mathbf{X}) : {{\mathbb {H}}} \rightarrow {{\mathbb {R}}}\) satisfies the strong assumption. Then f is convex if and only if \(\nabla ^2 f\) is positive semi-definite at any \(\mathbf{X} \in {{\mathbb {H}}}\).
In the RQMC problem (6), the objective function is convex, the constraints are linear. Thus, it is a convex QMC problem. Optimality analysis on it was made in [9], and numerical examples were reported there.
6 A Sparse Color Image Denoising Problem
The task of this section is to analyze the optimality conditions of (8) to pave the way for further study on similar color image problems. As the \(\ell _0\)-norm and the rank function are not continuous, we have to develop general subdifferential calculus for proper functions of quaternion matrix variables first.
6.1 Generalized Subdifferentials of Lower Semi-Continuous Functions of Quaternion Matrix Variables
The generalized subdifferential calculus for real-valued functions of real variables can be found in standard references [6, 15, 16]. In this subsection, we extend it to proper functions of quaternion matrix variables.
Denote \({\bar{{\mathbb {R}}}} := {{\mathbb {R}}} \cup \{ +\infty \}\). For a function \(h(\mathbf{X}) : {{\mathbb {M}}} \rightarrow {\bar{{\mathbb {R}}}}\), denote its domain as dom\((h) = \left\{ \mathbf{X} \in {{\mathbb {M}}} : h(\mathbf{X}) < +\infty \right\} \). We say that h is a proper function if its domain is not empty.
Suppose that \(h(\mathbf{X}) : {{\mathbb {M}}} \rightarrow {\bar{{\mathbb {R}}}}\) is a proper function, \({{\bar{\mathbf{X}}}} = \left( {\bar{Y}}, {\bar{Z}}\right) \in \) dom(h). Let \(\mathbf{G} \in {{\mathbb {M}}}\). We say that \(\mathbf{G}\) is a F(réchet)-subgradient of h at \({{\bar{\mathbf{X}}}}\) if
The set of all F-subgradients of h at \({{\bar{\mathbf{X}}}}\) is called the Fréchet subdifferential of h at \({{\bar{\mathbf{X}}}}\) and denoted as \(\partial ^F h({{\bar{\mathbf{X}}}})\). On the other hand, \(\mathbf{G} \in {{\mathbb {M}}}\) is a limiting subgradient of h at \({{\bar{\mathbf{X}}}}\) if
The set of all limiting subgradients of h at \({{\bar{\mathbf{X}}}}\) is called the limiting subdifferential of h at \({{\bar{\mathbf{X}}}}\) and denoted as \(\partial ^L h({{\bar{\mathbf{X}}}})\).
By the definition of R-product, we see that \(\mathbf{G}\) is a F-subgradient (limiting subgradient) of h at \({{\bar{\mathbf{X}}}}\) if and only if \(R(\mathbf{G})\) is a F-subgradient (limiting subgradient) of \(h^R\) at \(R({{\bar{\mathbf{X}}}})\). Here, h is regarded as a function of \(Y_i\) and \(Z_i\) for \(i = 0, 1, 2, 3\), and denote such a function as \(h^R\), and for \(\mathbf{X} = (Y, Z)\), denote \(R(Y) = (Y_0, Y_1, Y_2, Y_3)\), \(R(Z)= (Z_0, Z_1, Z_2, Z_3)\) and \(R(\mathbf{X}) = (R(Y), R(Z))\). By (3) of [10], we have
By Theorem 1 of [10], we have the following theorem.
Theorem 6.1
Let \(A = (a_{ij}) \in {{\mathbb {Q}}}^{m \times n}\). Then
where \({{\mathbb {Q}}}^{m\times n}_{\Gamma ^C_A} = \left\{ B \in {\mathbb Q}^{m \times n} : B_{\Gamma _A} = O \right\} \), \(\Gamma _A = \left\{ (i, j) : \ a_{ij} \not = 0 \right\} \) is the support of A, and \(\Gamma ^C_A\) is the complementary set of \(\Gamma _A\).
By using Theorem 4 of [10], we may also characterize the generalized subdifferential of the rank function of A.
6.2 The Feasible Set of The SCID Problem
In this subsection, we study the feasible set of the SCID problem (8):
Let
In this paper, we say that R(S) is the real representation set of S. Note that it is different from \(\left\{ Y^R : Y \in S \right\} \).
Recall that a set in a real finite-dimensional space is called a semi-algebraic set if it is defined by a set of polynomial equations and inequalities, and a real-valued function in that space is called semi-algebraic if its graph is semi-algebraic [1, 7].
Proposition 6.2
The set S is closed, and its real representation set R(S) is closed and semi-algebraic.
Proof
Suppose that \(Y^k \in S\) and \(Y^k \rightarrow Y\). Denote \(r_k:= {\text {rank}}(Y^k)\). Thus \(r_k\le r\). By Theorem 2.1, there exist unitary matrices \(U^k \in {{\mathbb {Q}}}^{m \times m}\) and \(V^k \in {{\mathbb {Q}}}^{n \times n}\), and real positive diagonal matrices \(\Sigma _{r_k}^k = {\text {diag}}\left( \sigma _{k,1}, \ldots , \sigma _{k,r_k}\right) \) such that
For each k, introduce a diagonal matrix \(\Sigma _r^k ={\text {diag}}\left( \sigma _{k,1}, \ldots , \sigma _{k,r_k}, 0,\ldots , 0\right) \in {{\mathbb {R}}}^{r\times r}\). We can rewrite \(Y^k\) as
This is also equivalent to
Since \(U^k\) and \(V^k\) are unitary, \(\{ U^k \}\) and \(\{ V^k \}\) are bounded. Thus, \(\{ U^k \}\) and \(\{ V^k \}\) have limiting points. Without loss of generality, we may assume that \(U^k \rightarrow U\) and \(V^k \rightarrow V\). Then U and V are unitary. Taking the limit on both sides of (32) as \(k\rightarrow +\infty \), we can find some nonnegative diagonal matrix \(\Sigma _r \in {{\mathbb {R}}}^{r\times r}\) such that
That is, \(Y = U\left( {\begin{array}{cc} \Sigma _r \ O \\ O \ \ O\end{array}}\right) V^*\), which indicates that rank\((Y) \le r\). Thus, \(Y\in S\), and the desired closedness of S is obtained.
Since S is closed, the set R(S) is also closed. By Theorem 1.8.4 of [22], if the rank of \(A \in {{\mathbb {Q}}}^{m \times n}\) is r, then the rank of its real representation \(A^R\) is 4r. Note that
Since R(S) is characterized by vanishing of all \((4r+1) \times (4r+1)\) minors of \(Y^R\), we conclude that the set R(S) is semi-algebraic. \(\square \)
For a nonempty closed set \(\Omega \), the indicator function with respect to \(\Omega \), denoted as \(\delta _\Omega \) is defined by
Let \(A \in {{\mathbb {Q}}}^{m \times n}\) and \(B \rightarrow A\). Then there is \(\delta > 0\) such that for all \(B \in {{\mathbb {Q}}}^{m \times n}\) satisfying \(\Vert B-A \Vert _F \le \delta \), we have \(\Vert B \Vert _0 \ge \Vert A \Vert _{0}\). This shows that the \(\ell _0\)-norm function of quaternion matrices is lower semi-continuous.
Proposition 6.3
Suppose that \(p : {{\mathbb {M}}} \rightarrow {\bar{{\mathbb {R}}}}\) is defined by \(p(\mathbf{X}) = \lambda \Vert Z\Vert _0 + \delta _S(Y)\), where \(\mathbf{X} = (Y, Z)\). Let \(p^R(R(Y), R(Z)) := p(\mathbf{X})\). Then,
(i) p is a proper lower semi-continuous function, and \(p^R\) is a semi-algebraic function;
(ii) the limiting subdifferential of h takes the form of
where \(N_S(Y)\) is the normal cone with respect to S at Y.
Proof
(i) For any \(Y \in S\) and \(Z \in {{\mathbb {Q}}}^{m \times n}\), \(p(\mathbf{X})\) is nonnegative and finite valued. Thus, p is proper. Since the \(\ell _0\)-norm function of quaternion matrices is lower semi-continuous, and S is closed by Proposition 6.2, p is lower semi-continuous. By Proposition 6.2, R(S) is semi-algebraic. The \(\ell _0\)-norm function has a piecewise linear graph, see [1]. Thus, \(p^R\) is a semi-algebraic function.
(ii) Note that p is separable in Y and Z. Thus,
where the second equality comes from Theorem 6.1. \(\square \)
The semi-algebraic property of \(p^R\) will not be used in this paper, yet it is very important in convergence analysis of first-order algorithms for solving (8) [1].
6.3 Stationarity
The SCID problem (8) is a rank-constrained sparse optimization problem of quaternion matrix variables. Inspired by the sparse optimization [2] and rank-constrained optimization [12], we may introduce stationary point concepts for (8).
We first introduce proximal mapping for a lower semi-continuous function \(h : {{\mathbb {M}}} \rightarrow {{\mathbb {R}}} \cup \{ +\infty \}\). We denote it as \(\hbox {Prox}_h\). It is defined as
Then we define \(\beta -\)stationary points and stationary points for (8). Let
Let \(\beta \) be a positive number, \({{\hat{\mathbf{X}}}} = ({\hat{Y}}, {\hat{Z}}) \in {{\mathbb {M}}}\). We say that \({{\hat{\mathbf{X}}}}\) a \(\beta -\)stationary point of (8) if
we say that \({{\hat{\mathbf{X}}}}\) a stationary point of (8) if
Here, \(\Pi _S\) is the projection operator with respect to S.
By (16), we have
where \(L^*\) stands for the adjoint of L in the sense that
Proposition 6.4
There is a gradient Lipschitz constant \(L_h = \sqrt{2\Vert L^*L\Vert } > 0\) (here \(\Vert L^*L\Vert \) is the spectral norm of the linear operator \(L^*L\)) such that for any \(\mathbf{X} = (Y, Z), {{\hat{\mathbf{X}}}} = ({\hat{Y}}, {\hat{Z}}) \in {{\mathbb {M}}}\),
i.e.,
Proof
By (35) and the equivalence relation between \((h, \mathbf{X})\) and \((h^R, R(\mathbf{X}))\), we have the conclusions. \(\square \)
Theorem 6.5
For the SCID problem (8), we have the following conclusions:
(i) Any local minimizer is a stationary point;
(ii) Any \(\beta -\)stationary point for \(\beta > 0\) is a stationary point;
(iii) Any global minimizer \({\mathbf{X}^\#} = (Y^\#, Z^\#)\) is a \(\beta -\)stationary point for \(\beta \in (0, {1 \over L_h})\); furthermore, \(\Pi _S(Y^\#- \beta \nabla _Y f({\mathbf{X}^\#}))\times \) \(\hbox {Prox}_{\beta \lambda \Vert \cdot \Vert _0}(Z^\#- \beta \nabla _Zf({\mathbf{X}^\#}))\) is a singleton;
(iv) If \({\mathbf{X}^\#} = (Y^\#, Z^\#)\) is a \(\beta -\)stationary point and rank\((Y^\#)<r\), then \({\mathbf{X}^\#}\) is a local minimizer.
Proof
(i) Let \({{\hat{\mathbf{X}}}} = ({\hat{Y}}, {\hat{Z}})\) be a local minimizer of (8). Rewrite problem (8) as
where h is defined by (33), p is defined in Proposition 6.3. Note that the quaternion matrix optimization problem (36) is equivalent to the real matrix optimization problem
Apply the generalized Fermat rule [16, Theorem 10.1] to the real matrix optimization problem (37). Since \(\mathbf{G}\) is a F-subgradient (limiting subgradient) of h or p at \({{\hat{\mathbf{X}}}}\) if and only if \(R(\mathbf{G})\) is a F-subgradient (limiting subgradient) of \(h^R\) or \(p^R\) at \(R({{\hat{\mathbf{X}}}})\), we have the desired result.
(ii) Let \({{\hat{\mathbf{X}}}} = ({\hat{Y}}, {\hat{Z}})\) be a \(\beta -\)stationary point of (8). Note that
is equivalent to
which is further equivalent to
Combining this with the generalized Fermat rule [16, Theorem 10.1] and the fact that
[16, Exercise 8.14], we conclude that
which is exactly
due to the conic property of \(N_{R(S)}(R({{\hat{Y}}}))\). Note that (38) is equivalent to
On the other hand, the relation
indicates that
Combining with (39), we have Conclusion (ii).
(iii) Denote \(R_1^\# = \nabla _{R(Y)} h^R(R(Y^\#), R(Z^\#))\) and \(R_2^\# = \nabla _{R(Z)} h^R(R(Y^\#), R(Z^\#))\). By the Lipschitz continuity of \(\nabla h^R\) and the descent lemma [3], we have
for any \(Y, Y', Z, Z' \in {{\mathbb {Q}}}^{m \times n}\). This is equivalent to
for any \(Y, Y', Z, Z' \in {{\mathbb {Q}}}^{m \times n}\). We then prove the both parts simultaneously. Assume on the contrary that \((Y^\#, Z^\#)\) is not a \(\beta -\)stationary point of problem (8) for some \(\beta < {1 \over L_h}\). Then there is \((Y_0, Z_0) \not = (Y^\#, Z^\#)\) such that
The inclusion \(Y_0 \in \Pi _S(Y^\#-\beta R_1^\#)\) indicates that
Thus,
On the other hand, the relation \(Z_0 \in \mathrm{Prox}_{\beta \lambda \Vert \cdot \Vert _0} (Z^\# - \beta R_2^\#)\) implies that
After simplification, we have
Note that \(\beta < {1 \over L_h}\) and \(\delta _S(Y_0) = \delta _S(Y^\#)\). Then (43) contradicts the global optimality of \((Y^\#, Z^\#)\). Thus, \((Y^\#, Z^\#)\) is the unique element in \(\Pi _S(Y^\#- \beta \nabla _Y f({\mathbf{X}^\#}))\times \) \(\hbox {Prox}_{\beta \lambda \Vert \cdot \Vert _0}(Z^\#- \beta \nabla _Zf({\mathbf{X}^\#}))\). This proves Conclusion (iii).
(iv) Since the Eckart-Young-Mirsky low rank approximation theorem can be applied to the case of quaternion matrices [9], we can easily verify the implication below:
If \({\mathbf{X}^\#} = (Y^\#, Z^\#)\) is a \(\beta -\)stationary point with rank\((Y^\#)<r\), then by the definition of \(\beta \)-stationarity in (34), together with (35), we have \(\nabla _Y h({\mathbf{X}^\#})= \nabla _Z h({\mathbf{X}^\#}) =O\) by employing (44). Additionally, the convexity of h yields that for any \(\mathbf{X}\in {{\mathbb {Q}}}^{m\times n}\),
Note that \(\Vert Z\Vert _0\) only takes integer values 0, 1, \(\ldots \), mn. Thus we can find some positive scalar \(\epsilon \) such that
where \(N(Z^\#, \epsilon ):=\left\{ Z: \Vert Z-Z^\#\Vert _F \le \epsilon \right\} \). Combining with (45), we can conclude that for any feasible solution \(\mathbf{X}\in N(X^\#, \epsilon )\), \(h(\mathbf{X})+p(\mathbf{X})\ge h({\mathbf{X}^\#})+p({\mathbf{X}^\#})\), that is, \({\mathbf{X}^\#}\) is a local minimizer. This completes the proof. \(\square \)
7 Final Remarks
In this paper, we proposed a general form of the QMO problem, presented its optimal analysis, introduced first- and second-order derivatives of real-valued functions of quaternion matrix variables, and established their calculation rules. Our approach is consistent with the subgradient concept for norms of quaternion matrix variables introduced in [9]. We established first and second-order optimality conditions for the QMO problem. Optimization methods can be developed based upon these.
One key tool of our approach is the R-product. It turns out that the R-product of two quaternion matrices is equal to the real part of the inner product of these two quaternion matrices. This is not by chance. As we may form a third-order real tensor to a quaternion matrix, there is also an inverse reaction to make the operation results of quaternion matrices to the real field, such as singular values and R-products.
Finally, we introduced the generalized subdifferentials of proper functions of quaternion matrices, and used them to analyze the optimality conditions of the SCID problem, a special case of the QMO problem. This combines the knowledge of quaternion matrices, color image processing and variational analysis.
Different QMO models have different structures. For example, the LRQD problem satisfies the strong assumption, thus even second-order methods are possible to be developed. On the other hand, the LRQA problem (7) involves all singular values of quaternion matrices, whose generalized subdifferentials need to be further studied.
We hope that our work is useful to people working with optimization models involving quaternion matrices.
Data Availability Statement
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.
References
Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods. Math. Program. 137, 91–129 (2013)
Beck, A., Eldar, Y.C.: Sparsity constrained nonlinear optimization: optimality conditions and algorithms. SIAM J. Optim. 23, 1480–1509 (2013)
Bertsekas, D.: Nonlinear Programming, \(2\)nd edition. Athena Scientific, Belmont, MA (1999)
Chen, Y., Qi, L., Zhang, X., Xu, Y.: A low rank quaternion decomposition algorithm and its application in color image inpainting, arXiv:2009.12203, (2020)
Chen, Y., Xiao, X., Zhou, Y.: Low-rank quaternion approximation for color image processing. IEEE Trans. Image Process. 29, 1057–7149 (2019)
Clarke, F.H.: Optimization and Nonsmooth Analysis. Wiley, Newyork (1983)
Hiriart-Urruty, J.-B., Le, H.Y.: From Eckart and Young approximation to Moreau envelopes and vice versa. RAIRO Oper. Res. 47, 299–310 (2013)
Hiriart-Urruty, J.-B., Le, H.Y.: A varitational approach of the rank function. TOP 21, 207–240 (2013)
Jia, Z., Ng, M.K., Song, G.: Robust quaternion matrix completion with applications to image inpaiting. Numer. Linear Algebra Appl. 26, e2245 (2019)
Le, H.Y.: Generalized subdifferentials of the rank function. Optim. Lett. 7, 731–743 (2013)
Le Bihan, N., Sangwine, S.J.: Quaternion principal component analysis of color images. IEEE Int. Conf. Image Process. 1, 809–812 (2003)
Li, X., Xie, N., Song, W.: Optimality conditions for rank-constrained matrix optimzation. J. Oper. Res. Soc. China 7, 285–301 (2019)
Lu, Z., Zhang, Y., Li, X.: Penalty decomposition methods for rank minimization. Optim. Method Softw. 30, 531–558 (2015)
Mandic, D.P., Jahanchahi, C., Took, C.C.: A quaternion gradient operator and its applications. IEEE Signal Process. Lett. 18, 47–50 (2011)
Mordukhovich, B.S.: Variational Analysis and Generalized Differention I: Basis Theory. Springer, Berlin (2006)
Rockafellar, T.R., Wets, R.J.B.: Variational Analysis. Springer, Berlin (2009)
Sangwine, S.J.: Fourier transforms of colour images using quaternion or hypercomplex, numbers. Electron. Lett. 32, 1979–1980 (1996)
Sangwine, S.J., Le Bihan, N.: Quaternion singular value decomposition based on bidiagonalization to a real or complex matrix using quaternion Householder transformations. Appl. Math. Comput. 182, 727–738 (2006)
Xiao, X., Zhou, Y.: Two-dimensional quaternion pca and sparse PCA. IEEE Trans. Neural Netw. Learn. Syst. 30, 2028–2042 (2018)
Xu, D., Jahanchahi, C., Took, C.C., Mandic, D.P.: Enabling quaternion derivatives: the generalized HR calculus. Roy. Soc. Open Sci. 2, 150255 (2015)
Xu, Y., Yu, L., Xu, H., Zhang, H., Nguyen, T.: Vector sparse representation of color image using quaternion matrix analysis. IEEE Trans. Image Process. 24, 1315–1329 (2015)
Wei, M., Li, Y., Zhang, F., Zhao, J.: Quaternion matrix computations. Nova Science Publisher, New York (2018)
Zeng, R., Wu, J., Shao, Z., Chen, Y., Chen, B., Senhadji, L., Su, H.: Color image classification via quaternion principal component analysis network. Neurocomputing 216, 416–428 (2016)
Zhang, F.: Quaternions and matrices of quaternions. Linear Algebra Appl. 251, 21–57 (1997)
Zhou, S., Xiu, N., Qi, H.: Global and quadratic convergence of Newton hard-thresholding pursuit. J. Machine Learn. Res. 22, 1–45 (2021)
Zou, C., Kou, K.I., Wang, Y.: Quarernion collaborative and sparse representation with application to color face recognition. IEEE Trans. Image Process. 25, 3287–3302 (2016)
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Boris S. Mordukhovich.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Ziyan Luo’s work was supported by NSFC (Grant No. 11771038) and Beijing Natural Science Foundation (Grant No. Z190002).Qing-Wen Wang’s work was supported by NSFC (Grant No. 11971294). Xinzhen Zhang’s work was supported by NSFC (Grant No. 11871369). Dedicated to Professor Franco Giannessi on the occasion of his 85th Birthday.
Rights and permissions
About this article
Cite this article
Qi, L., Luo, Z., Wang, QW. et al. Quaternion Matrix Optimization: Motivation and Analysis. J Optim Theory Appl 193, 621–648 (2022). https://doi.org/10.1007/s10957-021-01906-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10957-021-01906-y