Singular value problems under nonnegativity constraints

Seeger, Alberto; Sossa, David

doi:10.1007/s11117-023-01000-9

Singular value problems under nonnegativity constraints

Published: 12 July 2023

Volume 27, article number 46, (2023)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Positivity Aims and scope Submit manuscript

Singular value problems under nonnegativity constraints

Download PDF

138 Accesses
Explore all metrics

Abstract

Let A be a real matrix of size $m\times n$. In classical linear algebra, a real number $\sigma $ is called a singular value of A if there exist unit vectors $u\in \mathbb {R}^m$ and $v\in \mathbb {R}^n$ such that $Av = \sigma u$ and $ A^\top u = \sigma v$. In variational analysis, a singular value of A is viewed as a critical value of the bilinear form $\langle u,Av\rangle $ when u and v range on the unit spheres of $\mathbb {R}^m$ and $\mathbb {R}^n$, respectively. If u and v are further required to be nonnegative, then the idea of criticality is expressed by means of a pair of complementarity problems, namely, $0\le u \perp (Av-\sigma u)\ge 0$ and $0\le v \perp (A^\top u -\sigma v)\ge 0$. The parameter $\sigma $ is now called a Pareto singular value of A. In this work we study the concept of Pareto singular value and, by way of application, we analyze a problem of nonnegative matrix factorization. The set $\Xi (A)$ of Pareto singular values of A is nonempty and finite. We derive an explicit formula for the maximum number of Pareto singular values in a matrix of prescribed size. The elements of $\Xi (A)$ can be found by solving a collection of classical singular value problems involving the principal submatrices of A. Unfortunately, such a method is cost prohibitive if m and n are large. For matrices of large size we propose an algorithm of alternating minimization type. This work is a continuation of our paper entitled Cone-constrained singular value problems published in the Journal of Convex Analysis (30, 2023, pp. 1285–1306).

Singular value inequalities for convex functions of positive semidefinite matrices

Article 18 November 2022

Singular value inequalities involving convex and concave functions of positive semidefinite matrices

Article 16 July 2020

Singular value inequalities of matrices via increasing functions

Article Open access 03 September 2024

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

This work deals with singular value problems under nonnegativity constraints on singular vectors. Let $S_d$ be the unit sphere of $\mathbb {R}^{d}$ and $\mathbb {M}_{m,n}$ be the space of real matrices of size $m\times n$. Classically, the set $\Sigma (A)$ of singular values of a matrix $A\in \mathbb {M}_{m,n}$ is defined as follows: a real number $\sigma $ is a singular value of A if there exists a pair $(u,v) \in S_m\times S_n$ such that

$$\begin{aligned} Av = \sigma u,\; A^\top u = \sigma v, \end{aligned}$$

(1)

where the superscript $\top $ denotes matrix transposition. In such a case, $(u,v,\sigma )$ is a singular triplet of A and u (respectively, v) is a left-singular (respectively, right-singular) vector of A. The equations in (1) can be viewed as the criticality conditions for the problem which consists in minimizing or maximizing the bilinear form $\langle u,Av\rangle $ over the bi-sphere $ S_m\times S_n$. In a number of applications, the vectors u and v are subject to additional constraints: nonnegativity, monotonicity, sparsity, etc. For instance, Montanari and Richard [2] consider a Principal Component Analysis problem

$$\begin{aligned} \mathop {\textrm{max}}\limits _{\begin{array}{c} \Vert u\Vert =1, \,\Vert v\Vert =1 \\ v\ge 0 \end{array}}\langle u, Av\rangle , \end{aligned}$$

(2)

where the constraint $v\ge 0$ means that each component of v is nonnegative. Writing down the criticality condition for (2) leads to a hybrid complementarity problem

$$\begin{aligned} Av=\sigma u,\;\;0\le v\perp (\sigma v-A^\top u)\ge 0, \end{aligned}$$

(3)

where $\perp $ denotes orthogonality. The system (3) is not just a matter of adding the inequality $v\ge 0$ to the equations in (1), but it is something more involved pertaining to the realm of the theory of complementarity problems. We refer to (3) as a hybrid complementarity problem because it is a mixture of an equation and a complementary problem. In this work, we give the priority to minimization over maximization and assume that both singular vectors are subject to a nonnegativity constraint:

$$\begin{aligned} \varrho _{\textrm{min}}(A)\,=\mathop {\textrm{min}}\limits _{\begin{array}{c} \Vert u\Vert =1,\, \Vert v\Vert =1\\ u\ge 0, \, v\ge 0 \end{array}}\;\langle u,Av\rangle . \end{aligned}$$

(4)

We say that $(u,v) \in S_m\times S_n$ is a critical pair of (4) if there exists a real number $\sigma $ such that

$$\begin{aligned} {\left\{ \begin{array}{ll} \,0\le u\,\perp \,(Av-\sigma u) \;\;\ge 0,&{} \\ \,0\le v\,\perp \, (A^\top u -\sigma v)\ge 0.&{}\end{array}\right. } \end{aligned}$$

(5)

Since (4) is a nonconvex optimization problem, criticality is necessary but not sufficient for optimality. The system (5) of complementarity problems can be written in the equivalent form

$$\begin{aligned} {\left\{ \begin{array}{ll} \,\mathbb {P}_m\ni u\,\perp \, (Av-\sigma u) \in \mathbb {P}_m,&{} \\ \,\mathbb {P}_n\ni v\,\perp \, (A^\top u -\sigma v)\in \mathbb {P}_n,&{}\end{array}\right. } \end{aligned}$$

where $\mathbb {P}_d$ is the Pareto cone (or nonnegative orthant) of the space $\mathbb {R}^d$. The partial order relation induced by $\mathbb {P}_d$ is the componentwise ordering of $\mathbb {R}^d$. Seeger and Sossa [6] introduced a concept of singular value for matrices that is relative to a prescribed pair of nonzero closed convex cones. Such “ordering” cones can be absolutely general in fact, but in this work we focus on Pareto cones. In other words, that a vector is nonnegative means that it is nonnegative componentwisely.

Definition 1

A real number $\sigma $ is a Pareto singular value of A if the system (5) holds for some $(u,v)\in S_m\times S_n$. In such a case, $(u,v,\sigma )$ is called a Pareto singular triplet of A.

The set $\Xi (A)$ of Pareto singular values of A is nonempty. Indeed, if $(u,v,\sigma )$ is a Pareto singular triplet of A, then $\sigma =\langle u, Av \rangle $ and, therefore,

$$\begin{aligned} \Xi (A)=\{\langle u, Av \rangle : (u,v) \text{ is } \text{ a } \text{ critical } \text{ pair } \text{ of } (4)\} \end{aligned}$$

is the set of critical values of (4). As pointed out in [6], the set $\Xi (A)$ is finite and $\varrho _{\textrm{min}}(A)$ is its smallest element:

$$\begin{aligned} \varrho _{\textrm{min}}(A)\,= \min _{\sigma \in \Xi (A)}\sigma . \end{aligned}$$

(6)

A word of caution is however in order: the term

$$\begin{aligned} \varrho _{\textrm{max}}(A)\,=\mathop {\textrm{max}}\limits _{\begin{array}{c} \Vert u\Vert =1,\, \Vert v\Vert =1\\ u\ge 0, \, v\ge 0 \end{array}}\;\langle u,Av\rangle \end{aligned}$$

(7)

is not necessarily an element of $\Xi (A)$. Indeed, the criticality conditions for the maximization problem (7) are not exactly the same as those for the minimization problem (4). The organization of this paper is as follows. Section 2 settles the background of our work and presents some preliminary results concerning the set $\Xi (A)$. Unless the matrix A is of moderate size, the numerical computation of $\Xi (A)$ is an expensive task. Section 3 characterizes $\Xi (A)$ when A is a nonnegative matrix. In this special case, computing $\Xi (A)$ boils down to evaluate the spectral norm of each submatrix of A. Section 4 concerns the maximum number of Pareto singular values in a matrix of prescribed size. We characterize the matrices A of size $m\times n$ for which the cardinality of $\,\Xi (A)$ is highest. Section 5 and Sect. 6 are left to applications and algorithmic issues.

2 Background and preliminary results

Proposition 1 shows that $\Xi (A)$ remains unchanged if A is transposed or multiplied by a permutation matrix. The notation $\textrm{Perm}(d)$ stands for the set of permutation matrices of order d.

Proposition 1

Let $A\in \mathbb {M}_{m,n}$. Then

(a)
$\Xi (A^\top )=\Xi (A)$ and $\,\Xi (tA)=t\,\Xi (A)$ for all $t>0$.
(b)
$\Xi (P A\, Q)=\Xi (A)$ for all $P\in \textrm{Perm}(m)$ and $Q\in \textrm{Perm}(n)$.

Proof

Part (a) is obvious, so we focus on (b). Let $\sigma \in \Xi (A)$. This means that (5) holds for some $(u,v)\in S_m\times S_n$. The vectors $z= Pu$ and $w= Q^\top v$ have unit length and satisfy

$$\begin{aligned}{} & {} P^\top z\in \mathbb {P}_m, \;\; Q w\in \mathbb {P}_n,\\{} & {} AQw-\sigma P^\top z \in \mathbb {P}_m,\;\;A^\top P^\top z-\sigma Qw\in \mathbb {P}_n,\\{} & {} \langle P^\top z, AQw-\sigma P^\top z \rangle =0,\;\;\langle Qw, A^\top P^\top z -\sigma Qw\rangle =0. \end{aligned}$$

But a permutation matrix of order d leaves invariant the cone $ \mathbb {P}_d$ and preserves the angle between any two d-dimensional vectors. Hence,

$$\begin{aligned}{} & {} z\in \mathbb {P}_m, \;\; w\in \mathbb {P}_n,\\{} & {} PAQw-\sigma z \in \mathbb {P}_m,\;\;Q^\top A^\top P^\top z-\sigma w\in \mathbb {P}_n,\\{} & {} \langle z, PAQw-\sigma z \rangle =0,\;\;\langle w, Q^\top A^\top P^\top z -\sigma Qw\rangle =0. \end{aligned}$$

Thus, $(z,w,\sigma )$ is a Pareto singular triplet of $B=P A\, Q$. In particular, $\sigma \in \Xi (B)$. The reverse inclusion $\Xi (B)\subseteq \Xi (A)$ is obtained by exchanging the roles of A and B. $\square $

The next proposition considers $\Xi : \mathbb {M}_{m,n}\rightrightarrows \mathbb {R}$ as a multivalued map. Consistently, upper and lower semicontinuity are understood in the traditional multivalued sense, cf. Rockafellar and Wets [4, p. 193].

Proposition 2

Let $\max \{m,n\}\ge 2$.

(a)
$\textrm{gr}(\Xi )=\{(A,\sigma )\in \mathbb {M}_{m,n}\times \mathbb {R}: \sigma \in \Xi (A)\}$ is a closed set.
(b)
$\Xi $ is upper semicontinuous, but not lower semicontinuous.

Proof

Part (a). Let $\{A_k\}_{k}\rightarrow A$ and $\{\sigma _k\}_{k}\rightarrow \sigma $ be converging sequences such that $\sigma _k\in \Xi (A_k)$. Hence, for each k, there exists $(u_k, v_k)\in S_m\times S_n$ such that

$$\begin{aligned} {\left\{ \begin{array}{ll} \,0\le u_k\,\perp \,(A_kv_k-\sigma _k u_k) \,\ge 0,&{} \\ \,0\le v_k\,\perp \, (A_k^\top u_k -\sigma _k v_k)\ge 0.&{}\end{array}\right. } \end{aligned}$$

(8)

By taking subsequences if necessary, we may suppose that $\{u_k\}_{k}$ and $\{v_k\}_{k}$ converge to unit vectors u and v, respectively. Passing to the limit in (8) yields $\sigma \in \Xi (A)$. Part (b). Since

$$\begin{aligned} \Xi (A)\,\subseteq \,\left[ \varrho _{\textrm{min}}(A), \varrho _{\textrm{max}}(A)\right] \end{aligned}$$

for all $A\in \mathbb {M}_{m,n}$ and the functions $\varrho _{\textrm{min}}$ and $ \varrho _{\textrm{max}}$ are continuous on $\mathbb {M}_{m,n}$, it is clear that $\Xi $ maps bounded sets into bounded sets. But mapping bounded sets into bounded sets and having a closed graph is sufficient for a multivalued map to be upper semicontinuous, cf. [4, Theorem 5.19]. For proving that $\Xi $ is not lower-semicontinuous, we construct a counter-example. We suppose for instance that $n\ge 2$ and, for each $\varepsilon \ge 0$, we define $A_\varepsilon \in \mathbb {M}_{m,n}$ by

$$\begin{aligned} (A_\varepsilon )_{i,j}= {\left\{ \begin{array}{ll} \;\;1 &{} \text{ if } i=1, j=1,\\ -\varepsilon &{} \text{ if } i=1, j=2,\\ \;\;0 &{} \text{ otherwise }.\end{array}\right. } \end{aligned}$$

Note that $1\in \Xi (A_0)$. If $\varepsilon >0$ goes to 0, then $A_\varepsilon $ goes to $A_0$, but 1 remains away from $\Xi (A_\varepsilon )$. Indeed, by writing down (5) for the matrix $A_\varepsilon $, we get the system

$$\begin{aligned} {\left\{ \begin{array}{ll} \,v_1-\varepsilon v_2-\sigma u_1\ge 0,&{} \\ \,-\sigma u_i\ge 0\, \text{ for } i=2,\ldots , m&{}\\ \,u_1-\sigma v_1\ge 0,\; -\varepsilon u_1-\sigma v_2\ge 0&{} \\ \,-\sigma v_j\ge 0\, \text{ for } j=3,\ldots , n&{}\\ u_1(v_1-\varepsilon v_2)=\sigma &{} \end{array}\right. } \end{aligned}$$

together with the nonnegativity constraints on the unit vectors u and v. A quick examination at this system shows that $\sigma $ cannot be positive. Hence, the distance from 1 to the set $\Xi (A_\varepsilon )\subseteq \mathbb {R}_-$ is at least one. $\square $

2.1 Characterization and computation of $\,\Xi (A)$

The theory of Pareto singular values for rectangular matrices is related to the theory of Pareto eigenvalues for square matrices. Let E be a square matrix of order d. Seeger [5] refers to a real number $\lambda $ as a Pareto eigenvalue of E if the complementarity problem

$$\begin{aligned} 0\le x \perp (Ex-\lambda x)\ge 0 \end{aligned}$$

(9)

has a nonzero solution $x\in \mathbb {R}^d$. The set of Pareto eigenvalues of E is denoted by $\Pi (E)$ and it is called the Pareto spectrum of E. A nonzero solution x to (9) can always be normalized so that $ \Vert x\Vert =1$. Up to a detail, it is possible to compute $\Xi (A)$ by solving an eigenvalue problem of the type (9). Indeed, the couple of complementarity problems mentioned in (5) can be written as a single complementarity problem

$$\begin{aligned} \left[ \begin{array}{c} 0 \\ 0 \\ \end{array} \right] \le \left[ \begin{array}{c} u \\ v \\ \end{array} \right] \perp \left( \left[ \begin{array}{cc} 0 &{}\quad A \\ A^\top &{}\quad 0 \\ \end{array} \right] \left[ \begin{array}{c} u \\ v \\ \end{array} \right] -\sigma \left[ \begin{array}{c} u \\ v \\ \end{array} \right] \right) \ge \left[ \begin{array}{c} 0 \\ 0 \\ \end{array} \right] \end{aligned}$$

(10)

involving the Jordan–Wielandt matrix

$$\begin{aligned} E_A=\left[ \begin{array}{cc} 0 &{} A \\ A^\top &{} 0 \\ \end{array} \right] . \end{aligned}$$

The sets $\Xi (A)$ and $\Pi (E_A)$ are almost equal. Indeed, it is shown in [6, Proposition 3] that

$$\begin{aligned} \Pi (E_A)\,\backslash \{0 \} \;\subseteq \;\Xi (A)\;\subseteq \; \Pi (E_A). \end{aligned}$$

(11)

From an algorithmic point of view, transforming a Pareto singular value problem into a Pareto eigenvalue problem is not always a good strategy. In fact, since $E_A$ is of order $m+n$, the numerical handling of (10) could be rather expensive. We suggest to compute the Pareto singular values of A without resorting to the theory of Pareto eigenvalues. One can use for instance the characterization of $\Xi (A)$ given in Theorem 1. In what follows, $\langle d\rangle =\{1,2,\ldots ,d\},$

$$\begin{aligned} {\mathcal {J}}(m,n)= \{(I,J): \,\emptyset \not =I\subseteq \langle m \rangle ,\; \emptyset \not =J\subseteq \langle n \rangle \}, \end{aligned}$$

and $A^{I,J}$ is the submatrix of A with entries $a_{i,j}$ indexed by $i\in I$ and $j\in J$. In particular, $A^{\langle m\rangle , \langle n\rangle }=A$. The symbol $w>0$ means that w is a vector with positive components.

Theorem 1

A real number $\sigma $ belongs to $\Xi (A)$ if and only if there exist $(I,J)\in {\mathcal {J}}(m,n)$ and vectors $\xi \in {\mathbb {R}}^{|I|}$ and $\eta \in {\mathbb {R}}^{|J|}$ such that

$$\begin{aligned}{} & {} A^{I,J}\eta =\sigma \xi ,\; (A^{I,J})^\top \xi =\sigma \eta , \end{aligned}$$

(12)

$$\begin{aligned}{} & {} \xi>0, \, \eta >0, \end{aligned}$$

(13)

$$\begin{aligned}{} & {} \sum _{j\in J}a_{i,j}\,\eta _j\ge 0\;\;\; \forall i\in \langle m\rangle \backslash I, \end{aligned}$$

(14)

$$\begin{aligned}{} & {} \sum _{i\in I}a_{i,j}\,\xi _i\ge 0\;\;\; \forall j\in \langle n\rangle \backslash J. \end{aligned}$$

(15)

Theorem 1 is stated in [6], but we recall it here for the reader’s convenience. Without loss of generality, the system (12)–(15) can be written with the extra conditions $\Vert \xi \Vert =1$ and $\Vert \eta \Vert =1$. In such a case, a Pareto singular triplet $(u,v,\sigma )$ of A is obtained by setting $u_i=\xi _i$ if $i\in I$, $ u_i=0$ if $i\in \langle m\rangle \backslash I$, $v_j=\eta _j$ if $j\in J$, and $v_j=0$ if $j\in \langle n\rangle \backslash J$. Note that $\Vert u\Vert =\Vert \xi \Vert $ and $\Vert v\Vert =\Vert \eta \Vert $. Theorem 1 shows that

$$\begin{aligned} \Xi (A)\;= \bigcup _{(I,J)\in {\mathcal {J}}(m,n)}\Xi _{I,J}\left( A\right) , \end{aligned}$$

(16)

where

$$\begin{aligned} \Xi _{I,J}(A)= \left\{ \sigma \in \mathbb {R}: (12){-}(15) \text{ holds } \text{ for } \text{ some } (\xi ,\eta )\in {\mathbb {R}}^{|I|}\times {\mathbb {R}}^{|J|}\right\} \end{aligned}$$

(17)

is either empty or a singleton. To see that $\Xi _{I,J}(A)$ cannot have more than one element, it suffices to write (12) as an eigenvalue problem

$$\begin{aligned} \left[ \begin{array}{cc} 0 &{} A^{I,J} \\ (A^{I,J})^\top &{} 0 \\ \end{array} \right] \left[ \begin{array}{c} \xi \\ \eta \\ \end{array} \right] =\sigma \left[ \begin{array}{c} \xi \\ \eta \\ \end{array} \right] \end{aligned}$$

and observe that the Jordan-Wieland matrix asociated to $A^{I,J}$ is symmetric. If $\Xi _{I,J}(A)$ is nonempty, then the pair (I, J) is said to be successful for detecting a Pareto singular value of A. Beware that two different successful pairs, say $(I_0,J_0)$ and $(I_1,J_1)$, may yield the same $\sigma $.

Table 1 Successful pairs (I, J) and Pareto singular values of the matrix (18)

Full size table

Example 1

Consider the test example

$$\begin{aligned} A= \left[ \begin{array}{ccc} -6 &{}\quad 3 &{}\quad 4\\ 1 &{}\quad 2 &{}\quad 1 \\ \end{array} \right] . \end{aligned}$$

(18)

Table 1 is obtained by working out the system (12)–(15) for each of the $21=(2^2-1)(2^3-1)$ choices of (I, J). The Pareto singular value $\sqrt{5}=2.236068$ appears twice in Table 1, because it is obtained with two distinct choices of (I, J). In this example, there are 7 successful pairs (I,J), but only 6 Pareto singular values. A Pareto singular value that is obtained with different choices of (I, J) is counted only once.

3 Pareto singular values of nonnegative matrices

If A is nonnegative, then the slackness inequalities (14)–(15) hold automatically and the set (17) depends only on the entries of the submatrix $A^{I,J}$. In what follows, we use the notation $B\unlhd A$ to indicate that B is a submatrix of A.

Proposition 3

Let A be nonnegative. Then $\Xi (B)\subseteq \Xi (A)$ for all $ B\unlhd A$ and, in particular, each entry of A belongs to $\Xi (A)$.

Proof

Let $A\in \mathbb {M}_{m,n}$ be nonnegative. Consider a submatrix $B=A^{I_0,J_0}$ with $(I_0,J_0)\in {\mathcal {J}}(m,n)$. Note that if $(I,J)\in {\mathcal {J}}(m,n)$ is such that $I\subseteq I_0$ and $J\subseteq J_0$, then

$$\begin{aligned} B^{I,J}=(A^{I_0,J_0})^{I,J}= A^{I,J}. \end{aligned}$$

Let $\sigma \in \Xi (B)$. By Theorem 1, there exists a pair (I, J) as just mentioned such that (12)–(13) holds for some pair $(\xi ,\eta )$ of vectors of appropriate dimensions. If we view $B^{I,J}$ as a submatrix of A and recall that the slackness inequalities (14)–(15) can be dropped, then it is clear that $\sigma \in \Xi (A)$. $\square $

Proposition 3 can be generalized to the case in which only some entries of A are nonnegative. Consider for instance a matrix

$$\begin{aligned} A= \left[ \begin{array}{ccc} *&{}\quad N &{}\quad *\\ W &{}\quad B &{}\quad E\\ *&{}\quad S &{}\quad *\\ \end{array} \right] , \end{aligned}$$

where the signs on the central block B and the four corner blocks are arbitrary. If the signs on the northern block N, southern block S, western block W, and eastern block E, are nonnegative, then $\Xi (B)\subseteq \Xi (A)$. The same conclusion holds for a matrix A having any of the following forms:

$$\begin{aligned} \left[ \begin{array}{ccc} *&{}\quad N \\ W &{}\quad B \\ \end{array} \right] ,\; \left[ \begin{array}{ccc} W &{}\quad B \\ *&{}\quad S \\ \end{array} \right] ,\;\left[ \begin{array}{cc} N &{}\quad *\\ B &{}\quad E\\ \end{array} \right] ,\; \left[ \begin{array}{ccc} B &{}\quad E\\ S &{}\quad *\\ \end{array} \right] . \end{aligned}$$

Example 2

The matrix

$$\begin{aligned} \left[ \begin{array}{ccc|cc} -6 &{}\quad 3 &{}\quad 4&{}\quad 2&{}\quad 3\\ 1 &{}\quad 2 &{}\quad 1 &{}\quad 0&{}\quad 1\\ \hline 8 &{} 1 &{} 5 &{}-7&{}1\\ \end{array} \right] \end{aligned}$$

(19)

contains (18) in the northwestern corner. Since

$$\begin{aligned} \left[ \begin{array}{ccc} 8 &{}\quad 1 &{}\quad 5 \\ \end{array} \right] ,\; \left[ \begin{array}{cc} 2&{}\quad 3\\ 0&{}\quad 1\\ \end{array} \right] \end{aligned}$$

are nonnegative blocks, each Pareto singular value of (18) is a Pareto singular value of (19).

The next theorem is the main result of this section. We start by writing two useful lemmas.

Lemma 1

$A\in \mathbb {M}_{m,n}$ is nonnegative if and only if each Pareto singular value of A is nonnegative.

Proof

A matrix A is nonnegative if and only if $\langle u,Av\rangle \ge 0$ for all $u\ge 0$ and $v\ge 0$. The later condition is equivalent to saying that $\varrho _{\textrm{min}}(A)$ is nonnegative. Formula (6) completes the job. $\square $

Let $\lambda _{\textrm{max}}(E)$ be the largest eigenvalue of a symmetric matrix E. The spectral norm of a rectangular matrix A is given by

$$\begin{aligned} \sigma _{\textrm{max}}(A)\;=\; \left[ \lambda _{\textrm{max}}(A^\top A)\right] ^{1/2}\;=\;\max _{\Vert v\Vert =1} \Vert Av\Vert . \end{aligned}$$

This notation is consistent with the fact that $\sigma _{\textrm{max}}(A)$ is the largest singular value of A.

Lemma 2

Let A be a nonnegative matrix. Then $\sigma _{\textrm{max}}(A)$ belongs to $\,\Xi (A)$. Furthermore,

$$\begin{aligned} \sigma _{\textrm{max}}(A)\,=\,\varrho _{\textrm{max}}(A)\,=\, \max _{\sigma \in \Xi (A)}\sigma . \end{aligned}$$

(20)

Proof

Note that $\sigma \le \varrho _{\textrm{max}}(A)\le \sigma _{\textrm{max}}(A)$ for all $A\in \mathbb {M}_{m,n}$ and all $\sigma \in \Xi (A)$. Less easy to prove is that $\sigma _{\textrm{max}}(A)\in \Xi (A)$ when A is nonnegative. For each $\varepsilon >0$, we set $A_\varepsilon = A+ \varepsilon \, 1_m 1_n^\top $, where $1_d$ is the d-dimensional vector of all ones. Since $A_\varepsilon $ is positive, also $A_\varepsilon ^\top A_\varepsilon $ is positive. Perrons’s theorem ensures the existence of a positive unit vector $v_\varepsilon $ such that

$$\begin{aligned} A_\varepsilon ^\top A_\varepsilon \,v_\varepsilon =\lambda _{\textrm{max}}(A_\varepsilon ^\top A_\varepsilon ) \,v_\varepsilon . \end{aligned}$$

(21)

Since $\sigma _\varepsilon = [\lambda _{\textrm{max}}(A_\varepsilon ^\top A_\varepsilon )]^{1/2}= \sigma _{\textrm{max}}(A_\varepsilon )$ is positive, the vector

$$\begin{aligned} u_\varepsilon =(1/\sigma _\varepsilon ) A_\varepsilon v_\varepsilon \end{aligned}$$

(22)

is well defined and positive. By combining (21)–(22) and the definition of $\sigma _\varepsilon $, we get

$$\begin{aligned} \Vert u_\varepsilon \Vert ^2= (1/\sigma _\varepsilon ^2)\, \Vert A_\varepsilon v_\varepsilon \Vert ^2= (1/\sigma _\varepsilon ^2)\,\langle v_\varepsilon , A_\varepsilon ^\top A_\varepsilon v_\varepsilon \rangle = \Vert v_\varepsilon \Vert ^2=1. \end{aligned}$$

Hence, $(u_\varepsilon ,v_\varepsilon )$ is a pair of positive unit vectors satisfying $A_\varepsilon v_\varepsilon =\sigma _\varepsilon u_\varepsilon $ and $A_\varepsilon ^\top u_\varepsilon =\sigma _\varepsilon v_\varepsilon $. This proves that $\sigma _{\textrm{max}}(A_\varepsilon )\in \Xi (A_\varepsilon )$ for all $\varepsilon >0$. But the function $\sigma _{\textrm{max}}: \mathbb {M}_{m,n}\rightarrow \mathbb {R}$ is continuous and the multivalued map $\Xi $ has a closed graph, cf. Proposition 2. So, by letting $\varepsilon \rightarrow 0$, we obtain $\sigma _{\textrm{max}}(A)\in \Xi (A)$. $\square $

The second formula in (20) looks similar to (6). However, (6) holds for all A, whereas the second formula in (20) may be false if A contains a negative entry. The next theorem shows that computing the set $\Xi (A)$ for a nonnegative matrix A is essentially a matter of evaluating the spectral norm $\sigma _{\textrm{max}}(B)$ of each submatrix B of A. If the size of B is reasonable, then $\sigma _{\textrm{max}}(B)$ can be computed in short time and with precision.

Theorem 2

$A\in \mathbb {M}_{m,n}$ is nonnegative if and only if $ \,\Xi (A) = \{\sigma _{\textrm{max}}(B): B\unlhd A\}$.

Proof

For avoiding trivialities, we suppose that $\max \{m,n\}\ge 2$. Each element of the set

$$\begin{aligned} \Gamma (A) = \{\sigma _{\textrm{max}}(B): B\unlhd A\} \end{aligned}$$

(23)

is nonnegative. If $\Xi (A)=\Gamma (A)$, then A is nonnegative by Lemma 1. Conversely, suppose that A is nonnegative. For proving the inclusion $\Xi (A)\subseteq \Gamma (A)$, we pick any $\sigma \in \Xi (A)$. Since A is nonnegative, we know that $\sigma \ge 0$, cf. Lemma 1. By Theorem 1, there exists $(I,J)\in {\mathcal {J}}(m,n)$ such that (12)–(13) holds for some pair $(\xi ,\eta )$ of unit vectors. Hence, $M\eta = \sigma ^2 \eta $ with $M= (A^{I,J})^\top A^{I,J}$. Let $\lambda $ be the largest eigenvalue of the symmetric matrix M. Since M is nonnegative and positive semidefinite, there exists a nonnegative unit vector $\gamma \in \mathbb {R}^{\vert J\vert }$ such that $M\gamma =\lambda \gamma .$ Since $ \gamma $ belongs to the Pareto cone of the space $\mathbb {R}^{\vert J\vert }$ and $\eta $ belongs to the interior of the same cone, these eigenvectors of M cannot be orthogonal. Hence, $ \sigma ^2=\lambda $ and $\sigma = \sqrt{\lambda }= \sigma _{\textrm{max}}(A^{I,J})$ belongs to $\Gamma (A)$. For proving the reverse inclusion $\Gamma (A)\subseteq \Xi (A)$, we pick any $B\unlhd A$ and show that $\sigma = \sigma _{\textrm{max}}(B)$ belongs to $\Xi (A)$. We suppose that B is not of size $1\times 1$, otherwise we are done because each entry of A belongs to $\Xi (A)$. Since B is nonnegative, Lemma 2 implies that $\sigma \in \Xi (B)$. By using Proposition 3, we deduce that $\sigma \in \Xi (A)$. $\square $

Example 3

Consider a nonnegative matrix of rank-one, that is to say, a matrix of the form $pq^\top $, where $p\in \mathbb {P}_m$ and $q\in \mathbb {P}_n$ are nonzero vectors. Note that

$$\begin{aligned} \sigma _{\textrm{max}}\left( (p q^\top )^{I,J}\right) = \sigma _{\textrm{max}}(p_I q_J^\top ) = \Vert p_I\Vert \, \Vert q_J\Vert , \end{aligned}$$

where $p_I$ is the vector formed with the components of p indexed by I. The definition of $q_J$ is analogous. Hence, Theorem 2 yields $\,\Xi (pq^\top )= \left\{ \Vert p_I\Vert \, \Vert q_J\Vert : (I,J)\in {\mathcal {J}}(m,n)\right\} $.

Let $\mathbb {M}_{m,n}$ be equipped with the Frobenius inner product $\langle Y, X\rangle = \textrm{tr}( Y^\top X)$ and let ${\mathcal {N}}_{m,n}$ be the closed convex cone of nonnegative matrices of size $m\times n$. The next corollary is a consequence of Theorem 2, but it can be proven also in a direct way.

Corollary 1

Let $A\in \mathbb {M}_{m,n}$ be nonnegative. Then

$$\begin{aligned} \varrho _{\textrm{min}}(A)\,=\, \min \{a_{i,j}: 1\le i\le m,\,1\le j\le n\} \end{aligned}$$

corresponds to the distance from A to the boundary of ${\mathcal {N}}_{m,n}$.

Consider again the case of a general matrix A. The set $\Gamma (A)$ introduced in (23) collects the spectral norms of all submatrices of A. If A contains at least one negative entry, then $\Xi (A)$ and $\Gamma (A)$ could be quite different in fact. By way of example, consider the matrix (18) whose Pareto singular values are displayed in Table 1. Table 2 shows the spectral norm of each of the 21 submatrices of (18). For easy of visualization, we display these values in decreasing order. After removing repetitions, we get 18 different values. In this example, $\Xi (A)$ and $\Gamma (A)$ are not only different, but they have also different cardinality.

Table 2 Spectral norm of each submatrix of the matrix (18)

Full size table

For subsequent use, we introduce the following natural concept.

Definition 2

A matrix A is called spectral-norm scattered if all the submatrices of A have a different spectral norm.

Definition 2 makes sense even if A admits negative entries. By way of example, the matrix (18) is not spectral-norm scattered because the submatrices $ \left[ \begin{array}{ccc} 1&\quad 2 \end{array} \right] $ and $ \left[ \begin{array}{ccc} 2&\quad 1 \end{array} \right] $ have the same spectral norm. Random matrices following an absolutely continuous probability distribution are usually spectral-norm scattered. Topologically speaking,

$$\begin{aligned} \textrm{SNS}(m,n)=\{A\in \mathbb {M}_{m,n}: A \text{ is } \text{ spectral-norm } \text{ scattered }\} \end{aligned}$$

is an open set in $\mathbb {M}_{m,n}$. The proof of this fact will be incorporated in Proposition 4.

4 Counting Pareto singular values

In general, $\Sigma (A)$ and $\Xi (A)$ are non-comparable with respect to set inclusion, i.e., neither set is included in the other. The particular example

$$\begin{aligned} \Sigma \left( \left[ \begin{array}{ccc} 4 &{}\quad 1&{}\quad 3 \\ 8 &{}\quad 3&{}\quad -2 \\ \end{array} \right] \right)= & {} \{9.492982,\, 3.589331\},\\ \Xi \left( \left[ \begin{array}{ccc} 4 &{}\quad 1&{}\quad 3 \\ 8 &{}\quad 3&{}\quad -2 \\ \end{array} \right] \right)= & {} \{\,5.099020,\, 5.000000,\,4.123106,\,4.000000\\{} & {} 2.854102, \,1.000000,\,-2.000000\} \end{aligned}$$

shows that classical and Pareto singular values may form disjoint sets. A matrix A of size $m\times n$ has at most $2\min \{m,n\}$ singular values. The factor 2 is because negative singular values are also counted. On the other hand, it is proven in [6, Theorem 4] that

$$\begin{aligned} \mathfrak {s}(m,n)\,= \max _{A\in \mathbb {M}_{m,n}} \textrm{card}[ \Xi (A)] \end{aligned}$$

(24)

grows exponentially in n and m separately. More precisely,

$$\begin{aligned} \mathfrak {s}(m,n)=(2^m-1)(2^n-1). \end{aligned}$$

(25)

The right-hand side of (25) corresponds to the cardinality of ${\mathcal {J}}(m,n)$ or, equivalently, to the number of submatrices of a matrix of size $m\times n$. A matrix attaining the maximum in (24) is called PSV-saturated. The next proposition characterizes the set

$$\begin{aligned} \mathfrak {S}(m,n)=\{A\in \mathbb {M}_{m,n}: \textrm{card}[ \Xi (A)]= \mathfrak {s}(m,n)\} \end{aligned}$$

of PSV-saturated matrices of size $m\times n$. A set in a vector space is said to be positively homogeneous if it is stable under multiplication by positive scalars.

Proposition 4

Let $\max \{m,n\}\ge 2$. Then

$$\begin{aligned} \mathfrak {S}(m,n)= \{A\in \mathbb {M}_{m,n}: A \text{ is } \text{ positive } \text{ and } \text{ spectral-norm } \text{ scattered }\,\}. \end{aligned}$$

(26)

Furthermore, this set is open and positively homogeneous.

Proof

$A\in \mathbb {M}_{m,n}$ is PSV-saturated if and only if the following two conditions hold true:

$$\begin{aligned}&\Xi _{I,J}(A)\not = \emptyset&\text{ for } \text{ all } (I,J)\in {\mathcal {J}}(m,n), \end{aligned}$$

(27)

$$\begin{aligned}&\Xi _{I_0,J_0}(A)\not = \Xi _{I_1,J_1}(A)&\text{ when } (I_0,J_0)\not = (I_1,J_1). \end{aligned}$$

(28)

This observation is due to (16) and the fact that each $\Xi _{I,J}(A)$ is either empty or a singleton. Let $ A\in \mathfrak {S}(m,n)$. We claim that A is a nonnegative matrix. Suppose, to the contrary, that A has a negative entry. By Proposition 1 (b), we may consider the case $a_{1, 1}<0$. By Proposition 1 (a) and the fact that $\max \{m,n\}\ge 2$, we may suppose for instance that $n\ge 2$. In such a case, the condition (27) is false for the choice $(I, J)= (\{1\},\{2\})$, because the slackness inequality (15) is violated for $j=1$. We deduce that A is nonnegative. In such a case, $\Xi _{\{i\},\{j\}}(A)= \{a_{i,j}\}$ for all $i\in \langle m\rangle $ and $j\in \langle n\rangle $. This implies that the entries of A are all distinct, otherwise (28) is violated. We now claim that A is positive. Suppose, to the contrary, that A has a zero entry. Such a zero entry is necessarily unique and, without loss of generality, we may take $a_{1, 1}=0$. As before, we consider the case $n\ge 2$. For the choice $(I,J)= (\{1\},\{1,2\})$, the system (12) becomes

$$\begin{aligned}{} & {} \left[ \begin{array}{cc} 0 &{} a_{1,2} \\ \end{array} \right] \left[ \begin{array}{c} \eta _1 \\ \eta _2 \\ \end{array} \right] =\sigma \xi _1,\;\, \left[ \begin{array}{c} 0 \\ a_{1,2} \\ \end{array} \right] \xi _1 = \sigma \left[ \begin{array}{c} \eta _1 \\ \eta _2 \\ \end{array} \right] . \end{aligned}$$

These equations being unsolvable for positive variables $\xi _1$, $\eta _1$, and $\eta _2$, it follows that $\Xi _{I,J}(A)= \emptyset $, contradicting (27). In conclusion, A is positive as claimed. In such a case, $\Xi _{I,J}(A)=\left\{ \sigma _{\textrm{max}}(A^{I,J})\right\} $ for all $(I,J)\in {\mathcal {J}}(m,n)$ and condition (28) can be written as

$$\begin{aligned} (I_0,J_0)\not = (I_1,J_1) \;\; \Rightarrow \;\; \sigma _{\textrm{max}}\left( A^{I_0,J_0}\right) \not = \sigma _{\textrm{max}}\left( A^{I_1,J_1}\right) . \end{aligned}$$

This implication means that A is spectral-norm scattered. We have shown in this way that the first set in (26) is included in the second. The reverse inclusion is a consequence of Theorem 2. That $\mathfrak {S}(m,n)$ is positively homogeneous follows from Proposition 1 (a). Finally, we prove that $\mathfrak {S}(m,n)$ is open. The set of positive matrices of size $m\times n$ is clearly open. For showing that $\textrm{SNS}(m,n)$ is open, we proceed as follows. For each $(I,J)\in {\mathcal {J}}(m,n)$, let $\pi _{I,J}:\mathbb {M}_{m,n}\rightarrow \mathbb {M}_{m,n}$ be the linear map given by

$$\begin{aligned}{}[\pi _{I,J}(A)]_{i,j}= {\left\{ \begin{array}{ll} \,a_{i,j}\quad \text{ if } i\in I, j\in J&{} \\ \,\;\;0 \;\,\quad \text{ otherwise }.&{}\end{array}\right. } \end{aligned}$$

One can check that $\sigma _{\textrm{max}}( A^{I,J})= \sigma _{\textrm{max}} (\pi _{I,J}(A))$. Hence,

$$\begin{aligned} \textrm{SNS}(m,n)= & {} \bigcap _{\begin{array}{c} I_0,J_0, I_1,J_1\\ (I_0,J_0)\not = (I_1,J_1) \end{array}} \{A\in \mathbb {M}_{m,n}: (\sigma _{\textrm{max}}\circ \pi _{I_0,J_0}- \sigma _{\textrm{max}}\circ \pi _{I_1,J_1})(A)\not =0\} \end{aligned}$$

is expressible as intersection of finitely many open sets. $\square $

Constructing PSV-saturated matrices is not difficult. By way of example,

$$\begin{aligned} \left[ \begin{array}{cccc} 23 &{} 5&{} 9&{} 21\\ 7 &{} 13&{} 11&{} 31\\ 29 &{} 37&{} 19&{}27\\ \end{array} \right] \end{aligned}$$

belongs to $\mathfrak {S}(3,4)$ because it is matrix of size $3\times 4$ with as much as $105=(2^3-1)(2^4-1)$ Pareto singular values. The next proposition considers the case of arbitrary dimensions m and n.

Proposition 5

An example of PSV-saturated matrix of size $m\times n$ is the rank-one matrix

$$\begin{aligned} \left[ \begin{array}{cccc} \pi \\ \pi ^2 \\ \vdots \\ \pi ^m\\ \end{array} \right] \left[ \begin{array}{cccc} e^\pi \\ e^{2\pi } \\ \vdots \\ e^{n\pi }\\ \end{array} \right] ^\top =\left[ \begin{array}{cccc} \pi e^\pi &{} \pi e^{2\pi } &{} \ldots &{} \pi e^{n\pi } \\ \pi ^2 e^\pi &{} \pi ^2 e^{2\pi } &{} \ldots &{} \pi ^2 e^{n\pi } \\ \vdots &{} \vdots &{} &{} \vdots \\ \pi ^m e^\pi &{} \pi ^m e^{2\pi } &{} \ldots &{} \pi ^m e^{n\pi } \\ \end{array} \right] . \end{aligned}$$

(29)

Proof

We suppose that $\max \{m,n\}\ge 2$, otherwise the result is trivial. Let $p=(s,s^2,\ldots ,s^m)^\top $ and $q=(t,t^2,\ldots ,t^n)^\top $, where s and t are real numbers to be chosen in a convenient way. Let $A=pq^\top $. Pick distinct pairs $(I_0, J_0)$ and $ (I_1,J_1)$ in ${\mathcal {J}}(m,n)$ and consider the bivariate polynomial

$$\begin{aligned} \Phi (s,t)= & {} \left[ \sigma _{\textrm{max}}\left( A^{I_0,J_0}\right) \right] ^2 -\left[ \sigma _{\textrm{max}}\left( A^{I_1,J_1}\right) \right] ^2\\= & {} \left\| p_{I_0} \right\| ^2 \,\left\| q_{J_0} \right\| ^2 - \left\| p_{I_1} \right\| ^2 \,\left\| q_{J_1} \right\| ^2 \\= & {} \sum _{i\in I_0,\,j\in J_0}s^{2i}t^{2j}-\sum _{i\in I_1,\,j\in J_1}s^{2i}t^{2j}, \end{aligned}$$

which is not identically zero. Since all the coefficients of $\Phi $ are integer, this polynomial does not vanish at a pair (s, t) of algebraically independent numbers. Hence, if we take for instance $(s,t)=(\pi ,e^{\pi })$, then we get a matrix A that is spectral-norm scattered. This proves that (29) belongs to $\mathfrak {S}(m,n)$. $\square $

Remark 1

The above proof uses the fact that $\pi $ and $e^\pi $ are algebraically independent. We could have chosen any other pair of algebraically independent positive numbers. It is worthwhile mentioning that all matrices in a small neighbourhood of (29) are still in $\mathfrak {S}(m,n)$. Also, any positive multiple of (29) remains in $\mathfrak {S}(m,n)$.

In a sense, the opposite of being PSV-saturated is to have a unique Pareto singular value.

Proposition 6

A negative matrix A has $-\sigma _{\textrm{max}}(A)$ as unique Pareto singular value.

Proof

We suppose that $\max \{m,n\}\ge 2$, otherwise the proposition is trivial. Let A be negative. The slackness conditions (14)–(15) are violated, unless $(I,J)= (\langle m\rangle ,\langle n\rangle )$. For such a choice of pair (I, J), the system (12)–(15) reduces to

$$\begin{aligned} Av=\sigma u,\; A^\top u =\sigma v, \;u>0,\;v>0. \end{aligned}$$

Since A is negative, the scalar $\sigma $ is negative. Note that $A^\top A$ is a positive square matrix. Perron’s theorem applied to the eigenvalue problem $A^\top A v=\sigma ^2 v$ implies that $\sigma ^2=\lambda _{\textrm{max}}(A^TA)$ and, a posteriori, $\sigma =-\sigma _{\textrm{max}}(A)$. $\square $

Table 3 Expected number of Pareto singular values in a Gaussian random matrix of size $m\times n$

Full size table

In applications, A has usually positive and negative entries. The presence of one or several negative entries reduces drastically the number of Pareto singular values. This is due to two reasons: firstly, for a number of pairs (I, J), the system (12)–(13) could be unsolvable. In other words, $A^{I,J}$ may not have a singular value with positive singular vectors. And, secondly, even if (12)–(13) were solvable, it may happen that the slackness inequalities (14)–(15) are violated. Table 3 reports on a numerical experiment with randomly generated matrices of different dimensions. For each pair (m, n), we generated a sample $\{A^{(1)},\ldots , A^{(N)}\}$ of size $N=10^6$ from a Gaussian random matrix $\textbf{A} \rightsquigarrow [{\mathcal {N}}(0,1)]^{m\times n}.$ This notation means that each entry of $\textbf{A}$ follows a standard normal distribution. Table 3 displays the empirical mean $ (1/N) \sum _{k=1}^N\textrm{card}[\,\Xi (A^{(k)}\,] $ of the random variable $\textrm{card}[\,\Xi (\textbf{A})]$. Each figure is rounded to 2 decimal places, but the second decimal place is not trustworthy (repeating the same experiment with another sample of size $N=10^6$ may yield a slightly different value). Table 3 is symmetric with respect to n and m, so the portion below the diagonal is left blank. As one can see, the expected number of Pareto singular values in a random Gaussian matrix of size $m\times n$ grows rather slowly with respect to m and n. This observation is consistent with the fact that a typical realization of $\textbf{A}$ has a certain number of negative entries. By way of example, a Gaussian random matrix of size $5\times 8$ has 9.11 Pareto singular values in average. This is far less than the number $7905= (2^5-1)(2^8-1)$ of Pareto singular values in a PSV-saturated matrix of the same size.

5 Applications

5.1 Nash equilibria of bilinear forms on Pareto cones

As a first application of the concept of Pareto singular triplet of a matrix A, we consider the problem of finding a pair $({\bar{u}}, {\bar{v}})$ such that

$$\begin{aligned}{} & {} \,{\bar{u}}\ge 0,\,\Vert {\bar{u}}\Vert =1,\, {\bar{v}}\ge 0,\,\Vert {\bar{v}}\Vert =1, \end{aligned}$$

(30)

$$\begin{aligned}{} & {} \langle {\bar{u}},A {\bar{v}})\le \langle u,A {\bar{v}})\;\;\text{ for } \text{ all } u\ge 0,\Vert u\Vert =1, \end{aligned}$$

(31)

$$\begin{aligned}{} & {} \langle {\bar{u}},A {\bar{v}})\le \langle {\bar{u}},Av)\;\;\text{ for } \text{ all } v\ge 0,\Vert v\Vert =1. \end{aligned}$$

(32)

A pair $( {\bar{u}}, {\bar{v}})$ as above is called a Nash equilibrium of (4) and the term $\langle {\bar{u}},A {\bar{v}}\rangle $ is called a Nash value of (4). Satisfying (30)–(32) is stronger than being a critical pair of (4), but it is weaker than being a solution to (4). In other word, being a Nash equilibrium is a property that lies between optimality and criticality. The next result is less evident.

Proposition 7

Suppose that $A\in \mathbb {M}_{m,n}$ has at least one negative entry. Let $({\bar{u}}, {\bar{v}}, {\bar{\sigma }} )$ be a Pareto singular triplet of A with ${\bar{\sigma }} <0$. Then $( {\bar{u}}, {\bar{v}})$ is a Nash equilibrium of (4).

Proof

We start by proving that the minimization problem

$$\begin{aligned} \mathop {\textrm{min}}\limits _{ \Vert u\Vert =1,\, u\ge 0}\;\langle u,A{\bar{v}}\rangle \end{aligned}$$

(33)

admits ${\bar{u}}$ as unique solution. Let ${\tilde{u}}$ be any solution to (33). We must show that ${\tilde{u}} ={\bar{u}}$. Since $({\bar{u}}, {\bar{v}}, {\bar{\sigma }} )$ solves the system (5), we have in particular

$$\begin{aligned} (A{\bar{v}})_i={\bar{\sigma }} {\bar{u}}_i{} & {} \text{ for } \text{ all } i\in I, \end{aligned}$$

(34)

$$\begin{aligned} (A{\bar{v}})_i\ge 0{} & {} \text{ for } \text{ all } i\in \langle m\rangle \backslash I, \end{aligned}$$

(35)

where $I=\{i\in \langle m\rangle : {\bar{u}}_i>0\}$. Since ${\tilde{u}}$ is a solution to (33), we have $\Vert {\tilde{u}}\Vert =1$ and

$$\begin{aligned} 0 \le {\tilde{u}}\,\perp \, (A{\bar{v}}- {\tilde{\sigma }} {\tilde{u}}) \ge 0 \end{aligned}$$

(36)

for some real number ${\tilde{\sigma }}$ (viewed as a Lagrange multiplier associated to the equality constraint $\Vert u\Vert ^2-1=0$). By using (36), we get ${\tilde{\sigma }}=\langle {\tilde{u}},A{\bar{v}}\rangle \le \langle {\bar{u}},A{\bar{v}}\rangle ={\bar{\sigma }} <0$ and

$$\begin{aligned} (A{\bar{v}})_i={\tilde{\sigma }} {\tilde{u}}_i{} & {} \text{ for } \text{ all } i\in {\tilde{I}}, \end{aligned}$$

(37)

$$\begin{aligned} (A{\bar{v}})_i\ge 0{} & {} \text{ for } \text{ all } i\in \langle m\rangle \backslash {\tilde{I}}, \end{aligned}$$

(38)

where ${\tilde{I}}=\{i\in \langle m\rangle : {\tilde{u}}_i>0\}$. If $i_0\in I\,\backslash {\tilde{I}}$, then the combination of (34) and (38) yields

$$\begin{aligned} (A{\bar{v}})_{i_0}={\bar{\sigma }} {\bar{u}}_{i_0}<0\le (A{\bar{v}})_{i_0}, \end{aligned}$$

which is a contradiction. Similarly, if $i_0\in {\tilde{I}}\,\backslash I$, then the combination of (35) and (37) yields

$$\begin{aligned} (A{\bar{v}})_{i_0}={\tilde{\sigma }} {\tilde{u}}_{i_0}<0\le (A{\bar{v}})_{i_0}, \end{aligned}$$

which is again a contradiction. Hence, $I={\tilde{I}}$. We get ${\bar{\sigma }} {\bar{u}}_i={\tilde{\sigma }} {\tilde{u}}_i$ for all $i\in I$ and ${\bar{u}}_i= {\tilde{u}}_i=0 $ for all $i \in \langle m\rangle \backslash I$. Thus, ${\bar{\sigma }} {\bar{u}}={\tilde{\sigma }} {\tilde{u}}.$ Since ${\bar{u}} $ and ${\tilde{u}}$ are unit vectors and ${\tilde{\sigma }}\le {\bar{\sigma }} <0$, we deduce that ${\tilde{\sigma }}={\bar{\sigma }}$ and ${\tilde{u}}= {\bar{u}}$. In a similar way, one can prove that the minimization problem

$$\begin{aligned} \mathop {\textrm{min}}\limits _{ \Vert v\Vert =1,\, v\ge 0}\;\langle {\bar{u}},A v\rangle \end{aligned}$$

admits ${\bar{v}}$ as unique solution. In particular, $({\bar{u}}, {\bar{v}})$ is a Nash equilibrium of (4). $\square $

Corollary 2

Suppose that $A\in \mathbb {M}_{m,n}$ has at least one negative entry. Then each negative Pareto singular value of A is a Nash value of (4).

5.2 Nonnegative matrix factorization

Let X be a data matrix of size $m\times n$. For instance, the columns of X may correspond to n realizations of an m-dimensional random vector. Nonnegative matrix factorization (NMF) is a reduction strategy which consists in approximating X by the product of two low-rank nonnegative matrices. Given a small integer r, the issue at hand is that of finding a pair $(P,Q)\in \mathbb {M}_{m,r}\times \mathbb {M}_{n,r}$ that solves the minimization problem

$$\begin{aligned} \mathop {\textrm{min}}\limits _{ P\ge 0,\,Q\ge 0} \Vert X- PQ^\top \Vert , \end{aligned}$$

(39)

where $\Vert \cdot \Vert $ is the Frobenius norm on $\mathbb {M}_{m,n}$. NMF is of special relevance when X is a nonnegative matrix, but it makes sense also for any data matrix. General information on (39) can be found in Gillis and Glineur [1], Tolić et al. [7, Section 2.1], and references therein. We briefly discuss the particular case $r=1$, which corresponds to the rank-one NMF problem

$$\begin{aligned} \kappa (X)\,=\mathop {\textrm{min}}\limits _{p\ge 0,\,q\ge 0} \Vert X- pq^\top \Vert . \end{aligned}$$

(40)

We assume that X has at least one positive entry, otherwise $(p,q)=(0,0)$ is a solution to (40) and we are done. Note that p and q are not required to be unit vectors. The next theorem shows that solving (40) boils down to solve (4), and viceversa.

Theorem 3

Let X be a data matrix with at least one positive entry and let $A=-X$. Then the optimal values of (4) and (40) are related by the formula

$$\begin{aligned} \left[ \kappa (X)\right] ^2+\left[ \varrho _{\textrm{min}}(A)\right] ^2= \Vert X\Vert ^2. \end{aligned}$$

Furthermore, the optimization problems (4) and (40) are equivalent in the following sense:

(a)
If $(u_0, v_0)$ is a solution to (4), then
$$\begin{aligned} (p_0,q_0)=\left( \sqrt{-\varrho _{\textrm{min}}(A)} \,u_0,\sqrt{-\varrho _{\textrm{min}}(A)} \,v_0\right) \end{aligned}$$
(41)
is a solution to (40).
(b)
If $(p_1, q_1)$ is a solution to (40), then $(u_1,v_1)= \left( \Vert p_1\Vert ^{-1}\,p_1,\,\Vert q_1\Vert ^{-1}\,q_1\right) $ is a solution to (4).

Proof

Consider the problem (40). If we incorporate a nonnegative factor $\sigma $ in front of $pq^\top $, then we may assume that p and q are unit vectors. Hence, we write (40) in the equivalent form

$$\begin{aligned} \left[ \kappa (X)\right] ^2= & {} \mathop {\textrm{min}}\limits _{\begin{array}{c} \sigma \ge 0, u\ge 0,\,v\ge 0 \\ \Vert u\Vert =1,\,\Vert v\Vert =1 \end{array}} \Vert X- \sigma uv^\top \Vert ^2\nonumber \\= & {} \mathop {\textrm{min}}\limits _{\sigma \ge 0}\, \underbrace{\mathop {\textrm{min}}\limits _{\begin{array}{c} u\ge 0,\,v\ge 0 \\ \Vert u\Vert =1,\,\Vert v\Vert =1 \end{array}} \Vert X- \sigma uv^\top \Vert ^2}_{h(\sigma )}. \end{aligned}$$

(42)

The identity $\Vert X- \sigma uv^\top \Vert ^2= \Vert X\Vert ^2 -2 \sigma \langle u,Xv\rangle + \sigma ^2\Vert u\Vert ^2\,\Vert u\Vert ^2$ yields

$$\begin{aligned} h(\sigma )= \Vert X\Vert ^2+ 2\sigma \left[ \mathop {\textrm{min}}\limits _{\begin{array}{c} \Vert u\Vert =1,\, \Vert v\Vert =1\\ u\ge 0, \, v\ge 0 \end{array}}\;\langle u,Av\rangle \right] + \sigma ^2, \end{aligned}$$

where the minimization problem between brackets is precisely (4). Since A has at least one negative entry, $\varrho _{\textrm{min}}(A)$ is negative. Hence, $\sigma _0= -\varrho _{\textrm{min}}(A)$ is the unique minimizer of h on $\mathbb {R}_+$. We have shown that if $(u_0,v_0)$ is a solution to (4), then $(u_0,v_0,\sigma _0)$ is a solution to (42). In particular,

$$\begin{aligned} \kappa (X)= \Vert X- \sigma _0 u_0v_0^\top \Vert = \Vert X- (\sqrt{\sigma _0}\,u_0)(\sqrt{\sigma _0}\,v_0)^\top \Vert \end{aligned}$$

and the pair (41) is a solution to (40). Note also that

$$\begin{aligned} \left[ \kappa (X)\right] ^2\;=\; \Vert X- \sigma _0 u_0v_0^\top \Vert ^2\;=\;\Vert X\Vert ^2-\left[ \varrho _{\textrm{min}}(A)\right] ^2. \end{aligned}$$

Conversely, let $(p_1, q_1)$ be a solution to (40). We claim that $p_1\not =0$ and $q_1\not =0$. If the claim is false, then $p_1q_1^\top =0$ and $ \Vert X-pq^\top \Vert \ge \Vert X\Vert \,$ for all $p\ge 0$ and $ q\ge 0$. In particular,

$$\begin{aligned} \Vert X-tpq^\top \Vert \,= \, \Vert X-(\sqrt{t}p)(\sqrt{t}q)^\top \Vert \, \ge \, \Vert X\Vert \end{aligned}$$

for all $t>0$. This leads to

$$\begin{aligned} -2\langle p,Xq\rangle =\lim _{t\rightarrow 0}\,(1/t)(\Vert X-tpq^\top \Vert ^2- \Vert X\Vert ^2) \;\ge 0 \end{aligned}$$

and contradicts that X has a positive entry. Hence, the claim is true and $u_1=\Vert p_1\Vert ^{-1} \,p_1$ and $v_1=\Vert q_1\Vert ^{-1}\,q_1$ are well defined. If $s_1=\Vert p_1\Vert \,\Vert q_1\Vert $, then $ {\bar{p}} =\sqrt{s_1}\, u_1$ and ${\bar{q}}=\sqrt{s_1}\,v_1$ are nonnegative vectors such that

$$\begin{aligned} {\bar{p}} \,{\bar{q}}^\top = (\sqrt{s_1}\,u_1)(\sqrt{s_1}\, v_1)^\top =\left( \sqrt{\Vert p_1\Vert \,\Vert q_1\Vert }\, \frac{p_1}{\Vert p_1\Vert } \right) \left( \sqrt{\Vert p_1\Vert \,\Vert q_1\Vert }\, \frac{q_1}{\Vert q_1\Vert } \right) ^\top = p_1 q_1^\top . \end{aligned}$$

Hence,

$$\begin{aligned} \kappa (X)= \Vert X- p_1 q_1^\top \Vert = \Vert X- {\bar{p}} \,{\bar{q}}^\top \Vert = \Vert X- s_1 u_1 v_1^\top \Vert \end{aligned}$$

and the triplet $(u_1, v_1, s_1)$ is a solution to (42). A posteriori, $(u_1, v_1)$ is a solution to (4). $\square $

6 Algorithmic issues

An ambitious endeavor is to determine all the critical values of the minimization problem (4) and, for each critical value $\sigma $, to display an associated critical pair (u, v). Such a goal is reasonable only if A is of moderate size. The only exhaustive method at our disposal is the utilization of Theorem 1. We must solve (12)–(15) for each one of the $(2^m-1)(2^n-1)$ choices of (I, J). In order to have a rough idea of the CPU time needed for determining $\Xi (A)$ as a function of m and n, we have worked out the case of a test matrix $A\in \mathbb {M}_{m,n}$ that contains positive and negative entries, namely,

$$\begin{aligned} A_{i,j}= (-1)^{ij} \cos \left( 10\,j/(i+j)\right) . \end{aligned}$$

(43)

Table 4 suggests that the cost of evaluating $\Xi (A)$ grows exponentially in m and n, separately. The figures shown in Table 4 correspond to CPU times measured in seconds. Our numerical experiments are carried in macOS Catalina with a processor 3.4 GHz Intel Core i5, Memory(RAM) 8 Gb. The codes are implemented with Matlab R2017a. If we deal with a nonnegative matrix A, then we can forget about slackness inequalities and proceed directly to the computation of the spectral norm of each submatrix of A. As test example of nonnegative matrix $A\in \mathbb {M}_{m,n}$, we worked out the case

$$\begin{aligned} A_{i,j}= 5 \arctan (j/(i+j)). \end{aligned}$$

(44)

As shown in Table 5, the CPU time needed for determining $\Xi (A)$ still exhibits exponential growth with respect to n and m. Such CPU times are however significatively smaller than those displayed in Table 4. The fact that (44) is a nonnegative matrix is therefore a useful information.

Table 4 CPU time for computing all the Pareto singular values of (43). Method based on Theorem 1

Full size table

Table 5 CPU time for computing all the Pareto singular values of (44). Method based on Theorem 2

Full size table

6.1 Alternating minimization

We come back to the case of a general matrix A. If the dimensions m and n are not moderate, then determining the entire set $\Xi (A)$ is not a realistic scope. For instance, computing all the Pareto singular values of (43) when $(m,n)= (12,20)$ took us almost 59 hours. More reasonable goals are:

(i)
To determine the smallest Pareto singular value $\varrho _{\textrm{min}}(A)$ and, in tandem, to find a solution (u, v) to the minimization problem (4).
(ii)
To find one or a bunch of Pareto singular values of A.
(iii)
To find a negative Pareto singular value of A, knowing that A has a least one negative entry. By Proposition 7, such a Pareto singular value is a Nash value of (4).

We address the third goal. We saw that solving (4) is essentially the same task as solving (40). For the sake of convenience, we introduce the function $f:\mathbb {R}^m\times \mathbb {R}^n\rightarrow \mathbb {R}$ given by

$$\begin{aligned} f(p,q)= & {} \Vert X- pq^\top \Vert ^2\\= & {} \Vert X\Vert ^2-2\langle p,Xq\rangle + \Vert p\Vert ^2\,\Vert q\Vert ^2. \end{aligned}$$

Note that if p and q are nonzero vectors, then $f(p,\cdot \,)$ and $f(\,\cdot , q)$ are strictly convex quadratic functions. The rank-one NMF problem (40) is about minimizing f(p, q) with respect to $p\ge 0$ and $q\ge 0$ simultaneously. As in Sect. 5, we assume that X contains at least one positive entry, in which case both components of a solution $(p_*, q_*)$ are nonzero vectors. Any solution $(p_*, q_*)$ to (40) is a Nash equilibrium of (40) in the sense that $p_*$ and $q_*$ are nonnegative vectors such that

$$\begin{aligned} f(p_*,q_*)\le & {} f(p,q_*)\;\;\text{ for } \text{ all } p\ge 0, \end{aligned}$$

(45)

$$\begin{aligned} f(p_*,q_*)\le & {} f(p_*,q)\;\;\text{ for } \text{ all } q\ge 0. \end{aligned}$$

(46)

Since (45)–(46) is about minimizing f(p, q) with respect to $p\ge 0$ and $q\ge 0$ separately, a natural way of handling (40) is to use the alternating minimization algorithm

$$\begin{aligned} {\left\{ \begin{array}{ll} \, q^{k+1}= \textrm{argmin}_{q\ge 0} \;f(p^k, q) ,&{} \\ \, p^{k+1}= \textrm{argmin}_{p\ge 0} \,f(p, q^{k+1}).&{}\end{array}\right. } \end{aligned}$$

(47)

Alternating minimization algorithms for solving unconstrained NMF problems have been already considered in the literature, cf. Gillis and Glineur [1]. Given the special form of f and the nonnegativity constraints on p and q, we have

$$\begin{aligned} \textrm{argmin}_{q\ge 0} \,f(p, q)= & {} \frac{\textrm{Proj}\,(X^\top p,\mathbb {P}_n )}{\Vert p\Vert ^2}\;=\;\frac{ \max \{0, X^\top p\}}{\Vert p\Vert ^2}, \end{aligned}$$

(48)

$$\begin{aligned} \textrm{argmin}_{p\ge 0} \,f(p, q)= & {} \frac{\textrm{Proj}\,(Xq,\mathbb {P}_m )}{\Vert q\Vert ^2}\;\;=\; \frac{\max \{0, X q\}}{\Vert q\Vert ^2}, \end{aligned}$$

(49)

where $\textrm{Proj}(w,K)$ denotes the metric projection of w onto K and the maximum of two vectors is taken componentwisely. Hence, (47) can be written in the more explicit form

$$\begin{aligned} q^{k+1}_j= & {} \frac{\max \{0, (X^\top p^k)_j\}}{\Vert p^k\Vert ^2}\quad \;\text{ for } \text{ all }\;\;j\in \langle n\rangle , \\ p^{k+1}_i= & {} \frac{\max \{0, (X q^{k+1})_i\}}{ \Vert q^{k+1}\Vert ^{2}}\quad \text{ for } \text{ all }\;\;i\in \langle m\rangle . \end{aligned}$$

Such recursion formulas are mentioned in [1]. We initialize the algorithm (47) at a convenient point $p^0\in \mathbb {R}^m$. A fundamental criterion for choosing $p^0$ is to ensure that the sequences $\{p^k\}_{k\ge 1}$ and $\{q^k\}_{k\ge 1}$ are well defined. This issue is addressed in the first part of Lemma 3. The second part of Lemma 3 is a useful boundedness result.

Lemma 3

Let X be a matrix of size $m\times n$ with at least one positive entry. Let (47) be initialized at a point $p^0\in {\mathbb {P}}_m$ such that $-X^\top p^0\notin {\mathbb {P}}_n$. Then:

(a)
$p^k\not =0$ and $q^k \not =0$ for all $\,k\ge 1$.
(b)
$\Vert p^k\Vert \,\Vert q^k\Vert $ remains bounded as $k\rightarrow \infty $.

Proof

Part (a). Since X has a positive entry, also $X^\top $ has a positive entry and the sets

$$\begin{aligned} {\mathcal {Q}}= & {} \{q\in {\mathbb {P}}_n: -X q\notin {\mathbb {P}}_m\},\\ {\mathcal {P}}= & {} \{p\in {\mathbb {P}}_m: -X^\top p\notin {\mathbb {P}}_n\} \end{aligned}$$

are nonempty. These sets are positively homogeneous and do not contain the origin. Note that the initial point $p^0$ belongs to ${\mathcal {P}}$. We claim that $ p^k\in {\mathcal {P}}$ and $q^k\in {\mathcal {Q}}$ for all $k\ge 1$. For proving this claim, it suffices to show the following implications:

$$\begin{aligned} p\in {\mathcal {P}}\Rightarrow & {} \textrm{Proj}\,(X^\top p,\mathbb {P}_n )\in {\mathcal {Q}}, \end{aligned}$$

(50)

$$\begin{aligned} q\in {\mathcal {Q}}\Rightarrow & {} \textrm{Proj}\,(Xq,\mathbb {P}_m )\;\,\in {\mathcal {P}}. \end{aligned}$$

(51)

Let $p\in {\mathcal {P}}$. Since $z= \textrm{Proj}\,(X^\top p,\mathbb {P}_n )$ belongs to $\mathbb {P}_n$, we just need to check that $-Xz\notin {\mathbb {P}}_m$. We know that $-X^\top p\notin {\mathbb {P}}_n$. Hence, $ r=\textrm{card}\{j\in \langle n \rangle :(X^\top p)_j>0\}\ge 1$. Permuting the columns of X if necessary, we may assume that $X=[Y\,\vert \, W]$, where $Y\in \mathbb {M}_{m, r}$ and $W\in \mathbb {M}_{m, n-r}$ are such that

$$\begin{aligned} Y^\top p\in \textrm{int}\left( {\mathbb {P}}_{r}\right) ,\; W^\top p\le 0. \end{aligned}$$

(52)

Note that

$$\begin{aligned} z\,=\, \textrm{Proj}\,\left( \left[ \begin{array}{c} Y^\top p \\ W^\top p \end{array} \right] ,\mathbb {P}_n \right) = \left[ \begin{array}{c} Y^\top p \\ 0 \end{array} \right] . \end{aligned}$$

Suppose, to the contrary, that $-Xz\in {\mathbb {P}}_m$. In such a case,

$$\begin{aligned} Y Y^\top p = \left[ \begin{array}{cc} Y&W \end{array} \right] \left[ \begin{array}{c} Y^\top p \\ 0 \end{array} \right] \,= \,Xz\,\in -{\mathbb {P}}_m. \end{aligned}$$

(53)

Since $p\in {\mathbb {P}}_m$, condition (53) yields $\left\| Y^\top p\right\| ^2= \langle p, Y Y^\top p\rangle \le 0.$ Hence, $Y^\top p=0$, contradicting the first condition in (52). This completes the proof of (50). The proof of (51) is analogous. Part (b). Since the projection map onto $\mathbb {P}_m $ is positively homogeneous, the combination of (49) and the second equation in (47) yields

$$\begin{aligned} \Vert q^{k}\Vert \,p^{k}= \textrm{Proj}\left( X\left( \frac{q^{k}}{\Vert q^{k}\Vert }\right) ,\mathbb {P}_m \right) . \end{aligned}$$

By passing to norms and using the fact that $\Vert \textrm{Proj}(w,\mathbb {P}_m )\Vert \le \Vert w\Vert $ for all $w\in \mathbb {R}^m,$ we get

$$\begin{aligned} \Vert p^k\Vert \,\Vert q^k\Vert \,=\, \left\| \textrm{Proj}\left( X\left( \frac{q^{k}}{\Vert q^{k}\Vert }\right) ,\mathbb {P}_m \right) \right\| \;\le \; \left\| X\left( \frac{q^{k}}{\Vert q^{k}\Vert }\right) \right\| \;\le \;\sigma _{\textrm{max}}(X), \end{aligned}$$

as desired. $\square $

If the matrix $X=[x_1,\ldots , x_n]$ is given in terms of its column vectors, then the assumption made on the initial point $p^0\in \mathbb {R}^m$ says that $p^0\ge 0$ and $\langle p^0, x_j\rangle >0$ for some $j\in \langle n\rangle $. Finding such a point $p^0$ offers no difficulty. The next theorem can be used to find a Nash equilibrium to (40) and, at the same time, to detect a negative Pareto singular value of $A=-X$.

Theorem 4

Consider the same assumptions as in Lemma 3. Suppose that

$$\begin{aligned} {\left\{ \begin{array}{ll} \text{ the } \text{ limits } \,p_*= \lim _{k\rightarrow \infty } p^k\, \text{ and } \, q_*= \lim _{k\rightarrow \infty } q^k&{} \\ \text{ exist } \text{ and } \text{ are } \text{ nonzero } \text{ vectors }.\end{array}\right. } \end{aligned}$$

(54)

Then $(p_*, q_*)$ is a Nash equilibrium of (40). Furthermore,

$$\begin{aligned} (u_*, v_*, \sigma _*)= \left( \frac{p_*}{\Vert p_*\Vert }, \frac{q_*}{\Vert q_*\Vert }, -\Vert p_*\Vert \,\Vert q_*\Vert \right) \end{aligned}$$

is a Pareto singular triplet of $A=-X$ and $(u_*, v_*)$ is a Nash equilibrium of (4).

Proof

Passing to limits in (47) yields

$$\begin{aligned} q_*= \frac{\textrm{Proj}\,(X^\top p_*,\mathbb {P}_n )}{\Vert p_*\Vert ^2},\;\; p_*=\frac{\textrm{Proj}\,(Xq_*,\mathbb {P}_m )}{\Vert q_*\Vert ^2}. \end{aligned}$$

(55)

The first equality in (55) asserts that $q_*$ is a minimizer of $f(p_*,\cdot \,)$ on $\mathbb {P}_n$. Analogously, the second equality in (55) asserts that $p_*$ is a minimizer of $f(\,\cdot , q_*)$ on $\mathbb {P}_m$. This proves that $(p_*, q_*)$ is a Nash equilibrium of (40). Next, we write (55) in the equivalent form

$$\begin{aligned} {\left\{ \begin{array}{ll} \,\textrm{Proj}\left( X v_*,\mathbb {P}_m \right) \;\;= \,\varrho _*u_*,&{} \\ \,\textrm{Proj}\left( X^T u_*,\mathbb {P}_n \right) =\,\varrho _*v_*,\end{array}\right. } \end{aligned}$$

(56)

where $\varrho _*= \Vert p_*\Vert \,\Vert q_*\Vert $ is a positive real number. Except for the intervention of the projection maps, the system (56) looks similar to the unconstrained model (1). As pointed out in Quittner [3], when $\varrho $ is a positive real number and K is a closed convex cone, a “projected” equation of the type $\,\textrm{Proj}\left( w,K\right) = \varrho \, y\,$ is equivalent to saying that $y\in K$ and

$$\begin{aligned} \langle \varrho \, y -w, z-y\rangle \ge 0\quad \text{ for } \text{ all } z\in K. \end{aligned}$$

By using this observation, it is clear that (56) can be written as

$$\begin{aligned} {\left\{ \begin{array}{ll} \,0\le u_*\,\perp \,(\varrho _*u_*- Xv_*) \;\;\ge 0,&{} \\ \,0\le v_*\,\perp \, (\varrho _*v_*- X^\top u_*)\ge 0.&{}\end{array}\right. } \end{aligned}$$

The change $(A, \sigma _*)=(-X,-\varrho _*)$ shows that $(u_*, v_*, \sigma _*) $ is a Pareto singular triplet of A. That $(u_*, v_*)$ is a Nash equilibrium of (4) follows from Proposition 7 and the negativity of $\sigma _*$. $\square $

Remark 2

Hypothesis (54) may seem strong at first sight, but numerical testing with random data suggests that this hypothesis is fulfilled in a vast majority of cases. We carried out the following experiment: for $(m,n)=(20,30)$, we generated $10^6$ realizations from a Gaussian random matrix $\textbf{X} \rightsquigarrow [{\mathcal {N}}(0,1)]^{m\times n}$ and the same number of realizations from a random vector $\textbf{p}^{0}\in {\mathcal {P}}$. The hypothesis (54) turned out to be true for each one of $10^6$ realizations of $(\textbf{X}, \textbf{p}^0\,)$.

The pair $(u_*, v_*)$ obtained in Theorem 4 is a Nash equilibrium of (4), but not necessarily a solution to (4). Consistently, the negative real number $\sigma _*$ is a Nash value of (4), but not necessarily the optimal value of (4). Table 6 illustrates this fact with the help of a test example. We consider the matrix (43) and take $(m,n)=(3,8)$. The first column of Table 6 displays all the negative Pareto singular values of A. This information is unknown or hard to obtain when A is of large size, but in this example the dimensions m and n are small. The matrix $X=-A$ contains at least one positive entry. We apply the algorithm (47) initialized at $10^6$ random initial points $p^0\in {\mathcal {P}}$ and count the number of times that each negative $\sigma $ is detected. This operation yields the percentages displayed in the second column of Table 6. The last column displays the average number of iterations needed to detect a particular $\sigma $, in case the algorithm converges to such a value. As stopping criterion for the algorithm (47), we consider the rule

$$\begin{aligned} \max \{\Vert p^{k+1}- p^{k}\Vert , \Vert q^{k+1}- q^{k}\Vert \}\le 10^{-7}, \end{aligned}$$

but other rules are also possible. If we decrease the tolerance level from $10^{-7}$ to $10^{-12}$, then the number of iterations in the last column of Table 6 almost duplicate. The smallest Pareto singular value $\varrho _{\textrm{min}}(A)= -1.440480$ was detected with $31.24\%$ of the initial points, but more frequently we got convergence towards the Pareto singular value $-1.170013 =- \Vert p_*\Vert \,\Vert q_*\Vert $, where

$$\begin{aligned} p_*= & {} (1.867104,\,1.242761,\,0)^\top ,\\ q_*= & {} \,(0.347799,\,0,\,0,\,0,\,0,\,0.158386,\,0,\,0.355069)^\top \end{aligned}$$

is just a Nash equilibrium of (40).

Table 6 Detecting negative Pareto singular values of (43) with algorithm (47). Case $(m, n)=(3,8)$

Full size table

The main interest of algorithm (47) is that m and n could be large. Table 7 deals again with the matrix (43), but now we take $(m,n)= (100, 120)$. Getting the entire list of negative Pareto singular values of A by using Theorem 1 is simply too expensive. So, we run the algorithm (47) with a large number of initial points and observe what happens. Table 7 displays the negative Pareto singular values of (43) detected with the help of $10^6$ initial points. The first column of Table 7 displays the negative Pareto singular values of (43) detected by the algorithm (47). The value $-29.801378$ is probably the smallest Pareto singular value of (43), but we do not have a formal proof of this fact. Finding the complete list of negative Pareto singular values of a matrix of size $100\times 120$ with the help of Theorem 1 is cost prohibitive.

Table 7 Detecting negative Pareto singular values of (43) with algorithm (47). Case $(m, n)=(100,120)$

Full size table

References

Gillis, N., Glineur, F.: Nonnegative factorization and the maximum edge biclique problem. CORE Discussion Paper 2010/59 (2010). http://hdl.handle.net/2078.1/33448
Montanari, A., Richard, E.: Non-negative principal component analysis: message passing algorithms and sharp asymptotics. IEEE Trans. Inf. Theory 62, 1458–1484 (2016)
Article MathSciNet MATH Google Scholar
Quittner, P.: Spectral analysis of variational inequalities. Comment. Math. Univ. Carolinae 27, 605–629 (1986)
MathSciNet Google Scholar
Rockafellar, R. T., Wets, R. J. B.: Variational Analysis. Springer, Berlin (1998)
Book MATH Google Scholar
Seeger, A.: Eigenvalue analysis of equilibrium processes defined by linear complementarity conditions. Linear Algebra Appl. 292, 1–14 (1999)
Article MathSciNet MATH Google Scholar
Seeger, A., Sossa, D.: Cone-constrained singular value problems. J. Convex Anal. 30, 1285–1306 (2023)
MATH Google Scholar
Tolić, D., Antulov-Fantulin, N., Kopriva, I.: A nonlinear orthogonal nonnegative matrix factorization approach to subspace clustering. Pattern Recogn. 82, 40–55 (2018)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics, University of Avignon, 33 rue Louis Pasteur, 84000, Avignon, France
Alberto Seeger
Instituto de Ciencias de la Ingeniería, Universidad de O’Higgins, Av. Libertador Bernardo O’Higgins 611, Rancagua, Chile
David Sossa

Authors

Alberto Seeger
View author publications
You can also search for this author in PubMed Google Scholar
David Sossa
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alberto Seeger.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

David Sossa: Partially supported by FONDECYT (Chile) through Grant 11220268.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Seeger, A., Sossa, D. Singular value problems under nonnegativity constraints. Positivity 27, 46 (2023). https://doi.org/10.1007/s11117-023-01000-9

Download citation

Received: 13 April 2022
Accepted: 04 July 2023
Published: 12 July 2023
DOI: https://doi.org/10.1007/s11117-023-01000-9

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Singular value problems under nonnegativity constraints

Abstract

Similar content being viewed by others

Singular value inequalities for convex functions of positive semidefinite matrices

Singular value inequalities involving convex and concave functions of positive semidefinite matrices

Singular value inequalities of matrices via increasing functions

1 Introduction

Definition 1

2 Background and preliminary results

Proposition 1

Proof

Proposition 2

Proof

2.1 Characterization and computation of \(\,\Xi (A)\)

Theorem 1

Example 1

3 Pareto singular values of nonnegative matrices

Proposition 3

Proof

Example 2

Lemma 1

Proof

Lemma 2

Proof

Theorem 2

Proof

Example 3

Corollary 1

Definition 2

4 Counting Pareto singular values

Proposition 4

Proof

Proposition 5

Proof

Remark 1

Proposition 6

Proof

5 Applications

5.1 Nash equilibria of bilinear forms on Pareto cones

Proposition 7

Proof

Corollary 2

5.2 Nonnegative matrix factorization

Theorem 3

Proof

6 Algorithmic issues

6.1 Alternating minimization

Lemma 3

Proof

Theorem 4

Proof

Remark 2

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation